Calculus of Variations and Partial Differential Equations Diogo ... - IST

67 downloads 85 Views 1MB Size Report
connection and applications to partial differential equations. We have ... of these problems, the Euler-Lagrange equation is an elliptic partial differential equation ...
Calculus of Variations and Partial Differential Equations

Diogo Aguiar Gomes

Contents . Introduction 1. 1. 2. 3. 4. 5. 6.

5

Finite dimensional optimization problems Unconstrained minimization in Rn Convexity Lagrange multipliers Linear programming Non-linear optimization with constraints Bibliographical notes

9 10 16 26 30 37 48

2. Calculus of variations in one independent variable 1. Euler-Lagrange Equations 2. Further necessary conditions 3. Applications to Riemannian geometry 4. Hamiltonian dynamics 5. Sufficient conditions 6. Symmetries and Noether theorem 7. Critical point theory 8. Invariant measures 9. Non convex problems 10. Geometry of Hamiltonian systems 11. Perturbation theory 12. Bibliographical notes

49 50 57 60 75 89 105 111 116 118 119 122 126

3. 1. 2. 3. 4.

127 129 136 136 136

Calculus of variations and elliptic equations Euler-Lagrange equation Further necessary conditions and applications Convexity and sufficient conditions Direct method in the calculus of variations 3

4

CONTENTS

5. 6. 7. 8.

Euler-Lagrange equations Regularity by energy methods H¨older continuity Schauder estimates

145 146 155 171

4. Optimal control and viscosity solutions 1. Elementary examples and properties 2. Dynamic programming principle 3. Pontryagin maximum principle 4. The Hamilton-Jacobi equation 5. Verification theorem 6. Existence of optimal controls - bounded control space 7. Sub and superdifferentials 8. Optimal control in the calculus of variations setting 9. Viscosity solutions 10. Stationary problems

183 186 188 190 192 193 195 197 202 214 224

5. 1. 2. 3. 4. 5.

231 231 237 241 244 266

Duality theory Model problems Some informal computations Duality Generalized Mather problem Monge-Kantorowich problem

. Bibliography

269

. Index

271

Introduction

This book is dedicated to the study of calculus of variations and its connection and applications to partial differential equations. We have tried to survey a wide range of techniques and problems, discussing, both classical results as well as more recent techniques and problems. This text is suitable to a first one-year graduate course on calculus of variations and optimal control, and is organized in the following way: 1. 2. 3. 4. 5.

Finite dimensional optimization problems; Calculus of variations with one independent variable; Calculus of variations and elliptic partial differential equations; Deterministic optimal control and viscosity solutions; Duality theory.

The first chapter is dedicated to finite dimensional optimization, giving emphasis to techniques that can be generalized and applied in infinitely dimensional problems. This chapter starts with an elementary discussion of unconstrained optimization in Rn and convexity. Then we discuss constrained optimization problems, linear programming and KKT conditions. The following chapter concerns variational problems with one independent variable. We study classical results including applications to Riemannian geometry and classical mechanics. We also discuss sufficient conditions for minimizers, Hamiltonian dynamics and several other related topics. The next chapter concerns variational problems with functionals defined through multiple integrals. In many of these problems, the Euler-Lagrange equation is an elliptic partial differential equation, possibly non linear. Using the direct method in the calculus of variations, we prove the existence of minimizers. Then 5

6

INTRODUCTION

we show that the minimum is a weak solution to the Euler-Lagrange equation and study its regularity. The study of regularity follows the classical path: first we consider energy methods, then we prove the De Giorgi-Nash-Moser estimates and finally Schauder estimates. In the fourth chapter we consider optimal control problems. We study both classical control theory methods such as the dynamic programming and Pontryagin maximum principle, as well as more recent tools such as viscosity solutions of Hamilton-Jacobi equations. The last chapter is a brief introduction to the (infinite dimensional) duality theory and its applications to non-linear partial differential equations. We study Mather’s problem and Monge-Kantorowich optimal mass transport problem. These have important relations with Hamilton-Jacobi and Monge-Amp´ere equations, respectively. The pre-requisites of these notes are some familiarity with the Sobolev spaces and functional analysis, at the level of [Eva98b]. With some few exceptions, we do not assume familiarity with partial differential equations beyond elementary theory. Many of the results discussed, as well as important extensions, can be found in the bibliography. In what it what concerns finite dimensional optimization and linear programming, the main reference is [Fra02]. On variational problems with one independent variable, a key reference is [AKN97]. The approach to elliptic equations in chapter 3 was strongly influenced by the course the author frequented at the University of California at Berkeley by Fraydoun Rezakhanlou, by the (unpublished) notes on Elliptic Equations by my advisor L. C. Evans, and by the book [Gia83]. The books [GT01] and [Gia93] are also classical references in this area. Optimal control problems are discussed in 4. The main references are [Eva98b], [Lio82], [Bar94] [FS93], [BCD97]. The last chapter concerns duality theory. We recommend the books [Eva99] [Vil03a], [Vil] as well as the author’s papers [Gom00], [Gom02b].

INTRODUCTION

7

I would like to thank my students: Tiago Alcaria, Patr´ıcia Engr´acia, S´ılvia Guerra, Igor Kravchenko, Anabela Pelicano, Ana Rita Pires, Ver´onica Qu´ıtalo, Lucian Radu, Joana Santos, Ana Santos, and Vitor Saraiva, which took courses based on part of these notes and suggested me several corrections and improvements. My friend Pedro Gir˜ao deserves a special thanks are he read the first LATEX version of these notes and suggested many corrections and improvements.

1

Finite dimensional optimization problems

This chapter is an introduction to optimization problems in finite dimension. We are certain that many of the results discussed, as well as its proofs, are familiar to the reader. However, we feel that it is instructive to recall them and, throughout this text, observe how they can be adapted for infinite dimensional problems. The plan of this chapter is the following: we start in §1 by considering unconstrained minimization problems in Rn , we discuss existence and uniqueness of minimizers, as well as first and second order tests for minimizers. The following section, §2, concerns properties of convex functions which will be needed throughout the text. We start the discussion of constrained optimization problems in §3 by studying the Lagrange multiplier method for equality constraints. Then, the general case involving both equality and inequality constrains is discussed in the two remaining sections. In §4 we consider linear programming problems, and in §5 we discuss nonlinear optimization problems and we derive the Karush-Kuhn-Tucker (KKT) conditions. The chapter ends with a few bibliographical references. The general setting of optimization problems is the following: given a function f : Rn → R and a set X ⊂ Rn , called the admissible set, we would like to solve the following minimization problem  min f (x) (1) x ∈ X, i.e. to find the solution set S ⊂ X such that f (y) = inf f, X

9

10

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

for all y ∈ S. We should note that the ”min” in (1) should be read ”minimize” rather than ”minimum” as the minimum may not be achieved. The number inf X f is called the value of problem (1).

1. Unconstrained minimization in Rn In this section we address the unconstrained minimization case, that is the case in which the admissible set X is Rn . Let f : Rn → R be an arbitrary function. We look for conditions on f that • ensure the existence of a minimum; • show that this minimum is unique. In many instances, existence and uniqueness results are not enough: we would also like to • determine necessary or sufficient conditions for a point to be a minimum; • estimate the location of a possible minimum. By looking for all points that satisfy necessary conditions one can determine a set of candidate minimizers. Then, by looking at sufficient conditions one may in fact be able to show that some of these points are indeed minimizers. To study the existence of a minimum of f , we can use the following procedure, called the direct method of the calculus of variations: let (xn ) be a minimizing sequence, that is, a sequence such that f (xn ) → inf f. Proposition 1. Let A be an arbitrary set and f : A → R. Then there exists a minimizing sequence.

1. UNCONSTRAINED MINIMIZATION IN Rn

11

Proof. If inf A f = −∞, there exists xn ∈ A such that f (xn ) → −∞. Otherwise, if inf A f > −∞ ,we can always find xn ∈ A such that inf A f ≤ f (xn ) ≤ inf A f + n1 , which again produces a minimizing sequence.  Let f : Rn → R. Suppose (xn ) is a minimizing sequence for f . If xn (or some subsequence) converges to a point x, and, if additionaly, f (xn ) converges to f (x), then x is a minimum of f because f (x) = lim f (xn ), and lim f (xn ) = inf f, because xn is a minimizing sequence. Thus f (x) = inf f . Although minimizing sequences always exist, they may fail to converge, even up to subsequences, as the next exercise illustrates: Exercise 1. Consider the function f (x) = e−x . Compute inf f , give an example of a minimizing sequence. Show that no minimizing sequence for f converges. As the previous exercise suggests, to ensure convergence it is natural to impose certain compactness conditions. In Rn , any bounded sequence (xn ) has a convergent subsequence. A convenient condition on f that ensures boundedness of minimizing sequences is coercivity: a function f : Rn → R is called coercive if f (x) → +∞, as |x| → ∞. Exercise 2. Let f be a coercive function and let xn be a sequence such that f (xn ) is bounded. Show that xn is bounded. Note in particular that if f (xn ) is convergent then xn is bounded. Therefore, from the previous exercise, it follows Proposition 2. Let f : Rn → R be a coercive function. Let (xn ) is a minimizing sequence for f . Then there exists a point x for which, through some subsequence xn → x.

12

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

Unfortunately, if f is discontinuous at x, f (xn ) may fail to converge to f (x). This poses a problem because if xn is a minimizing sequence f (xn ) → inf f and if this limit is not f (x) then x cannot be a minimizer. It would, therefore, seem natural to require f to be continuous. However, to establish that x is a minimizer we do not really need continuity. In fact, a weaker property is sufficient: it is enough that for any sequence (xn ) converging to x the following inequality holds: lim inf f (xn ) ≥ f (x).

(2)

A function f is called lower semicontinuous if inequality (2) holds for any point x and any sequence xn converging to x. Example 1. The function f (x) =

 1

if x 6= 0

0

if x = 0

is lower semicontinuous. However,  0 g(x) = 1

if x 6= 0 if x = 0

is not.

J

ADD HERE GRAPH OF FUNCTIONS Proposition 3. Let f : Rn → R be lower semicontinuous and let (xn ) ⊂ Rn be a minimizing sequence converging to x ∈ Rn . Then x is a minimizer of f . Proof. Let xn be a minimizing sequence. Then inf f = lim f (xn ) = lim inf f (xn ) ≥ f (x), that is, f (x) ≤ inf f .



Lower semicontinuity is a weaker property than continuity, and therefore easier to be satisfied.

1. UNCONSTRAINED MINIMIZATION IN Rn

13

Establishing the uniqueness of minimizer is, in general, more complex. A convenient condition that implies uniqueness of minimizers is convexity. A set A ⊂ Rn is convex if for all x, y ∈ A and any 0 ≤ λ ≤ 1 we have λx + (1 − λ)y ∈ A. Let A be a convex set A function f : A → R is convex if, for any x, y ∈ A and 0 ≤ λ ≤ 1, f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y), and it is uniformly convex if there exists θ > 0 such that for all x, y ∈ A and 0 ≤ λ ≤ 1, f (λx + (1 − λ)y) + θλ(1 − λ)|x − y|2 ≤ λf (x) + (1 − λ)f (y). Example 2. Let k · k be any norm in Rn . Then, by the triangle inequality kλx + (1 − λ)yk ≤ kλxk + k(1 − λ)yk = λkxk + (1 − λ)kyk, for all 0 ≤ λ ≤ 1. Thus the mapping x 7→ kxk is convex.

J

Exercise 3. Show that the square of the Euclidean norm in Rd , kxk2 = P 2 k xk is uniformly convex. Proposition 4. Let A ⊂ Rn be a convex set and f : A → R be a convex function. If x and y are minimizers of f then so is λx + (1 − λ)y, for any 0 ≤ λ ≤ 1. If f is uniformly convex then x = y. Proof. If x and y are minimizers then f (x) = f (y) = min f . Consequently, by convexity f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) = min f. Therefore λx + (1 − λ)y is a minimizer of f . If f is uniformly convex, and choosing 0 < λ < 1, we obtain f (λx + (1 − λ)y) + θλ(1 − λ)|x − y|2 ≤ min f, which implies x = y.



14

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

The characterization of minimizers, through necessary or sufficient conditions is usually made by introducing certain conditions that involve first or second derivatives. Let f : Rn → R be a C 2 function. Recall that Df and D2 f denote, respectively the first and second derivatives of f . Also we use the notation that a n × n matrix A ≥ 0 if A is semidefinite positive and A > 0 is A is definite positive. The next proposition is a well known result that illustrates this. Proposition 5. Let f : Rn → R be a C 2 function and x a minimizer of f . Then Df (x) = 0

and

D2 f (x) ≥ 0.

Proof. For any vector y ∈ Rn and  > 0 we have 0 ≤ f (x + y) − f (x) = Df (x)y + O(2 ), dividing by , and letting  → 0, we obtain Df (x)y ≥ 0. Since y is arbitrary we conclude that: Df (x) = 0. In a similar way, f (x + y) + f (x − y) − 2f (x) = y T D2 f (x)y + o(1), 2 and so, when  → 0, we obtain 0≤

y T D2 f (x)y ≥ 0.  Let f : Rn → R be a C 1 function. A point x is called a critical point of f if Df (x) = 0. Exercise 4. Let A be any set and f : A → R be a C 1 function in the interior int A of A. Show that any maximizer or minimizer of f is either a critical point or lies on the boundary ∂A of A.

1. UNCONSTRAINED MINIMIZATION IN Rn

15

We will now show that any critical point of a convex function is a minimizer. For that we need the following preliminary result: Proposition 6. Let f : Rn → R be a C 1 convex function. Then, for any x, y we have f (y) ≥ f (x) + Df (x)(y − x). Proof. We have (1−λ)f (x)+λf (y) ≥ f (x+λ(y−x)) = f (x)+λDf (x)(y−x)+o(|λ(y−x)|). Thus, reorganizing the inequality and dividing by λ we obtain f (y) ≥ f (x) + Df (x)(y − x) + o(1), as λ → 0.



We can use now this result to prove: Proposition 7. Let f : Rn → R be a C 1 convex function and x a critical point of f . Then x is a minimizer of f . Proof. Since Df (x) = 0 and f is convex, it follows from proposition 6 that f (y) ≥ f (x), for all y.



Exercise 5. Let f (x, λ) : Rn × Rm → R be a C 2 function, x0 a mini2 mizer of f (·, 0), with Dxx f (x0 , 0) definite positive. Show that, for each λ in a neighborhood of λ = 0, there exists a unique local minimizer xλ of f (·, λ) with xλ |λ=0 = x0 . Compute Dλ xλ at λ = 0. Growth conditions on f can be used to estimate the norm of a minimizer. In finite dimensional problems, estimates on the norm of a minimizer are important for numerical methods. For instance, if such an estimate exits, it makes it possible localize the search region for a minimizer. In infinite dimensional problems this issue is even more

16

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

relevant as it will be clear later in these notes. An elementary result is given in the next exercise: Exercise 6. Let f : Rn → R be such that f (x) ≥ C1 |x|2 + C2 , C1 > 0. Let x0 be a minimizer of f . Show that s f (y) − C2 |x0 | ≤ , C1 for any y ∈ Rn . Exercise 7. Let f (x, λ) : R2 → R be a continuous function. Suppose for each λ there is at least one minimizer xλ of x 7→ f (x, λ). Suppose there exists C such that |xλ | ≤ C for all λ in a neighborhood of λ = 0. Suppose that for λ = 0 there exists a unique minimizer x0 . Show that limλ→0 xλ = x0 . Exercise 8. Let f ∈ C 1 (R2 ). Define u(x) = inf y∈R f (x, y). Suppose that lim f (x, y) = +∞,

|y|→∞

uniformly in x. Let x0 be a point in which the infimum in y of f is achieved at a single point y0 . Show that u is differentiable in x at x0 and that ∂f ∂u (x0 ) = (x0 , y0 ). ∂x ∂x Give an example that shows that u may fail to be differentiable if the infimum of f in y is achieved at more than one point. Exercise 9. Find all maxima and minima (both local and global) of the function xy(1 − x2 − y 2 ) on the square −1 ≤ x, y ≤ 1.

2. Convexity As we discussed in the previous section, convexity is a central property in optimization. In this section we discuss additional properties of convex functions which will be necessary in the sequel.

2. CONVEXITY

17

2.1. Characterizarion of convex functions. We now discuss several tools that are useful to characterize convex functions. We first observe that given a family of convex functions it is possible to build another convex function by taking the pointwise supremum. This is a useful construction and is illustrated in figure ADD FIGURE HERE Proposition 8. Let I be an arbitrary set and fι : Rn → R, ι ∈ I, an indexed collection of convex functions. Let f (x) = sup fι (x). ι∈I

Then f is convex. Proof. Let x, y ∈ Rn and 0 ≤ λ ≤ 1. Then f (λx + (1 − λ)y) = sup fι (λx + (1 − λ)y) ≤ sup λfι (x) + (1 − λ)fι (y) ι∈I

ι∈I

≤ sup λfι1 (x) + sup(1 − λ)fι2 (y) ι1 ∈I

ι2 ∈I

= λf (x) + (1 − λ)f (y).  Corollary 9. Suppose f : Rn → R is a C 1 function satisfying f (y) ≥ f (x) + Df (x)(y − x), for all x. Then f is convex. Proof. It suffices to observe that f (y) ≥ sup f (x) + Df (x)(y − x), x∈Rn

which by proposition 8 is convex. Finally, we just observe that sup f (x) + Df (x)(y − x) ≥ f (y), x∈Rn

and so the equality follows.



Proposition 10. Let f : Rn → R be a C 2 function. Then f is convex if and only if D2 f (x) is positive semi-definite, for all x ∈ Rn .

18

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

Proof. Observe that if f is convex then for any y ∈ Rn and any  ≥ 0 we have f (x − y) + f (x + y) − 2f (x) ≥ 0. 2 By sending  → 0 and using Taylor formula conclude y T D2 f (x)y ≥ 0, and so D2 f (x) is semi-definite positive. Conversely, Z

1

f (y) − f (x) =

Df (x + s(y − x))(y − x)ds = 0 1

Z = Df (x)(y − x) +

[Df (x + s(y − x))(y − x) − Df (x)(y − x)] ds  Z 1 Z 1 T 2 s(y − x) D f (x + ts(y − x))(y − x)dt ds = Df (x)(y − x) + 0

0

0

≥ Df (x)(y − x), since (y − x)T D2 f (x + ts(y − x))(y − x) ≥ 0, by the semi-positive definiteness hypothesis.  Proposition 11. Let f : Rn → R be a continuous function. Then f is convex if and only if (3)

f (x + y) + f (x − y) − 2f (x) ≥ 0,

for any x, y ∈ Rn . Proof. Clearly convexity implies (3). Let x, y ∈ Rn , and 0 ≤ λ ≤ 1 be such that λx + (1 − λ)y = z. We must prove that (4)

λf (x) + (1 − λ)f (y) ≥ f (z)

holds. We claim that the previous equation holds for any λ = 2kj , for any 0 ≤ k ≤ 2j . Clearly (4) holds when j = 1. Now we proceed with induction in j. Assume that (4) holds for λ = 2kj . Then we claim that k it holds with λ = 2j+1 . If k is even we can reduce the fraction, therefore

2. CONVEXITY

19

k we may suppose that k is odd, λ = 2j+1 and λx + (1 − λ)y = 0. Now note that        1 k−1 1 k+1 k−1 k+1 z= x + 1 − j+1 y + x + 1 − j+1 y . 2 2j+1 2 2 2j+1 2

Thus 1 f (z) ≤ f 2



       k−1 1 k+1 k−1 k+1 x + 1 − j+1 y + f x + 1 − j+1 y 2j+1 2 2 2j+1 2

but, since k − 1 and k + 1 are even, k˜0 = k−1 and k˜1 = k+1 are integers. 2 2 Hence ! ! ! ! k˜0 k˜0 1 k˜1 k˜1 1 x+ 1− j y + f x+ 1− j y f (z) ≤ f 2 2j 2 2 2j 2 But this implies, that k˜0 + k˜1 f (z) ≤ j+1 f (x) + 2

k˜0 + k˜1 1 − j+1 2

! f (y).

Since k˜0 + k˜1 = k we get   k f (z) ≤ j+1 f (x) + 1 − j+1 f (y). 2 2 k

Since f is continuous and the rationals of the form we conclude that

k 2j

are dense in [0, 1],

f (z) ≤ λf (x) + (1 − λ)f (y), for any real 0 ≤ λ ≤ 1.



Exercise 10. Let f : Rn → R be a C 2 function. Show that the following statements are equivalent: 1. 2. 3. 4.

f is uniformly convex; D2 f ≥ γ > 0, for some γ > 0;  |x−y|2 (y) + θ f x+y ≤ f (x)+f ; 2 4 2 f (y) ≥ f (x) + Df (x)(y − x) + γ2 |x − y|2 , for some γ > 0.

Exercise 11. Let ϕ : R → R be a non-decreasing convex function, and ψ : Rn → R a convex function. Show that ϕ ◦ ψ is convex. Show by giving an example that if ϕ is not non-decreasing then ϕ ◦ ψ may fail to be convex.

20

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

2.2. Lipschitz continuity. Convex functions enjoy remarkable properties. We will first show that any convex function is locally bounded and Lipschitz. Proposition 12. Let f : Rd → R be a convex function. Then f is locally bounded and locally Lipschitz. P Proof. For x ∈ Rd denote |x|1 = k |xk |. Define XM = {x ∈ Rd : |x|1 ≤ M }. We will prove that f is bounded on XM/8 . Any point x ∈ XM can be written as a convex combination of the points ±M ek , where ek is the k-th standard unit vector. Thus f (x) ≤ max{f (M ek ), f (−M ek )}. k

Suppose now f is not bounded by bellow on XM/8 . Then there exists a sequence xn ∈ XM/8 such that f (xn ) → −∞. Choose a point y ∈ c . Note that 2y − xn ∈ XM . Therefore we can write 2y − xn XM/4 ∩ XM/8 as a convex combination of the points ±M ek , i.e. 1 1 XX y = xn + ±λ± k M ek . 2 2 k ± Thus 1 1 f (y) ≤ f (xn ) + max{f (M ek ), f (−M ek )}, 2 2 k which is a contradiction if f (xn ) → −∞. Now we will show the second part of the proposition, i.e., that any convex function is also locally Lipschitz. By contradiction, by changing coordinates if necessay, we can assume that 0 is not a Lipschizt point, that is, there exists a sequence xn → 0 such that |f (xn ) − f (0)| ≥ C|xn |, for all C and all n large enough. In particular this implies that lim sup n→∞

f (xn ) − f (0) ∈ {−∞, +∞}. |xn |

2. CONVEXITY

21

and, similarly, lim inf n→∞

f (xn ) − f (0) ∈ {−∞, +∞}. |xn |

By the previous part of the proof, we can assume that f is bounded on X1 . For each n choose a point yn such that |yn |1 = 1 such that xn = |xn |1 yn . Then f (xn ) ≤ |xn |f (yn ) + (1 − |xn |)f (0), which implies f (yn ) ≥ f (0) +

f (xn ) − f (0) . |xn |

Therefore (5)

lim sup n→∞

f (xn ) − f (0) = −∞, |xn |

otherwise we would have a contradiction (note that f (yn ) is bounded). We can also write 0 = f (0) ≤

1 x 1+|xn | n



|xn | y . 1+|xn | n

So

1 |xn | f (xn ) + f (−yn ). 1 + |xn | 1 + |xn |

This implies f (−yn ) ≥ f (0) +

f (0) − f (xn ) . |xn |

Because f (−yn ) is bounded lim sup n→∞

f (0) − f (xn ) = −∞ |xn |

which is a contradiction to (5).



2.3. Separation. In this last subsection we study separation properties that arise from convexity and present some applications. Proposition 13. Let C be a closed convex set not containing the origin. Then there exists x0 ∈ C which minimizes |x| over all x ∈ C.

22

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

Proof. Consider a minimizing sequence xn . By a simple computation, we have the parallelogram identity

xn + xm 2 1

+ kxn − xm k2 = 1 kxn k2 + 1 kxm k2 .

2 4 2 2 Because

xn +xm 2

∈ C, by convexity, we have the inequality

xn + xm 2

≥ inf kyk2 .

y∈C 2

As n, m → ∞ we also have kxn k, kxm k → inf kyk2 . y∈C

But then, as n, m → ∞, we conclude that kxn − xm k2 → 0. Therefore any minimizing sequence is a Cauchy sequence and hence convergent.  Exercise 12. Let F : Rn → R be a uniformly convex function. Show that any minimizing sequence for F is a Cauchy sequence. Hint: xn + xm θ F (xn )+F (xm )−2 inf F ≥ F (xn )+F (xm )−2F ( ) ≥ |xn −xm |2 . 2 2 Proposition 14. Let U and V be disjoint closed convex sets. Suppose one them is compact. Then there exists w ∈ Rn and a > 0 such that (w, x − y) ≥ a > 0, for all x ∈ U and y ∈ V . Proof. Consider the closed convex set W = U − V (this set is closed because either U or V is compact). Then there exists a point w ∈ W with minimal norm. Since 0 6∈ W , w 6= 0. So, for all x ∈ U and y ∈ V , by the convexity of W , kwk2 ≤ kλ(x − y) + (1 − λ)wk2 = (1 − λ)2 kwk2 + 2λ(1 − λ)(x − y, w) + λ2 kx − yk2 . The last inequality implies 0 ≤ ((1 − λ)2 − 1)kwk2 + 2λ(1 − λ)(x − y, w) + λ2 kx − yk2 .

2. CONVEXITY

23

Dividing by λ and letting λ → 0 we obtain (x − y, w) ≥ kwk2 > 0.  As a first application to the separation result we discuss a generalization of derivatives for convex functions. The subdifferential ∂ − f (x) of a convex function f : Rn → R at a point x ∈ Rn is the set of vectors p ∈ Rn such that f (y) ≥ f (x) + p · (y − x), for all y ∈ Rn . Proposition 15. Let f : Rn → R be a convex function and x0 ∈ Rn . Then ∂ − f (x0 ) 6= ∅. Proof. Consider the set E(f ) = {(x, y) ∈ Rn+1 : y ≥ f (x)}, the epigraph of f . Then, because f is convex and hence continuous, E(f ) is a closed convex set. Consider the sequence yn = f (x0 ) − n1 . Because for each n the sets E(f ) and {(x0 , yn )} are disjoint closed convex sets, and the second one is compact, there is a separating plane (6)

f (˜ x) ≥ αn (˜ x − x0 ) + βn ,

for all x˜ and 1 = yn ≤ βn ≤ f (x0 ). n Thus, from (7) we get that βn is bounded. Since f is locally bounded, the inequality (6) implies the boundedness of αn . Therefore, up to a subsequence, there exists α = lim αn and β = lim βn . Furthermore (7)

f (x0 ) −

f (˜ x) ≥ α(˜ x − x0 ) + β, and, again using (7), we get that f (x0 ) = β. Thus f (˜ x) ≥ α(˜ x − x0 ) + f (x0 ), and so α ∈ ∂ − f (x).



Exercise 13. Let f : R → R, be given by f (x) = |x|. Compute ∂ − f .

24

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

Exercise 14. Let f : Rn → R be convex. Show that if f is differentiable at x ∈ Rn then ∂ − f (x) = {Df (x)}. Proposition 16. Let f : Rn → R be a C 1 convex function. Then (Df (x) − Df (y)) · (x − y) ≥ 0. Proof. Observe that f (y) ≥ f (x) + Df (x) · (y − x)

f (x) ≥ f (y) + Df (y) · (x − y).

Add these two inequalities.



Exercise 15. Prove the analogous to the previous proposition for the case in which f is not C 1 by replacing derivatives by points in the subdifferential. Exercise 16. Let f be a uniformly convex function. Show that (Df (x) − Df (y)) · (x − y) ≥ γ|x − y|2 . Exercise 17. Let f : Rn → R be a convex function. Show that a point x ∈ Rn is a minimizer of f if and only if 0 ∈ ∂ − f (x). Exercise 18. Let A be a convex set and f : A → R be a uniformly convex function. Let x ∈ A be a maximizer of f . Show that x is an extreme point, that is, that there are no y, z ∈ A, x 6= y, z and 0 < λ < 1 such that x = λy + (1 − λ)z.

The second application of Proposition 14 is a very important result called Farkas lemma: Lemma 17 (Farkas Lemma). Let A be a m × n matrix, c a line vector in Rn . Then we have one and only one of the following alternatives 1. c = y T A, for some y ≥ 0 2. There exists a column vector w ∈ Rn , such that Aw ≤ 0 and cw > 0

2. CONVEXITY

25

Proof. If the first alternative does not hold, the sets U = {y T A, y ≥ 0} and V = {c} are disjoint and convex. Then the separation theorem for convex sets (see proposition 14) implies that there exists an hyperplane with normal w which separates them, that is y T Aw ≤ a

(8) and

cw > a. Note that a ≥ 0 (by setting y = 0 in (8)), so cw > 0. Furthermore, for any γ ≥ 0 we have γy T Aw ≤ a, by letting γ → +∞ we conclude that y T Aw ≤ 0. So this corresponds to the second alternative.



Example 3. Consider a discrete state one-period pricing model, that is, we are given n assets which at the initial time cost ci , 1 ≤ i ≤ n per unit (we regard c as a row vector) and after one unit of time, each asset is worth with probability pj , 1 ≤ j ≤ m, Pji . A portfolio is a (column) vector π ∈ Rn . The value of the portfolio at time 0 is cπ and at time one, with probability pj the value is (P π)j . An arbitrage opportunity is a portfolio such that cπ < 0 and (P π)j ≥ 0, i.e. a portfolio with negative cost and non-negative return. Farkas lemma yields that either 1. there exists y ∈ Rm , y ≥ 0 such that c = yP or 2. there exists an arbitrage portfolio. Furthermore, if one of the assets is a no-interest bearing bank account, for instance c1 = 1 and Pj1 = 1. Then y is a probability vector which in general may be different from p. J

26

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

3. Lagrange multipliers Many important problems require minimizing (or maximizing) functions under equality constraints. The Lagrange multiplier method is the standard tool to study these problems. For inequality constraints, the Lagrange multiplier method can be extended in a suitable way as it will be studied in the two following sections. Proposition 18. Let f : Rn → R and g : Rn → Rm (m < n) be C 1 functions. Suppose c ∈ Rm fixed, and assume that the rank of Dg is m at all points of the set g = c. Then, if x0 is a minimum of f in the set g(x) = c, there exists λ ∈ Rm such that Df (x0 ) = λT Dg(x0 ). Proof. Let x0 be as in the statement. Suppose that w1 , . . . wm are vectors in Rn satisfying det [Dg(x0 )W ] 6= 0, where W ≡ [w1 · · · wm ] is the matrix with columns w1 , . . . wm . Note that it is possible to choose such vectors because the rank of Dg is m. Given v ∈ Rn consider the equation g(x0 + v + W i) = c. The implicit function theorem implies that there exists a unique function i() : R → Rm ,   i1 ()   i() =  ...  , im () defined in a neighborhood of  = 0, with i(0) = 0, and such that g(x0 + v + W i()) = c. Additionally, i0 (0) = −(Dg(x0 )W )−1 Dg(x0 )v. Since x0 is a minimizer of f in the set g(x) = c, the function I() = f (x0 + v + W i())

3. LAGRANGE MULTIPLIERS

27

satisfies 0 = I 0 (0) = Df (x0 )v + Df (x0 )W i0 (0), that is, Df (x0 )v = λT Dg(x0 )v, with λT = Df (x0 )W (Dg(x0 )W )−1 , for any vector v.



Proposition 19. Let f : Rn → R, and g : Rn → Rm , with m < n, be smooth functions. Assume that Dg has maximal rank at all points. Let xc be a minimizer of f (x) under the constraint g(x) = c, and λc the corresponding Lagrange multiplier, i.e. (9)

Df (xc ) = λc Dg(xc ).

Suppose that xc is differentiable function of c. Define V (c) = f (xc ). Then Dc V (c) = λc . Proof. We have g(xc ) = c. By differentiating with respect to c we obtain ∂xc Dg(xc ) = I. ∂c Multipying by λc and using (9) yields λc = λc Dg(xc )

∂xc ∂xc = Df (xc ) = Dc V (c). ∂c ∂c 

Exercise 19. Let f : Rn → R, and g : Rn → Rm , with m < n, be smooth functions. Assume that Dg has maximal rank at all points. Let x0 be a minimizer of f (x) under the constraint g(x) = g(x0 ), λ the corresponding Lagrange multiplier, and F = f + λg. Show that Dx2i xj F (x0 )ξi ξj ≥ 0, for all vectors ξ that satisfy Dxi g(x0 )ξi = 0.

28

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

Proposition 20. Let f : Rn → R, and g : Rn → Rm , with m < n. Let x0 be a minimizer of f (x) under the constraint g(x) = g(x0 ). Then there exist constants λ0 , · · · λm not identically zero such that λ0 Df + λ1 Dg1 · · · λm Dgm = 0 at x0 . Furthermore, if Dg has maximal rank we can choose λ0 = 1. Proof. First observe that the matrix " # Df Dg cannot have rank m + 1. Indeed, this follows by applying the implicit function theorem to the function (x, c) 7→ (f (x) − c0 , g(x) − c0 ) with x ∈ Rn and c = (c0 , c0 ) ∈ Rm+1 , to obtain a contradiction to x0 being a minimizer. This fact then implies that there exist constants λ0 , · · · λm not identically zero such that λ0 Df + λ1 Dg1 + · · · + λm Dgm = 0 at x0 . Observe also that if Dg has maximal rank we can choose λ0 = 1. In fact, if λ0 6= 0, it suffices to multiply λ by λ10 . To see that λ0 6= 0 we argue by contradiction. In fact, if λ0 = 0 we would have λ1 Dg1 + · · · + λm Dgm = 0 which contradicts the hypothesis that Dg has maximal rank m.



Example 4 (Minimax principle). There exists a nice formal interpretation of Lagrange multipliers, which although not rigorous is quite useful. Fix c ∈ Rm , and consider the problem of minimizing a function f : Rn → R under the constraint g(x) − c = 0, with g : Rn → Rm . This problem can be rewritten as min max f (x) + λT (g(x) − c). x

λ

The minimax principle asserts that the maximum can be exchanged with the minimum (which is frequently false) and, therefore, we obtain

3. LAGRANGE MULTIPLIERS

29

the ”equivalent” problem max min f (x) + λT (g(x) − c). x

λ

From this we deduce that, for each λ the minimum xλ is determined by Df (xλ ) + λT Dg(xλ ) = 0.

(10)

Furthermore, the function to maximize in λ is f (xλ ) + λT (g(xλ ) − c). Differentiating this equation with respect to λ, assuming that xλ is differentiable, and using (10), we obtain g(xλ ) = c. J Exercise 20. Use the minimax principle to determine (formally) optimality conditions for the problem min f (x) under the constraint g(x) ≥ c.

The next exercise illustrates that the minimax principle may indeed be false, although in many problems it is an important heuristic Exercise 21. Show that the minimax principle is not valid in the following cases: 1. x + λ; 2. x3 + λ(x2 + 1); 1 3. 1+(x−λ) 2. Exercise 22. Let A and B be arbitrary sets and F : A × B → R. Show that inf sup F (a, b) ≥ sup inf F (a, b).

a∈A b∈B

b∈B a∈A

30

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

4. Linear programming We now continue the study of constrained optimization problems by looking into minimization of linear functions subjected to linear inequality constraints - i.e., linear programming problems. A detailed discussion on this class of problems can be found, for instance, in [GSS08] or [Fra02].

4.1. The setting of linear programming. A model problem in linear programming is the following: given a line vector c ∈ Rn , a real m × n matrix A, and a column vector b ∈ Rm we look for a column vector x ∈ Rn which is a solution to the problem:    maxx cx (11)

Ax ≤ b

  

x ≥ 0,

where the notation v ≥ 0 for a vector v means that all components of v are non-negative. The set defined by the inequalities Ax ≤ b and x ≥ 0 may be empty, or in this set the function cx may be unbounded by above. To simplify the discussion, we assume that this situation does not occur. Move here feasible set Example 5. Add example here. J Observe that if c 6= 0 the maximizers of cx cannot be interior points of the feasible set, otherwise by exercise 4 they would be critical points. Therefore, the maximizers must lie on the boundary of Ax ≤ b, x ≥ 0. Unfortunately this boundary can be quite complex as consists on a finite (but frequently large) union of intersections of planes (of the form dx = e) with half-planes (of the form dx ≤ e).

4. LINEAR PROGRAMMING

31

Exercise 23. Suppose that no line of A vanishes. Show that the boundary of the set Ax ≤ b consist of all points which satisfy Ax ≤ b with equality in at least one coordinate. Note that the linear programming problem (11) is quite general as it is possible to include equality constraints as inequalities: in fact A0 x = b0 is the conjunction of A0 x ≤ b0 and −A0 x ≤ −b0 . A vector x is called feasible for (11) if it satisfies the constraints, that is Ax ≤ b and x ≥ 0. Example 6 (Diet problem). A animal food factory would like to minimize the production cost of a pet food, while keeping it nutritionally balanced. Each food i costs ci by unit. Therefore, if each unit of pet food contains an amount xi of the food ci , the total cost is cx. There is, of course, the obvious constraint that x ≥ 0. Suppose that Aij represents the amount of the nutrient i in the food j, and bi the minimum recommended amount of the nutrient i. Then, to ensure a nutritionally ballanced diet we must have Ax ≥ b. Thus the diet problem is    min cx

Ax ≥ b

  

x ≥ 0. J

Example 7 (Optimal Transport). A large multinational needs to transport its supply from each factory i to the distribution points j. The supply in i is si and the demand in j is dj . The cost of transporting one unit from i to j is cij . We would like to determine the quantity πij transported from i to j solving the following optimization problem X min cij πij , π

ij

32

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

under the constraints πij ≥ 0, and supply and demand bounds X

πij ≤ si ,

X

j

πij ≥ dj .

i

J Example 8. The existence of feasible vectors, i.e. vectors satisfying the constraint Ax ≤ b is not obvious. There exists, however a procedure that can convert this question into a new linear programming problem. Let x0 be a new variable. We would like to solve min x0 where the minimum is taken over all vectors (x0 , x) which satisfy the constraints (Ax)j ≤ bj + x0 , for all j. It is clear that the feasible set for this problem is non-empty, take for instance x = 0 and x0 = max |bj |. This new linear programming problem has therefore a value (which could be −∞ but not +∞). If the value is non-positive, there exist feasible vectors for the constraint Ax ≤ b. Otherwise, if the value is positive, it implies that the feasible set of the original problem is empty. J Exercise 24. Let A be m × n matrix, with m > n. Consider the overdetermined system Ax = b for b ∈ Rm . In general, this equation has no solution. We would like to determine a vector x ∈ Rn which minimizes the maximum of the error sup |(Ax)i − bi |. i

Rewrite this problem as a linear programming problem. Compare this problem with the minimum square method which consists in solving min kAx − bk2 . x

4. LINEAR PROGRAMMING

33

4.2. The dual problem. To problem (11), which we call primal, we associate another problem, called the dual, which consists in determining y ∈ Rm , which solves  T   min y b yT A ≥ c

(12)

  

y ≥ 0.

As the next exercise shows, the dual problem can be motivated by the minimax principle: Exercise 25. Show that (11) can be written as (13)

max min cT x + y T (b − Ax). x≥0 y≥0

Suppose we can exchange the maximum with the minimum in (13). Relate the resulting problem with (12). Example 9 (Interpretation of the dual of the diet problem). The dual of the diet problem (example 6) is the following  T   max y b yT A ≤ c

  

y ≥ 0.

This problem admits the following interpretation. A competing company is willing to provide a nutritionally balanced diet, charging for each unit of the nutrient i a price yi . Obviously, the competing company would like to maximize its income. There are the following constraints: y ≥ 0, and furthermore if the food item j costs cj the competing company should charge an amount (y T A)j no larger than cj . This constraint is quite natural, since if it does not hold, at least part of the diet could be obtained by buying the food items j such that (y T A)j > cj . J Exercise 26. Show that the dual of the dual is equivalent to the primal. Exercise 27. Determine the dual of the optimal transport problem and give a possible interpretation.

34

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

As the next theorem concerns the relation between the primal and dual problems: Theorem 21. 1. Weak Duality: Suppose x and y are feasible, respectively, for (11) and (12), then cx ≤ y T b. 2. Optimality: Furthermore, if cx = y T b then x and y are solutions of (11) and (12), respectively. 3. Strong duality: If (11) has a solution x∗ , then (12) also has a solution y ∗ , cx∗ = (y ∗ )T b. Finally, yj∗ = 0 for all indices j such that (Ax∗ )j < bj . Proof. To prove the weak duality, observe that cx ≤ (y T A)x = y T (Ax) ≤ y T b. The optimality criterion follows from the previous inequality. To prove the strong duality, we may assume that the inequality Ax ≤ b includes also x ≥ 0, for instance replacing A by the augmented matrix " # A A˜ = −I and the vector b by " ˜b =

b 0

# .

In this case it will be enough to prove that there exists a vector y˜∗ ∈ Rn+m such that y˜∗ ≥ 0, c = (˜ y ∗ )T A˜ ˜ ∗ )j < ˜bj . In fact, if such with y˜j∗ = 0 for all indices j such that (Ax vector y˜∗ is given we just set y ∗ to be the first n coordinates of y˜∗ .

4. LINEAR PROGRAMMING

35

Then c ≤ (y ∗ )T A and then ˜ ∗ = (˜ cx∗ = (˜ y ∗ )T Ax y ∗ )T ˜b = (y ∗ )T b, since ˜b differs from b by adding n zero entries. From this point on we drop the ∼ to simplify the notation. First we state the following auxiliary result, whose proof is a simple corollary to Lemma 17: Lemma 22. Let A be a m × n matrix, c a line vector in Rn and J an arbitrary set of lines of A. Then we have one and only one of the following alternatives 1. c = y T A, for some y ≥ 0 with yj = 0 for all j 6∈ J. 2. There exists a column vector w ∈ Rn , such that (Aw)j ≤ 0 for all j ∈ J and cw > 0. Exercise 28. Use Lemma 17 to prove Lemma 22. Let x∗ be a solution of (11). Let J be the set of indices j for which (Ax∗ )j = bj . We will show that there exists y ≥ 0 such that c = y T A and yj = 0 for j 6∈ J. By contradiction assume that no such y exists. By the previous lemma there is w such that cw > 0 and (Aw)j ≤ 0 for j ∈ J. But then, x˜ = x∗ + w is feasible, for  > 0 sufficiently small since A˜ x = Ax∗ + Aw ≤ b. However, c˜ x = c(x∗ + w) > cx∗ , which contradicts the optimality of x∗ . Therefore, for some y ≥ 0, cx∗ = y T Ax∗ = y T b. Consequently, by the second part of the theorem we conclude that y is optimal. 

36

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

Lemma 23. Let x and y be, respectively, feasible for the primal and dual problems. Define e = AT y − cT ≥ 0.

s = b − Ax ≥ 0, Then

sT y + xT e = bT y − xT cT ≥ 0. Proof. Since x, y ≥ 0 we have sT y = bT y − xT AT y ≥ 0

xT e = xT AT y − xT cT ≥ 0.

By adding these two expressions, we obtain sT y + xT e = bT y − xT cT ≥ 0.  Theorem 24 (Complementarity). Suppose x and y are solutions of (11) and (12), respectively. Then sT y = 0

and

xT e = 0.

Proof. We have sT y, xT e ≥ 0. If x and y are optimal then cx = y T b. By the previous lemma sT y + xT e = 0, which implies the theorem.



Exercise 29. Study the following problem in R2 : max x1 + 2x2 with x1 , x2 ≥ 0, x1 + x2 ≤ 1 and 2x1 + x2 ≤ 3/2. Determine the dual problem, its solution and show that it has the same value as the primal problem. Exercise 30. Let x∗ be a solution of the problem min cx

5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS

37

under the constraints Ax ≥ b, x ≥ 0 and let y ∗ be a solution of the dual. Use complementarity to show that x∗ minimizes cx − (y ∗ )T Ax under the constraint x ≥ 0. Exercise 31. Solve by elementary methods the problem max x1 + x2 under the constraints 3x1 + 4x2 ≤ 12, 5x1 + 2x2 ≤ 10. Exercise 32. Consider the problem min −7x1 + 9x2 + 16x3 , under the constraints x ≥ 0, 2 ≤ x1 + 2x2 + 9x3 ≤ 7. Obtain an upper and lower bound for the value of the minimum. Exercise 33. Show that the solution set of a linear programming problem is a convex set. Exercise 34. Consider a linear programming problem in Rn min c x under the constraints Ax ≤ b, x ≥ 0. Suppose c = c0 + c1 . Suppose that for  > 0 there exists a minimizer x which converges to a point x0 , as  → 0. Show that x0 is a minimizer of c0 x under Ax ≤ b, x ≥ 0. Show, furthermore that if this limit problem has more than one minimizer then x0 minimizes c1 x among all other minimizers.

5. Non-linear optimization with constraints Let f : Rn → R and g : Rn → Rm be C 1 functions. We consider the following non-linear optimization problem:    max f (x)   x (14) g(x) ≤ 0    x ≥ 0.

38

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

We denote the feasible set by X: X = {x ∈ Rn |x ≥ 0, g(x) ≤ 0}, and the solution set by S: S = {¯ x ∈ X : f (¯ x) = sup f (x)}. x∈X

In this section we derive necessary conditions, called the Karush-KuhnTucker (KKT) conditions, for a point to be a solution of the problem. We start by explaining these conditions which generalize both the Lagrange multipliers for equality constraints and the optimality conditions from linear programming. We then show that under convexity hypothesis these conditions are in fact sufficient. After that we show that under a condition called constraint qualification that the KKT conditions are indeed necessary optimality conditions. We end the discussion with several conditions that allow to check in practice the constraint qualification conditions.

5.1. KKT conditions. For y ∈ Rm define the Lagrangian L(x, y, µ) = f (x) − y T g(x) + µT x For (x, µ, y) ∈ Rn × Rn × Rm the KKT conditions are the following:

(15)

 ∂L  =0  ∂xi    g(x) ≤ 0,

y T g(x) = 0

 x ≥ 0, µT x = 0     µ, y ≥ 0.

The variables y and µ are called the Lagrange multipliers. Several variations of the KKT conditions arise in different problems. For instance, in the case in which there is no positivity constraints for the variable x, the KKT conditions take the form: for (x, y) ∈ Rn ×Rm ,

5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS

and L(x, y) = f (x) − y T g(x),  ∂L    ∂xi = 0 (16) g(x) ≤ 0,    y ≥ 0.

39

y T g(x) = 0

Exercise 35. Derive (16) from (15) by writing x = x+ − x− where x+ , x− ≥ 0.

Another example are equality constraints g(x) = 0, again without positivity constraints in the variable x. We can write the equality constraint as g(x) ≤ 0 and −g(x) ≤ 0. Let y ± be the multipliers corresponding to ±g(x) ≤ 0, define y = y + − y − . Then (16) can be written as m X ∂gj ∂f = yj , g(x) = 0, ∂xi ∂x i j=1 that is, y is the Lagrange multiplier for the equality constraint g(x) = 0. Consider a linear programming problem where in (14) we set f (x) = cx,

g(x) = Ax − b.

Then the KKT conditions are then   c − y T A = −µ     Ax ≤ b, y T (Ax − b) = 0  x ≥ 0, µT x = 0     µ, y ≥ 0. In this case, the first line of the KKT conditions can be rewritten as c − y T A ≤ 0, that is, since y ≥ 0, y is admissible for the dual problem. Using the condition µT x = 0 we conclude that c · x = y T Ax.

40

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

Then the second line of the KKT condition yields y T Ax = y T b, which implies cx = y T b, which is the optimality criterion for the linear programming problem, and shows that a solution of the KKT condition is in fact a solution of (14). Furthermore, it also shows that y is a solution to the dual problem. Example 10. Let Q be an n × n real matrix. Consider the quadratic programming problem  1 T   maxx 2 x Qx (17) Ax ≤ b    x ≥ 0. The KKT conditions are   xT Q − y T A = −µ     Ax ≤ b, y T (Ax − b) = 0 (18)  x ≥ 0, µT x = 0     µ, y ≥ 0. J 5.2. Duality and sufficiency of KKT conditions. We can write problem (14) in the following minimax form: sup inf f (x) − y T g(x). x≥0 y≥0

We define the dual problem as (19)

inf sup f (x) − y T g(x).

y≥0 x≥0

Let h∗ (y) = sup f (x) − y T g(x), x≥0

and h∗ (x) = inf f (x) − y T g(x). y≥0

5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS

Then (14) is equivalent to sup h∗ (x), x≥0

and (19) is equivalent to the problem inf h∗ (y).

y≥0

From exercise 22, we have the duality inequality sup h∗ (x) = sup inf f (x) − y T g(x) x≥0 y≥0

x≥0

≤ inf sup f (x) − y T g(x) = inf h∗ (y). y≥0 x≥0

y≥0

Furthermore, if x¯ ≥ 0 and y¯ ≥ 0 satisfy h∗ (¯ x) = h∗ (¯ y) then x¯ and y¯ are, respectively, solutions to (14) and (19). If we choose f (x) = cx,

g(x) = Ax − b,

(14) is a linear programming problem. Then  cx if Ax − b ≤ 0 h∗ (x) = −∞ otherwise, and ∗

h (y) =

 b T y

if AT y − c ≥ 0

+∞

otherwise.

Consider the quadratic programming problem   max 1 xT Qx 2 (20)  Ax − b ≤ 0. Note that here the variable x does not have any sign constraint. In this case we define   1 xT Qx

1 h∗ (x) = inf xT Qx − y T (Ax − b) = 2 y≥0 2 −∞

if Ax − b ≤ 0 otherwise,

41

42

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

and

1 h∗ (y) = sup xT Qx − y T (Ax − b). x 2 If we assume that Q is non-singular and negative definite we have 1 h∗ (y) = − y T AQ−1 AT y + y T b. 2 It is easy to check directly that h∗ (x) ≤ h∗ (y). It turns out that the KKT conditions are in fact sufficient if f and g satisfy additional convexity conditions. Proposition 25. Suppose that −f and each component of g is convex. Let (¯ x, µ ¯, y¯) ∈ Rn × Rn × Rm be a solution of the KKT conditions (15). Then x¯ is a solution of (14). Proof. Let x ∈ X. By the concavity of f we have f (x) − f (¯ x) ≤ Df (¯ x)(x − x¯). By the KKT conditions (15), Df (¯ x)(x − x¯) = y¯T Dg(¯ x)(x − x¯) − µ ¯T (x − x¯). Since each component of g is convex, and y¯ ≥ 0, y¯T Dg(¯ x)(x − x¯) ≤ y¯T (g(x) − g(¯ x)) Since y¯T g(¯ x) = 0, y¯T g(x) ≥ 0, µ ¯T x ≥ 0, and µ ¯T x¯ = 0, we have f (x) − f (¯ x) ≤ 0, that is x¯ is solution.



As the next proposition shows, the KKT conditions imply strong duality. Proposition 26. Suppose that −f and each component of g is convex. Let (¯ x, µ ¯, y¯) ∈ Rn × Rn × Rm be a solution of the KKT conditions (15). Then h∗ (¯ x) = h∗ (¯ y ).

5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS

43

Proof. Observe that, by the previous theorem, any solution to Df (x) − y T Dg(x) + µT = 0, with µ ≥ 0, µT x = 0, is a maximizer of the function f (x) − y T g(x), under the constraint x ≥ 0. Therefore h∗ (¯ y ) = f (¯ x) − y¯T g(¯ x) = f (¯ x), since y¯T g(¯ x) = 0. Furthermore, h∗ (¯ x) = f (¯ x) + inf −y T g(¯ x) = f (¯ x), y≥0

because g(¯ x) ≤ 0. Thus h∗ (¯ x) = h∗ (¯ y ).  5.3. Constraint qualification and KKT conditions. Consider the constraints (21)

g(x) ≤ 0,

x ≥ 0.

Let X denote the admissible set for (21). For x ∈ X define the active coordinates indices as I(x) = {i : xi = 0}, and the active constraints indices as J(x) = {j : gj (x) = 0}. For x ∈ X define the tangent cone to the admissible set X at the point x ∈ X as the set T (x) of vectors v ∈ Rn which satisfy vi ≥ 0,

v · Dgj (x) ≤ 0,

for all i ∈ I(x) and all j ∈ J(x). We say that the constraints satisfy the constraint qualification condition if for any x ∈ X and any v ∈ T (x) ˙ there exists a C 1 curve x(t) with x(0) = x and x(0) = v with x(t) ∈ X for all t ≥ 0 sufficiently small. Proposition 27. Let x be a solution of (14), and assume that the constraint qualification condition holds. Then there exists µ ∈ Rn and y ∈ Rm such that (15) holds.

44

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

Proof. Fix v ∈ T (x) and let x be a curve as in the constraint qualification condition. Because x is a maximizer, d (22) 0 ≥ f (x(t)) = v · Df (x). dt t=0 From Farkas lemma (Lemma 17) we know that either there is v ∈ T (x) such that v·Df > 0 or else the vector −Df belongs to the positive cone generated by ei , i ∈ I and −Dgj (x), for j ∈ J. By (22) we know that the first alternative does not hold, hence there exists a vector µ ∈ Rn , with µi ≥ 0 for i ∈ I, and µi = 0 for i ∈ I c , and y ∈ Rm with yj ≥ 0 for j ∈ J and yj = 0 for j ∈ J c such that Df = y T Dg − µT . By the construction of y and µ, as well as the definition of I and J, it is clear that µT x = 0 as well as y T g = 0.  To give an interpretation of the Lagrange multipliers in the KKT conditions, consider the family of problems  max f (x) x (23) g θ (x) ≤ 0, where θ ∈ Rm and g θ (x) = g(x) − θ. We will assume that for all θ the constraint qualification condition holds. Furthermore, assume that there exists a unique solution xθ which is a differentiable function of θ. Define the value function V (θ) = f (xθ ). Let y θ ∈ Rm be the corresponding Lagrange multipliers, which we assume to be also differentiable. We claim that for any θ0 ∈ Rm (24)

∂V (θ0 ) = yjθ0 . ∂θj

5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS

45

To prove this identity, observe first that we have, using the KKT conditions, ∂V (θ) X ∂f (xθ ) ∂xθ X θ ∂gjθ (xθ ) ∂xθk = = . yj ∂θj ∂x ∂x ∂θ k ∂θj k j k kj P By differentiating the complementarity condition k ykθ gkθ (xθ ) = 0 with respect to θj we obtain " # X ∂y θ X ∂g θ (xθ ) ∂xθ i k θ θ k (25) 0= gk (x ) + ykθ − yjθ . ∂θ ∂x ∂θ j i j i k For θ = θ0 we either have gkθ (xθ0 ) = 0 or gkθ (xθ0 ) < 0, in which case ykθ vanishes in a neighborhood of θ0 . Consequently, in this last case we θ

have

∂yk0 ∂θj

= 0. Therefore ∂ykθ θ θ g (x ) = 0. ∂θj k

So, from (25), we conclude that yjθ0 =

X k

ykθ

∂gkθ0 (xθ0 ) ∂xθi 0 . ∂xi ∂θk

Thus we obtain (24). 5.4. Checking the constraint qualification conditions. Consider the following optimization problem    max x1   x (26) −(1 − x1 )3 + x2 ≤ 0    x ≥ 0. The Lagrangian is L(x, y, µ) = x1 − y T (x2 − (1 − x1 )3 ) + µ1 x1 + µ2 x2 and so ∂L(x, y, µ) = 1 − 3(1 − x1 )2 y + µ1 . ∂x1 In particular, when x1 = 1, the equation 1 + µ1 = 0

46

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

does not have a solution with µ1 ≥ 0. Hence the KKT conditions are not satisfied. Nevertheless the point (x1 , x2 ) = (1, 0) is a solution. This example illustrates the need for obtaining simple criteria to check whether the constraint qualification conditions hold. We will show that the following are sufficient conditions for the verification of the constraint qualifications. 1. The Mangasarian-Fromowitz condition: for any x ∈ X there is v such that ∇gi (x)v < 0; 2. The Cotte-Dragominescu condition: for any x ∈ X the active constraints are positively linearly independent: X y∇gi = 0, y ≥ 0 implies y = 0; 3. The Arrow-Hurwicz and Uzawa condition: for any x ∈ X the active constraints are linearly independent. It is obvious that 3. implies 2. We will show that 1. is equivalent to 2. To do so we need the following lemma: Proposition 28 (Gordon alternative). Let A be a real-valued m × n matrix. Then one and only one of the following holds: • There exists x ∈ Rn such that Ax < 0; • There exists y ∈ Rm , y ≥ 0, and y 6= 0, such that y T A = 0. Proof. (i) It is clear that the two conditions are disjoint. Otherwise, if Ax < 0 and y T A = 0 we would have 0 = y T Ax < 0 which is a contradiction. (ii) We consider the following optimization problem:   max y1 + · · · + ym    y (27) yT A = 0    y ≥ 0.

5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS

47

It is clear that if the second alternative holds then the value of this problem is +∞. Otherwise, y = 0 is a solution and the value is 0. In this case the dual problem:  min 0 x (28) (Ax) ≤ −1, i = 1, . . . , m i has a solution, i.e., there is a point x satisfying the constraints. Hence, the first alternative holds.  Proposition 29. The Cotte-Dragominescu condition is equivalent to the Mangasarian-Fromowitz condition. Proof. Set A = ∇g. The Mangasarian-Fromowitz condition corresponds to the first case in the Gordon alternative. Therefore, the P only solution of y∇gi = 0 and y ≥ 0 is y = 0. Thus the CotteDragominescu condition is satisfied. Conversely, if the only solution P to y∇gi = 0 and y ≥ 0 is y = 0 the second case of the Gordon alternative does not hold. Then the first alternative holds and so the Mangasarian-Fromowitz condition is satisfied.  Theorem 30. If the Mangasarian-Fromowitz condition holds then the constraint qualification condition is satisfied. Proof. Let x0 ∈ X. Take w such that ∇gi (x0 )w ≤ 0. We must construct a curve x() in such a way that x() ∈ X for  sufficiently ˙ small and such that x(0) = w. Let v be a vector as in the MangasarianFromowitz condition. Take M sufficiently large and define x() = x0 + w + M 2 v. Then using Taylor’s series we have 2 T 2 w D gi (x0 )w+O(3 ). 2 Thus, if M is large enough and  sufficiently small gi (x()) < 0. 

gi (x()) = gi (x0 )+∇gi (x0 )w+M 2 ∇gi (x0 )v +

Theorem 31. If either the Cotte-Dragominescu condition or the ArrowHurwicz and Uzawa condition hold then so does the constraint qualification condition.

48

1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

6. Bibliographical notes In what concerns linear programming problem, we have used the books [GSS08] or [Fra02]...

2

Calculus of variations in one independent variable

This chapter is dedicated to a classical subject in the calculus of variations: variational problems with one independent variable. These are extremely important because of its applications to classical mechanics and Riemannian geometry. Furthermore they serve as a model for optimal control problems and problems with multiple integrals. We start in Section 1, by deriving the Euler-Lagrange equation and give some elementary applications. Then, in section 2 we study additional necessary conditions for minimizers, and in section 3 we discuss several applications to Riemannian geometry and classical mechanics. An introduction to the Hamiltonian formalism is discussed in section 4. The next issue, section 5, is the study of sufficient conditions for a trajectory to be a minimizer: first we establish the existence of local minimizers, then we study the connections between smooth solutions of Hamilton-Jacobi equations and global minimizers, and finally we discuss the Jacobi equation, conjugate points and curvature. Symmetries are an important topic in calculus of variations. In section 6 we present Routh’s method for integration of Lagrangian systems and Noether’s theorem. Of course, not every solution to the Euler-Lagrange equation is a minimizer. Section 7 is a brief introduction to minimax methods and to the mountain pass theorem. We also consider several examples of nonexistence of minimizing orbits (Lavrentiev phenomenon) and relaxation methods (Young measures) in section 9.

49

50

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Invariant measures for Lagrangian and Hamiltonian systems are considered in section 8. The next part of this chapter is dedicated to the study of the geometry of Hamiltonian systems: symplectic and Poisson structures, Darboux theorem and Arnold-Liouville integrability, section 10 In the last section, section 11 we consider perturbation problems and describe the Linstead series perturbation procedure. We end the chapter with bibliographical notes.

1. Euler-Lagrange Equations In classical mechanics, the trajectories x : [0, T ] → Rn of a mechanical system are determined by a variational principle called the minimal action principle. This principle asserts that the trajectories are minimizers (or at least critical points) of an integral functional. In this section we study this problem and discuss several examples. Consider a mechanical system on Rn with kinetic energy K(x, v) and potential energy U (x, v). We define the Lagrangian, L(x, v) : Rn × Rn → R to be difference between the kinetic energy K and potential energy U of the system, that is, L = K−U . The variational formulation of classical mechanics asserts that trajectories of this mechanical system minimize (or are at least critical points) of the action functional Z T ˙ S[x] = L(x(t), x(t))dt, 0

under fixed boundary conditions. More precisely, a C 1 trajectory x : [0, T ] → Rn is a minimizer S under fixed boundary conditions if for any C 1 trajectory y : [0, T ] → Rn such that x(0) = y(0) and x(T ) = y(T ) we have S[x] ≤ S[y].

1. EULER-LAGRANGE EQUATIONS

51

In particular, for any C 1 function ϕ : [0, T ] → Rn with compact support in (0, T ), and any  ∈ R we have i() = S[x + ϕ] ≥ S[x] = i(0). Thus i() has a minimum at  = 0. So, if i is differentiable, i0 (0) = 0. A trajectory x is a critical point of S, if for any C 1 function ϕ : [0, T ] → Rn with compact support in (0, T ) we have d 0 i (0) = S[x + ϕ] = 0. d =0 The critical points of the action which are of class C 2 are solutions to an ordinary differential equation, the Euler-Lagrange equation, that we derive in what follows. Any minimizer of the action functional satisfies further necessary conditions which will be discussed in section 2. Theorem 32 (Euler-Lagrange equation). Let L(x, v) : Rn × Rn → R be a C 2 function. Suppose that x : [0, T ] → Rn is a C 2 critical point of the action S under fixed boundary conditions x(0) and x(T ). Then d ˙ − Dx L(x, x) ˙ = 0. Dv L(x, x) dt

(29)

Proof. Let x be as in the statement. Then for any ϕ : [0, T ] → Rn with compact support on (0, T ), the function i() = S[x + ϕ] has a minimum at  = 0. Thus i0 (0) = 0, that is, Z

T

˙ + Dv L(x, x) ˙ ϕ˙ = 0. Dx L(x, x)ϕ 0

Integrating by parts, we conclude that  Z T d ˙ − Dx L(x, x) ˙ ϕ = 0, Dv L(x, x) dt 0

52

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

for all ϕ : [0, T ] → Rn with compact support in (0, T ). This implies (29) and ends the proof of the theorem.  Example 11. In classical mechanics, the kinetic energy K of a particle with mass m with trajectory x(t) is: K=m

˙ 2 |x| . 2

Suppose that the potential energy U (x) depends only on the position x. Assume also that U is smooth. Then the Lagrangian for this mechanical system is then L = K − U. and the corresponding Euler-Lagrange equation is m¨ x = −U 0 (x), which is the Newton’s law.

J

Exercise 36. Let P ∈ Rn , and consider the Lagrangian L(x, v) : Rn × Rn → R defined by L(x, v) = g(x)|v|2 + P · v − U (x), where g and U are C 2 functions. Determine the Euler-Lagrange equation and show that it does not depend on P . Exercise 37. Suppose we form a surface of revolution by connecting a point (x0 , y0 ) with a point (x1 , y1 ) by a curve (x, y(x)), x ∈ [0, 1], and then revolving it around the y axis. The area of this surface is Z x1 p x 1 + y˙ 2 dx. x0

Compute the Euler-Lagrange equation and study its solutions. To understand the behavior of the Euler-Lagrange equation it is sometimes useful to change coordinates. The following proposition shows how this is achieved: Proposition 33. Let x : [0, T ] → Rn be a critical point of the action Z T ˙ L(x, x)dt. 0

1. EULER-LAGRANGE EQUATIONS

53

ˆ given by Let g : Rn → Rn be a C 2 diffeomorphism and L ˆ w) = L(g(y), Dg(y)w). L(y, Then y = g −1 ◦ x is a critical point of Z T ˆ y)dt. ˙ L(y, 0

Proof. This is a simple computation and is left as an exercise to the reader.  Before proceeding, we will discuss some applications of variational methods to classical mechanics. As mentioned before, the trajectories of a mechanical system with kinetic energy K and potential energy U are critical points of the action corresponding to the Lagrangian L = K − U . In the following examples we use this variational principle to study the motion of a particle in a central field, and the planar two body problem. Example 12 (Central field motion). Consider the Lagrangian of a particle in the plane subjected to a radial potential field. p x˙ 2 + y˙ 2 ˙ y) ˙ = − U ( x2 + y2 ). L(x, y, x, 2 Consider polar coordinates, (r, θ), that is (x, y) = (r cos θ, r sin θ) = g(r, θ), We can change coordinates (see proposition 33) and obtain the Lagragian in these new coordinates 2 ˙2 2 ˙ = r θ + r˙ − U (r). ˆ θ, r˙ , θ) L(r, 2 Then the Euler-Lagrange equations can be written as

d 2˙ r θ=0 dt

d r˙ = −U 0 (r) + rθ˙2 . dt The first equation implies that r2 θ˙ ≡ η is conserved. Therefore, rθ˙2 = η2 . Multiplying the second equation by r˙ we get r3   d r˙ 2 η2 + U (r) + 2 = 0. dt 2 2r

54

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Consequently r˙ 2 η2 + U (r) + 2 2 2r is a conserved quantity. Thus, we can solve for r˙ as a function of r (given the values of the conserved quantities Eη and η) and so obtain a first-order differential equation for the trajectories. J Eη =

Example 13 (Planar two-body problem). Consider now the problem of two point bodies in the plane, with trajectories (x1 , y1 ) and (x2 , y2 ). Suppose that the interaction potential energy U depends only on the p distance (x1 − x2 )2 + (y1 − y2 )2 between them. We will show how to reduce this problem to the one of a single body under a radial field. The Lagrangian of this system is L = m1

p x˙ 21 + y˙ 12 x˙ 2 + y˙ 22 + m2 2 − U ( (x1 − x2 )2 + (y1 − y2 )2 ). 2 2

Consider new coordinates (X, Y, x, y), where (X, Y ) is the center of mass m1 x1 + m2 x2 m1 y1 + m2 y2 X= , Y = , m1 + m2 m1 + m2 and (x, y) the relative position of the two bodies x = x1 − x2 ,

y = y1 − y2 .

In these new coordinates the Lagrangian, using proposition 33, is ˙ Y) ˙ +L ˆ=L ˆ 1 (X, ˆ 2 (x, y, x, ˙ y). ˙ L Therefore, the equations for the variables X and Y are decoupled from the ones for x, y. Elementary computations show that d2 d2 X = Y = 0. dt2 dt2 Thus X(t) = X0 + VX t and Y(t) = Y0 + VY t, for suitable constants X0 , Y0 , VX and VY . Since

p m1 m2 x˙ 2 + y˙ 2 − U ( x2 + y2 ), m1 + m2 2 the problem now is reduced to the previous example. L2 =

J

1. EULER-LAGRANGE EQUATIONS

55

Exercise 38 (Two body problem). Consider a system of two point bodies in R3 with masses m1 and m2 , whose relative location is given by the vector r ∈ R3 . Assume that the interaction depends only on the distance between the bodies. Show that by choosing appropriate coordinates, the motion can be reduced to the one of a single point m2 particle with mass M = mm11+m under a radial potential. Show, by 2 proving that r × r˙ is conserved, that the orbit of a particle under a radial field lies in a fixed plane for all times. Exercise 39. Let x : [0, T ] → Rn be a solution to the Euler-Lagrange equation associated to a C 2 Lagrangian L : Rn × Rn → R. Show that ˙ + x˙ · Dv L(x, x) ˙ E(t) = −L(x, x) is constant in time. For mechanical systems this is simply the conservation of energy. Occasionally, the identity dtd E(t) = 0 is also called the Beltrami identity. Exercise 40. Consider a system of n point bodies of mass mi , and positions ri ∈ R3 , 1 ≤ i ≤ n. Suppose the kinetic energy is T = P P mi 2 mm r| and the potential energy is U = − i,j6=i 2|rii−rjj | . Let I = i 2 |˙ P 2 i mi |ri | . Show that d2 I = 4T + 2U, dt2 which is strictly positive if the energy T + U is positive. What implications does this identity have for the stability of planetary systems? Exercise 41 (Jacobi metric). Let L(x, v) : Rn × Rn → R be a C 2 Lagrangian. Let x : [0, T ] → Rn be a solution to the corresponding Euler-Lagrange d Dv L − Dx L = 0, dt

(30) for the Lagrangian

L(x, v) = Let E(t) =

2 ˙ |x(t)| 2

+ V (x(t)).

1. Show that E˙ = 0.

|v|2 − V (x). 2

56

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

2. Let E0 = E(0). Show that x is a solution to the Euler-Lagrange equation (31)

d Dv LJ − Dx LJ = 0 dt p ˙ associated to LJ = E0 − V (x)|x|. 3. Show that any reparametrization in time of x is also a solution to (31) and observe that the functional Z Tp ˙ E0 − V (x)|x| 0

represents the lenght of the path between x(0) and x(T ) using p the Jacobi metric g = E0 − V (x). 4. Show that the solutions to the Euler-Lagrange (31) when reparametrized in time in such a way that the energy of the reparametrized trajectory is E0 satisfy (30). Exercise 42 (Braquistochrone problem). Let (x1 , y1 ) be a point in a (vertical) plane. Show that the curve y = u(x) that connects (0, 0) to (x1 , y1 ) in such a way that a particle with unit mass moving under the influence a unit gravity field reaches (x1 , y1 ) in the minimum amount of time minimizes Z x1 r 1 + u˙ 2 dx. −2u 0 Hint: use the fact that the sum of kinetic and potential energy is constant. Determine the Euler-Lagrange equation and study its solutions, using exercise 39. Exercise 43. Consider a second-order variational problem: Z T ˙ x ¨) (32) min L(x, x, x

0

where the minimum is taken over all trajectories x : [0, T ] → Rn ˙ ˙ ). Determine the Eulerwith fixed boundary data x(0), x(T ), x(0), x(T Lagrange equation corresponding to .

2. FURTHER NECESSARY CONDITIONS

57

2. Further necessary conditions A classical strategy in the study of variational problems consists in establishing necessary conditions for minimizers. If there exists a minimizer and if the necessary conditions have a unique solution, then this solution has to be the unique minimizer and thus the problem is solved. In addition to Euler-Lagrange equations, several other necessary conditions can be derived. In this section we discuss boundary conditions which arise, for instance when the end-points are not fixed, and second-order conditions.

2.1. Boundary conditions. In certain problems, the boundary conditions, such as end point values are not prescribed a-priori. In this case, it is possible to prove that the minimizers satisfy certain boundary conditions automatically. These are called natural boundary conditions. Example 14. Consider the problem of minimizing the integral Z T ˙ L(x, x)dt, (33) 0

over all C 2 curves x : [0, T ] → Rn . Note that the boundary values for the trajectory x at t = 0, T are not prescribed a-priori. Let x be a minimizer of (33) (with free endpoints). Then for all ϕ : [0, T ] → Rn , not necessarily compactly supported, Z

T

˙ + Dv L(x, x) ˙ ϕdt Dx L(x, x)ϕ ˙ = 0. 0

Integrating by parts and using the fact that x is a solution to the Euler-Lagrange equation, we conclude that ˙ ˙ )) = 0. Dv L(x(0), x(0)) = Dv L(x(T ), x(T J

58

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Exercise 44. Consider the problem of minimizing the integral Z T ˙ L(x, x)dt, 0

over all C 2 curves x : [0, T ] → Rn such that x(0) = x(T ). Deduce that ˙ ˙ )). Dv L(x(0), x(0)) = Dv L(x(T ), x(T Use the previous identity to show that any periodic (smooth) minimizer is in fact a periodic solutions to the Euler-Lagrange equations. Exercise 45. Consider the problem of minimizing Z T ˙ L(x, x)dt + ψ(x(T )), 0

with x(0) fixed and x(T ) free. Derive a boundary condition at t = T for the minimizers. Exercise 46 (Free boundary). Consider the problem of minimizing Z T ˙ L(x, x), 0

over all terminal times T and all C 2 curves x : [0, T ] → Rn . Show that x is a solution to the Euler-Lagrange equation and that ˙ )) = 0, L(x(T ), x(T ˙ ))x(T ˙ ) + Dv L(x(T ), x(T ˙ ))¨ Dx L(x(T ), x(T x(T ) ≥ 0, ˙ )) = 0. Dv L(x(T ), x(T Let q ∈ R and L : R2 → R given by (v − q)2 x2 + −1 2 2 If possible, determine T and x : [0, T ] → R that are (local) minimizers of Z T ˙ L(x, x)ds, L(x, v) =

0

with x(0) = 0.

2. FURTHER NECESSARY CONDITIONS

59

2.2. Second-order conditions. If f : R → R is a C 2 function which has a minimum at a point x0 then f 0 (x0 ) = 0 and f 00 (x0 ) ≥ 0. For the minimal action problem, the analog of the vanishing of the first derivative is the Euler-Lagrange equation. We will now consider the analog to the second derivative being non-negative. The next theorem concerns second-order conditions for minimizers: Theorem 34 (Jacobi’s test). Let L(x, v) : Rn × Rn → R be a C 2 Lagrangian. Let x : [0, T ] → Rn be a C 1 minimizer of the action under fixed boundary conditions. Then, for each η : [0, T ] → Rn , with compact support in (0, T ), we have Z T 1 T 2 1 2 2 ˙ + η T Dxv ˙ η˙ ≥ 0. ˙ η˙ + η˙ T Dvv (34) η Dxx L(x, x)η L(x, x) L(x, x) 2 0 2 Proof. If x is a minimizer, the function  7→ I[x + η] has a d2 minimum at  = 0. By computing d 2 I[x + η] at  = 0 we obtain (34).  A corollary of the previous theorem is Lagrange’s test that we state next: Corollary 35 (Lagrange’s test). Let L(x, v) : Rn × Rn → R be a C 2 Lagrangian. Suppose x : [0, T ] → Rn is a C 1 minimizer of the action under fixed boundary conditions. Then 2 ˙ ≥ 0. Dvv L(x, x)

Proof. Use Theorem 34 with η = ξ(t) sin t , for ξ : [0, T ] → Rn , with compact support in (0, T ), and let  → 0.  Exercise 47. Let L : R2n → R be a continuous Lagrangian and let x : [0, T ] → Rn be a continuous piecewise C 1 trajectory. Show that for each δ > 0 there exists a trajectory yδ : [0, T ] → Rn of class C 1 such that Z T Z T ˙ − L(x, x) L(yδ , y˙ δ ) < δ. 0

0

60

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

As a corollary, show that the value of the infimum of the action over piecewise C 1 trajectories is the same as the infimum over trajectories globally C 1 . Note, however, that a minimizer may not be C 1 . Exercise 48 (Weierstrass test). Let x : [0, T ] → Rn be a C 1 minimum of the action corresponding to a Lagrangian L. Let v, w ∈ Rn and 0 ≤ λ ≤ 1 be such that λv + (1 − λ)w = 0. Show that ˙ λL(x, x˙ + v) + (1 − λ)L(x, x˙ + w) ≥ L(x, x). Hint: To prove the inequality at a point t0 , choose η such that   if t0 ≤ t ≤ t + λ v  η(t) ˙ = w if t + λ < t ≤ t0 +     0 otherwise and consider I[x + η], as  → 0.

3. Applications to Riemannian geometry This section is dedicated to some applications of the calculus of variations to Riemannian geometry, namely the study of geodesics and curvature. We also present some applications to geometric mechanics, namely the study of the rigid body. In our examples we will use most of the time local coordinates and will not try to address global problems in geometry. In fact, by using suitable charts, the problems we address can usually be reduced to problems in Rn . To simplify the notation we will also use the Einstein convention for repeated indices, that is ai bi in fact is an abreviation of P i ai bi . Example 15. Let M be a Riemannian manifold with metric g, defined in local coordinates by the positive definite symmetric matrix gij (x). Let L : T M → R be given by 1 L(x, v) = gij (x)vi vj . 2

3. APPLICATIONS TO RIEMANNIAN GEOMETRY

61

Let x : [a, b] → M be a curve that minimizes Z b ˙ L(x, x)dt, a

over all curves with certain fixed boundary conditions. Then, we have d 1 (gij x˙ i ) − Dj gmk x˙ m x˙ k = 0, dt 2 that is, (35)

¨i + x

 1  ij g (Dk gmj + Dk gmj − Dj gmk ) x˙ m x˙ k = 0, 2

where g ij represents the inverse matrix of gij . We can write the previous equation in the more compact form ¨ i + Γikm x˙ m x˙ k = 0, x where 1 Γikm = g ij (Dk gmj + Dm gkj − Dj gmk ) 2 is the Christoffel symbol for the metric g (note that the change in the order of the indices in the second term does not change the sum in (35) but makes Γ symmetric in the indices m and k). J (36)

Theorem 36. Let gij be a smooth Riemannian metric in Rn . The critical points x of the functional Z T 1 (37) gij (x)x˙ i x˙ j dt 0 2 are also critical points of the functional Z Tq (38) gij (x)x˙ i x˙ j dt, 0

Additionally, we can reparametrize the critical points of (38) in such a way that they are also critical points of (37). Proof. The fact that the critical points of (37) are critical points of (38) is a simple computation. To prove the second part of the theorem it suffices to observe that the solutions of the Euler-Lagrange associated to L preserve the energy E = 21 gij (x)x˙ i x˙ j . Using this fact is

62

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

easy to find the correct parametrization of the critical points of (38).  The minimizers of (38) are called geodesics, although sometimes the name is also used for critical points. Example 16. Consider a parametrization f : A ⊂ Rm → Rn of a m-dimensional manifold. The induced metric in Rm is represented by the matrix g = (Df )T Df. The motivation is the following, given a curve θ(t) ∈ M consider the ˙ in T M . Let x = f (θ) and x˙ = Df θ. ˙ corresponding tangent vector θ(t) Then we define ˙ θi ˙ = hx, ˙ xi, ˙ hθ, which gives rise, precisely to the induced metric.

J

Exercise 49. Consider R2 \{0} with polar coordinates (r, θ). Show that the standard metric in R2 can be written in these coordinates as " # 1 0 g= . 0 r2 Let 2 2 ˙2 ˙ = r˙ + r θ , L(r, θ, r, ˙ θ) 2 the Lagrangian of a free particle in polar coordinates. Compute the Euler-Lagrange equation and determine the corresponding Christoffel symbol.

Exercise 50. Consider the sphere x2 + y 2 + z 2 = 1 and the associate spherical coordinates (θ, ϕ) x = cos θ sin ϕ y = sin θ sin ϕ z = cos ϕ,

3. APPLICATIONS TO RIEMANNIAN GEOMETRY

63

θ ∈ (0, 2π) and ϕ ∈ (0, π). Show that the induced metric is given by the matrix " # sin2 ϕ 0 g= . 0 1 Determine the Euler-Lagrange equation for L = 21 gij vi vj and the Christoffel symbol corresponding to the coordinates (θ, ϕ). Exercise 51. Consider the revolution surface in R3 parametrized by (r, θ): x = r cos θ y = r sin θ z = z(r). Show that the induced metric is " # 1 + (z 0 )2 0 g= . 0 r2 Show that the equation for the geodesics is 2 θ¨ + r˙ θ˙ = 0 r 0 00 r ˙2 + z z ¨r − θ r˙ 2 = 0 1 + (z 0 )2 1 + (z 0 )2 Determine the corresponding Christoffel symbols. Prove the Clairaut identity, that is, that r cos β is constant, where β is the angle between ∂ ∂ ∂ and r˙ ∂r + θ˙ ∂θ . ∂θ Exercise 52 (Spherical pendulum). Show that for a spherical pendulum with unit mass, the Lagrangian can be written as θ˙2 sin2 ϕ + θ˙2 L= − U (ϕ). 2 Exercise 53. Determine the Lagrangian of point particle constrained to the cone z 2 = x2 + y 2 . Exercise 54. Consider the Lagrangian for a particle of unit mass constrained to move in the cycloid parametrized by x = θ − sin θ

y = cos θ.

64

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Show that the y coordinate is 2π-periodic for any initial condition that yields a periodic orbit.

3.1. Parallel Transport. The Christoffel symbols Γikm can be used to study parallel transport in a Riemannian manifold. In this section we define and discuss the main properties of parallel transport. Let M be a manifold and Ξ(M ) the set of all C ∞ vector fields in M . As usual in differential geometry, we identify vector fields in M with the corresponding first-order linear differential operators. That is, if X = (X1 , . . . Xn ) is a vector field, we identify X with the first order differential operator X=

X

Xi

i

∂ . ∂xi

Then, the commutator of two vector fields X and Y is the vector field [X, Y ], which is defined through its action as a differential operator in smooth functions f : [X, Y ]f = X(Y (f )) − Y (X(f )).

A connection ∇ in M is a mapping ∇:Ξ×Ξ→Ξ satisfying the following properties 1. ∇f X+gY Z = f ∇X Z + g∇Y Z, 2. ∇X (Y + Z) = ∇X Y + ∇X Z, 3. ∇X (f Y ) = f ∇X Y + X(f )Y , for all X, Y, Z ∈ Ξ(M ) and all f, g ∈ C ∞ (M ). The vector ∇X Y represents the rate of variation of Y along a curve tangent to X.

3. APPLICATIONS TO RIEMANNIAN GEOMETRY

65

Exercise 55. Let M be a manifold and ∇ a connection in M . Define Γikm as ∂ ∂ ∇ ∂ = Γikm . ∂xk ∂x ∂xi m Show that   ∂Yi ∂ i , (39) ∇X Y = Γkm Xk Ym + Xj ∂xj ∂xi whereX = Xj ∂x∂ j e Y = Yj ∂x∂ j . In every point x, the formula (39) only depends on the value of the vector field X at x, this allow us to define the covariant derivative of a vector field Y along a curve x(t) trough DY = ∇x˙ Y. dt A vector field X is parallel along a curve x(t) if DX = 0. dt A connection is symmetric if ∇X Y − ∇Y X = [X, Y ]. In general, connections in a manifold do not have to be symmetric, and therefore ∇X Y − ∇Y X = T (X, Y ) + [X, Y ], where T is the torsion. Exercise 56. Determine an expression for the torsion in local coordinates. Exercise 57. Let ∇ be a symmetric connection. Show that Γkij = Γkji . A manifold can be endowed with different connections. For Riemannian manifolds, are of special interest the connections which are

66

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

compatible with the metric, that is, that for all vector fields X and Y satisfy (40)

D D d hX, Y i = h X, Y i + hX, Y i, dt dt dt

where the derivatives are taken along any arbitrary curve x(t). There exists a unique symmetric connection compatible with the metric, the Levi-Civita connection, whose Christoffel symbols are given by (36). Theorem 37. Let M be a Riemannian manifold with metric g. The the Levi-Civita connection, defined in local coordinates by the Christoffel symbols (36), is the unique connection which is symmetric and compatible with the metric g. Proof. Let ∇ be a connection which is symmetric and compatible with the metric g. Then one can use (40) to determine Dk gmj , Dm gkj and Dj gmk and it is a simple computation to show that its Christoffel symbols are give by (36). Exercise 58. Verify that the Christoffel symbols define a connection.

 Exercise 59. Use formula (36) to determine the Christoffel symbol corresponding to the polar coordinates in R2 - compare with the result of exercise 49. Exercise 60. Let X be a vector field and x a trajectory that satisfies dx = X(x). dt Show that in local coordinates ¨i x

∂ ∂Xi ∂ = Xk (x) , ∂xi ∂xk ∂xi

and, therefore,  ∂ DX ¨i = Γikm x˙ k x˙ m + x . dt ∂xi

3. APPLICATIONS TO RIEMANNIAN GEOMETRY

67

Show that the previous definition is independent of the choice of local coordinates, which allow us to define covariant acceleration as:  ∂ Dx˙ ¨i = Γikm x˙ k x˙ m + x , dt ∂xi for any C 2 trajectory. Example 17. Equation (15) can be then rewritten as Dx˙ = 0, dt which should be compared with the Newton law for a particle in the ¨ = 0. absence of forces x J Exercise 61. Let M be a Riemannian manifold in which is defined a potential V : M → R. The corresponding Lagrangian is 1 L(x, v) = gij vi vj − V (x). 2 Determine the Euler-Lagrange equation. Example 18. A force field in a manifold M is a mapping F : T M → T ∗M such that the image of Tx M is a subset of Tx∗ M . The generalized Newton law is Dx˙ = F, g dt in which the metric g is identified with the operator g : T M → T ∗ M defined by (gX)(Y ) = hX, Y i. J 3.2. Rigid Body - I. The rigid body is perhaps one of the best examples in which the geometric formalism of the classical mechanics is natural. Consider a rigid body F with a fixed point at the origin. The position of F at the time t can be described by a matrix M (t) ∈ SO(3) with M (0) = I (recall that SO(3) is the set of 3 × 3 matrices M that satisfy M T M = I and det M = 1). More precisely, consider a point of F which was at the position x in t = 0. Then at time t, the same

68

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

point is located at x(t) = M (t)x. If the body has mass density ρ(x), the kinetic energy is given by Z 1 T = ρ(x)|M˙ x|2 . 2 Since M is an isometry, we have |M˙ x|2 = |M −1 M˙ x|2 , that is, 1 T = 2

Z

ρ(x)|M −1 M˙ x|2 .

The mapping that to a vector K, tangent to SO(3) at the point M , associates Z 1 ρ(x)|M −1 Kx|2 (41) K 7→ hK, KiM = 2 is a metric in SO(3) which is invariant by left translation. More precisely, let G ∈ SO(3) be fixed. The left translation by G is the mapping LG : SO(3) → SO(3) defined by LG M = GM,

M ∈ SO(3).

We have that L∗G : TM SO(3) → TGM SO(3) is simply L∗G K = GK

K ∈ TM SO(3).

A metric is called left-invariant if hL∗G K, L∗G KiLG M = hK, KiM . Exercise 62. Verify that the metric (41) is left invariant. Exercise 63. Let M (t) be a C 1 curve in SO(3). Show that: 1. The matrix M −1 M˙ is anti-symmetric. 2. There exists a vector ωM −1 M˙ = (ω1 , ω2 , ω3 ) such that   0 −ω3 ω2   M −1 M˙ =  ω3 0 −ω1  −ω2 ω1 0 and M −1 M˙ x = ωM −1 M˙ × x, in which × is the usual inner product in R3 . The vector ωM T M˙ is called the angular velocity.

3. APPLICATIONS TO RIEMANNIAN GEOMETRY

69

3. Verify that the kinetic energy is a quadratic form in ωM , that is, there exists a symmetric matrix I (the inertia tensor) such that 1 T T = ωM ˙. TM ˙ IωM T M 2 4. Let M1 (t) and M2 (t) be C 1 curves in SO(3) and M (t) = M1 (t)M2 (t). Determine ωM T M˙ as a function of ωM1T M˙ 1 and ωM2T M˙ 2 . 5. Let y(t) be the trajectory of a body in a referential under constant rotation. Identify the forces that act over the body: referential acceleration, centrifugal force and Coriolis force.

Let M (t) be a curve in SO(3). The trajectory of a point x is x(t) = M (t)x. Let G ∈ SO(3) and consider the change of coordinates Gy = x. Then y(t) = GT M (t)Gy. And, therefore, in the new coordinates the trajectory is N (t) = GT M (t)G The kinetic energy can be written as 1 T 1 T ˜ T = ωM ˙ = ωN T N˙ IωN T N˙ . TM ˙ IωM T M 2 2 We would like to relate I˜ and ωN T N˙ with I and ωM T M˙ . We have ωN T N˙ ∧ x = GT M T M˙ Gx = GT (ωM T M˙ ∧ (Gx))   = GT G (GT ωM T M˙ ) ∧ (GT (Gx)) = (GT ωM T M˙ ) ∧ x, that is, ωN T N˙ = GT ωM T M˙ and, consequently, I˜ = GT IG. Since I is symmetric, we can always choose a rotation matrix G such that in the new referential   I1 0 0   I =  0 I2 0  . 0 0 I3 The constants Ii are called the principal moments of inertia.

70

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

3.3. Poincar´ e equations. Let M be a differentiable manifold. Consider a set of n linearly independent vector fields Zi in M . The speed x˙ of a trajectory x : [0, T ] → M can be written as a linear combination of these vector fields: ˙ x(t) = wi (t)Zi (x), the functions wi (t) are called quasi-velocities [AKN97]. Sometimes it is useful to write the Lagrangian as a function of the quasi-velocities, that is, we write L(x, w). We will deduce the Euler-Lagrange equations in this situation. Let us consider a family of trajectories xτ (t) depending differentiably on a parameter τ . We have ∂xτ = wi Zi ∂t

∂xτ = ξi Zi . ∂τ

Write Zi = Zik ∂x∂ k . Then by differentiating and dropping the subscript in xτ , ∂ 2 xk ∂wi k ∂Z k ∂xm = Zi + wi i ∂τ ∂t ∂τ ∂xm ∂τ ∂wi k ∂Z k = Zi + wi ξj Zjm i , ∂τ ∂xm and ∂Zjk ∂xm ∂ξi k ∂ 2 xk = Zi + ξj ∂t∂τ ∂t ∂xm ∂t ∂Zjk ∂ξi k m = Z + wi ξj Zi . ∂t i ∂xm As

∂2x ∂t∂τ

=

∂2x ∂τ ∂t

and " [Zj , Zi ] =

∂Z k Zjm i ∂xm



Zim

# ∂Zjk ∂ , ∂xm ∂xk

we have 0=

∂ξi ∂wi Zi − Zi + wi ξj [Zi , Zj ], ∂t ∂τ

that is, ∂wi ∂ξi = + cikj wk ξj , ∂τ ∂t

3. APPLICATIONS TO RIEMANNIAN GEOMETRY

71

where cikj Zi = [Zk , Zj ]. Then   Z T Z T d ∂L ∂ξi i L(xτ , wτ ) = Zi (L)ξi + + ckj wk ξj dτ 0 ∂wi ∂t 0 and, therefore, d ∂L ∂L k = Zi (L) + c wj . dt ∂wi ∂wk ji 3.4. Rigid body - II. Let M and N be differentiable manifolds. Recall that for a diffeomorphism f : M → N and any vector field X in T M we define the vector field f∗ X to be the vector field in T N which satisfies (f∗ X)(h) = (X(h ◦ f )) ◦ f −1 , for all h ∈ C ∞ (N ). In the case of the rigid body, or more generally, in the case of a Lagrangian defined in a Lie group and left invariant, we can choose the vector fields Zi of the form Zi (g) = g∗ Zi (e), that is, left invariant vector fields. Lemma 38. Let Xi and Yi , i = 1, 2, be vector fields in a manifold M and f : M → M a diffeomorfism. Assume that Yi = f ∗ Xi . Then [Y1 , Y2 ] = f∗ [X1 , X2 ]. Proof. Let p ∈ M . We have Yi (g)|f (p) = f∗ Xi (g)|f (p) = Xi (g ◦ f )|p , that is, Yi (g) ◦ f = Xi (g ◦ f ). Therefore Y1 (Y2 (g))|f (p) = X1 (Y2 (g) ◦ f ) = X1 (X2 (g ◦ f )).

72

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Consequently [Y1 , Y2 ] = f∗ [X1 , X2 ].  Thus, from the previous result, ckij is constant since [g∗ Zi , g∗ Zj ] = g∗ [Zi , Zj ]. Therefore, if L is left invariant L ≡ L(w). Consequently d ∂L ∂L k = c wj . dt ∂wi ∂wk ji In the case of a rigid body, using, if necessary, a orthogonal transformation to diagonalize the inertia tensor into diag(I1 , I2 , I3 ). We can choose vectors Z1 , Z2 , Z3 , such that in the identity they have the following form:       0 0 0 0 0 −1 0 1 0       Z1 =  −1 0 0  , Z2 =  0 0 0  e Z3 =  0 0 1  0 −1 0 1 0 0 0 0 0 and that are left invariant. Thus, the Lagrangian is L(w) =

I1 w12 + I2 w22 + I3 w32 . 2

Exercise 64. Verify that the commutator of the vector fields Zi corresponds to the commutator of the corresponding matrices, and that [Z1 , Z2 ] = Z3

[Z2 , Z3 ] = Z1

[Z3 , Z1 ] = Z2 .

Using the previous exercise, the Euler-Lagrange equation are then (42)

I1 w˙ 1 = (I2 − I3 )w2 w3 I2 w˙ 2 = (I3 − I1 )w3 w1 I3 w˙ 3 = (I1 − I2 )w1 w2 ,

that is, (43)

I w˙ + w × (Iw) = 0.

3. APPLICATIONS TO RIEMANNIAN GEOMETRY

73

The angular momentum vector is given by N = Iω. With this notation, (43) can be written as (44)

N˙ = N ∧ ω.

From the previous equation we conclude that d d kN k2 = 0 N · ω = 0. dt dt The first identity represents the conservation of the total angular momentum and the second the conservation of the energy. Let     0 ω1 −ω2 0 N1 −N2     A =  −ω1 L =  −N1 0 ω3  . 0 N3  , ω2 −ω3 0 N2 −N3 0 The equation (44) can be written as (45)

L˙ = [A, L].

A pair (A, L) satisfying (45) is called a Lax pair. Equations with the previous structure have a rich structure and are interesting in the study of diverse equations such as Kortwreg-of-Vries equations. Proposition 39. Let L be a solution of (45). Then the eigenvalues of L are constant. Furthermore, if v0 is an eigenvalue of L at t = 0 and v solves v˙ = Av, with v(0) = v0 then v(t) is an eigenvalue for all t. Proof. Let v(0) = v0 be an eigenvector of L at t = 0 with corresponding eigenvalue λ ∈ C. Define v(t) through the differential equation v˙ = Av. Then

d ˙ + Lv˙ = ALv − LAv + LAv = ALv, Lv = Lv dt that is, w = Lv satisfies w˙ = Aw,

w(0) = λv(0),

74

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

which implies w(t) = λv(t).



The Euler equation (42) admits as stationary solutions rotations around each of the principal inertia axis. For instance, ω1 6= 0, ω2 = ω3 = 0. In the case in which I1 = I2 = I3 the only solutions are stationary rotations ω˙ = 0. Proposition 40. The stationary solution ω1 6= 0, ω2 = ω3 = 0 is stable if I1 < I2 , I3 or I2 , I3 < I1 and unstable if I2 < I1 < I3 or I3 < I1 < I2 . Proof. In the unstable cases, it suffices to look at the linearized matrix (42):   0 0 0   I3 −I1 0 ω1  ,  0 I2 2 ω1 0 0 I1I−I 3 and check that it has two eigenvalues with opposite sign. The stable case requires some additional work which is left to the reader.  If I1 = I2 = Ic the body is called a symmetrical top. In this in case, ω˙ 3 = 0 and Ic − I3 ω1 ω3 Ic I3 − Ic ω˙ 1 = ω3 ω2 . Ic ω˙ 2 =

From this last equation one concludes that ω ¨1 = −

(I3 − Ic )2 2 ω3 ω1 , Ic2

that is, ω ¨ 1 = −kω1 , with k > 0, which implies that ω1 is a periodic function, and, in a similar way, the same holds for ω2 .

4. HAMILTONIAN DYNAMICS

75

Finally, in the general case, the conservation of energy and total angular momentum implies that the trajectory ω(t) satisfies: I12 ω12 + I22 ω22 + I32 ω32 = C1 I1 ω12 + I2 ω22 + I3 ω32 = C2 , that is, the trajectories belongs to the intersection of two ellipsoids. Exercise 65. Consider a rigid body with mass density ρ. Show that the inertia tensor admits the matricial representation:  R  R R (y 2 + z 2 )dρ − xydρ − xzdρ R R 2 R   I =  − xydρ (x + z 2 )dρ − yzdρ  . R R R 2 − xzdρ − yzdρ (x + y 2 )dρ Exercise 66. Show that S(θ, ϕ, ψ) given by     cos ϕ − sin ϕ 0 1 0 0 cos ψ − sin ψ 0      sin ϕ cos ϕ 0   0 cos θ − sin θ   sin ψ cos ψ 0  , 0 0 1 0 sin θ cos θ 0 0 1 for (θ, ϕ, ψ) ∈ (0, π) × (0, 2π) × (0, 2π) defines a local parametrization of SO(3). The coordinates (θ, ϕ, ψ) are called the Euler angles. Exercise 67. Consider a rigid body with a fixed point and such that I1 = I2 . Show that the kinetic energy written in the local coordinates (θ, ϕ, ψ) is I1 ˙2 I3 (θ + ϕ˙ 2 sin θ2 ) + (ψ˙ + φ˙ cos θ)2 . 2 2

4. Hamiltonian dynamics In this section we introduce the Hamiltonian formalism of Classical Mechanics. We start by discussing the main properties of the Legendre transform. Then we derive Hamilton’s equations. Afterwards we discuss briefly the classical theory of canonical transformations. The section ends with a discussion of additional variational principles.

76

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

4.1. Legendre transform. Before we proceed, we need to discuss the Legendre transform of convex functions. The Legendre transform is used to define the Hamiltonian of a mechanical system and it plays an essential role in many problems in calculus of variations. Additionally, it illustrates many of the tools associated with convexity. Let L(v) : Rn → R be a convex function, satisfying the following superlinear growth condition: L(v) = +∞. |v|→∞ |v| lim

The Legendre transform L∗ of L is L∗ (p) = sup [−v · p − L(v)] . v∈Rn

This is the usual definition of Legendre transform in optimal control, see [FS93] or [BCD97]. However, it differs by a sign from the Legendre transform traditionally used in classical mechanics: L] (p) = sup [v · p − L(v)] , v∈Rn

as it is defined, for instance, in [AKN97] or [Eva98b]. They are related by the elementary identity L∗ (p) = L] (−p). We will frequently denote L∗ (p) by H(p). The Legendre transform of H is denoted by H ∗ and is H ∗ (v) = sup [−p · v − H(p)] . p∈Rn

In classical mechanics, the Lagrangian L can depend also on a position coordinate x ∈ Rn , L(x, v), but for purposes of the Legendre transform x is taken as a fixed parameter. In this case we write also H(p, x) = L∗ (p, x). Proposition 41. Let L(x, v) be a C 2 function, which for each x fixed is uniformly convex and superlinear in v. Let H = L∗ . Then 1. H(p, x) is convex in p;

4. HAMILTONIAN DYNAMICS

77

2. H ∗ = L; 3. for each x H(p, x) = ∞; |p|→∞ |p| 4. let v ∗ be defined by p = −Dv L(x, v ∗ ), then lim

H(p, x) = −v ∗ · p − L(x, v ∗ ); 5. in a similar way, let p∗ be given by v = −Dp H(p∗ , x), then L(x, v) = −v · p∗ − H(p∗ , x); 6. if p = −Dv L(x, v) or v = −Dp H(p, x), then Dx L(x, v) = −Dx H(p, x). Proof. The first statement follows from the fact that the supremum of convex functions is a convex function. To prove the second point, observe that H ∗ (x, w) = sup [−w · p − H(p, x)] p

= sup inf [(v − w) · p + L(x, v)] . p

v

For v = w we conclude that H ∗ (x, w) ≤ L(x, w). The opposite inequality is obtained by observing, since L is convex in v, that for each w ∈ Rn there exists s ∈ Rn such that L(x, v) ≥ L(x, w) + s · (v − w). Therefore, H ∗ (x, w) ≥ sup inf [(p + s) · (v − w) + L(x, w)] ≥ L(x, w), p

v

by letting p = −s. To prove the third point observe that p L(x, −λ |p| ) H(p, x) ≥λ− , |p| |p|

78

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

p by choosing v = −λ |p| . Thus, we conclude

lim inf |p|→∞

H(p, x) ≥ λ. |p|

Since λ is arbitrary, we have lim inf |p|→∞

H(p, x) = ∞. |p|

To establish the fourth point, note that for fixed p the function v 7→ v · p + L(x, v) is differentiable and strictly convex. Consequently, its minimum, which exists by coercivity and is unique by the strict convexity, is achieved for −p − Dv L(x, v) = 0. Note also that v as function of p is a differentiable function by the inverse function theorem. The proof of the fifth point is similar. Finally, to prove the last item, observe that for p(x, v) = −Dv L(x, v), we have H(p(x, v), x) = −v · p(x, v) − L(x, v). Differentiating this last equation with respect to x and using v = −Dp H(p(x, v), x), we obtain Dx H = −Dx L.  Exercise 68. Compute the Legendre transform of the following functions:

4. HAMILTONIAN DYNAMICS

1.

79

1 L(x, v) = aij (x)vi vj + hi (x)vi − U (x), 2 where aij is a positive definite matrix and h(x) an arbitrary vector field.

2. L(x, v) =

q aij (x)vi vj ,

where aij is a positive definite matrix. 3.

1 L(x, v) = |v|λ − U (x), 2 with λ > 1.

Exercise 69. By allowing the Lagrangian and its Legendre transform to assume the values ±∞ comute the Legendre transforms of 1. for ω ∈ Rn  0 if v = ω L(v) = +∞ otherwise. 2. for ω ∈ Rn set L(v) = ω · v. 3. for R > 0  0 if |v| ≤ R L(v) = +∞ otherwise. 4.2. Hamiltonian formalism. To motivate the Hamiltonian formalism, we consider the following alternative problem. Rather than looking for curves x : [0, T ] → Rn , which minimize the action Z T ˙ L(x, x)dt 0

we can consider extended curves (x, v) : [0, T ] → R2n which minimize the action Z T (46) L(x, v)dt 0

80

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

and that satisfy the additional constraint x˙ = v. Obviously, this problem is equivalent to the original one, however it motivates the introduction of a Lagrange multiplier p in order to enforce the constraint. Therefore, we will look for critical points of Z T ˙ L(x, v) + p · (v − x)dt. (47) 0

Proposition 42. Let L : Rn × Rn → R be a smooth Lagrangian. Let (x, v) : [0, T ] → R2n be a critical point of (46) under fixed boundary conditions and under the constraint x˙ = v (the choice of p is irrelevant since the corresponding term always vanishes). Let p = −Dv L(x, v). Then the curve (x, v, p) is a critical point of (47) under fixed boundary conditions. Additionally, any critical point (x, v, p) of (47) satisfies   x˙ = v  p = −Dv L(x, v)

  

p˙ = Dx L(x, v).

In particular, x is a critical point of (46). Furthermore, the EulerLagrange equation can be rewritten as p˙ = Dx H(p, x)

x˙ = −Dp H(p, x).

Proof. Let φ, ψ and η be C 2 ([0, T ], Rn ) with compact support in (0, T ). Then, at  = 0 Z d T ˙ ˙ + (p + η) · (ψ − φ) L(x + φ, v + ψ) + (p + η) · (v − x) d 0 Z T ˙ + η · (v − x) ˙ + Dv Lψ + p · (ψ − φ) ˙ = Dx L(x, x)φ 0

Z

T

˙ + p] ˙ φ = 0. [Dx L(x, x)

= 0

If p = −Dv L(x, v), then v maximizes −p · v − L(x, v).

4. HAMILTONIAN DYNAMICS

81

Let H(p, x) = max [−p · v − L(x, v)] . v

By proposition 41 we have Dx H(p, x) = −Dx L(x, v) whenever p = −Dv L(x, v). Additionally, we also have v = −Dp H(p, x). Therefore, the Euler-Lagrange equation can be rewritten as p˙ = Dx H(p, x)

x˙ = −Dp H(p, x).

These are the Hamilton equations.



Exercise 70. Suppose H(p, x) : Rn × Rn → R is a C 1 function. Show that the energy, which coincides with H, is conserved by the Hamiltonian flow since d H(p, x) = 0. dt 4.3. Canonical transformations. Before discussing canonical transformations we need to review some basic facts about differential forms in Rn . Firstly, recall that given a C 1 function f : Rn → R its differential, denoted by df , is a mapping df : Rn × Rn → R that to any point x ∈ Rn and each direction v ∈ Rn it associates the derivative of f in the direction v: d df (x)(v) = f (x + vt) . dt t=0 Note that for each x ∈ Rn this mapping is linear in v. For instance, for each coordinate i ∈ {1, . . . , n} we can consider the projection function in this coordinate: x 7→ xi , whose differential is dxi . A (first order) differential form is any mapping Λ : Rn × Rn → R,

82

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

which is linear on the second coordinate. For simplicity, we assume also that this map is continuous in the first coordinate. Clearly we can write X Λ= fi (x)dxi , i

where fi (x) = Λ(x)(ei ). An important example of a differential form is the differential df of a C 1 function f . In fact, by linearity, we have df =

X ∂f dxi . ∂xi i

The integral of a differential Λ form along a path γ : [0, T ] → Rn is simply Z T XZ T Λ(γ(t))(γ(t))dt ˙ = fi (γ(t))γ˙ i (t)dt. 0

i

0

Exercise 71 (Poincar´e-Cartan invariant). Fix t ∈ R and consider a closed curve γ = (x(s, t), p(s, t)) : [0, T ] → R2n . Suppose that for each fixed s ∈ [0, 1] d p(s, t) = Dx H(p(s, t), x(s, t)). dt

d x(s, t) = −Dp H(p(s, t), x(s, t)) dt Show that I

Z pdx ≡

1

p· 0

∂x ds ∂s

is independent of t. Exercise 72. Show that the critical points of Z T pdx + H(p, x)dt 0

under fixed boundary conditions satisfy the Hamilton equations.

4. HAMILTONIAN DYNAMICS

83

Let (x, p) be a solution of the Hamilton equation. By exercise 72, (x, p) is a critical point of Z pdx + Hdt. Let S(x, p) : R2n → R be a C 1 function. Then (x, p) is also a critical point of Z pdx + Hdt − dS, because the last integral differs from the previous only be the addition of the differential of a function S. Consider now a change of coordinates R P (x, p), X(x, p). In general the functional pdx + Hdt − dS when rewritten in terms of the new coordinates (P, X) does not have the R ¯ form P dX + H(P, X)dt, and, therefore, the Hamilton equations in these new coordinates may not have the standard form. A change of coordinates (x, p) 7→ (X(x, p), P (x, p)) is called canonical if there exist functions S and H(P, X) such that (48)

¯ pdx + Hdt − dS = P dX + Hdt.

Consider now a solution (x, p) : [0, T ] → R2n of Hamilton’s equations. Suppose the coordinate change (x, p) 7→ (X(x, p), P (x, p)) is canonical. Then the trajectory written in the new coordinates (X, P) is a critical point of the functional Z T ¯ PdX + Hdt. 0

Therefore (X, P) satisfies Hamilton’s equations in the new coordinates, which are (49)

¯ P˙ = DX H(P, X)

¯ X˙ = −DP H(P, X).

Thus, in order to have (48), we must have (because the change of coordinates does not depend on t) H(p, x) = H(P (p, x), X(p, x)). From this we conclude that pdx − P dX = dS.

84

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Suppose now we can write the function S in terms of x and X, that is S ≡ S(x, X). Then (50)

p = Dx S

P = −DX S.

Consider now the inverse procedure. Given S(x, X), suppose that (50) defines a change of coordinates (for this to happen locally it is sufficient, 2 by the implicit function theorem that det DxX S 6= 0). Then, in these new coordinates we have (49). Since S determines (at least formally) the change of coordinates, we call it a generating function. J Example 19. Consider the generating function S(x, X) = xX. Then the corresponding canonical transformation is p = X, P = −x. Thus, (x, p) 7→ (X, P ) = (p, −x) and H(P, X) = H(−P, X). J Suppose now that S, written as a function of (x, P ), is: S(x, P ) = −P X + S1 (x, P ). Then (48) can be written as: pdx + P dX + XdP − Dx S1 dx − DP S1 dP = P dX, that is, p = Dx S1

X = DP S1 .

Example 20. Let S1 (x, P ) = xP . Then p = P and X = x, therefore S1 generates the identity transformation. J Exercise 73. Assume now that S can be written as a function of X and p and that we have S(X, p) = px + S2 (X, p). Determine the corresponding canonical transformation in terms of S2 . Exercise 74. Suppose that S can be written as a function of p and P with the following form: S(p, P ) = px − P X + S3 (p, P ). Determine corresponding canonical transformation in terms of S3 .

4. HAMILTONIAN DYNAMICS

85

Example 21. Consider the Hamiltonian H ≡ H(px , py , x − y). Choosing S1 = P1 (x + y) + P2 (x − y) we obtain p x = P1 + P 2

p y = P1 − P 2 ,

X 1 = x 1 + x2

X2 = x − y,

and  H(P1 , P2 , X1 , X2 ) ≡ H(P1 , P2 , X2 ) = H

 P 1 + P2 P1 − P2 , , X2 , 2 2

which does not depend on X1 and, therefore, P1 , the total linear momentum is conserved. J Example 22. Let S1 (x, P ) be a C 2 solution of the Hamilton-Jacobi equation H(Dx S1 (x, P ), x) = H(P ). Suppose that X = DP S1 (x, P )

p = Dx S1 (x, P )

defines implicitly a change of coordinates (x, p) 7→ (X, P ). Assume 2 that det DxP S1 6= 0. Then, if (x(t), p(t)) satisfy x˙ = −Dp H(p, x)

p˙ = Dx H(p, x),

in the new coordinates we have ˙ = −DP H(P) X

˙ = 0. P J

Example 23. Consider a Hamiltonian H(p, x) with one degree of freedom, that is x, p ∈ R. We would like to construct a canonical change of coordinates such that the new Hamiltonian depends only on P . We will first construct the corresponding generating function. For that, suppose that there exists such a generating function S1 (x, P ). Then dS1 = XdP + pdx.

86

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Fix a value P . We will try to choose S1 so that the new Hamiltonian H depends only on P , that is H(p(P, X), x(P, X)) = H(P ). Along each curve γ = (x, p) : [0, T ] → R2 such that P is constant, we have dS1 = pdx. Therefore, Z

T

˙ p(t) · x(t)dt.

S1 (x(T ), P ) − S1 (x(0), P ) = 0

In principle, from the equation H(p, x) = H(P ). So we can solve for p as a function of x and of the value H(P ). In this case, the generating function is automatically determined as a function of H and of x. In the following example we consider a concrete application of this technique. J Example 24. Consider the Hamiltonian system with one degree of freedom: p2 H(p, x) = + V (x), 2 with V (x) 2π-periodic. For each value of H(P ) we have (assuming for definiteness p > 0) Z xq S1 (x, P ) = 2(H(P ) − V (y))dy. 0

Therefore, Z X= 0

x

∂ ∂H

q 2(H(P ) − V (y))DP H(P )dy.

In principle, the function H(P ) can be more or less arbitrary. To impose uniqueness it is convenient to require periodicity in the change of variables X(0, P ) = X(2π, P ), which implies 

∂ DP H(P ) = ∂H

Z



−1 q   2 H(P ) − V (y) dy .

0

J

4. HAMILTONIAN DYNAMICS

87

Exercise 75. Show that the polar coordinates change of variables (x, p) = (r cos θ, r sin θ) is not canonical. Determine a function g(r) such that (x, p) = (g(r) cos θ, g(r) sin θ) is a canonical transformation (for r > 0). 4.4. Other variational principles. In the case of Hamiltonian systems, as the next exercise shows, there exists an additional variational principle: Exercise 76. Show that the critical points (x, p) of the functional Z T px˙ − xp˙ + H(p, x) 2 0 are solutions to the Hamilton equation Unfortunately the functional of the previous exercise is not coercive in W 1,2 and may not have any minimizer. The Clarke duality principle (following exercise) is another variational principle for convex Hamiltonians which is coercive. Exercise 77 (Clarke duality). Let H(p, x) : R2n → R be a C ∞ function, strictly convex and coercive, both in x and p. Let H ∗ (v˙ x , v˙ p ) : R2n → R be the total Legendre transform H ∗ (wx , wp ) = sup −wx · x − wp · p − H(p, x). x,p

Let (vx , vp ) be a critical point of Z T 1 [vx · v˙ p − vp · v˙ x ] + H ∗ (v˙ x , v˙ p ). 0 2 Show that x = −Dv˙ x H ∗ (v˙ x , v˙ p )

p = −Dv˙ p H ∗ (v˙ x , v˙ p )

is a solution of Hamilton’s equations. Exercise 78. Apply the previous exercise to the Hamiltonian H(p, x) =

p 2 + x2 . 2

88

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Example 25 (Maupertuis principle). Consider a system with Lagrangian L and energy given by ˙ = Dv L(x, x) ˙ x˙ − L(x, x). ˙ E(x, x) Since the energy is conserved by the solutions of the Euler-Lagrange equation, the critical points of the action are also critical points of the functional Z T Z T ˙ x, ˙ L+E = Dv L(x, x) 0

0

under the constraint that energy is conserved. Obviously, in general it is hard to construct energy-preserving variations. We are going to illustrate, in an example, how to avoid this problem. Let L be the Lagrangian 1 L(x, v) = gij vi vj − U (x). 2 Then, 1 E = gij vi vj + U (x) 2 and Dv Lv = gij vi vj . Thus we can write Dv Lv = 2 (E − U (x)) . Therefore the functional can be rewritten as Z Tp p (51) M (x, E) = 2 (E − U (x)) gij x˙ i x˙ j dt 0

The last term represents the arc length along the curve that connects x(0) to x(T ). This integral is independent of the parametrization and therefore we can look at its critical points (without any constraint) which obviously depend on the parameter E. Then, once determined, in principle we can choose a parametrization of the curve that preserves the energy. The next exercise shows that such critical points are solutions to the Euler-Lagrange equation:

5. SUFFICIENT CONDITIONS

89

Exercise 79. Let x be a critical point of M (x, E0 ) parametrized in such a way that ˙ = E0 . E(x, x) Show that x is a solution of the Euler-Lagrange equation.

J

5. Sufficient conditions This section addresses a very classical topic in the calculus of variations, namely the study of conditions that ensure that a solution to the Euler-Lagrange equation is indeed a minimizer.

5.1. Existence of minimizers. In general, it is not possible to guarantee that a solution to the Euler-Lagrange is a minimizer of the action. However, for short time, the next theorem settles this issue. Theorem 43 (Existence of minimizers). Let L(x, v) be strictly convex in v satisfying 2 |Dxx L| ≤ C,

2 |Dxv L| ≤ C.

Let x : [0, T ] → Rn be a solution to the Euler-Lagrange equation. Then, for T sufficiently small, x is a minimizer of the action over all C 1 functions y with the same endpoints: y(0) = x(0), and y(T ) = x(T ).

Proof. Observe that if f is a C 2 function then Z 1Z s 0 f (1) = f (0) + f (0) + f 00 (r)drds. 0

0

Applying this identity to ˙ f (r) = L((1 − r)x + ry, (1 − r)x˙ + ry),

90

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

we obtain T

Z

˙ L(y, y)dt 0 T

Z

˙ + Dx L(x, x)(y ˙ ˙ y˙ − x) ˙ [L(x, x) − x) + Dv L(x, x)(

= 0 Z 1

s

Z



+ 0

2 ˙ (y − x)T Dxx L((1 − r)x + ry, (1 − r)x˙ + ry)(y − x)

0

2 ˙ y˙ − x) ˙ + 2(y − x)T Dxv L((1 − r)x + ry, (1 − r)x˙ + ry)(   2 ˙ y˙ − x) ˙ drds dt. ˙ T Dvv L((1 − r)x + ry, (1 − r)x˙ + ry)( +(y˙ − x)

Since x satisfies the Euler-Lagrange equation and, by strict convexity, 2 Dvv L ≥ γ, we have Z

T

T

Z ˙ L(y, y)dt ≥

˙ [L(x, x)

0

0 1

Z

s

Z

2 ˙ (y − x)T Dxx L((1 − r)x + ry, (1 − r)x˙ + ry)(y − x)  2 ˙ y˙ − x) ˙ drds +2(y − x)T Dxv L((1 − r)x + ry, (1 − r)x˙ + ry)(  ˙ 2 dt. +γ|y˙ − x|

+

0

0

The one-dimensional Poincar´e inequality implies T

Z

T2 |y − x| dt ≤ 2 2

0

Z

T

˙ 2 dt, |y˙ − x|

0

that is, Z 0

T

1

Z

Z

s

(y − x)T ·

0

·

0 2 Dxx L((1

≥ −CT 2

Z 0

˙ − r)x + ry, (1 − r)x˙ + ry)(y − x)drdsdt

T

˙ 2. |y˙ − x|

5. SUFFICIENT CONDITIONS

Thus, for any , Z T Z 1Z 0

0

91

s

(y − x)T ·

0

2 ˙ y˙ − x)drdsdt ˙ · Dvx L((1 − r)x + ry, (1 − r)x˙ + ry)( Z T Z C T 2 ˙ − |y˙ − x| ≥ − |y − x|2  0 0  Z T CT 2 ˙ 2. ≥− + |y˙ − x|  0

Thus, choosing T sufficiently small and taking,  = T we obtain Z T Z T Z T ˙ ˙ +θ ˙ 2, L(y, y)dt ≥ L(x, x) |y˙ − x| 0

0

0

for some θ > 0.



Exercise 80. Prove the one-dimensional Poincar´e inequality Z T Z T2 T ˙ 2 2 φ ≤ |φ| 2 0 0 for all C 1 function φ satisfying φ(0) = φ(T ) = 0. Exercise 81. Suppose that the Lagrangian L instead of satisfying 2 |Dxx L| ≤ C,

2 |Dxv L| ≤ C,

as in theorem 43, satisfies 2 |Dxx L| ≤ C(1 + |v|2 ),

2 |Dxv L| ≤ C(1 + |v|).

Assume further that the curves y are constrained to have bounded derivatives in L2 . Can you adapt the proof and the statement of theorem 43 to include this case? 5.2. Hamilton-Jacobi equations. Theorem 44. Let V (x, t) be a C 2 solution of the Hamilton-Jacobi equation (52)

Vt = H(Dx V, x),

for 0 ≤ t ≤ T . Let x be a solution to the equation x˙ = −Dp H(Dx V (x), x).

92

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Then x is a solution to the Euler-Lagrange equation d Dv L − Dx L = 0 dt which minimizes the action Z T ˙ L(x, x)dt, (53) 0

under fixed boundary conditions. Proof. Obviously, it suffices to show that the trajectory x minimizes the action that automatically it will be a solution to the EulerLagrange equation. Observe that the problem of minimizing (53) with fixed endpoints is equivalent to minimize Z T ˙ + xD ˙ x V (x) + Vt dt, L(x, x) 0

with the same endpoint constraint. In the trajectory x we have Z T ˙ + xD ˙ x V (x) + Vt dt = 0. L(x, x) 0

But for any other trajectory y we have ˙ + yD ˙ x V (y) ≥ L(y, y) ˙ − Dp H(Dx V (y), y)Dx V (y) L(y, y) = −H(Dx V (y), y), and, therefore, Z

T

˙ + yD ˙ x V (y) + Vt (y)dt ≥ 0. L(y, y) 0

 To solve the Hamilton-Jacobi equation we can use the method of characteristics: let (p, x) be a solution of Hamilton’s equation: (54)

p˙ = Dx H(p, x)

x˙ = −Dp H(p, x),

with initial data (p(0), x(0)) = (Dx V (x), x). Then Z t ˙ V (x(0), 0) − V (x(t), t) = L(x, x)ds. 0

5. SUFFICIENT CONDITIONS

93

Therefore, in order for the method of characteristics to yield a solution to the Hamilton-Jacobi equation in a neighborhood of the trajectory, we must have that the mapping x 7→ x(t; x) is invertible. As it was seen previously, the equation (54) is equivalent to the Euler-Lagrange equation: d ˙ − Dx L = 0. Dv L(x, x) dt The derivative of this equation with respect to a parameter is the Jacobi equation i d h 2 ˙ 2 2 2 (55) Dvv LY + Dxv LY − Dxv LY˙ − Dxx LY = 0. dt If Y (0) = I then there exists T > 0 such that det Y (t) 6= 0 for all 0 ≤ t < T , and therefore the method of the characteristics yields a local solution of the Euler-Lagrange equation.

5.3. Existence and regularity of minimizers. In this section we assume that the Lagrangian L(x, v) is C ∞ , strictly convex in v, satisfies (56)

− C + θ|v|2 ≤ L(x, v) ≤ C(1 + |v|2 ),

for θ > 0, and that, for each fixed compact K and x ∈ K we have (57)

|Dx L(x, v)| ≤ CK (1 + |v|2 ),

(58)

|Dv L(x, v)| ≤ CK (1 + |v|).

Theorem 45. Suppose L(x, v) : Rn × Rn → R is smooth and satisfies the previous (56), (57) and (58). Then, for any T > 0 and any x0 , x1 ∈ Rn , there exists a minimizer of x ∈ W 1,2 [0, T ] of Z T ˙ (59) L(x, x)ds 0

satisfying x(0) = x0 , x(T ) = x1 .

94

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Proof. Let xn be a minimizing sequence. Then, using (56) we conclude that kx˙ n kL2 is uniformly bounded. By Poincar´e inequality, we conclude that sup kxn kW 1,2 < ∞. n

By Morrey’s theorem, the sequence xn is equicontinuous and bounded (since xn (0) is fixed), thus there exists, by Ascoli-Arzela theorem, a subsequence which converges uniformly. We can extract a further subsequence that converges weakly in W 1,2 to a function x. We would like to prove that x is a minimum. To do that it is enough to prove that the functional is weakly lower semicontinuous, that is, that Z T Z T ˙ L(x, x), L(xn , x˙ n ) ≥ (60) lim inf n→∞

0

0

1,2

whenever xn * x in W . By contradiction suppose that there is a sequence xn * x such that Z T Z T ˙ (61) lim inf L(xn , x˙ n ) < L(x, x), n→∞

0

0

By convexity, Z T (62) L(xn , x˙ n ) 0

Z

T

˙ + Dv L(x, x)( ˙ x˙ n − x). ˙ L(xn , x˙ n ) − L(x, x˙ n ) + L(x, x)

≥ 0

Because x˙ n * x˙ we have Z T ˙ x˙ n − x) ˙ → 0, Dv L(x, x)( 0

˙ ∈ L2 . From the uniform convergence of xn to x we since Dv L(x, x) conclude that Z T L(xn , x˙ n ) − L(x, x˙ n ) → 0, 0

since |L(xn , x˙ n ) − L(x, x˙ n )| ≤ CK |xn − x|(1 + |x˙ n |2 ). Thus by taking the lim inf in (62) we obtain a contradiction to (61), and therefore (60) holds. 

5. SUFFICIENT CONDITIONS

95

Theorem 46. Let x be a minimizer of (59). Then x is a weak solution to the Euler-Lagrange equation, that is, for all ϕ ∈ Cc∞ (0, T ), Z T ˙ + Dv L(x, x) ˙ ϕ˙ = 0. Dx L(x, x)ϕ (63) 0

Proof. To obtain this result, it is enough to prove that at  = 0, Z T Z d T d L(x + ϕ, x˙ + ϕ) ˙ = L(x + ϕ, x˙ + ϕ) ˙ , d 0 0 d =0 =0 that is, justify the exchange of the derivative with the integral. By Morrey’s theorem, since x ∈ W 1,2 (0, T ), we have kxkL∞ ≤ C. So x ∈ K for a suitable compact set K. Let || < 1. Observe that ˜ ⊃ K such that x + ϕ ∈ K ˜ for all t. For there exists a compact K almost every t ∈ [0, T ], the function  7→ L(x + ϕ, x˙ + ϕ) ˙ is a C 1 function of . Furthermore ˙ 2 + |ϕ| |L(x + ϕ, x˙ + ϕ)| ˙ ≤ CK˜ (1 + |x˙ + ϕ| ˙ 2 ) ≤ CK˜ (1 + |x| ˙ 2 ), and, d ≤ C ˜ (1 + |x| L(x + ϕ, x˙ + ϕ) ˙ 2 + |ϕ| ˙ 2 )(|ϕ| + |ϕ|). ˙ ˙ K d This estimate allows us to exchange the derivative with the integral.  Exercise 82. Show that the identity (63) also holds for ϕ ∈ W01,2 . Theorem 47. Suppose L(x, v) : Rn × Rn → R is smooth, satisfies (56) and it is strictly convex. Then the weak solutions to the Euler-Lagrange equation are C 2 and, therefore, classical solutions. Proof. Let x ∈ W 1,2 (0, T ) be a weak solution to the Euler-Lagrange equation. Define Z T ˙ p(t) = p0 + Dx L(x, x)ds, t

96

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

with p0 ∈ Rn to be chosen later. For each ϕ ∈ Cc∞ (0, T ) taking values in Rn we have Z T T d (p · ϕ)dt = p · ϕ 0 = 0. 0 dt Thus, Z T ˙ + pϕdt −Dx L(x, x)ϕ ˙ = 0. 0

Using the Euler-Lagrange equation in the weak form we conclude that Z T ˙ ϕdt (p + Dv L(x, x)) ˙ = 0, 0

which implies that p + Dv L is constant, that is, ˙ p = −Dv L(x, x), choosing p0 conveniently. Since p is continuous, by the previous identity, x˙ = −Dp H(p, x). Therefore, x˙ is continuous. Moreover, if H(p, x) is the Hamiltonian associated to L, we have p˙ = Dx H(p, x), which shows that p is C 1 . But, since x˙ = −Dp H(p, x), we have that x˙ is C 1 and, therefore, x is C 2 .



5.4. Conjugate points. In this section we study the second variation of the action and certain issues concerning the existence of minimizing trajectories. If the Lagrangian corresponds to the kinetic energy in a Riemannian Manifold we also study the connections with curvature. 5.4.1. Second variation and conjugate points. The next exercise establishes the connection between Jacobi equation (55) and the second variation: Exercise 83. Let x : [0, T ] → Rn . Consider the functional Z T 1 1 2 Dvi vj LY˙ i Y˙ j + Dx2i vj LYi Y˙ j + Dx2i xj LYi Yj . Y 7→ 2 0 2

5. SUFFICIENT CONDITIONS

97

Show that the Euler-Lagrange equation is Jacobi equation (55). Show that if Y is a solution of the Jacobi equation with Y (0) = Y (T ) = 0 then Z T 1 2 1 (64) Dvi vj LY˙ i Y˙ j + Dx2i vj LYi Y˙ j + Dx2i xj LYi Yj = 0. 2 0 2 Let x is a solution of the Euler-Lagrange equation corresponding to the Lagrangian L. A point x(T ) is conjugate to x(0) if there exists a non vanishing solution of (55) satisfying Y (0) = Y (T ) = 0. The dimension of the space of solutions Y to the Jacobi equation which satisfy Y (0) = 0 is n. Similarly, the space of solutions Y to the Jacobi equation which satisfy Y (0) = 0 is also n. Since the space of solutions to the Jacobi equation is 2n, in general the intersection of these two spaces is 0-dimensional, i.e. it only contains the trivial solution. Exercise 84. Let x be a solution to the Euler-Lagrange equation. Show d that Y¯ (t) = dλ x(λt) λ=0 is a solution to the Jacobi equation satisfying Y¯ (0) = 0. Suppose L = 12 gij vi vj for some Riemannian metric g. Show that Y¯ (t) 6= 0, for all t 6= 0. Conclude that the space of solutions Y to the Jacobi equation which satisfy Y (0) = Y (T ) = 0 is at most n − 1. Theorem 48. Let L(x, v) be a C ∞ Lagrangian, strictly convex and coercive. Let x a solution of the Euler-Lagrange equation corresponding to the Lagrangian L. Let T be such that x(T ) is conjugate to x(0). then the trajectory x is not a local minimum of the action Z T1 ˙ L(x, x) 0

para T1 > T . Proof. Let Y be a non-trivial solution of the Jacobi equation com Y (0) = Y (T ) = 0. for each  > 0 consider the trajectory  x + Y if 0 ≤ t ≤ T x = x otherwise.

98

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

for each δ > 0, computing the Taylor expansion up to second order and taking into account (64) we obtain Z T +δ Z T +δ ˙ + O(3 ). L(x , x˙  ) ≤ L(x, x) 0

0

However, if the sign of the term O(3 ) is negative, we obtain a contradiction, if it is positive, by replacing Y by −Y , we are in the previous situation. Therefore the only non-trivial case occurs when the third order term vanishes and we have: Z T +δ Z T +δ ˙ + O(4 ). L(x , x˙  ) ≤ L(x, x) 0

0

Let ϕ be defined in the following way ϕ(t) = Y (t) if 0 ≤ t ≤ T − δ, ϕ(t) = 0 for t > T + δ and is linear in t for T − δ ≤ t ≤ T + δ, interpolating between the values of ϕ(T − δ) = Y (T − δ) and 0 = ϕ(T + δ). We would like to show that Z T +δ Z T +δ ˙ L(x + ϕ, x˙ + ϕ) ˙ < L(x, x), 0

0

if  and δ were chosen conveniently. For that we will proceed to prove a series of estimates. To simplify notation, and for reasons that will be clear later on, we assume that δ = 3/2 and we will use the relation a ∼ b to denote a = b+O(4 ), and similarly for  and ≺ for inequalities. We have Z T +δ

Z

T

L(x , x˙  ) ∼ T −δ

L(x (T ), x˙  (T ))+ T −δ

Dx L(x (T ), x˙  (T ))(x (t) − x (T ))+ Dv L(x (T ), x˙  (T ))(x˙  (t) − x˙  (T ))+ Z

T +δ

˙ ))+ L(x(T ), x(T T

˙ ))(x(t) − x(T ))+ Dx L(x(T ), x(T ˙ ))(x(t) ˙ − x(T ˙ )). Dv L(x(T ), x(T We observe that since |t − T | ≤ δ ˙ )(t − T ) + O(δ 2 ). x(t) − x(T ) = x(T

5. SUFFICIENT CONDITIONS

99

Furthermore ˙ )) + O() Dx L(x (T ), x˙  (T )) = Dx L(x(T ), x(T ˙ )) + O(). Dv L(x (T ), x˙  (T )) = Dv L(x(T ), x(T consequently, Z T +δ ˙ ) L(x , x˙  ) ∼δL(x(T ), x˙  (T )) + δL(x(T ), x(T T −δ   ˙ ) x˙  (T ) + x(T + 2δγ2 |Y˙ (T )|2 ∼ 2δL x(T ), 2 Z T +δ L(x + ϕ, x˙ + ϕ) + Cδ2 . T −δ

and so, Z T −δ

T +δ

Z

T +δ

L(x + ϕ, x˙ + ϕ) ˙ ≺

L(x , x˙  ) + 0

Z

T −δ

˙ L(x, x)ds − Cδ2 ,

0

which, for δ = 3/2 and  sufficiently small, implies that x does not minimize the action between 0 and T + δ.  5.4.2. Curvature. The curvature tensor R is defined by R(X, Y )Z = ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ] Z. Exercise 85. Show that l R(X, Y )Z = Rijk Xi Yz Zk

where l Rijk =

∂ , ∂ xl

∂Γjk ∂Γik l m l − + Γm jk Γim − Γik Γjm . ∂xi ∂xj

Exercise 86 (Bianchi’s identity). Show that for all vector fields X, Y, Z R(X, Y )Z + R(Y, Z)X + R(Z, X)Y = 0. Theorem 49. Let L be the Lagrangian be the kinetic energy defined by a Riemannian metric. Consider a geodesic x with tangent vector ˙ Then Jacobi’s equation can be written as X = x. (65)

D2 Y = R(X, Y )X. dt2

100

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Proof. Consider a one-parameter family of geodesics φ(t, δ), that is for each δ the mapping t 7→ ψ(t, δ) is a geodesic. Let Y =

∂φk ∂ ∂δ ∂xk

X=

∂φk ∂ . ∂t ∂xk

and We have [X, Y ] = 0, and so

R(X, Y )X = ∇X ∇Y X − ∇Y ∇X X − ∇[X,Y ] X = ∇X ∇Y X − ∇ Y ∇X X = ∇X ∇Y X, since ∇X X = DX = 0. Once more, using [X, Y ] = 0 and the fact that dt the connection is symmetric, we have ∇Y X = ∇X Y which then yields (65).  Lemma 50. For all vector fields X, Y, Z, we have hR(X, Y )Z, Zi = 0. Proof. We have h∇Y ∇X Z, Zi = Y h∇X Z, Zi − h∇X Z, ∇Y Zi, and h∇X ∇Y Z, Zi = Xh∇Y Z, Zi − h∇Y Z, ∇X Zi. Therefore h∇X ∇Y Z, Zi − h∇Y ∇X Z, Zi = Xh∇Y Z, Zi − Y h∇X Z, Zi = XY hZ, Zi − Y XhZ, Zi − XhZ, ∇Y Zi + Y hZ, ∇X Zi, that is, 1 h∇X ∇Y Z, Zi − h∇Y ∇X Z, Zi = [X, Y ]hZ, Zi. 2 Since h∇[X,Y ] Z, Zi = [X, Y ]hZ, Zi − hZ, ∇[X,Y ] Zi we have

1 h∇[X,Y ] Z, Zi = [X, Y ]hZ, Zi, 2 which implies the desired identity.



5. SUFFICIENT CONDITIONS

101

Proposition 51. Let Y be a solution of (65) along a geodesic x whose tangent vector is X = x˙ and satisfies D X = 0. dt Suppose that Y (0) = 0 and that hX,

DY i=0 dt

at t = 0. Then hX, DY i = 0 for all t. dt Proof. We have D D D D2 d hX, Y i = h X, Y i + hX, 2 Y i dt dt dt dt dt = hX, R(X, Y )Xi = 0, taking into account that 50.

D X dt

= 0, and using in the last identity lemma 

Suppose we are looking for solutions to the Jacobi equation satisfying Y (0) = 0 along a geodesic x. Consider the solution Y¯ constructed in exercise 84. Observe that Y¯˙ is tangent to the geodesic x. We can write the solution Y = aY¯ + Y⊥ where a ∈ R and Y⊥ is a solution to the Jacobi equation such that Y⊥ (0) = 0 and Y˙ ⊥ (0) is orthogonal to ˙ x(0). By the previous proposition, Y˙ ⊥ (t) is orthogonal at all times to ˙ x(t). Additionally, by exercise 84, Y¯ (t) 6= 0 for all t 6= 0. Therefore, if Y (T ) = 0 then a = 0. Consequently, to look for conjugate points it suffices to consider initial conditions orthogonal to the geodesic. A manifold has constant sectional curvature k0 if for all vector fields X, Y, W, Z we have hR(X, Y )W, Zi = k0 [hX, W ihY, Zi − hY, W ihX, Zi] . Exercise 87. Show that the sphere x2 + y 2 + z 2 has constant sectional curvature.

102

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Exercise 88. Let ei be an orthonormal basis for Tp M . Show that if M has constant sectional curvature then Rijkl = hR(ei , ej )ek , el i = k0 (δik δjl − δil δjk ). Example 26. Let M be a manifold with constant sectional curvature. ˙ = 1 and let Y be a Jacobi field Let x be a geodesic in M with |x| ˙ Then, Jacobi’s equation orthogonal to x. D2 Y ˙ Y )x˙ = R(x, dt2 can be written as D2 Y = k0 Y, dt2 since for each vector field X we have ˙ Y )x, ˙ Xi = k0 (hx, ˙ xihY, ˙ ˙ XihY, xi) ˙ = k0 hY, Xi. hR(x, Xi − hx, Thus, depending on the sign of k0 , we obtain the following solutions  √  sin t k0 e(t) if k0 > 0   Y (t) =

  

te(t) if k0 > 0 √ sinh t −k0 e(t) if k0 < 0,

where e(t) is a parallel vector field. As a conclusion, if the sectional curvature is negative, the geodesics cannot have conjugate points. J 5.4.3. Computation of conjugate points. In this section we explicitly compute conjugate points. Example 27 (Sphere). Consider a sphere of radius 1 in spherical coordinates (θ, ϕ) as in exercise (50). The Euler-Lagrange equations are   d (sin2 ϕθ) ˙ =0 dt  d ϕ˙ + sin ϕ cos ϕθ˙2 = 0. dt

And the corresponding Jacobi equation is   h i  d sin2 ϕp˙θ  + d 2 sin ϕ cos ϕθp ˙ ϕ =0 dt dt  d p˙ϕ + cos2 ϕpϕ θ˙2 − sin2 ϕpϕ θ˙2 + 2 sin ϕ cos ϕθ˙p˙θ = 0. dt

5. SUFFICIENT CONDITIONS

103

Consider a geodesic ϕ = π2 , θ˙ = 1 (the equator). In this case the Jacobi equation is  p¨ = 0 θ

p¨ϕ + pϕ = 0, which has as a particular solution pϕ = sin t

pθ = 0,

which shows that θ = π is conjugated to θ = 0.

J

Example 28 (Lobatchewski plane). Consider the following metric in the upper semiplane (y > 0) given by # " 1 0 2 . g= y 0 y12 The geodesics minimize Z

x˙ 2 + y˙ 2 , 2y2

and, consequently, are solutions to the Euler-Lagrange equation     d x˙ d y˙ x˙ 2 + y˙ 2 = 0. = 0, + dt y2 dt y2 y3 Consider vertical geodesics, that is with x˙ = 0. Then   d y˙ y˙ 2 = 0, + dt y2 y3 which admits y = ae−t as a solution. The Jacobi equation is     ˙ y d 2xp d p˙ x − =0 dt y2 dt y3   ˙ y d p˙ y yp x˙ p˙ x + y˙ p˙ y x˙ 2 + y˙ 2 − 2 + 2 − 3 py = 0. dt y2 y3 y3 y4

104

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Observe that to determine the conjugate points we only need to consider solutions which are orthogonal to the geodesic. So we for vertical geodesics we can set px = 0, and x˙ = 0. Thus   ˙ y d p˙ y y˙ p˙ y yp y˙ 2 + 2 − 2 − 3 py = 0 dt y2 y3 y3 y4 Set p =

py . y2

Then y˙ 2 y˙ p˙ + 2 p − 3 2 p = 0 y y

Since y˙ = −y we have p˙ − p = 0. We leave as a (simple) exercise to check that therefore there are no conjugate points. J 5.4.4. Cut locus. Theorem 52. Let x be a solution of the Euler-Lagrange equation and let T > 0 be the infimum of all t for which x is not a minimizing trajectory. Then either x(0) and x(T ) are conjugated or there exists y such that y(0) = x(0) and y(T ) = x(T ) such that Z T Z T ˙ = ˙ L(x, x) L(y, y). 0

0

Proof. Since for t > T the trajectory x is not minimizing, there exist ti > T and solutions to the Euler-Lagrange equation yi such that ti → T , yi (0) = x(0), yi (ti ) = x(ti ). By the proof of theorem 45, which guarantees the existence of minimizing trajectories, we can ˙ assume that y˙ i (0) is uniformly bounded. Then y˙ i (0) → y(0), through ˙ ˙ some subsequence. If y(0) 6= x(0), it is easy to check that we have Z T Z T ˙ = ˙ L(x, x) L(y, y), 0

0

otherwise the trajectory x would not be minimizing for t < T . In the second case, consider the flow φ(x, v, t) = (φx , φv ) given by the Euler-Lagrange equations with initial conditions (x, v) at 0, that is, ˙ ˙ φ(x(0), x(0), t) = (x(t), x(t). If x(0) is not conjugated to x(T ) the

6. SYMMETRIES AND NOETHER THEOREM

105

˙ matrix Dv φx (x(0), x(0), T ) is non singular, therefore for v in a neigh˙ borhood of x(0) and t sufficiently close to T the mapping v 7→ φ(x(0), v, t) ˙ is a diffeomorfism. But y˙ i (0) → x(0) and yi (ti ) = xi (ti ) which is a contradiction.  6. Symmetries and Noether theorem Noether’s theorem concerns variational problems which admit symmetries. By this theorem, associated to each symmetry there is a quantity that is conserved by the solutions of the Euler-Lagrange equation. In classical mechanics, for instance, translation symmetry yields conservation of linear momentum, to rotation symmetry corresponds conservation of angular momentum and time-invariance implies energy conservation. 6.1. Routh’s method. We start the discussion of symmetries by considering a classical technique to simplify the Euler-Lagrange equa˙ y), ˙ that is, indepentions. Consider a Lagrangian of the form L(x, x, dent of the coordinate y. Note that this corresponds to translation invariance in the coordinate y. The Euler-Lagrange equation shows that ˙ y) ˙ py = −Dy˙ L(x, x, is constant. We will explore this fact to simplify the Euler-Lagrange ˙ w) is strictly convex equations. We assume further that w 7→ L(x, x, and superlinear. Then we define the partial Legendre transform with ˙ Routh’s function, as respect to y, ˙ py ) = sup −py · w − L(x, x, ˙ w). R(x, x, w

˙ py ). By convexity, the supremum is achieved at a unique point w(x, x, We have that py = −Dw L

y˙ = −Dpy R.

106

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Note that, by the Euler-Lagrange equation p˙ y = 0 and,   d ∂R ∂R d ∂L ∂L d ∂L ∂w ∂w − =− + − + py dt ∂ x˙ ∂x dt ∂ x˙ ∂x dt ∂w ∂ x˙ ∂ x˙ ∂L ∂w ∂w + + py ∂w ∂x ∂x d ∂L ∂L = − = 0. dt ∂ x˙ ∂x Therefore, since py is constant, we can solve these equations in the following way: for each fixed py consider the equation d ∂R ∂R − = 0. dt ∂ x˙ ∂x Once this equation is solved, determine y˙ through ˙ py ). y˙ = −Dpy R(x, x, Exercise 89. Apply Routh’s method to the Lagrangian L=

x˙ 2 y˙ 2 + − U (x). 2 2

Exercise 90. Apply Routh’s method to the symmetric to in an external field which has as Lagrangian L=

I1 ˙2 I3 (θ + ϕ˙ 2 sin2 θ) + (ψ˙ + ϕ˙ cos θ)2 − U (ϕ, θ). 2 2

Exercise 91. Apply Routh’s method to the spherical pendulum whose Lagrangian is: θ˙2 sin2 ϕ + ϕ˙ 2 L= − U (ϕ). 2 6.2. Noether theorem. As a motivation for the definition of invariance of a Lagrangian with respect to a transformation group, observe that if φ : Rn → Rn is a diffeomorphism and γ : [0, T ] → Rn is an arbitrary curve, then φ(γ) is another curve in Rn whose velocity is Dx φ(γ)γ. ˙ Suppose for each τ ∈ R, φτ : Rn → Rn is a diffeomorphism.

6. SYMMETRIES AND NOETHER THEOREM

107

We say that a Lagrangian L(x, v) is invariant under a transformation group φτ (x) if for each τ ∈ R L(x, v) = L(φτ (x), Dx φτ (x)v). We will assume additionally φτ is differentiable in τ . Theorem 53. Let L be a Lagrangian invariant under a smooth transformation group φτ (x). Let x be a solution of the Euler-Lagrange equation. then d ˙ )) Dv L(x(T ), x(T φτ (x(T )) dτ τ =0 is independent of T . Proof. Let x be a solution of the Euler-Lagrange equation and xτ (t) = φτ (x(t)). Then ˙ x˙ τ = Dx φτ (x(t))x(t). Consequently, Z

T

L(xτ , x˙ τ )

(66) 0

is constant in τ . Differentiating (66) with respect to τ we obtain Z T dxτ dx˙ τ Dx L(xτ , x˙ τ ) + Dv L(xτ , x˙ τ ) = 0. dτ dτ 0 Integrating by parts, using the Euler-Lagrange equation, and taking τ = 0 we obtain d φτ (x(0)) Dv L(xτ (0), x˙ τ (0)) dτ τ =0 d = Dv L(xτ (T ), x˙ τ (T )) φτ (x(T )) . dτ τ =0  Exercise 92. Let ω ∈ Rn and L(x, v) be a Lagrangian satisfying, for all τ , L(x+ωτ, v) = L(x, v). Show that Dv L·ω is a constant of motion.

108

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE v 2 +v 2

2

2

Exercise 93. Let L(x, y, vx , vy ) = x 2 y − x +y . Show that L is in2 variant by rotations and, using Noether’s theorem, that the angular momentum xvy − yvx is a constant of motion. Theorem 54. Suppose L is a Lagrangian which does not depend on t. Then the energy is conserved. Proof. Observe that Z T +h ˙ − h))dt L(x(t − h), x(t h

is independent on h. Differentiate with respect to h, integrate by parts using the Euler-Lagrange equation.  Example 29. Consider the Lagrangian L=

x˙ 2 + y˙ 2 , 2y2

corresponding to the geodesic flow in the Lobatchewski plane. Identifying the upper semi-plane with {z ∈ C : =(z) > 0} and the points (x, y) with z = x + iy, the mapping z 7→

az + b cz + d

defines an action of the group SL(2, R), the group of matrices with unit determinant, in the Lobatchewski plane, which leaves the Lagrangian invariant. Use matrices of the form " # " # " # 1 τ eτ 0 1 0 A1 (τ ) = , A2 (τ ) = e A3 (τ ) = , 0 1 0 e−τ τ 1 we obtain the conservation laws x˙ , y2

xx˙ + yy˙ and y2

˙ 2 − y2 ) + 2yxy ˙ x(x . 2 y J

Exercise 94. Obtain the general law F (x, y) = 0 of motion of a geodesic in the Lobatchewski plane.

6. SYMMETRIES AND NOETHER THEOREM

109

6.3. Monotonicity formulas. As before, let L.Rn × Rn → R be a smooth Lagrangian. A sub-symmetry (resp. super-symmetry) of L is a (smooth) one-parameter mapping φτ (x) such that φ0 (x) = x and d L(φτ (x), Dx φτ (x)v) ≤ 0 (resp. ≥ 0). dτ τ =0 A simple variation of the proof of Noether’s theorem yields: Theorem 55. Let φτ be a sub-symmetry of L. Then   d d ˙ Dv L(x, x) φτ (x) ≤ 0, dt dτ τ =0 with the opposite inequality for super-symmetries. Proof. It suffices to observe that Z T d ˙ 0≥ L(φτ (x), Dx φτ (x)x)dt dτ 0 τ =0 Z T d d d ˙ ˙ φτ (x) + Dv L(x, x) φτ (x) = Dx L(x, x) dτ dt dτ 0 τ =0 τ =0 T d ˙ φτ (x) , = Dv L(x, x) dτ τ =0 0 which then implies the result.



An application of this theorem is the following corollary: Corollary 56. Let L(x, v) : Rn × Rn → R be smooth Lagrangian admitting a strict sub-symmetry. Then the corresponding Euler-Lagrange equations does not have periodic orbits. Next we present some additional examples and applications. Example 30. Suppose, for some y ∈ Rn and h ≥ 0, L(x + hy, v) ≤ L(x, v), then d ˙ ≤ 0. Dv L(x, x)y dt J

110

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Example 31. Consider the case in which L(λx, λv) is increasing in λ, for λ ≥ 0. Then d ˙ ≥ 0. Dv L(x, x)x dt J Example 32. Consider the mapping φτ (x) = x + τ F (x), and assume that d L(x + τ F (x), v + τ Dx F v) ≤ 0, dτ at τ = 0. Then d ˙ (x) ≤ 0. Dv L(x, x)F dt 2

Consider the case L = |v|2 , and F = ∇U , for some concave function U . Then d |(I + τ D2 U )v|2 = v T D2 U v ≤ 0. dτ 2 τ =0 Thus d ∇U · v ≤ 0, dt that is d2 U (x) ≤ 0, dt2 that is U (x(t)) is a concave function.

J

Example 33. Consider a system of n non-interacting particles, and set X U= |xi − xj |. i6=j

Clearly U is a convex function. By the previous example we have d2 |xi − xj | ≥ 0. dt2 J Exercise 95. Consider a smooth Lagrangian of the form e−αt L(x, v) This Lagrangian is sub-invariant in time.

7. CRITICAL POINT THEORY

1. Prove that

111

d −αt e E(t) ≥ 0, dt

where ˙ x˙ − L(x, x). ˙ E = Dv L(x, x) In particular, show that this estimate yields exponential blow up of the energy. 2. Impose conditions upon L that ensure that exponential blow up of the kinetic energy can also be bounded using simple estimates by E(t) ≤ Ceβt . Exercise 96. Consider the Lagrangian:, L : Rn × Rn → R L(x, v) =

1 β 1 α |v| − |x| . β α

Deduce the Virial theorem: Z Z 1 T β 1 T α ˙ = lim lim |x| |x| T →∞ T 0 T →∞ T 0 Hint: use the scaling transformation x → λx, for λ in a neighborhood of 1.

7. Critical point theory In this section we discuss methods to construct non-minimizing critical points. 7.1. Some informal computations. Let T > 0 be given. For a ∈ Rn , let xa be an orbit which minimizes the action under the con˙ ˙ ) this orbit does not straint x(0) = x(T ) = a. In general x(0) 6= x(T have period T . Let I[a] be the function that associates to a the action xa : Z T I[a] = L(xa , x˙ a )dt. 0

At the maxima or minima of I[a], if I is differentiable I 0 [a] = 0,

112

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

that is (assuming xa is differentiable in a) Z T Dx L(xa , x˙ a )Da xa + Dv L(xa , x˙ a )Da x˙ a . 0= 0

Integrating by parts and using the fact that xa satisfies the EulerLagrange equation, we obtain Dv L(xa (0), x˙ a (0)) = Dv L(xa (T ), x˙ a (T )), which is equivalent to p(0) = p(T ) and, if the Legendre transform v 7→ p = −Dv L is injective (see exercise 97), implies x˙ a (0) = x˙ a (T ). Thus we conclude that the orbits corresponding to maxima or minima of I[a] are T periodic. Exercise 97. Show that if L(x, v) if strictly convex in v then then v 7→ Dv L(x, v) is injective.

In general the differentiablity of xa is hard to establish and in the next section we work around this problem using mountain pass techniques.

7.2. Mountain pass lemma. Let H be a Hilbert space with inner product (·, ·). Consider a functional Φ : H → R. Φ is differentiable if there exists a function Φ0 (u) ∈ H such that kΦ(u) − Φ(v) − (Φ0 (u), v − u)k = 0. ku−vk→0 ku − vk lim

A function Φ is C 1 if Φ0 exists and is continuous. Similarly Φ is C 1,1 if Φ0 is Lipschitz. A point u ∈ H is a critical point if Φ0 (u) = 0. The set of critical points in the level set Φ(u) = c is denoted by Kc = {u : Φ0 (u) = 0, Φ(u) = c}.

7. CRITICAL POINT THEORY

113

A functional Φ : H → R satisfies the Palais-Smale condition if any sequence (uk ) ∈ H satisfying sup |Φ(uk )| ≤ C and kΦ0 (uk )k → 0 is pre-compact, that is, it admits a convergent subsequence. Lemma 57 (Deformation lemma). Let Φ : H → R be a functional satisfying the Palais Smale condition. Let c ∈ R be such that Kc = ∅. Then, there exists  > 0 and δ > 0 and a continuous function η : [0, 1] × H → H such that 1. 2. 3. 4.

η0 (u) = u; η1 (u) = u if |Φ(u) − c| > ; Φ(ηt (u)) ≤ Φ(u); Φ(η1 (u)) ≤ c − δ if Φ(u) ≤ c + δ.

Proof. Firstly, we claim that there exist non-negative real numbers σ and  such that |Φ(u) − c| <  =⇒ kΦ0 (u)k ≥ σ. To show this claim, assume by contradiction that there exist sequences σk → 0 and k → 0 such that |Φ(uk ) − c)| ≤ k and kΦ0 (uk )k ≤ σk . This implies the existence of a convergent subsequence of uk with limit u. This vector is a critical point, which is a contradiction. Choose δ, 0 < δ <  and 0 < δ < A = {u : |Φ(u) − c| > },

σ2 . 2

Define

B = {u : |Φ(u) − c| < δ}.

Let dist(u, A) , dist(u, A) + dist(u, B) 0 ≤ g ≤ 1. We have that g ≡ 0 in A and g ≡ 1 in B. Let also  1 if 0 ≤ t ≤ 1 h(t) =  1 if t > 1. g(u) =

t

Consider V (u) = −g(u)h(kΦ0 (u)k)Φ0 (u).

114

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

For each u ∈ H consider the equation (67)

η˙ t = V (ηt ),

with η0 = u. We have that d Φ(ηt ) ≤ 0. dt If ηt ∈ B then d Φ(ηt ) ≤ −σ 2 dt and for ηt ∈ A then V ≡ 0. Finally, to end the proof, it is enough to observe that if |Φ(u)−c| < δ 2 then we have Φ(η1 ) ≤ c − δ since σ2 > δ.  Exercise 98. Show that the solution of (67) is continuous on the initial condition u. Theorem 58 (Mountain pass). Let Φ be a C 1,1 functional satisfying the Palais Smale condition. Suppose that 1. Φ(0) = 0; 2. Φ(u) ≥ a if kuk = r, where a, r > 0 3. there exists v ∈ H such that Φ(v) ≤ 0, with kvk > r. Let Γ = {g ∈ C([0, 1], H) : g(0) = 0, g(1) = v} then the set Kc , with c = inf max Φ(g(t)), g∈Γ 0≤t≤1

is non-empty. Proof. Clearly c > a. Suppose that Kc = ∅. Choose  < a2 and apply the deformation lemma to construct the homeomorphism η. Let g be such that max Φ(g(t)) ≤ c + δ.

0≤t≤1

7. CRITICAL POINT THEORY

115

Then max Φ(η(g(t))) ≤ c − δ,

0≤t≤1

which is a contradiction.



Exercise 99. Consider the Lagrangian 1 1 x4 L(x, v) = |v|2 + x2 −  . 2 2 4 Let Φ be the functional Z 1 ˙ Φ(x) = L(x, x)ds 0

defined in

1 (0, 1). Hper

1. Show that Φ is differentiable and show that its derivative is give by Z 1 0 hΦ (x), yi = x˙ y˙ + xy − x3 y 0

2. Show that Φ0 (x) is Lipschitz in x, that is, the vector w ∈ H that satisfies Z 1 0 hΦ (x), yi = z˙ y˙ + zy, 0

is a Lipschitz function of x. 3. Show that Φ satisfies the Palais-Smale condition: (a) Let xn be a sequence satisfying Φ(xn ) ≤ C and Φ0 (xn ) → 0. Show that Z 1

x˙ 2n + x2n ≤ C.

0

(b) Show that this implies that, through a subsequence, xn * x, 1 for some function x in Hper (0, 1) and that xn → x uniformly. 1 (c) Use the fact that Φ0 (xn ) → 0 in Hper (0, 1) to show that 1 xn → x in Hper (0, 1) using Lax-Milgram theorem. 4. Show that x ≡ 0 is a strict local minimum of the action, that is, 1 , Φ[x] ≥ αkxkHper for some α > 0 and kxk sufficiently small.

116

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

5. Show that there exists a curve y 1-periodic that satisfies Φ[y] < 0. 6. Prove the existence of a non-trivial 1-periodic solution EulerLagrange equation.

8. Invariant measures An important issue in dynamical systems are invariant measures under the flow induced by a vector field. In this section we review some results and construct invariant measures under the Hamiltonian flow. Lemma 59. Let µ be a measure on a manifold M . Let χ be a smooth vector field on M . The measure µ is invariant with respect to the flow generated by the vector field χ iff for any smooth compactly supported function ξ : M → R we have Z ∇ξ · χdµ = 0. M

Proof. Let Φt be the flow, generated by the vector field χ. Then if µ is invariant under Φt , for any smooth compactly supported function ξ(x) and any t > 0 we have Z  ξ Φt (x) − ξ(x)dµ = 0. By differentiating with respect to t, and setting t = 0, we obtain the “only if” part of the theorem. To establish the converse, we have to prove that for any t the measure µt is well-defined as  µt (S) = µ (Φt )−1 (S) . and coincides with µ.

8. INVARIANT MEASURES

117

By the Riesz representation theorem it is sufficient to check that the identity Z Z ξdµ = ξdµt holds for any continuous function ξ (vanishing at ∞). Any continuous function can be uniformly approximated by smooth functions. Therefore it is sufficient to prove the above identity for smooth functions ξ with compact support. Assume, without loss of generality, that ξ(x) is a C 2 -smooth function. Fix t > 0. We have to prove that Z  ξ Φt (x) − ξ(x)dµ = 0. We have Z N −1 Z X    ξ Φt (x) − ξ(x)dµ = ξ Φt(k+1)/N (x) − ξ Φtk/N (x) dµ k=0

=

N −1 Z X

 ξk Φt/N (x) − ξk (x)dµ ,

k=0

 where ξk (x) = ξ Φtk/N (x) N −1 Z X

 ξk Φt/N (x) − ξk (x)dµ

k=0 N −1 Z X

=

 ∇ξk (x) · Φt/N (x) − x + O( Nt2 )dµ =

k=0

=

N −1 Z X

∇ξk (x) ·

t χ(x) N

 + O( Nt2 ) + O( Nt2 )dµ

k=0

=

t N

N −1 Z X

∇ξk (x) · χ(x)dµ + O( Nt ) = O( Nt ).

k=0

Taking the limit N → ∞ we complete the proof.



Exercise 100. Consider a measure on R2n with density eβH(p,x) . Show that this measure is invariant under the Hamiltonian flow generated by H.

118

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Exercise 101. Show that the Hamiltonian flow preserves area in phase space. Example 34. Let u(x, P ) be a solution of H(P + Dx u, x) = H(P ). Then the graph (68)

p = P + Dx u(x, P ),

is invariant under the flow generated by (75). Furthermore, the flow ˙ is constant. restricted to this graph is conjugated to a translation as X If the Hamiltonian H(p, x) is Zn periodic in x, and u is a Zn periodic function, that is H(p, x + k) = H(p, x), and u(x + k) = u(x), for all p, x ∈ Rn and k ∈ Zn , the graph (68) can be interpreted as an invariant torus. Furthermore, as the Lebesgue measure dX in the new coordinates is invariant under the Hamiltonian dynamics, the change of variables formula implies that the measure supported in the graph (68) with density (69)

θ(x)dx = det(I + DP2 x u)dx

is an invariant measure. 9. Non convex problems This section is an introduction to the calculus of variations for nonconvex Lagrangians. Exercise 102. Suppose that L(x, v) → ∞, |v|→∞ |v| lim

uniformly in x. Show that any C 1 minimizing sequence of the action with fixed endpoints is equicontinuous. Exercise 103. Consider the problem Z 1 2 ˙ min |tx(t)| dt. x(−1)=0,x(1)=1

Show that xn =

−1

1 arctan nx + 2 2 arctan n

10. GEOMETRY OF HAMILTONIAN SYSTEMS

119

is a minimizing sequence that does not converge uniformly. Exercise 104 (Discontinuities). Let L(x, v) : R2n → R be a C 2 function. Consider a continuous trajectory x(·), sectionally C 1 . Suppose that x is a minimizer of Z T ˙ L(x, x)dt, 0

over all piecewise C 1 curves which satisfy fixed boundary conditions. Let t0 be a point where x˙ is discontinuous with left and right limits v ± . Determine an equation that relates v + with v − . Show that, if L(x, v) is strictly convex in v, the continuous minimizers which are sectionally C 2 and whose left and right derivatives exist at all points are of class C 1. Exercise 105 (Lavrentiev phenomenon). Consider the variational problem Z 1

(u3 − t)2 u˙ 6 .

min u(0)=1,u(1)=1

0

Show that u = t1/3 minimizes this problem when the minimum is taken over continuous functions u on [0, 1] and differentiable in (0, 1). However, for any sequence uk of continuous functions on [0, 1] satisfying uk (0) = 0 and uk (1) = 1 with bounded derivative and converging pointwise to x1/3 we have Z 1 (u3k − t)2 u˙ 6k → ∞. 0

10. Geometry of Hamiltonian systems We can discuss the Hamiltonian formalism using a more geometric approach. Suppose for now that in (47) we can apply the minimax principle and exchange inf v supp with supp inf v . In this case we obtain the problem Z T ˙ (70) inf sup −H(p, x)dt − p · xdt. x(·) p(·)

0

120

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

To generalize the problem, suppose the variable x represents a point in a manifold M and consider the differential form on [0, T ] × T ∗ M σ = −Hdt − α, with α = pdx. Then (70) is equivalent to determining critical points of Z (71) σ γ

over all curves (x, p) : [0, T ] → T ∗ M . In a more general setting, suppose we are given an even dimensional manifold S, which replaces T ∗ M , and is endowed with a 1−form α such that dα is non-degenerate. Let H : S → R, we would like to determine the critical curves, γ, of (71). That is, curves γ such that for all C 1 variation γτ we have Z d σ = 0. dτ γτ τ =0 Let γ : [0, T ] → S be a critical point, XH : [0, T ] → T S a tangent vector to the curve γ, Y vector field in S with Y (γ(0)) = Y (γ(T )) = 0 and, finally, set φτ = exp(τ Y ). Consider Z T Z −H(φτ (γ))dt − αφτ (γ) (Dφ∗τ XH )dt i(τ ) = σ = γ

0

along φτ (γ). Exercise 106. Show that Z T di(τ ) = −dH(x, p)(Y ) − dα(XH , Y ). dτ τ =0 0 e, therefore, the critical points satisfy dα(XH , ·) = −dH(·). Hint: Observe that d ∗ αφτ (γ) (Dφτ XH ) = LY α(XH ), dτ τ =0 and recall that LY α = d(iY α) + iY (dα).

10. GEOMETRY OF HAMILTONIAN SYSTEMS

121

A symplectic manifold is an even dimensional manifold S endowed with a closed non-degenerate 2-form ω (recall that a form is nondegenerate if for all non-zero vector field X, ω(X, ·) is non-zero). Given a Hamiltonian H : S → R, the vector field XH which generates the Hamiltonian flow is uniquely determined by the equation ω(XH , ·) = −dH. It is important to observe that the form ω is only required to be closed, and not exact. Locally this distinction is irrelevant, but it has important consequences at the global level. Exercise 107. Consider R4 with the symplectic form ω = dp1 ∧ dx1 + 2dp2 ∧ dx2 . Let H : R4 → R. Determine XH . To determine the vector field XH it is necessary to solve the system of linear equations iXH ω = −dH. To avoid this problem, we introduce the Poisson bracket o {F, G} of two functions F and G defined as {F, G} = ω(XF , XG ). Exercise 108. Show that {F, G} = XF (G). In this way we can identify {F, ·} = XF . P Exercise 109. Let ω = i dpi ∧ dxi . Determine the Poisson bracket. Exercise 110. Show that {·, ·} 1. is bilinear; 2. anti-symmetric; 3. satisfies the Leibnitz rule: {F, GH} = {F, G}H + {F, H}G; 4. satisfies the Jacobi identity: {F, {G, H}} + {H, {F, G}} + {G, {H, F }} = 0. A Poisson manifold is a manifold P (in arbitrary dimension) endowed with a bracket {·, ·} satisfying the properties 1-4 of the previous exercise.

122

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Exercise 111. Show that using the Poisson one can define the vector field corresponding to a Hamiltonian H through the identification of {H, ·} with the vector field XH . Exercise 112. Let M be a Poisson manifold and F1 , F2 : M → R such that {F1 , F2 } = C. Show that [XF1 , XF2 ] = 0. Hint: Consider [XF1 , XF2 ]g for arbitrary g : M → R. 11. Perturbation theory Exercise 113. Consider the Hamiltonian H : R4 → R given by H(p, x) = ω · p +  sin(x1 + x2 ). Assume that ω ∈ R2 satisfies ω · k > 0 for all k ∈ Z2 . Show that exists a canonical transformation(x, p) 7→ (X, P ) such that the new Hamiltonian is H(P ) = ω · P. Consider now the case |p|2 + ω · p +  sin(x + y). 2 Show that in a neighborhood of P = 0 we have, using the same change of coordinates, H(p, x) =

H(P, X) = ω · P + O(2 + |P |2 ). We consider Hamiltonians of the form (72)

H (p, x) = H0 (p) + H1 (p, x),

with H0 , H1 smooth, H0 (p) strictly convex and H1 (p, x) bounded with bounded derivatives, and Zn periodic in x. We would like to approximate the solutions of (73)

H (P + Dx u , x) = H  (P ),

11. PERTURBATION THEORY

123

We are given a reference value P = P0 , and we assume that for  = 0 the rotation vector ω0 = DP H 0 (P0 ) satisfies Diophantine nonresonance conditions C (74) |ω0 · k| ≥ s , |k| for some positive constant C and some real s > 0. In this section we review the classical perturbation theory for Hamiltonian systems using a construction equivalent to the Poincar´e normal form near an invariant tori. Somewhat incorrectly, but following [AKN97], we call it the Linstedt series method. Although these results are fairly standard, see [AKN97], for instance, we present them in a more convenient form for our purposes. Consider the Hamiltonian dynamics:  x˙ = −D H (p, x) p  (75) p˙ = Dx H (p, x), we use the convention that boldface (x, p) are trajectories of the Hamiltonian flow and not the coordinates (x, p). The Hamilton-Jacobi integrability theory suggests that we should look for functions H  (P ) and u (x, P ), periodic in x, solving the Hamilton-Jacobi equation: (76)

H (P + Dx u , x) = H  (P ).

Then, by performing the change of coordinates (x, p) ↔ (X, P ) determined by:  X = x + D u P (77) p = P + Dx u , the dynamics (75) is simplified to  X ˙ = −DP H(P) P ˙ = 0, we use again the convention that boldface (X, P) are trajectories of the Hamiltonian flow and not the new coordinates (X, P ).

124

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

If u˜ is an approximate solution to (132) satisfying (78)

H (P + Dx u˜, x) = H  (P ) + f (x, P ),

then the change of coordinates (77) transforms (75) into  X ˙ = −DP H  (P) − DP f (X, P) (79) P ˙ = DX f (X, P), with the convention that f (X, P ) = f (x(X, P ), P ). The KAM theory deals with constructing solutions of (132) by using an iterative procedure, a modified Newton’s method, that yields an expansion u = u0 + v1 + 2 v2 · · · . The main technical point in KAM theory is to prove the convergence of these expansions. An alternate method that yields such an expansion is the Linstedt series [AKN97]. However we should point out that whereas the KAM expansion is a convergent one, the Linstedt series may fail to converge. Nevertheless, since we will only need finitely many terms we will use a variation of the Linstedt series that we describe next. We say that a vector ω ∈ Rn is Diophantine if for all k ∈ Zn \{0}, |ω · k| ≥ |k|Cs , for some C, s > 0. Let P0 be such that ω0 = DP H 0 (P0 ) is Diophantine. We look for an approximate solution of H (P + Dx u (x, P ), x) = H  (P ), valid for P = P0 + O(). When  = 0, H 0 (P ) = H0 (P ) and the solution u0 is constant, for instance we may take u0 ≡ 0. For  > 0 we have, formally, u = O(), and so we suggests the following approximation u˜N to u : (80)

u˜N =v1 (x, P0 ) + (P − P0 )DP v1 (x, P0 ) + 2 v2 (x, P0 )+ 1 + (P − P0 )2 DP2 P v1 (x, P0 ) + 2 (P − P0 )DP v2 (x, P0 )+ 2 + 3 v3 (x, P0 ) + · · · ,

11. PERTURBATION THEORY

125

this expansion is carried out up to order N − 1 in such a way that, formally u − u˜N = O(N ). For example u˜1 = 0,

u˜2 = v1 ,

u˜3 = v1 + 2 v2 + (P − P0 )DP v1 .

The functions vi and DPk k vi satisfy transport equations Dp H0 (P0 )Dx w = f (· · · ), for some suitable f , and can be solved inductively. For instance: H 1 (P0 ) = Dp H0 (P0 )Dx v1 + H1 (P0 , x), 2 H0 (P0 )Dx v1 + Dp H1 (P0 , x), DP H 1 (P0 ) = Dp H0 (P0 )Dx (DP v1 ) + Dpp

and H 2 (P0 ) = 1 2 Dp H0 (P0 )Dx v2 + Dpp H0 (P0 )Dx v1 Dx v1 + Dp H1 (P0 , x)Dx v1 . 2 Note that the derivatives of vi with respect to P , DPk k vi , are computed by solving appropriate transport equations, as is illustrated above for DP v1 , and not by differentiating vi . In fact vi may not be defined for P 6= P0 . However if its derivative exists it satisfies a transport equation. The constants H 1 (P0 ), DP H 1 (P0 ), H 2 (P0 )... are uniquely determined by integral compatibility conditions, for example, Z H 1 (P0 ) = H1 (P0 , x)dx, Z DP H(P0 ) =

Dp H1 (P0 , x)dx,

and Z H 2 (P0 ) =

1 2 D H0 (P0 )Dx v1 Dx v1 + Dp H1 (P0 , x)Dx v1 dx. 2 pp

If H is sufficiently smooth and ω0 is non-resonant then these equations have smooth solutions that are unique up to constants. Finally one can check that (81)

˜ N (P ) + O(N + |P − P0 |N ), H (P + Dx u˜N , x) = H 

126

2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

with ˜ N (P ) = H 0 (P0 ) + H 1 (P0 ) + (P − P0 )DP H 0 (P0 ) + 2 H 2 (P0 ) + · · · , H and this expansion is carried up to order N − 1 in such a way that formally ˜ N (P ) + O(N + |P − P0 |N ). H  (P ) = H  Consider the change of coordinates  p = P + D u˜ (x, P ) x N X = x + DP u˜ (x, P ). N

Then, by (78) and (79), (75) is transformed into:  X ˙ = −DP H  (P) + O(N + |P − P0 |N −1 ) P ˙ = O(N + |P − P0 |N ). 12. Bibliographical notes There is a very large literature on the topics of this chapter. The main references we have used were [Arn66] and [AKN97]. Two classical physics books on this subject are [Gol80] and [LL76]. On the more geometrical perspective, the reader may want to look at [?] (see also [?]) and [Oli98]. Additional material on classical calculus of variations can be found in [?] and the classical book [?]. In what concerns symmetries, additional material can be consulted in [?]. A very good reference in Portuguese is [?].

3

Calculus of variations and elliptic equations

The objective of this chapter is to study the existence and regularity of minimizers of functionals of the form Z L(Du, u, x)dx, I[u] = U

where U is a open subset of Rn , and L : Rn×m × Rn × U → R is a suitable Lagrangian. The models we will consider are quite simplified, illustrating, however, the ideas used in more general cases. Moreover, we will only establish regularity for minimizers in the interior of U , avoiding, thus, the study of the behavior up to the boundary that, frequently, is quite technical. Also, to simplify, we assume that U is bounded and has a regular boundary. The interested reader will be able to find, in higher generality, the results studied in this chapter in, for instance, [Gia83], [Gia93], or [GT01]. We will consider both the scalar case m = 1 and vectorial case m > 1. However, as the theory is more complete in the scalar case, we will prove a few more results. We will start by establishing necessary conditions for a function to be a minimizer, and then, as before we proceed with studying the existence of minimizers using the direct method in the calculus of variations. This guarantees the existence, for instance if L(p, z, x) is convex in p and satisfies certain growth conditions. Then, we will show that these minimizers are weak solutions to the Euler-Lagrange equation (82)

− divx Dp L + Lz = 0.

Although the results that we prove are valid for more general problems, in a significant part of this chapter, we consider the particular case 127

128

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

where (83)

L = L(p) − zf (x).

The regularity theory for elliptic equations addresses this problem and establishes conditions under which u is smooth enough so that it satisfies (82) in the classical sense. The study of the regularity of elliptic equations follows several steps. First the energy methods show that the minimizers are in W 2,2 (Ω) and solve div Dp L(Du) = f. This establishes the existence of second derivatives in the weak sense. Then we will try to show that these are classic solutions to the EulerLagrange equation. This is a second order partial differential equation, thus we will try to establish that u ∈ C 2,α . We will first will consider the deGiorgi-Nash Moser H¨older estimates for elliptic equations (84)

(aij (x)vxi )xj = (f (x))xk ,

with f ∈ Lp and aij uniformly elliptic, that is, θ|χ|2 ≤ aij (x)χi χj ≤ Θ|χ|2 . These estimates imply that the solutions of (84) are H¨older continuous independently of the regularity of aij . We will use the following strategy: each of the derivatives v = uxk of u is a weak solution of (85)

− (Dp2i pj Lvxj )xi = fxk .

That is, rewriting the equation, we conclude that v solves an equation of the form −(aij vxi )xj = fxk . The deGiorgi-Nash Moser estimates imply that Du is H¨older continuous. Therefore the coefficients Dp2i pj L of (85) are H¨older continuous.

1. EULER-LAGRANGE EQUATION

129

Finally, the Schauder estimates show that the solutions v of (aij (x)vxi )xj = f (x)xk , with a and f H¨older continuous, and a elliptic, have H¨older continuous derivative Dv. The combination of all these estimates yields that v ∈ C 1,α , that is u ∈ C 2,α .

1. Euler-Lagrange equation The first step to study variational problems involving multiple integrals is the derivation of necessary conditions for a function to be a minimizer. In this section we will proceed formally and will not provide a rigorous justification of the calculations, or worry about the convergence of integrals or the regularity of functions. As the reader will have the opportunity to observe in the following sections, these are delicate questions that require careful analysis. However, if adequate hypotheses are imposed, all the calculations in this section can be properly justified. 1.1. Scalar case. Let L, the Lagrangian, be L(p, z, x) : Rn × R × U¯ → R+ be a C ∞ function, U a bounded and open subset of Rn with smooth boundary (C ∞ ). We would like to study the minimizers of Z I[w] = L(Dx w, w, x)dx, U

in the set A of functions w that they satisfy certain boundary conditions, for instance, A = {w = g in ∂U } , where g : ∂U → R is a fixed function. Let w0 be a minimizer of I[·]. In an analogous way to what was done in last chapter, we are also going to deduce the Euler-Lagrange equation.

130

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

Theorem 60 (Euler-Lagrange equation). Let w0 be a C 2 minimizer of I[·]. Then divx [Dp L(Dx w0 , w0 , x)] − Dz L(Dx w0 , w0 , x) = 0.

(86)

Proof. If w0 ∈ A then for all φ ∈ C0∞ (U ), we have w0 + φ ∈ A. Therefore if w0 is a minimum of I[·], the function i() = I[w0 + φ] has a minimum at  = 0. Consequently i0 (0) = 0. Therefore we have Z Dp L(Dx w0 , w0 , x)Dx φ + Dz L(Dx w0 , w0 , x)φdx = 0. U

Using divergence theorem we conclude that Z Z Dp L(Dx w0 , w0 , x)Dx φ = − φ divx [Dp L(Dx w0 , w0 , x)]. U

U

C0∞

Therefore, for all φ ∈ Z φ [divx [Dp L(Dx w0 , w0 , x)] − Dz L(Dx w0 , w0 , x)] dx = 0, U

which implies divx Dp L(Dx w0 , w0 , x) − Dz L(Dx w0 , w0 , x) = 0.  Example 35. Let |p|2 + f (x)z, 2 where f : Rn → R is an arbitrary smooth function. The Euler-Lagrange equation is then L(p, z, x) =

−∆w + f (x) = 0, which is the Poisson equation.

J

Not every solution to the Euler-Lagrange equation is a minimum of I, in general solutions can respond to minimum, maximum or even saddle points. We can, however, as it happens in finite dimension, to

1. EULER-LAGRANGE EQUATION

131

establish further necessary conditions by looking at the second variation, that is by computing d2 , i() d2 =0 which for minimum points is nonnegative. Therefore for any φ ∈ Cc∞ (U ), we have Z 2 2 2 Lφ2 ≥ 0. LDx φ φ + Dzz LDx φDx φ + 2Dpz (87) Dpp U

Let B[u, v] be the bilinear form given by the following expression: Z 2 2 B[u, v] = Dpp L(Dx w0 , w0 , x)Dx uDx v + Dpz L(Dx w0 , w0 , x)vDx u+ U 2 2 + Dpz L(Dx w0 , w0 , x)uDx v + Dzz L(Dx w0 , w0 , x)uv.

From (87) we conclude that B must be positive definite if w0 is a minimum. Example 36. Let L =

|p|2 2

+ f (x)z. Then Z Dx uDx v, B[u, v] = U

which implies B[φ, φ] ≥ 0. In fact, if φ ∈ Cc∞ (U ) and φ 6= 0 then B[φ, φ] > 0. Exercise 114. Prove that in this case this implies that any solution to be Euler-Lagrange equation is in fact a minimum. J Example 37. In this example we derive further necessary conditions for the existence of a minimizer. Let   ξ·x 2 ϕδ (x) = δ η(x) sin , δ where η(x) ∈ Cc∞ (U ). Then   Z ξ·x 2 2 2 0 ≤ B[ϕδ , ϕδ ] = Dpi pj Lξi ξj η(x) sin + O(δ 2 ). δ U

132

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS ξ·x δ

Since sin2



* 12 , as δ → 0, we have Dp2i pj Lξi ξj ≥ 0,

that is, the mapping p 7→ L(p, w0 , x) is convex for any minimizer w0 and any x ∈ U . As we will see, in the scalar case, the convexity in p of the Lagrangian is very important both to establish existence of minimizers as well as proving its regularity. For systems, we will be able to derive weaker conditions (which agree with convexity in the scalar case) under which one can show the existence of a minimizer Exercise 115. Compute the Euler-Lagrange equation corresponding to Z u 7→ u2x − u2y , B1 (0)

¯1 ) ∩ C(B1 ) and u = 0 in ∂B1 (0). Show that the solutions with u : C 1 (B to the Euler-Lagrange equation are not minimizers. Exercise 116. Consider the problem Z min (∆u)2 + uf. u,uν =0 em ∂B1 (0)

B1 (0)

Determine the Euler-Lagrange equation and its second variation. Show that the solutions to the Euler-Lagrange equation are (global) minimizers. ¯ Exercise 117. Let Ω ⊂ Rn be a regular domain, and u a C 2 (Ω)∩C(Ω) solution of ∆u = f in Ω, with u = 0 in ∂Ω. Show that u minimizes Z Ω

|∇u|2 + f u, 2

2

over all C functions that vanish in ∂Ω. Exercise 118. Let Ω a regular domain and f ∈ L1 (Ω). Show that if Z fϕ = 0 Ω

1. EULER-LAGRANGE EQUATION

for all ϕ ∈ Cc∞ (Ω) satisfying

R Ω

133

ϕ = 0 then

Z fϕ = 0 Ω

R for all ϕ ∈ C ∞ (Ω) with Ω ϕ = 0. Show that this implies that f is almost everywhere constant. Exercise 119. Consider the problem Z min |∇u|2 + f (x)udx, U

where the minimum is taken over all functions that satisfy Z udx = 0, U

instead of the usual Dirichlet boundary condition u|∂U = 0. Derive the Euler-Lagrange equation and a boundary condition for u. Hint: use the previous exercise. Exercise 120. Determine the curve y = γ(x), γ(0) = γ(1) = 0, in the plane such that the area A defined by 0 ≤ x ≤ 1 and 0 ≤ y ≤ γ(x) is |A| = α (with α sufficiently small) and such that its length is as small as possible. Exercise 121. Determine a differential equation for the surface in R3 defined parametrically by u : B1 (0) ⊂ R2 → R3 such that u|∂B1 = γ, that is, its boundary is a given closed curve γ and which minimizes the area Z det((Du)T Du)1/2 dx. B1

1.2. Systems. For functionals defined for vector valued functions u : U ⊂ Rn → Rm the derivation of the Euler-Lagrange equation is similar: Exercise 122. Let u : U ⊂ Rn → Rm be a minimizer of u : U ⊂ Rn → Rm Z L(Du, u, x). U

134

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

Show that −

X

Dpαk L(x, u, Du)

 xk

= Dzα L(x, u, Du).

k

The second variation is also similar: Exercise 123. Let u : U ⊂ Rn → Rm be a minimizer of u : U ⊂ Rn → Rm Z L(Du, u, x). U

Show that for any compactly supported ϕ : U → Rm " !# X XZ X Dp2α pβ Lϕαxk ϕβxj + Dp2α zβ Lϕαxk ϕβ + Dz2α zβ Lϕα ϕβ ≥ 0. α,β

j,k

U

k

k xj

j

Theorem 61. Let u : U ⊂ Rn → Rm be a C 2 minimizer of u : U ⊂ Rn → Rm Z L(Du, u, x), U

under fixed boundary conditions at ∂U . Then Lpiα pj ξα ξβ ki kj ≥ 0, β

m

for all vectors ξ ∈ R

and k ∈ Rn .

Proof. Let η ∈ Cc∞ (Ω) be a real valued function. Fix ξ ∈ Rm and k ∈ Rn . Use k·x ϕ = 3 ξη(x) sin  in exercise 123 as  → 0.  A function F satisfies the Legendre-Hadamard condition if Fpiα pj (p, z, x)ξα ξβ ηi ηj ≥ θ|ξ|2 |η|2 β

for all vectors ξ ∈ Rn and η ∈ Rm . The Legendre-Hadamard condition is weaker than convexity in p (unless m = 1 or n = 1), which would be Fpiα pj (p, z, x)Mαi Mβj ≥ θ|M |2 , β

for all matrices M ∈ Rm×n .

1. EULER-LAGRANGE EQUATION

135

Exercise 124. Let U be a domain in R2 . Show that if L(P, z, x) : R2×2 → R is given by L(P ) = det P + |P |2 then L satisfies the Legendre-Hadamard condition but is not convex if  is sufficiently small. Exercise 125. Use the Lagrange multiplier method to show that the minimizers of Z |Du|2 U

with u = 0 in ∂U , under the constraint Z |u|2 = 1, U

are eigenvalues of the Laplacian. Exercise 126. Use the Lagrange multiplier method to determine a boundary condition in ∂U for the minimizers of Z |Du|2 U

under the constraints Z

Z

2

|u| = 1,

u = 0.

U

U

Exercise 127. Let 1 < p < ∞. Determine the Euler-Lagrange equation for the minimizers of the functional Z |Du|p , U

with u = g in ∂U . Exercise 128. Let U be a domain in Rn . Let L(P ) : Rn×n → R be given by L(P ) = det P. Determine the Euler-Lagrange equation corresponding to the functional Z u 7→ L(Du). U

Explain why this Lagrangian is called a ”null Lagrangian”.

136

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

2. Further necessary conditions and applications 2.1. Boundary conditions.

2.2. Variational inequalities.

2.3. Lagrange Multipliers.

2.4. Minimal surfaces.

2.5. Higher order problems.

3. Convexity and sufficient conditions 4. Direct method in the calculus of variations 4.1. Scalar case. To ensure the existence of a minimizer, we will impose conditions on the Lagrangian which ensure coercivity and lower semicontinuity. In our discussion we will consider the following model problem: Z (88) min L(Du, u, x)dx. u|∂U =0

U

Similar methods would work if we were to choose the boundary condition at ∂U , u = g ∈ C ∞ (∂U ) (or with adequate regularity). The following condition: (89)

L(p, z, x) ≥ α|p|q − β,

for all (p, z, x) ∈ Rn × R × U , with α, β > 0 and 1 < q < ∞ is enough to ensure coercivity. In fact, it implies I[w] ≥ αkDukqLq (U ) − γ,

4. DIRECT METHOD IN THE CALCULUS OF VARIATIONS

137

for some γ > 0. Consequently, in the Sobolev space W01,q we have I[w] → ∞, as kukW 1,q → ∞. 0

Exercise 129. Show that the functional associated to a Lagrangian satisfying (83) is coercive in W01,α for 1 < α < ∞ if L(p) ≥ C|p|α , f is bounded. Let wk be a maximizing sequence. Then sup kwk kW 1,q < ∞. k

To see this, observe using wk |∂U = 0 and the Poincar´e inequality we have kwk kW 1,q ≤ CkDwk kLq . Exercise 130. Let U be a bounded domain, g ∈ C ∞ (∂U ) and wk a minimizing sequence with the boundary condition w = g on ∂U . Show that sup kwk kW 1,q < ∞. k

Since in an infinite dimensional space a bounded sequence may fail to have any convergent subsequence, we will have to use weak convergence. In a reflexive and separable Banach space any bounded sequence wk has a weakly convergent subsequence (which we still denote by wk ): wk * w. This means, using a bounded sequence in W 1,q , that Z Z Dwk Dφ + wk φ → DwDφ + wφ, 0

for all φ ∈ W 1,q , where

1 q

+

1 q0

= 1.

As the next example shows, one main difficulty in using weak convergence arises from the lack of continuity with respect to weak convergence of non-linear functionals.

138

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

Exercise 131. Let wk (x) = sin 2πkx. Show that wk * 0 in Lq ([0, 1]) R1 and that 0 wk2 = 12 , independently of k. Conclude that wk2 6* 0. For our purposes we will show that under certain conditions, we may have we have weakly lower semicontinuity, that is, whenever wk * w, then lim inf I[wk ] ≥ I[w]. k→∞

Note that in general we do not expect continuity, i.e. I[wk ] → I[w]. Theorem 62. Assume that for fixed z and x the mapping p 7→ L(p, z, x) is convex then I[·] is weakly lower semicontinuous in W 1,q . Remark. From the previous chapter we already know that convexity of L in p is a natural condition. Proof. Suppose that wk * w in W 1,q . Then: 1. supk kwk kW 1,q < ∞. 2. By Rellich-Kondrachov theorem we can extract a subsequence wk → w in Lr , with r < q ∗ . 3. By extracting, if necessary, a further subsequence, we may assume that wk → w almost everywhere. 4. By Egorov’s theorem, for all  > 0 there exists a set E ⊂ U such that |U \E | ≤  and wk → w, uniformly in E . 5. Defining  F =

1 x ∈ U : |w| + |Dw| < 



4. DIRECT METHOD IN THE CALCULUS OF VARIATIONS

139

we have |U \F | → 0 when  → 0 and therefore G = F ∩ E satisfies |U \G | → 0, when  → 0. We can assume, without loss of generality, that L ≥ 0. Then Z I[wk ] = L(Dwk , wk , x)dx ≥ U Z L(Dwk , wk , x) ≥ ≥ G Z Z ≥ L(Dw, wk , x) + Dp L(Dw, wk , x)(Dwk − Dw) → G G Z → L(Dw, w, x), k→∞

G

therefore Z lim inf I[wk ] ≥ k→∞

L(Dw, w, x). G

Using the monotone convergence theorem when  → 0, we obtain lim inf I[wk ] ≥ I[w]. k→∞

 Exercise 132. Give an example of a sequence uk convergent in Lr to a function u ∈ Lr which does not converge pointwise. Show that, however, that there exists a subsequence of uk which converges to u almost everywhere.

As a corollary we have existence of a solution of (88): Theorem 63. Suppose L is coercive, that is, it satisfies (89) and convex in p, then there exists a minimizer of (88). Exercise 133. The following Lagrangian L(p, z, x) =

|p|2 + f (x)u 2

140

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

does not satisfy (89). Show, using similar ideas that there exists a minimizer in W 1,2 (Ω) of the functional Z L(Du, u, x) Ω

in the class of functions u|∂Ω = g, for g ∈ C ∞ (∂Ω). Exercise 134. Generalize the previous exercise for Lagrangians of the form L(Du) + f (x)u, with L convex and L(p) ≥ c1 |p|q + c2 for some 1 < q < ∞. Exercise 135. Use the direct method in the calculus of variations to establish the existence of minimizers in W 2,2 of the functional Z |∆u|2 + f (x) · Du + g(x)u, 2 Ω with u|∂ω = h1 and uν |∂ω = h2 (where uν is the normal derivative). Exercise 136. Let Ω ⊂ Rn be a bounded domain. f : Ω → Rn a C ∞ function with compact support. Show that the variational problem Z |∇u|2 1 min + f · ∇u + u|∂Ω =0 Ω 2 1 + u2 admits a minimizer in W01,2 (Ω). Exercise 137. Let Ω be a regular domain. Establish the existence of minimizers in W 1,2 (Ω) of the functional Z Z 2 2 |∇u| + |u| + g(x)u, Ω

∂Ω

2

where g :∈ L (∂Ω). 4.2. Systems. A functional of the form Z u 7→ L(Du, u, x)dx U n×m

isquasiconvex if for all P ∈ R Q ⊂ Rn Z

, z0 ∈ Rm and x0 ∈ Rm and any cube Z

L(P, z0 , x0 )dx ≤ Q

L(P + Dv, z0 , x0 )dx Q

4. DIRECT METHOD IN THE CALCULUS OF VARIATIONS

141

for all function v with compact support Q. Exercise 138. Consider a minimizer u : U → Rm of Z L(Du, u, x)dx. U

Let Q be a cube containing the origin. Suppose ϕ is a compactly supported function on Q. Let uλ = u + λϕ( λx ). Deduce from Z Z L(Du, u, x) ≤ L(Duλ , uλ , x) λQ

that

Z

λQ

Z L(Du(0), u(0), 0) ≤

L(Du(0) + Dϕ, u(0), 0).

Q

Q

Exercise 139. Show that convexity implies quasiconvexity. Theorem 64. Let L(P, z, x) : Rn×m ×Rm ×U → R, U ⊂ Rn a bounded domain. Suppose that L is quasiconvex and satisfies the following properties: • • • •

0 ≤ c|P |p + c ≤ L ≤ C|P |p + C |DP L| ≤ C|P |p−1 + C |Dz L| ≤ C|P |p−1 + C |Dx L| ≤ C|P |p + C.

Then there exists a minimizer u ∈ W01,p of Z I[U ] = L(Du, u, x). U

Note that a similar result would also hold for non-homogeneous boundary conditions u = g in ∂U . Proof. First recall the following result. Let Qi (x) denote a dyadic cube containing x with sidelenght 2−i . For f ∈ Lp , 1 < p < ∞, define Z hf ii (x) = − f. Qi (x)

Then hf ii → f in Lp .

142

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

Clearly any minimizing sequence uk is bounded in W 1,p and therefore there exists u ∈ W01,p such that uk * u and uk → u strongly in Lr for some r > p. Consider the sequence of measures µk with density µk = |Duk |p + |uk |p . Then there exists µ such that µk * µ. By translation we may choose a dyadic division of Rn , (Qji ) such that µ(∂Qji ) = 0 for all cubes. Let xji denote the center of Qji , and zij = huii (xji ). Fix  > 0 and choose V ⊂⊂ U such that Z L(Du, u, x) ≤ . U \V

Then Z

X

I[uk ] ≥

Qji

j:Qji ∩V 6=∅

Z

X

=

j:Qji ∩V 6=∅

Qji

L(Duk , uk , x) L(Duk , huk ii , xji ) + E0 ,

where the error term E0 can be estimated as follows: X Z E0 ≤ |L(Duk , uk , x) − L(Duk , huk ii , xji )| Qji

j:Qji ∩V 6=∅



Z

X j:Qji ∩V 6=∅



Qji

(C|Duk |p + C) |xji − x|

  + C|Duk |p−1 + C |uk − huk ii | → 0, as i → ∞, uniformly in k. Indeed, in first term the convergence is uniform because kDuk kLp is globally bounded, whereas in the second case |uk − huk ii | → 0, uniformly in k because uk is bounded in W 1,p . Therefore we have X Z I[uk ] ≥ L(Du + D(uk − u), zij , xji ) + oi (1), j:Qji ∩V 6=∅

Qji

4. DIRECT METHOD IN THE CALCULUS OF VARIATIONS

143

where oi (1) stand for the error terms that converge to 0 as i → ∞, uniformly in k. Fix  > 0. Thus, for i sufficiently large and all k: X Z I[uk ] ≥ L(Du + D(uk − u), zij , xji ) − , Qji

j:Qji ∩V 6=∅

ˆ j a concentric cube with Qj Now choose 0 < σ < 1 and denote by Q i i but with edge σ2−i . Choose ϕji smooth, compactly supported with  1 in Q ˆj i j ϕi = 0 in (Qj )C , i

and with |Dϕji | ≤

C2i . 1−σ

Then Z

X

I[uk ] ≥

j:Qji ∩V 6=∅

Qji

L(Du + D(vij ), zij , xji ) + E1 − ,

where vij = ϕji (uk − u), and |E1 | ≤ C

Z

X j:Qji ∩V 6=∅

ˆj Qji \Q i

1 + |Du|p + |Duk |p + |Dϕji |p |uk − u|p ,

and therefore, for any  > 0 lim sup |E1 | < , k→∞

if σ is sufficiently close to 1. Thus we can choose i0 , and k0 large enough so that X Z I[uk ] ≥ L(Du + D(vij ), zij , xji ) − 2, j:Qji ∩V 6=∅

Qji

144

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

for all i ≥ i0 and all k ≥ k0 . Note that X Z L(Du + D(vij ), zij , xji ) Qji

j:Qji ∩V 6=∅

Z

X



Qji

j:Qji ∩V 6=∅

L(hDuii + D(vij ), zij , xji ) + E2 .

Furthermore, we have X Z L(Du + D(v j ), zi , xi ) − L(hDuii + D(v j ), z j , xj ) E2 ≤ i i i i Qji

j:Qji ∩V 6=∅



X

Z

j:Qji ∩V 6=∅

Qji

(1 + |Du|p−1 + |hDuii |p−1 )|Du − hDuii |,

since khDuii kLp ≤ kDukLp , for i large enough, we have |E2 | ≤ . Therefore, using quasiconcavity, for k and i large enough, X Z I[uk ] ≥ L(hDuii , zij , xji ) − 3. Qji

j:Qji ∩V 6=∅

Finally, observe that Z

X j:Qji ∩V

6=∅

Qji

j:Qji ∩V

where

6=∅

Qji

L(Du, u, x) + E3

Z

X j:Qji ∩V

Z

X



E3 ≤

L(hDuii , zi , xi )

6=∅

Qji

|L(hDuii , zi , xi ) − L(Du, u, x)| ,

which also converges to 0 as i → ∞. Therefore, by sending  → 0 we obtain that uk converges weakly in W 1,p to a minimizer.  Exercise 140. Suppose that L(P ) satisfies the uniform strict quasiconvexity property: Z Z γ 2 L(P + Dv), L(P ) + |Dv| ≤ 2 Q Q for all v ∈ Cc∞ (Q). Let uk be a minimizing sequence in W 1,2 . Show that uk → u strongly in W 1,2 .

5. EULER-LAGRANGE EQUATIONS

145

5. Euler-Lagrange equations The minimizers we obtained in the previous section using the direct method in the calculus of variations are critical points and, therefore, we would like to show that they are solutions (in an appropriate sense) of the Euler-Lagrange equations. We will suppose the following additional hypothesis on L: 1. |L(p, z, x)| ≤ C(|p|q + |z|q + 1); 2. |Dp L(p, z, x)| ≤ C(|p|q−1 + |z|q−1 + 1); 3. |Dz L(p, z, x)| ≤ C(|p|q−1 + |z|q−1 + 1). A function u ∈ W 1,q is weak solution of the Euler-Lagrange equation (86) if, for all v ∈ Cc∞ (U ), Z (90) Dp L(Du, u, x)Dx v + Dz L(Du, u, x)vdx = 0. U

Remark. This is a natural constraint to impose since from (90) we can obtain (86) by integration by parts. Theorem 65. Under the previous assumptions, if u ∈ W 1,q minimizes I[·] then u is a weak solution of the Euler-Lagrange equation.

Proof. Let i(τ ) = I[u + τ v]. Then i(τ ) − i(0) = τ

Z

Lτ (x),

U

where Lτ (x) =

L(Du + τ Dv, u + τ v, x) − L(Du, u, x) . τ

Clearly Lτ (x) → Dp L(Du, u, x)Dv + Dz L(Du, u, x)v,

146

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

almost everywhere. Additionally, Z 1 τ d τ L (x) = L(Du + sDv, u + sv, x)ds = τ 0 ds Z 1 τ Dp L(Du + sDv, u + sv, x)Dv+ = τ 0 + Dz L(Du + sDv, u + sv, x)vds ≤ ≤C(|Du|q + |Dv|q + |u|q + |v|q + 1). Therefore, the dominated convergence theorem yields the desired result.  Exercise 141. Prove the last inequality of the previous theorem, that is, |Dp L(Du + sDv, u + sv, x)Dv + Dz L(Du + sDv, u + sv, x)v| ≤ C(|Du|q + |Dv|q + |u|q + |v|q + 1), uniformly for 0 ≤ s ≤ τ . Hint: recall the inequalities ab ≤

ar b s + r s

with

1 1 + =1 r s

and |a + b|r ≤ C(ar + br ). Exercise 142. Impose conditions on F (A, p, z) so that you can prove the existence of minimizers in W 2,2 of Z F (∆u, Du, u), Ω

and that these are weak solutions to the corresponding Euler-Lagrange equation.

6. Regularity by energy methods In order to motivate the results of this section, we start with an example:

6. REGULARITY BY ENERGY METHODS

Example 38. Let L = equation is

|p|2 2

147

+f (x)z. The corresponding Euler-Lagrange −∆u + f (x) = 0.

Let u be a C 2 solution of the previous equation. Multiplying the equation by ∆u and integrating we obtain Z Z 2 (∆u) = f (x)∆u. Integrating by parts the left-hand side of this identity and ignoring the boundary terms (of course this wrong and some effort must be done in order to avoid this difficulty), we have Z XZ C 2 2 |Dxi xj u| = f (x)∆u ≤ kf kL2 + k∆ukL2 .  i,j As a conclusion, we have kD2 uk2L2 ≤ Ckf k2L2 . This example suggest that if it is possible to somehow control the boundary terms then the solutions to the Euler-Lagrange equation should not only be in W 1,2 but also in W 2,2 . J To simplify the presentation we will consider a restricted class of Lagrangians of the form L(p) − zf (x), with 2 θ ≤ Dpp L(p) ≤ Θ,

for suitable constants 0 < θ < Θ. We should note that more complex problems can be handled using similar techniques and nothing essential is really lost by considering this particular problem. We also need to recall the following theorem Let u : Rn → R. For h ∈ R define Dih u =

u(x + hei ) − u(x) . h

148

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

Theorem 66. Let 1 ≤ p < ∞, u ∈ W 1,p (U ) and V , V ⊂⊂ U . Then kDh ukLp (V ) ≤ CkDukLp (U ) . Conversely, if u ∈ Lp and sup kDh ukLp (V ) ≤ C, h

then u ∈ W

1,p

(V ).

Theorem 67. Let u ∈ W 1,2 (U ) be a weak solution of the equation − div(Dp L(Du)) = f. 2,2 Then u ∈ Wloc (U ).

Proof. Let V ⊂⊂ W ⊂⊂ U (recall that A ⊂⊂ B means that A¯ is compact subset of B) and ξ ∈ Cc∞ (U ) with   ξ ≡ 1 in V  ξ ≡ 0 in U \W    0 ≤ ξ ≤ 1.

Let h > 0 be sufficiently small and 1 ≤ k ≤ n. Define v = −Dk−h (ξ 2 Dkh u), where Dkh w =

w(x + hek ) − w(x) . h

Exercise 143. Show that the operator Dkh satisfies an “integration by parts formula”: Z Z h vDk u = − uDk−h v, for u, v ∈ Cc (U ). Suppose u is a weak solution of the Euler-Lagrange equation, then Z 0 = Dp L(Du)Dv − f v = Z = Dkh (Dp L(Du))D(ξ 2 Dkh u) + f Dk−h (ξ 2 Dkh u).

6. REGULARITY BY ENERGY METHODS

149

We can rewrite: Dp L(Du(x + hek )) − Dp L(Du(x)) h Z 1 1 d = Dp L(sDu(x + hek ) + (1 − s)Du(x))ds h 0 ds Z 1 1 2 = D L(· · · )(Du(x + hek ) − Du(x))ds h 0 pp

Dkh (Dp L(Du)) =

= ah (x)Dkh Du, where h

1

Z

2 Dpp L(· · · ).

a (x) = 0 h

The matrix a is positive definite. Therefore Z Z 2 h 2 θ ξ |Dk Du| ≤ ξ 2 ah (Dkh Du)(Dkh Du). Therefore we have the following estimate: Z Z h 2 h Dk (Dp L(Du))D(ξ Dk u) ≥ θ ξ 2 |Dkh Du|2 U ZU + 2 ah (Dkh Du)(Dkh u)ξDξ ZU Z θ 2 h 2 ξ |Dk Du| − C |Dkh u|2 . ≥ 2 U W The second term of the Euler-Lagrange equation satisfies the estimate: Z Z Z Z 2 2 f D−h (ξ 2 Dkh u) ≤ C |f | + C |Du| +  ξ 2 |Dk−h Dkh u|2 k  U Z ZU ZU C |f |2 + C |Du|2 +  ξ 2 |Dkh Du|2 , ≤  U U U where we used the estimates, which follow from theorem 66, Z Z  h  2 −h 2 | Dk ξ Dk u | ≤ C |Du|2 , U

U

and

Z ξ U

2

|Dk−h Dkh u|2

Z ≤C

ξ 2 |Dkh Du|2 .

U

Therefore, for  sufficiently small, Z Z θ 2 h 2 ξ |Dk Du| ≤ |f |2 + |Du|2 . 4 U U

150

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

So u ∈ W 2,2 (V ).



The last theorem implies in particular that the Euler-Lagrange equation div(Dp L) − Dz L = 0 holds almost everywhere. To conclude our discussion concerning energy methods, we are going to review some facts concerning elliptic equations, namely LaxMilgram’s theorem. 2,2 Exercise 144. Let u ∈ Wloc be a solution of the Euler-Lagrange equation

− div(Dp L(Du)) = f (x).

(91)

Show that u is a weak solution of −(Dp2i pj L(Du)uxk xj )xi = fxk , which can be obtained from (91) by differentiation with respect to xk . Let v = uxk . The previous exercise shows that v is a weak solution of − (aij vxj )xi = g,

(92) where

aij = Dp2i pj L(Du),

g = f xk .

Equation (92) is an elliptic equation since the matrix a is positive definite, that is, aij ξi ξj ≥ θ|ξ|2 , for all vectors ξ ∈ Rn . The main result to establish existence of solutions of elliptic equations is Lax Milgram’s theorem Theorem 68 (Lax-Milgram ). Let H be a Hilbert space with norm k·k, inner product (·, ·) and duality pairing denoted by h·, ·i. Let B[·, ·] : H × H → R

6. REGULARITY BY ENERGY METHODS

151

be a continuous bilinear form, that is |B[u, v]| ≤ αkukkvk, and coercive, that is, B[u, u] ≥ βkuk2 . Let f : H → R be a continuous linear functional (f ∈ H 0 ). Then there exists u ∈ H such that B[u, v] = hf, vi

∀v ∈ H.

Proof. For the proof of the theorem, we need the following result from functional analysis: Theorem 69 (Riesz representation theorem). Let H be a Hilbert space and H 0 its dual. Then, for each u∗ ∈ H 0 , there exists u ∈ H such that hu∗ , vi = (u, v)

∀v ∈ H.

For each fixed u, the functional v 7→ B[u, v] is a continuous linear functional. Thus, by Riesz theorem, there exists w ∈ H, dependent upon u that we denote by w = Au, such that B[u, v] = (Au, v). We will show that A is a continuous linear mapping. To establish linearity it suffices to observe that (A(λ1 u1 + λ2 u2 ), v) = B[λ1 u1 + λ2 u2 , v] = = λ1 B[u1 , v] + λ2 B[u2 , v] = = λ1 (Au1 , v) + λ2 (Au2 , v). The continuity follows from the estimate kAuk2 = (Au, Au) = B[u, Au] ≤ αkukkAuk,

152

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

and, therefore, kAuk ≤ αkuk. By coercivity we have βkuk2 ≤ B[u, u] = (Au, u) ≤ kAukkuk, and, therefore, kAuk ≥ βkuk. consequently, A is injective and its image is closed in H. Finally, we claim that the image of A is H. For that, let w ∈ R(A)⊥ . Then βkwk2 ≤ B[w, w] = (Aw, w) = 0 and, therefore, w = 0. Therefore, we have just shown that A has a continuous inverse. Again, by Riesz theorem, there exists w such that hf, vi = (w, v), and, consequently, since A is invertible, there exists u such that Au = w, that is B[u, v] = (Au, v) = (w, v) = hf, vi.  As an application of the Lax-Milgram theorem, we have the following result: Example 39. Let H = W01,2 , f ∈ L2 and Z B[u, v] = aij uxi vxj , U

with aij elliptic, Z hf, vi = −

f vxk . U

We have B[u, v] ≤ CkukW 1,2 kvkW 1,2 0

0

6. REGULARITY BY ENERGY METHODS

153

and, by Poincar´e inequality, B[u, u] ≥ βkuk2W 1,2 . 0

Thus, by Lax-Milgram’s theorem, there exists a weak solution in W01,2 of −(aij uxi )xj = fxk . J Exercise 145. Use Lax-Milgram’s theorem to establish the existence of solutions of ∆2 u = f, with u ∈ W02,2 (B1 (0)). Exercise 146. Suppose that b(x) : Rn → Rn is a bounded C ∞ (Rn ) function and that f ∈ L2 (Rn ). Use Lax-Milgram’s theorem to establish the existence of solutions in W 1,2 (Rn ) of −∆u + b(x) · ∇u + λu = f, for λ large enough. In what remains in this section we will establish an essential result: Garding’s inequality. Theorem 70. Let Aαβ ij (x) be uniformly continuous function and satisfying X X αβ Aij ηα ηβ ξi ξj ≥ C|η|2 |ξ|2 . ij

αβ

Let U be a bounded domain with smooth boundary. Then, for all u ∈ W 1,2 (U ) we have Z Z XX Z αβ 2 α β C |u| + Aij Di u Dj u ≥ C |Du|2 . ij

αβ

Proof. By the extension theorem, for u ∈ W 1,2 (U ) there exists another function u˜ ∈ W 1,2 (Rd ), compactly supported, such that u = u˜ in U and k˜ ukW 1,2 (Rd ) ≤ CkukW 1,2 (U ) . We will drop the ∼ in what follows, to simplify the notation.

154

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

First consider the case in which Aαβ ij is constant. In this case, by using Fourier transform we have Z XX X X Z αβ β αβ α β Aij Di u Dj u = C Aij ξi ξj uˆα uˆ ij

ij

αβ

Z ≥C

αβ

|ξ|2 |ˆ u|2 ≤ CkukW 1,2 (Rd ) .

Now we consider a localized version of the inequality, suppose that supp u ⊂ BR (x0 ) with R sufficiently small. Let ω(R) denote the modulus of continuity of Aαβ ij . Then Z XX Z XX αβ α β α β Aαβ Aij (x)Di u Dj u = ij (x0 )Di u Dj u ij

ij

αβ

αβ

Z XX αβ α β + (Aαβ ij (x) − Aij (x0 ))Di u Dj u ij

αβ

≥ CkukW 1,2 − ω(R)kukW 1,2 ≥

C kukW 1,2 , 2

if R is small enough. Since we can assume that u has compact support in a fixed compact, we can use a partition of unity to write X u= ϕ2k u. k

Then we have XZ XX ij

k

=

α β ϕ2k Aαβ ij (x)Di u Dj u

αβ

XZ XX k

ij

α β Aαβ ij (x)Di (ϕk u )Dj (ϕk u ) + low order terms

αβ

Thus by reassembling everything we obtain the desired inequality.  Exercise 147. Use Garding’s estimate to obtain W 2,2 regularity for minimizers of Lagrangians that satisfy the Legendre-Hadamard condition (for systems, the scalar case was already considered!).

¨ 7. HOLDER CONTINUITY

155

Exercise 148. Let h > 0 and uhn be the following sequence obtained by the following inductive procedure: given uhn ∈ W 1,2 (Rn ), uhn+1 is determined by: Z (uhn+1 − uhn )2 |∇un+1 |2 min + , 2h 2 uh Rn n+1 where uh0 ≡ u0 is the initial data. 1. Use the direct method in the calculus of variations to show that for each uhn there exists a uhn+1 ∈ W 1,2 (Rn ). 2. Show that the sequence k∇uhn kL2 is decreasing. 3. Determine the Euler-Lagrange equation for uhn+1 . 4. Consider the family in L2 indexed by h h

v =

∞ X

uhk (x)1kh≤t 1, α > 0, and C > 0. Suppose φ(h) ≤

C (φ(k))β . (h − k)α

Then φ(M ) = 0,  1/α for M = Cφ(0)β−1 2αβ/(β−1) . Proof. Define kn = M (1 −

1 ). 2n

Then (94)

φ(kn+1 ) ≤

C 2α(n+1) β φ(k ) ≤ C φ(kn )β . n (kn+1 − kn )α Mα

¨ 7. HOLDER CONTINUITY

157

We now will prove by induction that φ(kn ) ≤ φ(0)2−nµ , α > 0. The case n = 0 is trivial. If the induction hypothesis with µ = β−1 holds for some n we must show it also holds for n + 1. Using (94) and the induction hypothesis we have

2α(n+1) φ(0)β 2−βnµ Mα  α C 1/α 2n+1 2−βn/(β−1) β ≤ φ(0) C 1/α φ(0)(β−1)/α 2β/(β−1)  n+1−βn/(β−1) α 2 ≤ φ(0) = φ(0)2−α(n+1)/(β−1) . 2β/(β−1)

φ(kn+1 ) ≤ C

 Our main theorem in this section is the following: Theorem 72. Let fi ∈ Lp for some p > d. Let u be a solution of X (95) − (aij uxi )xj = (fi )xi in Ω i

with u = 0 on ∂Ω. Then 1

1

kukL∞ ≤ Ckf kLp (Ω) |Ω| d − p .

(96)

Proof. Let k > 0 and multiply (95) by (u − k)+ . Then, after an integration by parts Z XZ + aij uxi (u − k)xj dx = − fi (u − k)+ xi dx. Ω

i



Define A(k) = {u > k} ∩ Ω. Then Z aij uxi uxj = − A(k)

XZ i

A(k)

fi uxi .

158

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

Therefore, since aij is elliptic Z

Z

2

2

|∇u| ≤ C

θ

1/2 Z

2

|∇u|

|fi |

A(k)

1/2 ,

A(k)

A(k)

which then yields Z

Z

2

|∇u| ≤ C A(k)

Z

2

p

|fi | ≤

2/p

|fi |

A(k)

|A(k)|1−2/p .

A(k)

If d > 2 (in the case d = 2 we can choose in the place of 2∗ any exponent q > 2, and proceed analogously), by Sobolev theorem Z

+ 2∗

2/2∗

((u − k) )

Z

+ 2∗

2/2∗

((u − k) )

=

A(k)



Z ≤C Z ≤

|∇(u − k)+ |2



|∇u|2 ≤ C

A(k)

X

kfi k2Lp φ(k)1−2/p ,

i

where φ(k) = |A(k)|. We also have for any h > k, 2

2/2∗

(h − k) φ(h)

Z

+ 2∗

2/2∗

((u − k) )

≤ A(h)

≤C

X

kfi k2Lp φ(k)(1−2/p) .

i

Therefore we obtain the following relation !2∗ φ(h) ≤ C

X

kfi kLp

i

where α = 2∗ , β = φ(M ) = 0 for some M ≤C

X i

1− p2 1− d2

φ(k)β , (h − k)α



= (1 − p2 ) 22 . Then lemma 71 implies that

kfi kLp φ(0)(β−1)/α ≤ C

X

kfi kLp |Ω|1/d−1/p .

i



¨ 7. HOLDER CONTINUITY

159

7.2. H¨ older continuity for the homogeneous equation. Now we consider weak solutions to the equation − (aij vxi )xj = 0,

(97) where aij satisfies

θ ≤ [aij ] ≤ Θ, but no regularity assumptions are imposed, as well as no boundary data. A function u ∈ W 1,p (U ) is a subsolution of (97) if Z aij uxi φxj ≤ 0, U 1,p0

for all φ ∈ W0 (U ) with φ ≥ 0. In a similar way, u is a supersolution if −u is a subsolution. Exercise 149. Let u be a smooth subsolution of (97). Show that −(aij uxi )xj ≤ 0. Lemma 73. Let u be a subsolution of (97) in W 1,p and ψ : R → R a non-decreasing convex function, such that ψ(u) ∈ W 1,p (e.g. ψ 0 bounded). Then ψ(u) is also a subsolution. Proof. Let v = ψ(u). Then Z Z aij ψ(u)xi φxj = aij ψ 0 (u)uxi φxj = Z Z 0 = aij uxi (ψ (u)φ)xj − aij uxi uxj ψ 00 (u)φ ≤ 0, since u is a subsolution, ψ 0 (u)φ is non negative and, by the convexity of ψ, the last term is negative.  The next lemma shows that the subsolutions of the equation have its supremum controlled by the Lp norm. This is not a surprising result since the main strategy in the study of elliptic equations is to try to establish control of ”‘high”’ norms in terms of ”‘low”’ norms, recall for instance what was discussed concerning energy methods.

160

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

Lemma 74. Let u be a subsolution (97). Then, for p > 0 and 0 < θ < 1, 1/p Z C + p esssupBRθ u ≤ . − (u ) (1 − θ)n/p BR Proof. Since u+ is a subsolution, we can assume that u ≥ 0. Case 1. p ≥ 2 Let φ = ξ 2 up−1 , with ξ ∈ Cc∞ . Then Z Z   aij uxi φxj = aij uxi (p − 1)up−2 uxj ξ 2 + 2ξξxj up−1 ≤ 0, which implies Z

p−2

u Since we have

2 2

Z

|Du| ξ ≤ C

up |Dξ|2 .

p D(up/2 ξ) = Dξup/2 + ξup/2−1 Du, 2 Z Z p/2 2 |D(u ξ)| ≤ C up |Dξ|2 .

consequently, by Sobolev’s inequality, Z 2/2∗ Z p/2 2∗ (u ξ) ≤ C up |Dξ|2 . Given 0 < ρ < R, let ξ ∈ Cc∞ with 0 ≤ ξ ≤ 1, ξ ≡ 1 in Bρ = B(x0 , ρ) and ξ ≡ 0 em BR = B(x0 , R)C . We can additionally assume that |Dξ| ≤

C . R−ρ

Then, for n ≥ 3 (for n < 3 the estimate is trivial by Sobolev’s theorem), "Z #(n−2)/n Z C pn/(n−2) up . (98) u ≤ 2 (R − ρ) Bρ BR pn

Thus we have obtained an estimate for the L n−2 norm in terms of the Lp norm. Unfortunately, these norms are computed in distinct sets. The main idea is to iterate this inequality and, at the same time, control the

¨ 7. HOLDER CONTINUITY

161

domains and the estimate’s constants in order to obtain a non-trivial estimate for the L∞ norm in terms of the Lp norm. For that, consider 1−θ Rk = R(θ + k ), 2 which satisfies 1−θ Rk − Rk+1 = k+1 R. 2 Let  k n pk = p . n−2 Then, applying estimate (98), with R = Rk , ρ = Rk+1 and p = pk , # n−2 "Z Z n C k+1 pk+1 upk , ≤ 2 4 u 2 R (1 − θ) BR BR k

k+1

that is 

kukLpk+1 (BRk+1 )

C ≤ 2 R (1 − θ)2

 p1

k

4

k+1 pk

kukLpk (BRk ) .

By iteration we obtain 

kukLpk+1 (BRk+1 ) Since

and

C ≤ 2 R (1 − θ)2

Pkj=0 p1

j

j+1 j=0 pj

Pk

4

kukLp (BR ) .

∞ X n 1 = , p 2p j=0 j

P∞

j+1 j=0 pj

is finite, we get kukLpk+1 (BRk+1 ) ≤

C kukLp (BR ) , [R(1 − θ)]n/p

where the last constant, C, is independent of k. Letting k → ∞ we conclude C kukL∞ (BRθ ) ≤ R−n/p kukLp (BR ) = (1 − θ)n/p Z 1/p C p = − u . (1 − θ)n/p BR

Case 2. 0 < p < 2

162

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

By the previous estimate, kukL∞ (BRθ )

C ≤ (1 − θ)n/2 Rn/2

Z

C ≤ (1 − θ)n/2 Rn/2

Z

2

1/2 ≤

u BR

p

1/2

1−p/2

kukL∞ (BR ) .

u BR

Using the inequality: ab ≤

a2/p b2/(2−p) + , 2/p 2/(2 − p)

which holds for 0 < p < 2, we obtain: kukL∞ (BRθ )

1 C ≤ kukL∞ (BR ) + 2 [(1 − θ)R]n/p

Z

p

u

1/p .

BR

If we define ϕ(t) = kukL∞ (Bt ) , we have, for s < t ≤ R, (99)

C 1 kukLp (BR ) . ϕ(s) ≤ ϕ(t) + 2 (t − s)n/p

We need now a technical lemma: Lemma 75. Let ϕ be a bounded non-decreasing function satisfying (99). Then, for s < t ≤ R, ϕ(s) ≤ CkukLp (BR ) (t − s)−n/p . Thus the lemma implies kukL∞ (BRθ ) ≤ C

kukLp (BR ) . (1 − θ)n/p Rn/p

Proof. Let ϕ satisfying 1 ϕ(s) ≤ ϕ(t) + a(t − s)−α , 2 for s < t. Let 0 < τ < 1 and si+1 = si + (1 − τ )τ i (t − s),

¨ 7. HOLDER CONTINUITY

163

with s−1 = s. Then 1 ϕ(si ) ≤ ϕ(si+1 ) + a(1 − τ )−α τ −iα (t − s)−α , 2 and, therefore by induction i−1 X 1 (1 − τ )−α τ −jα (t − s)−α 2−j . ϕ(s) ≤ i ϕ(si ) + a 2 j=0

Choosing τ sufficiently close to one 1 such that i → ∞,

τ −α 2

< 1 we have, as

ϕ(s−1 ) = ϕ(s) ≤ Ca(t − s)−α .  This ends the proof of the lemma.



The next step is to study estimates similar to the ones of lemma 74 for p < 0. In this case we obtain, however, the opposite inequality. Lemma 76. Let u be a non-negative supersolution. Then there exists δ > 0 and p0 > 0 such that Z 1/p0 p0 (100) essinf BR/2 u ≥ δ − u . BR

Proof. We will leave the following fact as an exercise: Exercise 150. Let u be a positive supersolution. Show that subsolution. Combining the last exercise with lemma 74 we obtain Z 1/p −1 −p esssupBR/2 u ≤ C − u , BR

for p > 0. In this way, Z −1/p Z 1/p Z −p p p essinf BR/2 u ≥ C − u − u − u , BR

BR

BR

1 u

is a

164

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

which implies (100) if we can prove Z  Z  −p p − u − u ≤ C, BR

BR

for some p > 0 and C > 0. To prove this inequality we need the John-Nirenberg lemma, whose proof is the subject of the next section. Lemma 77 (John-Nirenberg). Denote by Q a generic cube contained in U and Q0 ⊂ Q a generic subcube of Q. For f ∈ L1 , let |f |∗,Q be given by Z |f |∗,Q = sup − |f − fQ0 |dx, Q0 ⊂Q

Q0

where fQ0 denotes the average of f in Q0 . Then, if |f |∗,Q < ∞, there exist positive constants C1 , C2 and λ such that 1 |{x ∈ Q : |f (x) − fQ | ≥ λ|f |∗,Q }| ≤ C1 e−λC2 . |Q| We leave as an exercise the proof of the following corollary: Corollary 78. If |f |∗,Q < ∞ then for some  > 0 we have Z − ef < C, Q

independent of Q. The proof of the corollary is left as an exercise, which is a variation of the following lemma: Lemma 79. Let f ∈ Lp , f ≥ 0. Then Z Z ∞ p |f | = pλp−1 |{x : f (x) > λ}|dλ. 0

Proof. We have Z Z Z f (x) Z p p−1 |f | dx = pλ dλdx = 0

0



Z

pλp−1 χ{λ 0 to be determined. This suggests we should try to estimate | ln u|∗,Q = | ln v|∗,Q . Let φ(x) =

ξ2 . u(x)

Z −

Then ξ2 aij uxi uxj 2 + u

Z aij uxi

2ξξxj ≥ 0, u

which implies Z |Du|2 2 ξ ≤ C |Dξ|2 . u2 Let ξ ≡ 1 in Q0 and ξ ≡ 0 in the exterior of a cube with twice the sidelenght and same center. Then we conclude Z |D ln u|2 ≤ Cρn−2 , Z

Q0

where ρ is the sidelenght of Q0 . The Poincar´e inequality implies Z Z 2 2 | ln u − (ln u)Q0 | ≤ Cρ |D ln u|2 ≤ Cρn . Q0

Thus, Z

Q0

n/2

Z

| ln u − (ln u)Q0 | ≤ Cρ Q0

2

| ln u − (ln u)Q0 |

1/2

≤ Cρn .

Q0

Therefore | ln u|∗,Q < ∞, which ends the proof.



166

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

Theorem 80 (Harnack inequality). Let u be a positive solution. Then essinf BR/2 u ≥ C esssupBR/2 u. Proof. By the two previous lemmas we have Z 1/p0 p0 essinf BR/2 u ≥ δ − u ≥ C esssupBR/2 u. BR

 Using Harnack’s inequality, the H¨older continuity of u is a consequence of the following theorem Theorem 81 (deGiorgi-Nash-Moser). Let u be a solution of (97) . Then u is H¨older continuous. Furthermore, if we set M (R) = esssupBR u, m(R) = essinf BR u. and let ω(R) = M (R) − m(R), there exists γ < 1 such that ω(R/2) ≤ γω(R). Proof. The Harnack inequality implies, by subtracting m(R) −  to u, and letting  → 0, C[m(R/2) − m(R)] ≥ M (R/2) − m(R). Defining ω(r) = M (r) − m(r) we have ω(R/2) = M (R/2) − m(R/2)   1 ≤ M (R/2) − m(R) + [M (R/2) − m(R)] C   1 = 1− [M (R/2) − m(R)] C   1 ≤ 1− ω(R). C By induction we obtain ω(2−k R) ≤ η k ω(R),

¨ 7. HOLDER CONTINUITY

167

with η < 1. Therefore ω(ρ) ≤ Cρα , that is u is H¨older continuous.



7.3. John-Nirenberg Lemma. Before discussing the proof of John-Nirenberg lemma, we need to establish a version of the Calder´onZygmund decomposition: ˜ ⊃ Q the unique dyadic cube Lemma 82. Let Q be a dyadic cube and Q whose side is twice the size of Q. Let f ∈ L1 (Q0 ) and α > |fQ0 |. Then there exists a disjoint sequence of dyadic cubes Qj such that |fQ˜ j | ≤ α < |fQj |, and |f | ≤ α almost everywhere in Q0 \ ∪j Qj . Proof. We start with Q0 which we divide into 2n dyadic cubes Q0,k . Then we select those in which |fQ0,k | > α and we subdivide again the ones which are not selected. By continuing iteratively, we obtain a sequence of cubes Qj such that |fQj | > α and |fQ˜ j | ≤ α, ˜ j has not been selected. By Lebesgue differentiation theorem, since Q in the complement of ∪j Qj |f | ≤ α, almost everywhere, since no cube of the complement was selected.  Now we give the proof of John-Nirenberg lemma (lemma 77).

168

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

Proof. Without loss of generality we may assume that fQ = 0 and |f |∗,Q = 1. Let α0 > 0 and for each natural l apply lemma (82) with α = α0 l. Let {Qlj } be the sequence of cubes that are obtained in this way. Then, |fQlj | > lα0

|fQ˜ lj | ≤ lα0 ,

and for x 6∈ ∪Qlj |f (x)| ≤ lα0 . Now we are going to estimate ∪j Qlj . In the complement of this set |f | ≤ lα0 and therefore the previous estimate gives an upper bound: |{x : |f (x)| > lα0 }| . We are going to establish a recurrence relation between the estimate at l and the one at l + 1. This estimate will allow us to obtain exponential decay in l. Fix l and j, and suppose i is such taht Ql+1 ⊆ Qlj . Then i Z fQlj − fQl+1 ≤ − |f − fQlj | i

and therefore (101) X |Ql+1 i ||fQlj − fQl+1 | ≤

Ql+1 i

X

Z

i

i:Ql+1 ⊆Qlj i

i:Ql+1 ⊆Qlj i

Ql+1 i

|f − fQlj | ≤ |f |∗,Q |Qlj |.

If, for Ql+1 ⊆ Qlj , we obtain a lower bound for i |fQlj − fQl+1 |, i

P l+1 equation (101) yields a recurrence relation for the values of |Qi | P l as a function of |Qj |, by adding both sides over j. To obtain this

¨ 7. HOLDER CONTINUITY

169

lower bound, observe that |fQlj − fQl+1 | ≥ |fQl+1 | − |fQlj | i

i

≥ |fQl+1 | − |fQlj − fQ˜ lj | − |fQ˜ lj | i

≥ (l + 1)α0 − |fQlj − fQ˜ lj | − lα0 . However, if P is a dyadic subcube of Q with P˜ ⊆ Q, then Z Z 1 1 |fP − fP˜ | = (f − fP˜ ) ≤ |f − fP˜ | |P | P |P | P Z Z 2d 2d |f − fP˜ | ≤ |f − fP˜ | = |P˜ | P |P˜ | P˜ ≤ 2d |f |∗,Q ≤ 2d . Therefore |fQlj − fQl+1 | ≥ (l + 1)α0 − 2d − lα0 = α0 − 2d . i

If we choose α0 = 2 + 2d we obtain |fQlj − fQl+1 | ≥ 2, i

which implies X

|Ql+1 i | ≤

i

1X l |Qj |. 2 j

Therefore |{x : |f (x)| > lα0 }| ≤ 2−l−1 |Q|, which easily yields the lemma.



7.4. H¨ older continuity. Finally we use all the estimates in the previous sections to establish interior H¨older continuity. Theorem 83. Let u be a solution of (102)

− (aij uxi )xj = (fi )xk

in an open set U . Then u is H¨older continuous in any compact subset of U .

170

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

Proof. Write u = v + w where v is a solution of −(aij vxi )xj = (fi )xk , in B2R and v = 0 in ∂B2R . Therefore w solves −(aij wxi )xj = 0, in B2R and with arbitrary boundary data in ∂B2R . Then we have kvkL∞ (B2R ) ≤ CR1−n/p , where C depends on the Lp norm of f , ellipticity of aij but not on the solution u or R. Let ωw be the modulus of continuity of w. Then for all R0 < R we know that R0 ωw ( ) ≤ ηωw (R0 ), 4 for some 0 < η < 1. Hence ωu (R/4) ≤ CR1−d/p + ωw (R/4) ≤ CR1−n/p + ηωw (R) ˜ 1−d/p + ηωu (R). ≤ CR Then the H¨older continuity follows from next lemma: Lemma 84. Suppose ω(R/4) ≤ CRα + ηω(R). Then ω(R) ≤ CRγ . Proof. Suppose M>

sup R0 ≤R≤4R0

ω(R) , Rγ

to be chosen later as a function of γ. Then, for all R0 /4 ≤ R ≤ R0 we have ω(R) ≤ ηω(4R) + C(4R)α ≤ M η(4R)γ + C(4R)α ≤ M Rγ , if we choose γ < α sufficiently small, and then M large enough so that 4γ M η + 4α C < M.

8. SCHAUDER ESTIMATES

Now, if

R0 4i+1

≤R≤

R0 4i

171

we have

ω(R) ≤ (M η4γ + 4α C) Rγ ≤ M Rγ .  

8. Schauder estimates In this section we will prove that weak solutions of equations of the form (aij (x)vxi )xj = f are C 1,α , as long as both the coefficients aij and f are H¨older continuous functions. These are the so called Schauder estimates. We should observe that although we will carry out the proof for the scalar case, the argument is unchanged for elliptic systems, in contrast with the regularity results of the previous section.

8.1. Morrey and Campanato spaces. The key idea in Schauder estimates is to use the ellipticity of the equation to control the oscillation of the solution. For this we will need certain spaces of functions the Campanato and Morrey spaces, as well as some of its basic properties. For p ≥ 1 and λ ≥ 0 we define the Campanato seminorm  1/p Z −λ p [u]p,λ = sup ρ |u − ux,ρ | , x∈U,ρ>0

U (x,ρ)

where U (x, ρ) = U ∩ B(x, ρ) and ux,ρ

Z = − u. U (x,ρ)

To avoid technicalities, we assume that for ρ sufficiently small, |U (x, ρ)| ≥ cρ−n . In any case, our main objective is to establish interior estimates on U and not up to the boundary.

172

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

The Campanato space Lp,λ (U ) is the space of functions u ∈ Lp (U ) which satisfy kukp,λ ≡ kukLp + [u]p,λ < ∞. The Morrey space Lp,λ (U ) is the space of functions u ∈ Lp (U ) for which  1/p Z −λ p kukLp,λ ≡ sup sup ρ |u| < ∞. x∈U ρ>0

U (x,ρ)

Exercise 151. Show that [·]p,λ and k · kLp,λ are, respectively, a seminorm and a norm. Proposition 85. Depending on the relative values of λ, p and n we have the following isomorfisms: (i) If 0 ≤ λ < n then Lp,λ ' Lp,λ ; λ−n (ii) If n < λ < n + p then Lp,λ ' C 0, p ; (iii) Lp,0 = Lp and Lp,λ ' R if λ > n + p. Proof. To prove (i), we start by showing that [u]p,λ ≤ CkukLp,λ and then we will establish the opposite inequality. Let us start by observing that Z Z p (|u|p + |ux,ρ |p ) . |u − ux,ρ | ≤ C U (x,ρ)

U (x,ρ)

Then Jensen’s inequality implies Z |ux,ρ | ≤ − |u|p , p

U (x,ρ)

and, therefore, Z

p

Z

|ux,ρ | ≤ U (x,ρ)

|u|p .

U (x,ρ)

This implies, ρ

−λ

Z U (x,ρ)

that is, kukp,λ ≤ CkukLp,λ .

|u − ux,ρ |p ≤ CkukpLp,λ ,

8. SCHAUDER ESTIMATES

173

To prove the opposite inequality, we need some preliminary estimates. First, observe that Z Z p p |u| ≤ C|ux,ρ | |U (x, ρ)| + C |u − ux,ρ |p U (x,ρ)

U (x,ρ)

≤ Cρn |ux,ρ |p +

C[u]pp,λ ρλ .

Therefore (103)

ρ

−λ

Z

|u|p ≤ C[u]pp,λ + Cρn−λ |ux,ρ |p .

U (x,ρ)

Unfortunately, the norm Lp,λ does not control directly |ux,ρ |p . To use this estimate we need some auxiliary estimates. For R > r we have: Z p −n (104) |ux,R − ux,r | ≤ Cr (|ux,R − u|p + |ux,r − u|p ) U (x,r)

≤ Cr

−n

(Rλ + rλ )[u]pp,λ ≤ Cr−n Rλ [u]pp,λ .

Let R = R0 2−i , r = R0 2−i−1 and R0 > 1. Then (λ−n)/p (n−λ)i/p

|ux,R − ux,r | ≤ CR0

2

[u]p,λ .

Let ρ = R0 2−l−1 . Then |ux,ρ | = |ux,R0 2−l−1 | ≤ |ux,R0 2−l−1 − ux,R0 2−l | + |ux,R0 2−l − ux,R0 2−l+1 | + . . . + |ux,R0 /2 − ux,R0 | + |ux,R0 | ≤ |ux,R0 | + C[u]p,λ

l X

(λ−n)/p (n−λ)i/p

R0

2

i=0

≤ |ux,R0 | + Cρ(λ−n)/p [u]p,λ . Therefore |ux,ρ |p ≤ C|ux,R0 |p + Cρλ−n [u]pp,λ ≤ CkukpLp + Cρλ−n [u]pp,λ . By combining the last inequality with (103) and using λ < n, we have Z −λ ρ |u|p ≤ CkukpLp + [u]pp,λ . U (x,ρ)

In what concerts the second statement of the proposition, (ii), the inclusion λ−n C 0, p ⊂ Lp,λ

174

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

is elementary and is left as an exercise: Exercise 152. Show that C 0,

λ−n p

⊂ Lp,λ . λ−n

So we need to establish the opposite inequality, Lp,λ ⊂ C 0, p . Let u ∈ Lp,λ ∩ C 0 . Given x and y, let R = |x − y| and α = λ−n . We must p show that |u(x) − u(y)| ≤ CRα . By the triangle inequality |u(x) − u(y)| ≤ |u(x) − ux,2R | + |ux,2R − uy,2R | + |uy,2R − u(y)|. Applying (104) we obtain |ux,R − ux,R2−l−1 | ≤

l X

|ux,R2−i − ux,R2−(i+1) |

i=0 α

≤ CR [u]p,λ

l X

2−αi ≤ CRα [u]p,λ ,

i=0

where the constant C is independent of l. Therefore, by taking l → ∞, we obtain |ux,2R − u(x)|, |uy,2R − u(y)| ≤ CRα [u]p,λ . We also have, Z |U (x, 2R) ∩ U (y, 2R)||ux,2R − uy,2R | ≤

|ux,2R − u|+ U (x,2R)

Z |uy,2R − u|.

+ U (y,2R)

Since |U (x, 2R)∩U (y, 2R)| ≥ cRn , we obtain, using H¨older’s inequality, Z 1/p −n 1−1/p p |ux,2R − uy,2R | ≤ CR |U (x, 2R)| |ux,2R − u| + U (x,2R)

+ CR

−n

1−1/p

Z

|U (y, 2R)|

p

1/p

|uy,2R − u| U (y,2R)

≤ CRα [u]p,λ . The last statement of the proposition, (iii), is left as an exercise:

8. SCHAUDER ESTIMATES

175

Exercise 153. Prove (iii). Hint: observe that if |u(x) − u(y)| ≤ C|x − y|α for some α > 1 then u is constant.  8.2. Preliminary estimates. The next lemma gives a key estimate concerning the behavior of solutions of elliptic equations in the interior of U . Lemma 86. Let u ∈ H 1 be a solution of −(aij uxi )xj = fxk , with aij elliptic satisfying θ ≤ aij ≤ Θ. Then, for any r and x for which B(x, r) ⊂ U , we have:   Z Z Z 2 2 −2 2 |f | . |Du| ≤ C r |u| + B(x, r2 )

B(x,r)

B(x,r)

Proof. Without loss of generality we can assume x = 0. Let ξ ∈ Cc∞ with ξ ≡ 1 in B(0, 21 ) and ξ ≡ 0 in B(0, 1)C , and η(x) = ξ( xr ). Then Z Z 2 aij (ηu)xi (ηu)xj |D(ηu)| ≤ θ Br Br Z Z 2 = aij uxi (η u)xj + aij ηxi u(ηu)xj Br Br Z − aij uxi ηηxj u Br Z Z 2 = f xk η u + aij (ηxi u(ηu)xj − uxi ηηxj u) Br Br Z Z Z Z C 2 2 2 2 ≤− f (η u)xk + 2 u + η |Du| +  |D(ηu)|2 r B Br Br Br Z Z Z r C u2 + C η 2 |Du|2 , ≤C f2 + 2 r Br Br Br which implies, by choosing  sufficiently small, Z Z Z C 2 2 |Du| ≤ C f + 2 u2 . r Br Br/2 Br

176

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

 Exercise 154. Show that if aij is constant and u ∈ H 1 satisfies −(aij uxi )xj = 0, then, for all multiindices α ∈ Zn , kDα ukL2 (Br/2 ) ≤ Cr−|α| kukL2 (Br ) . 8.3. Schauder estimates. The main objective of this section is to prove: Theorem 87. Let u ∈ H 1 be a solution of −(aij uxi )xj = fxk . Then (i) If aij is constant and f ∈ L2,λ then Du ∈ L2,λ loc , for 0 ≤ λ < n + 2. (ii) If aij (x) is continuous and f ∈ L2,λ then Du ∈ L2,λ loc , for 0 ≤ λ < n. (iii) If aij (x), f (x) ∈ C 0,α with 0 < α < 1 then 0,β Du ∈ Cloc ,

for some β > 0. Proof. (i). Suppose that B(x, 2R) ⊂ U (without loss of generality assume x = 0). Let w be the unique solution of −(aij wxi )xj = fxk in H01 (B2R ), whose existence is guaranteed by Lax-Milgram theorem (theorem 68).

8. SCHAUDER ESTIMATES

177

We have the following estimate: Z Z Z 2 |Dw| ≤ C aij wxi wxj = −C (f − γ)wxk B2R B2R B2R Z Z 1 2 |Dw| + C |f − γ|2 , ≤ 2 B2R B2R for any constant γ. Define γ = f0,2R ≡ f2R , to simplify the notation, we obtain Z Z 2 (105) |Dw| ≤ C |f − f2R |2 ≤ CRλ [f ]22,λ . B2R

B2R

We will use w to decompose the solution into two parts: u = v + w. w, by definition, satisfies w = 0 in the boundary of B2R and −(aij wxi )xj = fxk . Consequently, v = u − w has unknown boundary data but it satisfies the homogeneous equation −(aij vxi )xj = 0. By hypothesis, the coefficients are constant, therefore v is C ∞ (exercise 154) and, by Poincar´e’s inequality for ρ < 2R Z Z 2 2 |Dv − (Dv)ρ | ≤ Cρ |D2 v|2 . Bρ



It is important to observe that in the Poincar´e the dependence of the constant in ρ is exactly the one used previously: Exercise 155. Show that the Poincar´e inequality in Bρ has the form Z Z 2 2 |u − uρ | ≤ Cρ |Du|2 , Bρ



where C does not depend on ρ. Exercise 156. Use the Fourier transform to prove the following interpolation inequality 1−θ θ kukH k+θ (Rd ) ≤ CkukH k (Rd ) kukH k+1 (Rd ) .

178

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

Recall that for s ∈ R the norm H s is defined by Z 2 (1 + |ξ|2 )s |ˆ u|2 dξ. kukH s (Rd ) = Rd

Exercise 157. Let 1 ≤ p0 , p1 ≤ ∞. Prove the following interpolation inequality kukLpθ ≤ kukθLp0 kukL1−θ p1 , where 1 θ 1−θ = + . pθ p0 p1 Using interpolation techniques and the Fourier transform, it is also possible to prove the following version of the Sobolev theorem: Theorem 88. Let u ∈ H s , where 0 < s < n2 . Let

1 p∗

=

1 2

− ns . Then

kukLp∗ ≤ CkukH s . Proof. If s is integer, this the standard Sobolev theorem, for s fractionary we can use interpolation.  Let v˜ = D2 v. Since the coefficients aij are constant −(aij v˜xi )xj = 0. Therefore, for 1 ≤ p < ∞, Z 0 |˜ v |2 ≤ ρn/p k˜ v k2L2p (BR/2 ) Bρ

0

≤ Cρn/p k˜ v k2H n/(2p0 ) (B

R/2 )

,

using Sobolev’s theorem (theorem 88). By exercise 154, and using interpolation, we obtain Z 0 ρn/p 2 v k2L2 (BR ) . (106) |˜ v | ≤ C n/p0 k˜ R Bρ

8. SCHAUDER ESTIMATES

179

As a conclusion, given ν sufficiently small, there exists p, sufficiently large, such that pn0 = n − ν. By the Poincar´e inequality Z

2

2

Z

|D2 v|2

|Dv − (Dv)ρ | ≤ Cρ Bρ

Bρ n+2−ν

ρ ≤ C n−ν R

Z

2

 ρ n+2−ν Z

2

|D v| ≤ C

R

BR

|Dv − (Dv)2R |2 ,

B2R

where in the last inequality we have applied lemma 86. Let ρ < Z

R . 2

Then Z

2

2

|Du − (Du)ρ | ≤ 2

|Dv − (Dv)ρ | + 2





≤C

Z

|Dw − (Dw)ρ |2



 ρ n+2−ν Z R

2

Z

|Dv − (Dv)2R | + C

B2R

|Dw|2



:= T1 + T2 , where T1 = C

 ρ n+2−ν Z R B2R  ρ n+2−ν Z

|Dv − (Dv)2R |2 =

|Du − (Du)2R − Dw + (Dw)2R |2 ≤ R B2R  ρ n+2−ν Z  ρ n+2−ν Z 2 ≤C |Du − (Du)2R | + C |Dw|2 R R B2R B2R

=C

and Z

|Dw|2 .

T2 = C Bρ

Let Z Φ(ρ) = sup ρ1 ≤ρ

|Du − (Du)ρ1 |2 .

Bρ1

Then, using the estimate (105), we have Φ(ρ) ≤ C

 ρ n+2−ν R

Φ(2R) + CRλ [f ]22,λ .

180

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

From the previous inequality we conclude, when λ < n + 2 and using the lemma that we will prove next, that    ρ λ λ 2 Φ(ρ) ≤ C Φ(2R) + [f ]2,λ ρ , R which implies, when λ < n + 2, Z −λ sup ρ |Du − (Du)ρ |2 ≤ C(kuk2L2 + kf k2L2,λ ). ρ< R 2



Lemma 89. Either Φ ≥ 0, not decreasing and ρ < R/2, with R < R0 and h ρ α i Φ(ρ) ≤ aΦ(2R) +  + bRβ 2R with 0 < β < α and  > 0. Then, if  is small enough,   γ  ρ β Φ(ρ) ≤ C Φ(2R0 ) + bρ , R0 with β < γ < α. Proof. Let β < γ < α and θ sufficiently small such that 2aθα < θγ . Suppose  < θα , such that a < θγ /2. Then Φ(2θR) ≤ Φ(2R)θγ + bRβ . Exercise 158. Estimate Φ(2θ2 R) and Φ(2θ3 R), applying inductively the previous inequality. By induction, Φ(2θk+1 R) ≤ θγ(k+1) Φ(2R) + bRβ

k X

! θγj θβ(k−j)

j=0

≤θ

γ(k+1)

β βk

Φ(2R) + bR θ c(θ).

Therefore, given ρ and k satisfying 2θk+1 R ≤ ρ ≤ 2θk R,

8. SCHAUDER ESTIMATES

181

we have Φ(ρ) ≤ Φ(2θk R) ≤ θγ(k) Φ(2R) + b(Rθk )β c(θ) h ρ γ i ≤C Φ(2R) + b˜ c(θ)ρβ . R 

(ii). We have −(aij (x0 )uxi )xj = [(aij (x) − aij (x0 ))uxi + δkj f ]xj , that is L0 u = g, where L0 is an operator with constant coefficients. To simplify notation, we take x0 = 0. Let w ∈ H01 (BR ), defined by L0 w = g and v = u − w. Let v¯ = Dv and so, L0 v¯ = 0. Proceeding as in (106), Z  ρ n−ν Z 2 |¯ v |2 . |¯ v| ≤ C R BR Bρ Consequently, Z Z 2 |Du| ≤ 2 Bρ

Z

2



≤C ≤C

|Dw|2 ≤

|¯ v| + 2 Bρ

 ρ n−ν Z R

|¯ v| + 2

BR

|Dw|2 ≤



  Z ρ n−ν +1 |Dw|2 . |Du| + C R BR BR

 ρ n−ν Z R

Z

2

2

However, w depends implicitly on u, and therefore we must proceed with caution. Z Z 2 |Dw| ≤ C aij (0)wxi wxj BR BR Z Z Z = gw = − f wxk − (aij (x) − aij (0))uxi wxj BR BR BR Z Z Z 1 2 2 2 ≤C |f | + |Dw| + C(ω(R)) |Du|2 , 4 BR BR BR

182

3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

where ω(R) is the modulus of continuity of aij . Therefore Z

2

|Du| ≤ C Bρ

 ρ n−ν Z R

2

|Du| + CR

λ

kf k2L2,λ

2

Z

+ C(ω(R))

BR

|Du|2 .

BR

Thus, for R so that ω(R) is sufficiently small and applying lemma 89 to the function Z Φ(ρ) = sup |Du|2 , ρ˜≤ρ 2,λ

we obtain Du ∈ L

Bρ˜

if λ < n.

(iii). Let g and w as in (ii). We have Z Z  ρ n+2−ν Z 2 2 |Du−(Du)ρ | ≤ C |Dw|2 |Du − (Du)ρ | + R Bρ B2R B2R Z  ρ n+2−ν Z ≤C |Du − (Du)ρ |2 + |f − f2R |2 R B B2R Z 2R |Du|2 . + Cω(2R)2 B2R

By hypothesis we have ω(R) ≤ Rα and Z |f − f2R |2 ≤ CRn+2α . B2R

For λ0 < n, part (ii) implies Du ∈ L2,λ0 . Choosing Z |Du − (Du)ρ˜|2 Φ(ρ) = sup ρ˜≤ρ

we obtain Φ(ρ) ≤ C

Bρ˜

 ρ n+2−ν R

Φ(2R) + CRλ0 +2α ,

and so Φ(ρ) ≤ Cρλ0 +2α . 

4

Optimal control and viscosity solutions

This chapter is dedicated to the study of deterministic optimal control problems and its connection with Hamilton-Jacobi equations. put some more details concerning controlled dynamics, motivation, define a control space A typical problem in optimal control, which is studied in detail in this chapter, is the terminal value optimal control problem. This problem consists in determining the optimal trajectories x(·) which minimize Z t1 L(x, u)ds + ψ(x(t1 )), J[u; x, t] = t

among all controls u(·) : [t, t1 ] → Rn and all (continuous) trajectories x with a initial condition x(t) = x and which are (almost everywhere in time) solutions to the controlled dynamics x˙ = f (x, u). The value function V is defined as (107)

V (x, t) = inf J[u; x, t]

in which the infimum is taken over all controls. An important case is the ”calculus of variations setting”. In this case, f (x, u) = u, and the optimal trajectories x(·) are solutions to the Euler-Lagrange equation d ∂L ∂L ˙ − ˙ = 0, (x, x) (x, x) dt ∂v ∂x 183

184

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

˙ is a solution of Hamilton’s equations: and p = −Dv L(x, x) x˙ = −Dp H(p, x),

p˙ = Dx H(p, x).

This problem was studied in detail in chapter 2. However we will revisit it, and generalize the previous results by allowing more general Lagrangians. In fact, will work under the following assumptions: L(x, v) : R2n → R, 2 x ∈ Rn , v ∈ Rn , is a C ∞ function, strictly convex em v, i.e., Dvv L is positive definite, and satisfying the coercivity condition

L(x, v, t) = ∞, |v|→∞ |v| lim

for each (x, t); without loss of generality, we may also assume that L(x, v, t) ≥ 0, by adding a constant if necessary. We will also assume that L(x, 0, t) ≤ c1 ,

|Dx L| ≤ c2 L + c3 ,

for suitable constants c1 , c2 and c3 ; finally we assume that there exists a function C(R) such that 2 |Dxx L| ≤ C(R),

|Dv L| ≤ C(R)

whenever |v| ≤ R. The o terminal cost, ψ, is assumed to be a bounded Lipschitz function. Example 40. Note that, although the conditions on L are quite technical, they are fulfilled by a wide class of Lagrangians, for instance 1 L(x, v) = v T A(x)v − V (x), 2 where A and V are C ∞ , Zn -periodic is x, and A(x) is positive definite. J Before considering the ”calculus of variations setting” we study a simpler case. Let U , the control space, be a compact convex set. We restrict the class of admissible controls by requiring u(s) ∈ U , for all t ≤ s ≤ t1 . Furthermore, we suppose that L(x, u) is a bounded

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

185

continuous function, convex in u. We suppose that the function f (x, u) satisfies the following Lipschitz condition |f (x, u) − f (y, u)| ≤ C|x − y|. To establish existence of optimal solutions we simplify even further by assuming that f (x, u) has the form (108)

f (x, u) = A(x)u + B(x),

where A and B are Lipschitz continuous functions. In section 1, we start the rigorous study of optimal control problems by establishing basic properties. The dynamic programming principle is proved in §2. The analog of the Euler-Lagrange equation for optimal control problems is the Pontryagin maximum principle, which will be studied in §3. In §4, we show that, if the value function V is differentiable, it satisfies the Hamilton-Jacobi partial differential equation −Vt + H(Dx V, x) = 0, in which H(p, x), the Hamiltonian, is the (generalized) Legendre transform of the Lagrangian L (109)

H(p, x) = sup −p · f (x, v) − L(x, v). v∈U

It is well known that first order partial differential equations such as the Hamilton-Jacobi equation may not admit classical solutions. Using the method of characteristics, the next exercise gives an example of nonexistence of smooth solutions: Exercise 159. Solve, using the method of characteristics, the equation  u + u 2 = 0 x ∈ R, t > 0 t x u(x, 0) = ±x2 . It is therefore necessary to consider weak solutions to the HamiltonJacobi equation: viscosity solutions. In section §9 we develop the theory of viscosity solutions for Hamilton-Jacobi equations, and show that

186

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

the value function is the unique viscosity solution of the HamiltonJacobi equation. Finally, in §10 we address the stationary optimal control problem which corresponds to the Hamilton-Jacobi equation H(Dx u, x) = H, and the discounted cost infinite horizon problem, whose HamiltonJacobi equation is αu + H(Du, x) = 0. Main references on optimal control and viscosity solutions are [BCD97], [FS93], [Lio82], [Bar94], and [Eva98b].

1. Elementary examples and properties In this section we establish some elementary properties and study some explicit examples. Proposition 90. The value function V satisfies the following inequalities −kψk∞ ≤ V ≤ c1 |t1 − t| + kψk∞ . Proof. The first inequality follows from L ≥ 0. To obtain the second inequality it is enough to observe that V ≤ J(x, t; 0) ≤ c1 |t1 − t| + kψk∞ .  Example 41 (Lax-Hopf formula). Suppose that L(x, v) ≡ L(v), L convex in v and coercive. Assume further that f (x, v) = v. By Jensen’s inequality     Z t1 Z t1 1 y−x 1 ˙ ˙ L(x(s)) ≥L x(s) =L , t1 − t t t1 − t t t1 − t

1. ELEMENTARY EXAMPLES AND PROPERTIES

187

where y = x(t1 ). Therefore, to solve the terminal value optimal control problem, it is enough to consider constant controls of the form u(s) = y−x . Thus t1 −t     y−x + ψ(y) , V (x, t) = infn (t1 − t)L y∈R t1 − t and, consequently, the infimum is a minimum. Thus Lax-Hopf formula gives an explicit solution to the optimal control problem. J Exercise 160. Suppose Q and A be n×n constant positive definite matrices. Let L(v) = 12 v T Qv and ψ(y) = 12 y T Ay. Use Lax-Hopf formula to determine V (x, t). Proposition 91. Let ψ1 (x) and ψ2 (x) be continuous functions such that ψ1 ≤ ψ2 , and V1 (x, t) and V2 (x, t) the corresponding value functions. Then V1 (x, t) ≤ V2 (x, t). Proof. Fix  > 0. Then there exists an almost optimal control u and corresponding trajectory x such that Z t1 V2 (x, t) > L(x (s), u (s), s)ds + ψ2 (x (t1 )) − . t

Clearly Z V1 (x, t) ≤

t1

L(x (s), u (s), s)ds + ψ1 (x (t1 )),

t

and therefore V1 (x, t) − V2 (x, t) ≤ ψ1 (x (t1 )) − ψ2 (x (t1 )) +  ≤ . Since  is arbitrary, this ends the proof.



An important corollary is the continuity of the value function (with respect to the L∞ norm) on the terminal value.

188

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

Corollary 92. Let ψ1 (x) and ψ2 (x) be continuous functions and V1 (x, t) and V2 (x, t) the corresponding value functions. Then sup |V1 (x, t) − V2 (x, t)| ≤ sup |ψ1 (x) − ψ2 (x)|. x

x

Proof. Note that ψ1 ≤ ψ˜2 ≡ ψ2 + sup |ψ1 (y) − ψ2 (y)|. y

Let V˜2 be the value function corresponding to ψ˜2 . Clearly, V˜2 = V2 + sup |ψ1 (y) − ψ2 (y)|. y

By the previous proposition, V1 − V˜2 ≤ 0, which implies V1 − V2 ≤ sup |ψ1 (y) − ψ2 (y)|. y

By reverting the roles of V1 and V2 we obtain the other inequality. 

2. Dynamic programming principle The dynamic programming principle, that we prove in the next theorem, is simply a semigroup property that the value function evolution satisfies. Theorem 93 (Dynamic programming principle). Suppose that t0 ≤ t ≤ t0 ≤ t1 . Then "Z 0 # t

(110)

L(x(s), u(s), s)ds + V (y, t0 ) ,

V (x, t) = inf u

t

where x(t) = x and x˙ = f (x, u).

2. DYNAMIC PROGRAMMING PRINCIPLE

189

Proof. Denote by V˜ (x, t) the right hand side of (110). For fixed  > 0, let u be an almost optimal control for V (x, t), and x (s) the corresponding trajectory trajectory , i.e., J(x, t; x˙  ) ≤ V (x, t) + . We claim that V˜ (x, t) ≤ V (x, t) + . To check this statement, let x(·) = x (·) and y = x (t0 ). Then Z t0 ˜ L(x (s), u (s), s)ds + V (y, t0 ). V (x, t) ≤ t

Additionally V (y, t0 ) ≤ J(y, t0 ; u ). Therefore V˜ (x, t) ≤ J(x, t; u ) ≤ V (x, t) + , and, since  is arbitrary, V˜ (x, t) ≤ V (x, t). To prove the opposite inequality, we will proceed by contradiction. Therefore, if V˜ (x, t) < V (x, t), we could choose  > 0 and a control u] such that Z t0 L(x] (s), u] (s), s)ds + V (y, t0 ) < V (x, t) − , t

where x˙ = f (x] , u] ), x] (t) = x, and y = x] (t0 ). Choose u[ such that  J(y, t0 ; u[ ) ≤ V (y, t0 ) + 2 ? Define u as  u? (s) = u] (s) for s < t0 u? (s) = u[ (s) for t0 < s. ]

So, we would have t0

Z

L(x] (s), u] (s), s)ds + V (y, t0 ) ≥

V (x, t) −  > t

Z

t0



L(x] (s), u] (s), s)ds + J(y, t0 ; u[ ) −

t

= J(x, t; u? ) − which is a contradiction.

 = 2

  ≥ V (x, t) − , 2 2 

190

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

3. Pontryagin maximum principle In this section we assume the control space U is bounded and that there exists an optimal control u∗ and corresponding optimal trajectory x∗ . We assume also that the terminal data ψ is differentiable. Let r ∈ [t, t1 ) be a point where u∗ is strongly approximately continuous, i.e., Z 1 r+δ ∗ ϕ(u (r)) = lim ϕ(u∗ (s))ds, δ→0 δ r for all continuous functions ϕ. Denote by Ξ0 the fundamental solution of ξ˙0 = Dx f (x∗ , u∗ )ξ0 ,

(111) with Ξ0 (r) = I. Let p∗ be given by (112) (113)

p∗ (r) =Dx ψ(xR (t1 ))Ξ0 (t1 )f (x∗ (r), u∗ (r)) Z t1 + Dx L(x∗ (s), u∗ (s), s)Ξ0 (s)f (x∗ (r), u∗ (r))ds. r

Lemma 94 (Pontryagin maximum principle). Suppose that ψ is differentiable. Then, for almost all r ∈ [t, t1 ), (114)

f (x∗ (r), u∗ (r)) · p∗ (r) + L(x∗ (r), u∗ (r), r) = min [f (x∗ , v) · p∗ (r) + L(x∗ (r), v, r)] . v∈U

Proof. Let v ∈ U . For almost all r ∈ [t0 , t1 ) u∗ is strongly approximately continuous (see [EG92]). Let r be one of these points. Define  v if r < s < r + δ uδ (s) = u∗ (s) otherwise,

3. PONTRYAGIN MAXIMUM PRINCIPLE

191

and  ∗   x (s) ifR t < s < r xδ (s) = x∗ (r) + rs f (x∗δ , v) if r < s < r + δ    ∗ x (s) + δξδ if r + δ < s < t1 , where 1 ξδ (r + δ) = δ

r+δ

Z

[f (x∗δ (s), v) − f (x∗ (s), u∗ (s))] ds,

r



and yδ = x (s) + δξδ solves, for r + δ < s < t1 , y˙ δ = f (yδ , u∗ ). Observe that ξ0 (r) = lim ξδ (r + δ) = f (x∗ (r), v) − f (x∗ (r), u∗ (r)). δ→0

Then, as δ → 0, ξδ converges to the solution ξ0 of (111). Thus ξ0 (s) = Ξ0 (s) (f (x∗ (r), v) − f (x∗ (r), u∗ (r))). Clearly ∗

Z

J(t, x; u ) ≤

t1

L(xδ (s), uδ (s), s)ds + ψ(x∗ (t1 ) + δξδ ).

t

This last inequality implies Z 1 r+δ [L(xδ (s), v, s) − L(x∗ (s), u∗ (s), s)] ds+ δ r Z 1 t1 + [L(x∗ (s) + δξδ , u∗ (s), s) − L(x∗ (s), u∗ (s), s)] ds+ δ r+δ 1 + [ψ(x∗ (t1 ) + δξδ ) − ψ(x∗ (t1 ))] ≥ 0. δ When δ → 0, the first term converges to L(x∗ (r), v, r) − L(x∗ (r), u∗ (r), r), since u∗ is strongly approximately continuous. The second term tends to Z t1 Dx L(x∗ (s), u∗ (s), s)ξ0 (s)ds, r

whereas the third one has the following limit: Dx ψ(xR (t1 )) · ξ0 (t1 )).

192

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

This implies that for almost all r r, L(x∗ (r), v, r) − L(x∗ (r), u∗ (r), r) + p∗ (r) · (f (x∗ (r), v) − f (x∗ (r), u∗ (r))) ≥ 0. consequently f (x∗ (r), u∗ (r)) · p∗ (r) + L(x∗ (r), u∗ (r), r) = min [f (x∗ (r), v) · p∗ (r) + L(xR (r), v, r)] , v∈U

as required.

 4. The Hamilton-Jacobi equation

Proposition 95. Suppose the value function is C 1 . Let r ∈ [t, t1 ) be a point where u∗ is strongly approximately continuous. Then p∗ (r) = Dx V (x, r). Proof. Let u∗ be an optimal control for the initial condition (x, r). For y ∈ Rn and δ > 0 consider the solution x˙ δ = f (xδ , u∗ ), with initial condition xδ (t) = x + δy. Then ∂xδ (s) = Ξ0 (s)y. ∂δ δ=0

Since for all δ Z V (x + δy, r) ≤

t1

L(xδ , u∗ )ds + ψ(xδ (t1 )),

r

by differentiating with respect to δ we obtain Z t1 Dx V (x, r)y = Dx L(x, u∗ )Ξ0 (s)yds + Dx ψ(x(t1 ))Ξ0 (t1 )y, r

which implies the result. Theorem 96. Suppose the value function V is C 1 . Then it solves (115)

− Vt + H(Dx V, x) = 0.



5. VERIFICATION THEOREM

193

Proof. Consider an optimal trajectory x∗ Z t1 ∗ L(x∗ (s), u∗ (s))ds. V (x (t), t) = t

Then, by differentiating with respect to t we have Vt (x∗ (t), t) + Dx V (x∗ (t), t)f (x∗ (t), u∗ (t)) + L(x∗ (t), u∗ (t)) = 0. Which by Pontryagin maximum principle is equivalent to the HamiltonJacobi equation (115).  Exercise 161. Let M (t), N (t) be n×n matrices with time-differentiable coefficients. Suppose that is N invertible. Let D be a n × n constant matrix. Consider the Lagrangian 1 1 L(x, v) = xT M (t)x + v T N (t)v 2 2 1 T and the terminal condition ψ = 2 x Dx. Show that there exists a solution to the Hamilton-Jacobi with terminal condition ψ at t = T of the form 1 V = xT P (t)x, 2 where P (t) satisfies the Ricatti equation P˙ = P T N −1 P − M and P (T ) = D.

5. Verification theorem Theorem 97. Let L(x, v) be a C 1 Lagrangian, strictly convex in v, and let f (x, u) a control law satisfying (108), and H the generalized Legendre transform (109) of L. Let Φ(x, t) a classical solution to the Hamilton-Jacobi equation (116)

− Φt + H(Dx Φ, x) = 0

on the time interval [0, T ], with terminal data Φ(x, T ) = ϕ(x). Then, for all 0 ≤ t ≤ T , Φ(x, t) = V (x, t), where V is the value function.

194

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

Proof. Let x be any trajectory satisfying x˙ = f (x, u). Then Z

T

ϕ(x(T )) − Φ(x(t), t) = t

Z

d Φ(x(s), s)ds ds

T

Dx Φ(x(s), s) · f (x, u) + Φs (x(s), s)ds.

= t

RT Adding t L(x(s), u(s))ds+Φ(x(t), t) to the above equality and taking the infimum over all controls u, we obtain  Z T  L(x(s), u(s))ds + ϕ x(T ) inf t T

Z

 Φs (x(s), s) + L(x(s), u(s)) + Dx Φ(x(s), s) · f (x, u)ds .

= Φ(x(t), t) + inf t

Now recall that for any v, −H(p, x) ≤ L(x, v) + p · f (x, v), therefore  Z T  ˙ L(x(s), x(s))ds + ϕ x(T ) inf t T

Z

 Φs (x(s), s) + H(Dx Φ(x(s), s), x(s)) ds

≥ Φ(x(t), t) + inf

 = Φ(x(t), t).

t

Let r(x, t) be uniquely defined as (117)

r(x, t) ∈ argminv∈U L(x, v) + Dx Φ(x, t) · f (x, v).

A simple argument shows that r is a continuous function. Now consider the trajectory x given by solving the following differential equation ˙ x(s) = f (x, r(x(s), s)),

6. EXISTENCE OF OPTIMAL CONTROLS - BOUNDED CONTROL SPACE195

with initial condition x(t) = x. Note that since the right-hand side is continuous there is a solution, although it may not be unique. Then Z T   ˙ inf L(x(s), x(s))ds + ϕ x(T ) t

≤ Φ(x(t), t) +

Z T

  Φs x(s), s − H Dx Φ(x(s), s), x(s) ds

t

= Φ(x(t), t), which ends the proof.



We should observe from the proof that (117) gives an optimal feedback law for the optimal control, provided we can find a solution to the Hamilton-Jacobi equation.

6. Existence of optimal controls - bounded control space We now give a proof of the existence of optimal controls for bounded control space. The unbounded case will be addressed in §8. Lemma 98. Suppose that f is as in (108). Then J is weakly lower semicontinuous, with respect to weak-* convergence in L∞ . ∗

Proof. Let un be a sequence of controls such that un *u in L∞ [t, t1 ]. Then, by using Ascoli-Arzela theorem, we can extract a subsequence such that xn (·) converges uniformly to x(·). Furthermore because the control law (108) is linear we have x˙ = f (x, u). We have Z

t1

[L(xn (s), un (s), s) − L(x(s), un (s), s)] ds+

J(x, t; un ) = t

Z +

t1

L(x(s), un (s), s)ds + ψ(xn (t1 )). t

196

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

Rt The first term, t 1 [L(xn (s), un (s), s) − L(x(s), un (s), s)] ds, converges to zero. Similarly, ψ(xn (t1 )) → ψ(x(t1 )). Finally, the convexity of L implies L(x(s), un (s), s) ≥ L(x(s), u(s), s) + Dv L(x(s), u(s), s)(un (s) − u(s)). Since un * u Z

t1

Dv L(x(s), u(s), s)(un (s) − u(s))ds → 0. t

Hence lim inf J(x, t; un ) ≥ J(x, t; u), and so J is weakly lower semicontinuous.



Using the previous result we can now state and prove our first existence result. Lemma 99. Suppose the control set U is bounded. There exists a minimizer u∗ of J in U . Proof. Let un be a minimizing sequence, that is, such that J(x, t; un ) → inf J(x, t; u). u∈UR

Because this sequence is bounded in L∞ , by Banach-Alaoglu theorem ∗ we can extract a sequence un *u∗ . Clearly, we have u∗ ∈ U. We claim now that J(x, t; u∗ ) = inf J(x, t; u). u∈U

This just follows from the weak lower semicontinuity: inf J(x, t; u) ≤ J(x, t; u∗ ) ≤ lim inf J(x, t; un ) = inf J(x, t; u),

u∈U

which ends the proof.

u∈U



Example 42 (Bang-Bang principle). Consider the case of a bounded closed convex control space U and suppose the Lagrangian vanishes. Suppose f (x, v) = v and that the terminal value ψ is convex.

7. SUB AND SUPERDIFFERENTIALS

197

In this setting we first observe that the set of all optimal controls is convex. As such it admits an extreme point u∗ . We claim that u∗ takes values on ∂U . To see this choose a time r and suppose that for some  there is a set of positive measure in [r, r + ] for which u∗ is in the interior of U . Then there exists an L∞ function ν supported on this set such that R r+ dν = 0, and such that u∗ ± ν is an admissible control. By our r assumptions it is also an optimal control. It is clear then that u∗ is not an extreme point which is a contradiction. J Exercise 162. Show that the Bang-Bang principle also holds if the Lagrangian is independent on the state variable x, that is L ≡ L(v). 7. Sub and superdifferentials Let ψ : Rn → R be a continuous function. The superdifferential Dx+ ψ(x) of ψ at x is the set of vectors p ∈ Rn such that lim sup |v|→0

ψ(x + v) − ψ(x) − p · v ≤ 0. |v|

Consequently, p ∈ Dx+ ψ(x) if and only if ψ(x + v) ≤ ψ(x) + p · v + o(|v|), as |v| → 0. Similarly, the subdifferential, Dx− ψ(x), of ψ at x is the set of vectors p such that lim inf |v|→0

ψ(x + v) − ψ(x) − p · v ≥ 0. |v|

Exercise 163. Show that if u : Rn → R has a maximum (resp. minimum) at x0 then 0 ∈ D+ u(x0 ) (resp. D− u(x0 )). We can regard these sets as one-sided derivatives. In fact, ψ is differentiable then Dx− ψ(x) = Dx+ ψ(x) = {Dx ψ(x)}. More precisely,

198

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

Proposition 100. If Dx− ψ(x), Dx+ ψ(x) 6= ∅ then Dx− ψ(x) = Dx+ ψ(x) = {p} and ψ is differentiable at x with Dx ψ = p. Conversely, if ψ is differentiable at x then Dx− ψ(x) = Dx+ ψ(x) = {Dx ψ(x)}. Proof. Suppose that Dx− ψ(x) and Dx+ ψ(x) are both non-empty. Then we claim that these two sets agree and have a single point p. To see this, take p− ∈ Dx− ψ(x) and p+ ∈ Dx+ ψ(x). Then ψ(x + v) − ψ(x) − p− · v ≥0 |v|→0 |v| ψ(x + v) − ψ(x) − p+ · v lim sup ≤ 0. |v| |v|→0

lim inf

By subtracting these two identities lim inf |v|→0

(p+ − p− ) · v ≥ 0. |v| +



In particular, by choosing v = − |pp−−p , we obtain −p+ | −|p− − p+ | ≥ 0, which implies p− = p+ ≡ p. Additionally p satisfies ψ(x + v) − ψ(x) − p · v = 0, |v|→0 |v| lim

and, therefore, Dx ψ = p. To prove the converse it suffices to observe that if ψ is differentiable then ψ(x + v) = ψ(x) + Dx ψ(x) · v + o(|v|).  Exercise 164. Let ψ be a continuous function. Show that if x0 is a local maximum of ψ then 0 ∈ D+ ψ(x0 ).

7. SUB AND SUPERDIFFERENTIALS

199

Proposition 101. Let ψ : Rn → R be a continuous function. Then, if p ∈ Dx+ ψ(x0 )

(resp. p ∈ Dx− ψ(x0 )),

there exists a C 1 function φ such that ψ(x) − φ(x) has a local strict maximum (resp. minimum) at x0 and p = Dx φ(x0 ). On the other hand, if φ is a C 1 function such that ψ(x) − φ(x) has a local maximum (resp. minimum) at x0 then p = Dx φ(x0 ) ∈ Dx+ ψ(x0 )

(resp. Dx− ψ(x0 )).

Proof. By subtracting p · (x − x0 ) + ψ(x0 ) to ψ, we can assume, without loss of generality, that ψ(x0 ) = 0 and p = 0. By changing coordinates, if necessary, we can also assume that x0 = 0. Because 0 ∈ Dx+ ψ(0) we have ψ(x) lim sup ≤ 0. |x| |x|→0 Therefore there exists a continuous function ρ(x), with ρ(0) = 0, such that ψ(x) ≤ |x|ρ(x). Let η(r) = max|x|≤r {ρ(x)}. η is continuous, non decreasing and η(0) = 0. Let Z 2|x| φ(x) = η(r)dr + |x|2 . |x| 1

The function φ is C and satisfies φ(0) = Dx φ(0) = 0. Additionally, if x 6= 0, Z 2|x| ψ(x) − φ(x) ≤ |x|ρ(x) − η(r)dr − |x|2 < 0. |x|

Thus ψ − φ has a strict local maximum at 0.

200

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

To prove the second part of the proposition, suppose that the difference ψ(x) − φ(x) has a strict local maximum at 0. Without loss of generality, we can assume ψ(0) − φ(0) = 0 and φ(0) = 0. Then ψ(x) − φ(x) ≤ 0 or, equivalently, ψ(x) ≤ p · x + (φ(x) − p · x). Thus, by setting p = Dx φ(0), and using the fact that φ(x) − p · x = 0, |x|→0 |x| lim

we conclude that Dx φ(0) ∈ Dx+ ψ(0). The case of a minimum is similar.  A continuous function f is semiconcave if there exists C such that f (x + y) + f (x − y) − 2f (x) ≤ C|y|2 . Similarly, a function f is semiconvex if there exists a constant such that f (x + y) + f (x − y) − 2f (x) ≥ −C|y|2 . Proposition 102. The following statements are equivalent: 1. 2. 3.

f is semiconcave; f˜(x) = f (x) − C2 |x|2 is concave; for all λ, 0 ≤ λ ≤ 1, and any y, z such that λy + (1 − λ)z = 0 we have C λf (x + y) + (1 − λ)f (x + z) − f (x) ≤ (λ|y|2 + (1 − λ)|z|2 ). 2 Additionally, if f is semiconcave, then a. Dx+ f (x) 6= ∅; b. if Dx− f (x) 6= ∅ then f is differentiable at x; c. there exists C such that, for each pi ∈ Dx+ f (xi ) (i = 0, 1), (x0 − x1 ) · (p0 − p1 ) ≤ C|x0 − x1 |2 . Remark. Of course analogous results hold for semiconvex functions.

7. SUB AND SUPERDIFFERENTIALS

201

Proof. Clearly 2 =⇒ 3 =⇒ 1. Therefore, to prove the equivalence, it is enough to show that 1 =⇒ 2. Subtracting C|x|2 to f , we may assume C = 0. Also, by changing coordinates if necessary, it suffices to prove that for all x, y such that λx + (1 − λ)y = 0, for some λ ∈ [0, 1], we have: λf (x) + (1 − λ)f (y) − f (0) ≤ 0.

(118)

We claim now that the previous equation holds for each λ = 2kj , with 0 ≤ k ≤ 2j . Clearly (118) holds for j = 1. We will proceed by induction on j. Suppose that (118) if valid for λ = 2kj . We will show k . If k is even, we can reduce the fraction that it also holds for λ = 2j+1 k and, therefore, we assume that k is odd, λ = 2j+1 and λx+(1−λ)y = 0. Note that        1 k−1 k−1 k+1 1 k+1 0= x + 1 − j+1 y + x + 1 − j+1 y . 2 2j+1 2 2 2j+1 2 consequently, 1 f (0) ≥ f 2



   k−1 k−1 x + 1 − j+1 y + 2j+1 2     1 k+1 k+1 + f x + 1 − j+1 y 2 2j+1 2

but, since k − 1 and k + 1 are even, k˜0 = k−1 2 Therefore ! ! k˜0 k˜0 1 1 x+ 1− j y + f f (0) ≥ f j 2 2 2 2

and k˜1 = k˜1 x+ 2j

k+1 2

are integers.

k˜1 1− j 2

! ! y

But this implies k˜0 + k˜1 f (0) ≥ j+1 f (x) + 2

k˜0 + k˜1 1 − j+1 2

! f (y).

From k˜0 + k˜1 = k we obtain   k f (0) ≥ j+1 f (x) + 1 − j+1 f (y). 2 2 k

Since the function f is continuous and the rationals of the form dense in R, we conclude that f (0) ≥ λf (x) + (1 − λ)f (y),

k 2j

are

202

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

for each real λ, with 0 ≤ λ ≤ 1. To prove the second part of the proposition, observe that by proposition 100, a =⇒ b. To check a, i.e., that Dx+ f (x) 6= ∅, it is enough to observe that if f is concave then Dx+ f (x) 6= ∅. By subtracting C|x|2 to f , we can reduce the problem to concave functions. Finally, if pi ∈ Dx+ f (xi ) (i = 0, 1) then f (x0 ) −

C C |x0 |2 ≤ f (x1 ) − |x1 |2 + (p1 − Cx1 ) · (x0 − x1 ), 2 2

f (x1 ) −

C C |x1 |2 ≤ f (x0 ) − |x0 |2 + (p0 − Cx0 ) · (x1 − x0 ). 2 2

and

Therefore, 0 ≤ (p1 − p0 ) · (x0 − x1 ) + C|x0 − x1 |2 , and so (p0 − p1 ) · (x0 − x1 ) ≤ C|x0 − x1 |2 .



Exercise 165. Let f : Rn → R be a continuous function. Show that if x0 is a local maximum then 0 ∈ D+ f (x0 ).

8. Optimal control in the calculus of variations setting We now consider the calculus of variations setting and prove the existence of optimal controls. The main technical issue is the fact that the control space U = Rn is unbounded and therefore compactness arguments do not work directly. Fortunately, the coercivity of the Lagrangian is enough to establish the existence of a-priori bounds on optimal controls. Theorem 103. Let x ∈ Rn and t0 ≤ t ≤ t1 . Suppose that the Lagrangian L(x, v) satisfies: 2 A. L is C ∞ , strictly convex in v (i.e., Dvv L is positive definite), and satisfying the coercivity condition

L(x, v) = ∞, |v|→∞ |v| lim

8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 203

uniformly in (x, t); B. L is bounded by bellow (without loss of generality we assume L(x, v) ≥ 0); C. L satisfies the inequalities L(x, 0) ≤ c1 ,

|Dx L| ≤ c2 L + c3

for suitable c1 , c2 , and c3 ; D. there exist functions C0 (R), C1 (R) : R+ → R+ such that |Dv L| ≤ C0 (R),

2 L| ≤ C1 (R) |Dxx

whenever |v| ≤ R. Then, if ψ is a bounded Lipschitz function, 1. There exists u∗ ∈ L∞ [t, t1 ] such that its corresponding optimal trajectory x∗ , given by x˙ ∗ (s) = u(s)

x∗ (t) = x,

satisfies Z V (x, t) =

t1

L(x∗ (s), x˙ ∗ (s))ds + ψ(x∗ (t1 )).

t

2. There exists C, depending only on L, ψ and t1 − t but not on x or t such that |u(s)| < C for t ≤ s ≤ t1 . The optimal trajectory x∗ (·) is a C 2 [t, t1 ] solution of the Euler-Lagrange equation (119)

d Dv L − Dx L = 0 dt

with initial condition x∗ (t) = x. 3. The adjoint variable p, defined by (120)

p(t) = −Dv L(x∗ , x˙ ∗ ), satisfies the differential equation  p(s) ˙ = Dx H(p(s), x∗ (s)) x˙ ∗ (s) = −Dp H(p(s), x∗ (s))

204

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

with terminal condition p(t1 ) ∈ Dx− ψ(x∗ (t1 )). Additionally, (p(s), H(p(s), x∗ (s))) ∈ D− V (x∗ (s), s) for t < s ≤ t1 . 4. The value function V is Lipschitz, and so almost everywhere differentiable. 2 5. If Dvv L is uniformly bounded, then for each t < t1 , V (x, t) is semiconcave in x. 6. For t ≤ s < t1 (p(s), H(p(s), x∗ (s))) ∈ D+ V (x∗ (s), s) and, therefore, DV (x∗ (s), s) exists for t < s < t1 . Proof. We will divide the proof into several auxiliary lemmas. For R > 0, define UR = {u ∈ U : kuk∞ ≤ R}. From lemma 99 there exists a minimizer uR of J in UR . Then we will show that the minimizer uR satisfies uniform estimates in R. Finally, we will let R → ∞. Let pR be the adjoint variable given by the Pontryagin maximum principle. We now will try to estimate the optimal control uR uniformly in R, in order to send R → ∞. Lemma 104. Suppose ψ is differentiable. Then there exists a constant C, independent on R, such that |pR | ≤ C. Proof. Since ψ is Lipschitz and differentiable we have |Dx ψ| ≤ kDx ψk∞ < ∞. Therefore Z |pR (s)| ≤

t1

|Dx L(xR (r), uR (r)|dr + kDx ψk∞ . s

8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 205

Let VR be the value function for the terminal value problem with the additional constraint of bounded control: |v| ≤ R. From |Dx L| ≤ c2 L + c3 , it follows |pR (s)| ≤ C(VR (t, x) + 1), for an appropriate constant C. Proposition 90, shows that there exists a constant C, which does not depend on R, such that VR ≤ C. Therefore |pR | ≤ C.  As we will see, the uniform estimates for pR yield uniform estimates for uR . Lemma 105. Let ψ be differentiable. Then there exists R1 > 0 such that, for all R, kuR k∞ ≤ R1 . Proof. Suppose |p| ≤ C. Then, for each c1 , the coercivity condition on L implies that there exists R1 such that, if v · p + L(x, v) ≤ c1 then |v| ≤ R1 . But then, uR (s) · pR (s) + L(xR (s), uR (s)) ≤ L(xR (s), 0) ≤ c1 , that is, kuR k∞ ≤ R1 .



Since uR is bounded independently of R, we have V = J(x, t; uR0 ), for R0 > R1 . Let u∗ = uR0 and p = pR0 . Lemma 106 (Pontryagin maximum principle - II). If ψ is differentiable, optimal control u∗ satisfies u∗ · p + L(x∗ , u∗ ) = min [v · p + L(x∗ , v)] = −H(p, x∗ ), v

206

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

for almost all s and, therefore, p = −Dv L(x∗ , u∗ )

u∗ = −Dp H(p, x∗ ),

and

where H = L∗ . Additionally, p satisfies the terminal condition p(t1 ) = Dx ψ(x∗ (t1 )). Proof. Clearly it is enough to choose R sufficiently large.



Lemma 107. Let ψ be differentiable. The minimizing trajectory x(·) is C 2 and satisfies the Euler-Lagrange equation (119). Furthermore, p˙ = Dx H(p, x∗ )

x˙ = −Dp H(p, x∗ ).

Proof. By its definition p is continuous. We know that x˙ ∗ (s) = −Dp H(p(s), x∗ (s)), almost everywhere. Since the right hand side of the previous identity is continuous, the identity holds everywhere and, therefore, we conclude that x∗ is C 1 . Because p is given by the integral of a continuous function (112), Z t1 ∗ p(r) = Dx ψ(x (t1 )) + Dx L(x∗ (s), u∗ (s))ds, r

we conclude that p is C 1 . Additionally, x˙ ∗ = −Dp H(p, x∗ ) and, therefore, x˙ ∗ is C 1 , which implies that x is C 2 . We have also p = −Dv L(x∗ , x˙ ∗ )

p˙ = −Dx L(x∗ , x˙ ∗ ),

from which it follows d (121) Dv L(x∗ , x˙ ∗ ) − Dx L(x∗ , x˙ ∗ ) = 0. dt Thus, since Dx L(x∗ , x˙ ∗ ) = −Dx H(p, x∗ ), we conclude that p˙ = Dx H(p, x∗ ) as required.

x˙ ∗ = −Dp H(p, x∗ ), 

8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 207

In the case in which ψ is only Lipschitz and not C 1 , we can consider a sequence of C 1 functions, ψn → ψ uniformly, such that kDx ψn k∞ ≤ kDψkL∞ . for each ψn . Let Z

t1

L(xn (s), x˙ n (s))ds + ψn (xn (t1 )),

Jn (x, t; u) = t

and x∗n , u∗n , respectively, the corresponding optimal trajectory and optimal control. Similarly, let pn be the corresponding adjoint variable. Passing to a subsequence, if necessary, the boundary values xn (t1 ) and pn (t1 ) converge, respectively, for some x0 and p0 . The optimal trajectories x∗n and corresponding optimal controls u∗n converge uniformly, by using Ascoli-Arzela theorem, to optimal trajectories and controls of the limit problem. Let p(s) = lim pn (s). n→∞

Then, for almost every s, u∗ · p(s) + L(x∗ (s), u∗ (s)) = inf [v · p(s) + L(x∗ (s), v)] , v

which implies p(s) = −Dv L(x∗ (s), x˙ ∗ (s)), for almost all s. But, in the previous equation both terms are continuous functions thus the identity holds for all s. Lemma 108. For t < s ≤ t1 we have (p(s), H(p(s), x∗ (s))) ∈ D− V (x∗ (s), s). Proof. Let x∗ be the optimal trajectory and u∗ the corresponding optimal control. For r ≤ t1 and y ∈ Rn , define xr = x∗ (r) and consider the sub-optimal control u] = u∗ +

y − xr , r−t

whose trajectory we denote by x] , x] (t) = x. Note that x] (r) = y.

208

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

We have Z

s

L(x∗ (τ ), u∗ (τ ))dτ + V (x∗ (s), s)

V (x, t) = t

and, by the sub-optimality of x] , Z r ∗ L(x] (τ ), u] (τ ))dτ + V (y, r). V (x (t), t) ≤ t

This implies V (x∗ (s), s) − V (y, r) ≤ φ(y, r), with Z

r ]

]

Z

L(x (τ ), u (τ ))dτ −

φ(y, r) = t

s

L(x∗ (τ ), u∗ (τ ))dτ.

t

Since φ is differentiable at y and r, (−Dy φ(x∗ (s), s), −Dr φ(x∗ (s), s)) ∈ D− V (x∗ (s), s). Observe that x] (τ ) = x∗ (τ ) +

y − xr (τ − t), r−t

and, therefore,  Z s 1 τ −t + Dv L Dy φ(x (s), s) = Dx L dτ. s−t s−t t ∗

Integrating by parts and using (121), we obtain Dy φ(x∗ (s), s) = Dv L(x∗ (s), x˙ ∗ (s)) = −p(s). Similarly, Z s y − xr Dr φ(y, r) = L(y, u (r)) + −Dx L (τ − t) (r − t)2 t  −u∗ (r) y − xr −u∗ (r) +Dx L (τ − t) − Dv L + Dv L dτ. (r − t) (r − t)2 r−t ]

Integrating by parts and evaluating at y = x∗ (s), r = s, we obtain Dr φ(x∗ (s), s) = L(x∗ (s), x˙ ∗ (s)) − u∗ (s)Dv L(x∗ (s), x˙ ∗ (s)) = −H(p(s), x∗ (s)), as we needed. Lemma 109. The value function V is Lipschitz.



8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 209

Proof. Let t < t1 be fixed and x, y arbitrary. We suppose first that t1 − t < 1. Then V (y, t) − V (x, t) ≤ J(y, t; u∗ ) − V (x, t), where V (x, t) = J(x, t; u∗ ). Therefore, there exists a constant C, depending only on the Lipschitz constant of ψ and of the supremum of |Dx L|, such that V (y, t) − V (x, t) ≤ C|x − y|. Suppose that t1 − t > 1. Let  u ˜ (s) = u∗ + (x − y) if t < s < t + 1 u ˜ (s) = u∗ (s) if t + 1 ≤ s ≤ t1 . Then ˜ ) − V (x, t) ≤ C|x − y|, V (y, t) − V (x, t) ≤ J(y, t; u where the constant C depends only on Dx L and on Dv L, and not on the Lipschitz constant of ψ. Reverting the roles of x and y we conclude |V (y, t) − V (x, t)| ≤ C|x − y|. Without loss of generality we can suppose that t < tˆ. Note that |V (x, t) − V (x∗ (tˆ), tˆ)| ≤ C|t − tˆ|. To prove that V is Lipschitz in t it is enough to check that (122)

|V (x∗ (tˆ), tˆ) − V (x, tˆ)| ≤ C|t − tˆ|.

But since x˙ ∗ is uniformly bounded |x∗ (tˆ) − x| ≤ C|t − tˆ| thus, the previous Lipschitz estimate implies (122).



Lemma 110. V is differentiable almost everywhere. Proof. Since V is Lipschitz, the almost everywhere differentiability follows from Rademacher theorem. 

210

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

In general, the value function is Lipschitz and not C 1 or C 2 . However we can prove an one-side estimate for second derivatives, i.e. that V is semiconcave. 2 2 L| ≤ C(R) whenever |v| ≤ R. L|, |Dvv Lemma 111. Suppose that |Dxv Then, for each t < t1 , V (x, t) is semiconcave in x.

Proof. Fix t and x. Choose y ∈ Rn arbitrary. We claim that V (x + y, t) + V (x − y, t) ≤ 2V (x, t) + C|y|2 , for some constant C. Clearly, V (x + y, t) + V (x − y, t) − 2V (x, t) Z t1 ˙ + L(x∗ − y, x˙ ∗ − y) ˙ − 2L(x∗ , x˙ ∗ )] ds, ≤ [L(x∗ + y, x˙ ∗ + y) t

where y(s) = y

t1 − s . t1 − t

2 L| ≤ C1 (R), Since |Dxx

˙ ≤ L(x∗ , x˙ ∗ + y) ˙ + Dx L(x∗ , x˙ ∗ + y)y ˙ + C|y|2 L(x∗ + y, x˙ ∗ + y) and, in a similar way for the other term. We also have ˙ + L(x∗ , x˙ ∗ − y) ˙ ≤ 2L(x∗ , x˙ ∗ ) + C|y| ˙ 2 + C|y||y|. ˙ L(x∗ , x˙ ∗ + y) Thus ˙ + L(x∗ − y, x˙ ∗ − y) ˙ ≤ 2L(x∗ , x˙ ∗ ) + C|y|2 + C|y| ˙ 2. L(x∗ + y, x˙ ∗ + y) This inequality implies the lemma.



Lemma 112. We have (p(s), H(p(s), x∗ (s))) ∈ D+ V (x∗ (s), s) for t ≤ s < t1 . Therefore DV (x∗ (s), s) exists for t < s < t1 . Proof. Let u∗ be an optimal control at (x, s) and let p be the corresponding adjoint variable. Define W by   x∗ (r) − y ∗ W (y, r) = J y, r; u + − V (x, s). t1 − r

8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 211

Hence, for each y ∈ Rn and t ≤ r < t1 , V (y, r) − V (x, s) ≤ W (y, r), with equality at (y, r) = (x, s). Since W is C 1 , it is enough to check that Dy W (x∗ (s), s) = p(s), and Dr W (x∗ (s), s) = H(p(s), x∗ (s)). The first identity follows from Z



t1

Dy W (s, x (s)) =

Dx Lϕ + Dv L s

where ϕ(τ ) =

t1 −τ . t1 −s

dϕ dτ, dτ

Using the Euler-Lagrange equation

d Dv L − Dx L = 0 dt and integration by parts we obtain Dy W (s, x∗ (s)) = −Dv L(x∗ (s), x˙ ∗ (s)) = p(s). On the other hand, ∗





Z

Dr W (s, x (s)) = −L(x (s), x˙ (s)) +

t1

Dx Lφ + Dv L s

dφ dτ, dτ

where

τ − t1 ∗ x˙ (s). t1 − s Using again the Euler-Lagrange equation and integration by parts, we obtain φ(τ ) =

Dr W (s, x∗ (s)) = −L(x∗ (s), x˙ ∗ (s), s) + Dv L(x∗ (s), x˙ ∗ (s))x˙ ∗ (s), or equivalently Dr W (s, x∗ (s)) = H(p(s), x∗ (s)). The last part of the lemma follows from proposition 100. This ends the proof of the theorem.

 

212

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

In what follows we prove that the value function is differentiable at points of uniqueness of optimal trajectory. A point (x, t) is regular if there exists a unique optimal trajectory x (s) such that x∗ (t) = x and Z t1 L(x∗ (s), x˙ ∗ (s))ds + ψ(x∗ (t1 )). V (x, t) = ∗

t

Theorem 113. V is differentiable with respect to x at (x, t) if and only if (x, t) is a regular point. Proof. The next lemma shows that differentiability at a point x implies that x is a regular point: Lemma 114. If V is differentiable with respect to x at a point (x, t), then there exists a unique optimal trajectory Proof. Since V is differentiable with respect to x at (x, t), then any optimal trajectory satisfies x˙ ∗ (t) = −Dp H(p(t), x∗ (t)), since p(t) = Dx V (x). Therefore, once Dx V (x∗ (t), t) is given, the velocity x˙ ∗ (t) is uniquely determined. The solution of the Euler-Lagrange equation (119) is determined by the initial condition and velocity: x∗ (t) and x˙ ∗ (t). Thus, the optimal trajectory is unique.  To prove the other implication we need an auxiliary lemma: Lemma 115. Let p such that kDx V (·, t) − pkL∞ (B(x,2)) → 0 when  → 0. Then V is differentiable with respect to x at (x, t) and Dx V (x, t) = p. Proof. Since V is Lipschitz, it is differentiable almost everywhere. By Fubin theorem, for almost every point with respect to the Lebesgue

8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 213

measure induced in S n−1 , V is differentiable y = x + λk, with respect to the Lebesgue measure in R. For these directions V (y, t) − V (x, t) − p · (y − x) |x − y| Z 1 (Dx V (x + s(y − x), t) − p) · (y − x) ds. = |x − y| 0 Suppose 0 < |x − y| < . Then V (x, t) − V (y, t) − p · (x − y) ≤ kDx V (·, t) − pkL∞ (B(x,)) . |x − y| In principle, the last identity only holds almost everywhere. However, for y 6= x, the left-hand side is continuous in y. consequently, the inequality holds for all y 6= x. Therefore, when y → x, V (x, t) − V (y, t) − p · (x − y) → 0, |x − y| which implies Dx V (x, t) = p.



Suppose that V is not differentiable at (x, t). We claim that (x, t) is not regular. By contradiction, suppose that (x, t) is regular. Then if V fails to be differentiable, the previous lemma implies that for each p, kDx V (·, t) − pkL∞ (B(x,)) 9 0. Thus, we could choose two sequences x1n and x2n such that xin → x but whose corresponding optimal trajectories xin satisfy lim x˙ 1n (t) 6= lim x˙ 2n (t). However, this shows that (x, t) is not regular. Indeed if (x, t) were regular, and xn were any sequence converging to x, and x∗n (·) the corresponding optimal trajectory then x˙ ∗n (t) → x˙ ∗ (t). If this were not true, by Ascoli-Arzela theorem, we could extract a ˙ convergent subsequence x˙ nk (·) → y(·), and for which x˙ ∗nk (t) → v 6= x˙ ∗ (t).

214

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

Let y(·) be the solution to the Euler-Lagrange equation with initial ˙ condition y(t) = x(t) and y(t) = v. Note that x∗n (·) → y(·) and ˙ uniformly in compact sets, and, therefore, x˙ ∗n (·) → y(·), V (x, t) = lim V (xn , t) = lim J(xn , t; x˙ n ) n→∞

n→∞

˙ > J(x, t; x˙ ∗ ) = V (x, t), = J(x, t; y) since the trajectory y cannot be optimal, by regularity, which is a contradiction.  Remark. This theorem implies that all points of the form (x∗ (s), s), in which x∗ is and optimal trajectory are regular for t < s < t1 . Exercise 166. Show that the optimal control ”bounded control space” setting, the value function is Lipschitz continuous if the terminal cost is Lipschitz continuous. Exercise 167. In the optimal control ”bounded control space” setting, show that if ψ is Lipschitz, for any (x, t) there exists p such that (p(s), H(p(s), x∗ (s))) ∈ D− V (x∗ (s), s) for t < s ≤ t1 and (p(s), H(p(s), x∗ (s))) ∈ D+ V (x∗ (s), s) for t ≤ s < t1 . 9. Viscosity solutions In this section we discuss the viscosity solutions in the calculus of variations setting. Since, however with small modifications our results hold for the bounded control setting, we have added exercises in which the reader is required to prove the analogous results. Theorem 116. Consider the calculus of variations setting for the optimal control problem. Suppose that the value function V is differentiable at (x, t). Then, at this point, V satisfies the Hamilton-Jacobi equation (123)

− Vt + H(Dx V, x, t) = 0.

9. VISCOSITY SOLUTIONS

215

Proof. If V is differentiable at (x, t) then the result follows by using statement 6 in theorem 103.  Exercise 168. Show that (123) also holds in the ”bounded control case” setting. Hint: use exercises 166 and 167. Corollary 117. Consider the calculus of variations setting for the optimal control problem. The value function V satisfies the HamiltonJacobi equation almost everywhere. Proof. Since the value function V is differentiable almost everywhere, by theorem 103, theorem 116 implies this result.  Exercise 169. Show that the previous corollary also holds in the ”bounded control case” setting. However, it is not true that a Lipschitz function satisfying the Hamilton-Jacobi equation almost everywhere is the value function of the terminal value problem, as shown in the next example. Example 43. Consider the Hamilton-Jacobi equation −Vt + |Dx V |2 = 0 with terminal data V (x, 1) = 0. The value function is V ≡ 0, which is a (smooth) solution of the Hamilton-Jacobi equation However, there are other solutions, for instance,  0 if |x| ≥ 1 − t V (x, t) = |x| − 1 + t if |x| < 1 − t which satisfy the same terminal condition t = 1 and is solution almost everywhere. J A bounded uniformly continuous function V is a viscosity subsolution (resp. supersolution) of the Hamilton-Jacobi equation (123) if for any C 1 function φ and any interior point (x, t) ∈ argmax V − φ (resp. argmin) then −Dt φ + H(Dx φ, x, t) ≤ 0

216

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

(resp. ≥ 0) at (x, t). A bounded uniformly continuous function V is a viscosity solution of the Hamilton-Jacobi equation if it is both a sub and supersolution. The value function is a viscosity solution of (123), although it may not be a classical solution. The motivation behind the definition of viscosity solution is the following: if V is differentiable and (x, t) ∈ argmaxV − φ (or argmin) then Dx V = Dx φ and Dt V = Dt φ, therefore we should have both inequalities. The specific choice of inequalities is related with the following parabolic approximation of the HamiltonJacobi equation − Dt u + H(Dx u , x, t) = ∆u .

(124)

This equation arises naturally in optimal stochastic control (see [FS93]). The limit  → 0 corresponds to the case in which the diffusion coefficient vanishes. Proposition 118. Let u be a family of solutions of (124) such that, as  → 0, the sequence u → u uniformly. Then u is a viscosity solution of (123). Proof. Suppose that φ(x, t) is a C 2 function such that u − φ has a strict local maximum at (x, t). We must show that −Dt φ + H(Dx φ, x, t) ≤ 0. By hypothesis, u → u uniformly. Therefore we can find sequences (x , t ) → (x, t) such that u − φ has a local maximum at (x , t ). Therefore, Du (x , t ) = Dφ(x , t ) and ∆u (x , t ) ≤ ∆φ(x , t ). Consequently, −Dt φ(x , t ) + H(Dx φ(x , t ), x , t ) ≤ ∆φ(x , t ). It is therefore enough to take  → 0 to end the proof.



9. VISCOSITY SOLUTIONS

217

An useful characterization of viscosity solutions is given in the next proposition: Proposition 119. Let V be a bounded uniformly continuous function. Then V is a viscosity subsolution of (123) if and only if for each (p, q) ∈ D+ V (x, t), −q + H(p, x, t) ≤ 0. Similarly, V is a viscosity supersolution if and only if for each (p, q) ∈ D− V (x, t), −q + H(p, x, t) ≥ 0. Proof. This result is an immediate corollary of proposition 101.  Example 44. In example 43 we have found two different solutions to equation −Vt + |Dx V |2 = 0 satisfying the same boundary data. It is easy to check that the value function V = 0 is viscosity solution (it is smooth, satisfies the equation and the terminal condition). The other solution, which is an almost everywhere solution is not a viscosity solution (check!). Now we will show that the definition of viscosity solution is consistent with classical solutions. Proposition 120. A differentiable solution of (123) is a classical solution. Proof. If V is differentiable then D+ V = D− V = {(Dx V, Dt V )}. Since V is a viscosity solution, we obtain immediately −Dt V + H(Dx V, x, t) ≤ 0,

e

therefore −Dt V + H(Dx V, x, t) = 0.

− Dt V + H(Dx V, x, t) ≥ 0, 

218

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

Theorem 121. Let uα be the value function of the infinite horizon discounted cost problem (??). Then uα is a viscosity solution to αuα + H(Duα , x) = 0. Similarly, let V be a solution to the initial value problem (??). Then V is a viscosity solution of Vt + H(Dx V, x) = 0. Proof. We present the proof only for the discounted cost infinite horizon as the other case is similar, and we refer the reader to [Eva98a], for instance. Let ϕ : Td → R, ϕ(x), be a C ∞ function, and let x0 ∈ argmin(uα − ϕ). By adding a suitable constant to ϕ we may assume that u(x0 ) − ϕ(x0 ) = 0, and u(x) − ϕ(x) ≥ 0 at all other points. We must show that αϕ(x0 ) + H(Dx ϕ(x0 ), x0 ) ≥ 0, that is, there exists v ∈ Rd such that αϕ(x0 ) + v · Dx ϕ(x0 ) − L(x0 , v) ≥ 0. By contradiction assume that there exists θ > 0 such that αϕ(x0 ) + v · Dx ϕ(x0 ) − L(x0 , v) < −θ, for all v. Because the mapping v 7→ L is superlinear and ϕ is C 1 , there exists a R > 0 and r1 > 0 such that for all x ∈ Br1 (x0 ) and all v ∈ BRc (0) = Rd \ BR (0) we have θ αϕ(x) + v · Dx ϕ(x) − L(x, v) < − . 2 By continuity, for some 0 < r < r1 and all x ∈ Br (x0 ) we have θ αϕ(x) + v · Dx ϕ(x) − L(x, v) < − , 2 for all v ∈ BR (0). Therefore for any trajectory x with x(0) = x0 and any T ≥ 0 such that the trajectory x stays near x0 on [−T, 0], i.e., x(t) ∈ Br (x0 ) for

9. VISCOSITY SOLUTIONS

219

t ∈ [−T, 0] we have e−αT u(x(−T )) − u(x0 ) ≥ e−αT ϕ(x(−T )) − ϕ(x0 ) Z 0  ˙ · Dx ϕ(x(t)) dt eαt αϕ(x(t)) + x(t) =− θ ≥ 2

Z

−T 0

Z

αt

0

˙ eαt L(x, x)dt.

e dt − −T

−T

This yields θ u(x0 ) ≤ − 2

Z

0 αt

Z

0

e dt + −T

˙ eαt L(x, x)dt + e−αT u(x(−T ))

−T

Since the infimum in (??) is, in fact, a minimum we can choose a time interval [−T ∗ , 0] and a trajectory x∗ that minimizes (??): Z 0 eαt L(x∗ , x˙ ∗ )dt + e−αT u(x∗ (−T ∗ )). u(x0 ) = −T ∗

A minimizing trajectory on [−T ∗ , 0] also minimizes on any sub interval: for any T ∈ (0, T ∗ ) we have Z 0 u(x0 ) = eαt L(x∗ , x˙ ∗ )dt + e−αT u(x∗ (−T )). −T

Taking T small enough we can insure that x∗ stays near x0 on [−T, 0]. This yields a contradiction. Now consider x0 ∈ argmax(uα − ϕ). Again, by adding a suitable constant to ϕ we may assume that u(x0 )−ϕ(x0 ) = 0, and u(x)−ϕ(x) ≤ 0 at all other points. We must show that αϕ(x0 ) + H(Dx ϕ(x0 ), x0 ) ≤ 0, that is, for all v ∈ Rd we have αϕ(x0 ) + v · Dx ϕ(x0 ) − L(x0 , v) ≤ 0. By contradiction assume that there exists θ > 0 such that for some v¯ αϕ(x0 ) + v¯ · Dx ϕ(x0 ) − L(x0 , v¯) > θ.

220

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

By continuity, for some r > 0 and all x ∈ Br (x0 ) we have θ αϕ(x) + v¯ · Dx ϕ(x) − L(x, v¯) > . 2 The trajectory x, with x(0) = x0 , x˙ = v¯ stays near x0 for t ∈ [−T, 0], provided T > 0 is sufficiently small. Therefore e−αT u(x(−T )) − u(x0 ) ≤ e−αT ϕ(x(−T )) − ϕ(x0 ) Z 0  ˙ · Dx ϕ(x(t)) dt =− eαt αϕ(x(t)) + x(t) −T

θ ≤− 2

Z

θ u(x0 ) ≥ 2

Z

0

0

Z

αt

˙ eαt L(x, x)dt.

e dt − −T

−T

This yields 0

Z

αt

0

˙ eαt L(x, x)dt + e−αT u(x(−T )) .

e dt + −T

−T

But since by (??) Z

0

u(x0 ) ≤

˙ eαt L(x, x)dt + e−αT u(x(−T )),

−T

this yields the contradiction

θ 1−e−αT 2 α

≤ 0 with T > 0.



Exercise 170. Show that the function V (x, t) given by the Lax-Hopf formula is Lipschitz in x for each t < t1 , regardless of the smoothness of the terminal data (note, however that the constant depends on t). Exercise 171. Use the Lax-Hopf formula to determine the viscosity solution of −ut + u2x = 0, para t < 0 and u(x, 0) = ±x2 − 2x. Exercise 172. Use the Lax-Hopf formula to determine the viscosity solution of −ut + u2x = 0, for t < 0 and u(x, 0) =

   0 if x < 0

x2 if 0 ≤ x ≤ 1

  

2x − 1 if x > 1.

9. VISCOSITY SOLUTIONS

221

To establish uniqueness of viscosity solutions we need the following lemma: Lemma 122. Let V be a viscosity solution of −Vt + H(Dx V, x) = 0 in [0, T ] × Rn and φ a C 1 function. If V − φ has a maximum (resp. minimum) at (x0 , t0 ) ∈ Rd × (0, T ] then (125) − φt (x0 , t0 ) + H(Dx φ(x0 , t0 ), x0 ) ≤ 0 (resp. ≥ 0)

at (x0 , t0 ).

Remark: The important point is that the inequality is valid even for some non-interior points (t0 = 0). Proof. Only the case t0 = 0 requires proof since in the other case the maximum is interior and then the viscosity property (the definition of viscosity solution) yields the inequality. Consider  φ˜ = φ − . t ˜ Then V − φ has an interior local maximum at (x , t ) with t < 0. Furthermore, (x , t ) → (x0 , 0), as  → 0. At the point (x , t ) we have  φt (x , t ) + 2 + H(Dx φ(x , t ), x ) ≤ 0, t that is, since

 t2

≥ 0, φt (x0 , 0) + H(Dx φ(x0 , 0), x0 ) ≤ 0.

Analogously we obtain the opposite inequality, using φ˜ = φ + t .



Finally we establish uniqueness of viscosity solutions: Theorem 123 (Uniqueness). Suppose H satisfies |H(p, x) − H(q, x)| ≤ C(|p| + |q|)|p − q| |H(p, x) − H(p, y)| ≤ C|x − y|(C + H(p, x)) Then the value function is the unique viscosity solution to the HamiltonJacobi equation −Vt + H(Dx V, x) = 0

222

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

that satisfies the terminal condition V (x, T ) = ψ(x). Proof. Let V and V˜ be two viscosity solutions with sup V − V˜ = σ > 0. −T ≤t≤0

For 0 < , λ < 1 we define 1 ψ(x, y, t, s) = V (x, t)−V˜ (y, s)−λ(t+s+2T )− 2 (|x−y|2 +|t−s|2 )−(|x|2 +|y|2 ).  When , λ are sufficiently small we have σ max ψ(x, y, t, s) = ψ(x,λ , y,λ , t,λ , s,λ ) > . 2 Since ψ(x,λ , y,λ , t,λ , s,λ ) ≥ ψ(0, 0, −T, −T ), and both V and V˜ are bounded, we have |x,λ − y,λ |2 + |t,λ − s,λ |2 ≤ C2 and (|x,λ |2 + |y,λ |2 ) ≤ C. From these estimates and the fact that V and V˜ are continuous, it then follows that |x,λ − y,λ |2 + |t,λ − s,λ |2 = o(1), 2 as  → 0. Denote by ω and ω ˜ the modulus of continuity of V and V˜ . Then σ ≤ V (x,λ , t,λ ) − V˜ (y,λ , s,λ ) 2 = V (x,λ , t,λ ) − V (x,λ , −T ) + V (x,λ , −T ) − V˜ (x,λ , −T )+ + V˜ (x,λ , −T ) − V˜ (x,λ , s,λ ) + V˜ (x,λ , s,λ ) − V˜ (y,λ , s,λ ) ≤ ≤ ω(T + t,λ ) + ω ˜ (T + s,λ ) + ω ˜ (o()). Therefore, if  is sufficiently small T + t,λ > µ > 0, uniformly in . Let φ be given by φ(x, t) = V˜ (y,λ , s,λ ) + λ(2T + t + s,λ )+ +

1 (|x − y,λ |2 + |t − s,λ |2 ) + (|x|2 + |y,λ |2 ). 2

9. VISCOSITY SOLUTIONS

Then, the difference V (x, t) − φ(x, t) achieves a maximum at (x,λ , t,λ ). Similarly, for φ˜ given by ˜ s) = V (x,λ , t,λ ) − λ(2T + t,λ + s)− φ(y, −

1 (|x,λ − y|2 + |t,λ − s|2 ) − (|x,λ |2 + |y|2 ), 2

the difference ˜ s) V˜ (y, s) − φ(y, has a minimum at (y,λ , s,λ ). Therefore φt (x,λ , t,λ ) + H(Dx φ(x,λ , t,λ ), x,λ ) ≤ 0, and ˜ ,λ , s,λ ), y,λ ) ≥ 0. φ˜s (y,λ , s,λ ) + H(Dy φ(y Simplifying, we have (126)

λ+2

x,λ − y,λ t,λ − s,λ + H(2 + 2x,λ , x,λ ) ≤ 0, 2  2

and (127)

−λ+2

t,λ − s,λ x,λ − y,λ + H(2 − 2y,λ , y,λ ) ≥ 0. 2  2

From (126) we gather that (128)

H(2

x,λ − y,λ o(1) + 2x,λ , x,λ ) ≤ −λ + . 2  

223

224

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

By subtracting (126) to (127) we have x,λ − y,λ x,λ − y,λ 2λ ≤ H(2 − 2y , y ) − H(2 + 2x,λ , x,λ ) ,λ ,λ 2 2 x,λ − y,λ x,λ − y,λ ≤ H(2 − 2y,λ , y,λ ) − H(2 − 2y,λ , x,λ ) 2  2 x,λ − y,λ x,λ − y,λ − 2y,λ , x,λ ) − H(2 + 2x,λ , x,λ ) + H(2 2  2   x,λ − y,λ ≤ C + CH(2 + 2x,λ , x,λ ) |x,λ − y,λ | 2   x,λ − y,λ x,λ − y,λ + 2 |x,λ − y,λ | + 2x − 2y + C 2 ,λ ,λ 2 2   o(1) ≤ + C (|x,λ − y,λ | + |t,λ − s,λ |) ,  when  → 0, which is a contradiction.



10. Stationary problems In this section we consider optimal control stationary problems. These problems arise in stationary steady state control and also in the infinite horizon discounted cost problem. In this chapter we consider the calculus of variations setting, however similar results hold for the bounded control setting. We define the discounted cost function Jα , with discount rate α, as Z ∞ −αs ˙ Jα (x; u) = L(x(s), x(s))e ds. 0

In this case, the optimal trajectories x(·) satisfy the differential equation x˙ = u, with the initial condition x(0) = x. As before, the value function, uα , is given by uα (x) = inf Jα (x; u), where infimum is taken over all controls u ∈ L∞ loc .

10. STATIONARY PROBLEMS

225

The dynamic programming principle in this case is Proposition 124. For each t > 0 Z t  −αs −αt ˙ uα (x) = inf L(x(s), x(s))e ds + e uα (x(t)) . x(0)=x

0

Proof. Observe that Z t −αs ˙ L(x(s), x(s))e ds uα (x) = inf x(0)=x 0  Z ∞ −αt −α(s−t) ˙ +e L(x(s), x(s))e ds t  Z t −αs −αt ˙ ≥ inf L(x(s), x(s))e ds + e uα (x(t)) . x(0)=x

0

The other inequality is left as an exercise: Exercise 173. Show that  Z t −αs −αt ˙ uα (x) ≤ inf L(x(s), x(s))e ds + e uα (x(t)) . x(0)=x

0

 Because of the dynamic programming, it is clear that V (x, t) = e−αt uα (x) is a viscosity solution of −Vt + e−αt H(eαt Dx V, x) = 0. This then implies Corollary 125. uα is a viscosity solution of αuα + H(Dx uα , x) = 0. Furthermore Corollary 126. If uα is differentiable then it is a solution of (129)

H(Dx uα , x) + αuα = 0.

226

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

Exercise 174. Show that the optimal trajectories for the discounted cost infinite horizon are solutions to the (negatively damped) EulerLagrange equation ∂L ∂L d ∂L −α − = 0. dt ∂ x˙ ∂ x˙ ∂x

(130)

If x(t) satisfies (130), the energy H may not be conserved Example 45. Let L(x, v) =

v2 2

+ cos x. Then (130) reads

¨ − αx˙ + sin x = 0. x When α = 0 the energy x˙ 2 − cos x 2 is constant in time, but for α > 0 we have H=

dH = αx˙ 2 . dt Therefore, the energy increases in time unless x˙ = 0.

J

Proposition 127. Suppose that x(t) satisfies (130). Then dH ˙ ˙ = αDv L(x(t), x(t)) · x(t). dt Proof. Let ˙ p(t) = −Dv L(x(t), x(t)) we have dH = Dp H · p˙ + Dx H · x˙ dt ˙ = x˙ · (αDv L + Dx L) − Dx L · x˙ = αDv L · x.  We assume now that H is Zn periodic in x. We will show that as α → 0, the solution uα converges (up to constants) to a solution of (131) for some H.

H(Dx u, x) = H.

10. STATIONARY PROBLEMS

227

Theorem 128. Let uα be a viscosity solution to αuα + H(Duα , x) = 0. Then αuα is uniformly bounded and uα is Lipschitz, uniformly in α. Proof. First let xM be the point where uα (x) has a global maximum, and xm a point of global minimum. Then, by the viscosity property, i.e., the definition of the viscosity solution, we have αuα (xM ) + H(0, xM ) ≤ 0,

αuα (xm ) + H(0, xm ) ≥ 0,

which yields that αuα is uniformly bounded. Now we establish the Lipschitz bound. Observe that if uα is Lipschitz, then there exists M > 0 such that uα (x) − uα (y) ≤ M |x − y|, for all x, y. By contradiction, assume that for every M > 0 there exists x and y such that uα (x) − uα (y) > M |x − y|. Let ϕ(x) = uα (y) + M |x − y|. Then uα (x) − ϕ(x) has a maximum at some point x 6= y. Therefore  x−y αuα (x) + H M |x−y| , x ≤ 0, which by the coercivity of H yields a contradiction if M is sufficiently large.  Example 46. We can also use directly calculus of variations methods to show that the exists C, independent of α, such that uα ≤

C . α

Indeed, since L(x, 0) is bounded Z uα (x) ≤ Jα (x, 0) ≤ 0



L(x, 0)e−αs ds ≤

C . α J

228

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

Theorem 129. (Stability theorem for viscosity solutions) Assume that for α > 0 function uα is a viscosity solution for H α (u, Du, x) = 0. Let H α → H uniformly on compact sets, and uα → u uniformly. Then u is a viscosity solution for H(u, Du, x) = 0. Proof. Suppose u−ϕ has a strict local maximum (resp. minimum) at a point x0 . Then there exists xα → x such that uα − ϕ has a local maximum (resp. minimum) at xα . Then H α (uα (xα ), Dϕ(xα ), xα ) ≤ 0

(resp. ≥ 0).

Letting α → 0 finishes the proof.



As demonstrated in context of homogenization of Hamilton-Jacobi equations, in the classic but unpublished paper by Lions, Papanicolaou and Varadhan [Lio82], it is possible to construct, using the previous result, viscosity solutions to the stationary Hamilton-Jacobi equation (132)

H(Du, x) = H.

Theorem 130 (Lions, Papanicolao, Varadhan). There exists a number H and a function u(x), Zd periodic in x, that solves (132) in the viscosity sense. Proof. Since uα − min uα is periodic, equicontinuous, and uniformly bounded, it converges, up to subsequences, to a function u. Moreover uα ≤ Cα , thus αuα converges uniformly, up to subsequences, to a constant, which we denote by −H. Then, the stability theorem for viscosity solutions, theorem 129, implies that u is a viscosity solution of H(Du, x) = H.  Theorem 131. Let u : Td → R be a viscosity solution to H(Du, x) = C. Then u is Lipschitz, and the Lipschitz constant does not depend on u.

10. STATIONARY PROBLEMS

229

Proof. First observe that from the fact that u = u − 0 achieves maximum and minimum in Td we have min H(0, x) ≤ C ≤ max H(0, x).

x∈Td

x∈Td

Then, it is enough to argue as in the proof of Theorem 128.



Exercise 175. Let u : R → R be continuous and piecewise differentiable (with left and right limits for the derivative at any point). Show that u is a viscosity solution of H(Dx u, x) = H if 1. u satisfies the equation almost everywhere; 2. whenever Dx u is discontinuous then Dx u(x− ) > Dx u(x+ ). Example 47 (One dimensional pendulum). The Hamiltonian corresponding to a one-dimensional pendulum with unit mass and unit length is p2 H(p, x) = − cos 2πx. 2 In this case, it is not difficult to determine explicitly the solution to the Hamilton-Jacobi equation H(P + Dx u, x) = H(P ), where P is a real parameter. In fact, for P ∈ R and almost every x ∈ R, the solution u(P, x) satisfies (P + Dx u)2 = H(P ) + cos 2πx. 2 consequently, H(P ) ≥ 1 and, therefore, q Dx u = −P ± 2(H(P ) + cos 2πx), Thus

Z u=

x

q.t.p. x ∈ R.

q −P + s(y) 2(H(P ) + cos 2πy)dy + u(0),

0

where |s(y)| = 1. Since H is convex em p and u is a viscosity solution, the only possible discontinuities on the derivative of u are the ones

230

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

that satisfy Dx u(x− ) − Dx u(x+ ) > 0, see exercise 175. Therefore s can change sign from 1 to −1 at any point, however the jumps from −1 to 1 can only happen when q 2(H(P ) + cos 2πx) = 0. Since we are looking for 1-periodic solutions, there are only two cases to consider. The first, in which H(P ) > 1 and the solution is C 1 q

since 2(H(P ) + cos 2πy) never vanishes. In this case H(P ) can be determined as from P through the equation Z 1q 2(H(P ) + cos 2πy)dy. P =± 0

It is easy to check that this equation has a unique solution H(P ) whenever Z 1p 2(1 + cos 2πy)dy, |P | > 0

that is, 4 . π The second case occurs whenever the last inequality does not hold, that is H(P ) = 1 and thus s(x) can have discontinuities. In fact, s(x) jumps from −1 to 1 whenever x = 12 + k, with k ∈ Z, and there exists a point x0 defined by the equation Z 1 p s(y) 2(1 + cos 2πy)dy = P, |P | >

0

such that s(x) jumps from 1 to −1 at x0 + k, k ∈ Z.

J

Exercise 176. Let φ : Tn → R be a C 1 function not identically constant. Show that there exist two distinct viscosity solutions of Dx u · (Dx u − Dx φ) = 0, whose difference is not constant.

5

Duality theory

This chapter is dedicated to the study of duality theory in optimization problems. The main applications we study are infinite dimensional linear programming problems such as Monge Kantorowich and Mather problems.

1. Model problems In this section we discuss certain minimization problems which involve linear objective functions under linear constraints, that is, infinite dimensional linear programming problems. Surprisingly there are deep relations between these problems and certain nonlinear partial differential equations.

1.1. Mather problem. 1.1.1. Classical Mather problem. Let Td be the d-dimensional standard torus. Consider a Lagrangian L(x, v), L : Td × Rd → R, smooth in both variables, strictly convex in the velocity v, and coercive, that is, L(x, v) = +∞. lim inf |v|→∞ x |v| The minimal action principle of classical mechanics asserts that the trajectories x(t) of mechanical systems are critical points or minimizers of the action Z T ˙ (133) L(x, x)ds. 0

231

232

5. DUALITY THEORY

These critical points are then solutions to the Euler-Lagrange equations d ˙ − Dx L(x, x) ˙ = 0. Dv L(x, x) dt

(134)

Mather’s problem is a relaxed version of this variational principle, and consists in minimizing the action Z (135) L(x, v)dµ(x, v) Td ×Rd

among a suitable class of probability measures µ(x, v). Originally, in [Mat91], this minimization was performed over all measures invariant under the Euler-Lagrange equations (134). However, as realized by [Mn96], it is more convenient to consider a larger class of measures, the holonomic measures. It turns out that both problems are equivalent as any holonomic minimizing measure is automatically invariant under the Euler-Lagrange equations. In what follows, we will define this class of measures and provide the motivation for it. Let x(t) be a trajectory on Td . Define a measure µTx on Td × Rd by its action on test functions ψ ∈ Cc (Td × Rd ), ψ(x, v), (continuous with compact support) as follows: Z  1 T T ˙ hψ, µx i = ψ x(t), x(t) dt. T 0 If x(t) is globally Lipschitz, the family {µTx }T >0 has support contained in a fixed compact set, and therefore is weakly-∗ compact. Consequently one can extract a limit measure µx which encodes some of the asymptotic properties of the trajectory x. Let ϕ ∈ C 1 (Td ). For ψ(x, v) = v · Dϕ(x) we have   Z ϕ x(T ) − ϕ x(0) 1 T hψ, µx i = lim x˙ · Dϕ(x)dt = lim = 0. T →∞ T 0 T →∞ T Let γ(v) be a continuous function, γ : Rd → R, such that inf γ(v) |v|→∞ 1+|v|

0, and lim

γ(v) 1+|v|

>

= ∞. A measure µ in Td × Rd is admissible if

1. MODEL PROBLEMS

233

R

γ(v)dµ < ∞. An admissible measure µ on Td × Rd is called holonomic if for all ϕ ∈ C 1 (Td ) we have Z v · Dϕdµ = 0. (136) Td ×Rd

Td ×Rd

Mather’s problem consists in minimizing (135) under all probability measures that satisfy (136). As pointed out before, however, this problem was introduced by Ma˜ ne in [Mn96] in his study of Mather’s original problem [Mat91]. 1.1.2. Stochastic Mather problem. In the framework of stochastic optimal control one is led to replace deterministic trajectories by stochastic processes. Suppose that x(t) satisfies the stochastic differential equation dx = νdt + σdW, in which ν is a bounded, progressively measurable process, σ > 0 and W a n−dimensional Brownian motion. One would like to minimize the average action Z T

E

L(x, ν)dt. 0

As before, one can associate to these stochastic processes, probability measures µ in Tn × Rn defined as Z Z 1 T φ(x, v)dµ = lim φ(x(t), ν(t))dt, T →∞ T 0 Tn ×Rn in which the limit is taken through an appropriate subsequence. The Dynkin’s formula is the analog for stochastic processes to the fundamental theorem of calculus. This formula applied to ϕ(x(t)), states that Z T σ2 E [ϕ(x(T )) − ϕ(x)] = E νDx ϕ(x(t)) + ∆ϕ(x(t))dt. 2 0 This identity implies Z σ2 vDx ϕ(x) + ∆ϕ(x)dµ = 0, 2 Tn ×Rn for all ϕ(x) : Tn → R, C 2 .

234

5. DUALITY THEORY

The stochastic Mather problem [Gom02a] consists in minimizing Z L(x, v)dµ, Tn ×Rn

over all probability measures µ Tn × Rn that satisfy Z σ2 vDx ϕ(x) + ∆ϕ(x)dµ = 0, 2 Tn ×Rn for all ϕ(x) : Tn → R of class C 2 . 1.1.3. Discrete Mather problem. Also interesting is the discrete case, in which the trajectories are replaced by sequences (xn , vn ) that satisfy xn+1 = xn + vn . In this case, if the sequence vn is globally bounded, for instance, we can associate to this sequence a measure µ in Tn × Rn through Z N 1 X φ(xn , vn ), φ(x, v)dµ = lim N →∞ N Tn ×Rn n=1 in which the limit is take through an appropriate subsequence. Since for all continuous functions ϕ : Tn → R we have N X

ϕ(xn + vn ) − ϕ(xn ) = ϕ(xN +1 ) − ϕ(x1 ),

n=1

we obtain Z [ϕ(x + v) − ϕ(x)] dµ = 0. Tn ×Rn

Therefore, we propose Mather discrete problem, which consists in minimizing Z L(x, v)dµ, Tn ×Rn

over all probability measures µ in Tn × Rn that satisfy Z [ϕ(x + v) − ϕ(x)] dµ = 0, Tn ×Rn

for all continuous function ϕ : Tn → R.

1. MODEL PROBLEMS

235

1.1.4. Generalized Mather problem. To state the generalized Mather problem, we must now make precise our framework. Let U ⊂ Rm be a non-empty closed convex set. Assume that, for some k ≥ 0 (usually k = 0, 1, 2) there exists a linear operator Av : C k (Tn ) → C(Tn × U ), which satisfies the following two conditions: the first one is that for each fixed ϕ ∈ C k (Tn ) we have |Av ϕ| ≤ Cϕ (1 + |v|), uniformly in Tn × U , which of course, if U is bounded means simply that |Av ϕ| is bounded; the second condition is that for ϕ ∈ C k (Tn ) the mapping (x, v) 7→ Av ϕ is continuous in Tn × U . We assume that there exists another operator B defined in C k (Tn ) which satisfies the following compatibility conditions with Av : Av κ = Bκ,

(137)

for any κ ∈ R, and that, for any given probability measure ν on Tn , there exists a probability measure µν in Tn × U such that Z Z v Bϕdν, A ϕdµν = (138) Tn

Tn ×U

for all ϕ ∈ C k (Tn ). The Lagrangian L(x, v) : Tn × U → R is continuous and convex in v, bounded below, and, either U is bounded, and no further hypothesis are required, or if U is unbounded we assume that, uniformly in x L(x, v) = ∞. |v|→∞ |v| lim

The generalized Mather problem consists in minimizing Z (139) L(x, v)dµ, Tn ×U

over all probability measures µ in Tn × U that satisfy the constraint Z Z v (140) A ϕdµ = Bϕdν, Tn ×U

Tn

for all functions ϕ : Tn → R with appropriate regularity.

236

5. DUALITY THEORY

1.2. Monge-Kantorowich problem. The Monge-Kantorowich optimal mass transport problem, see [Eva99] or [Vil03b] is the following: given two positive measures µ+ and µ− in Rn which satisfy the mass balance condition Z Z + dµ− , dµ = Rn

Rn

then one looks for a function s : Rn → Rn which transports µ+ into µ− , that is, Z Z + ϕ(s(x))dµ = ϕ(y)dµ− , Rn ∞ Cc (Rn ),

Rn

more compactly we write this condition as for each ϕ ∈ # + − s µ = µ , and furthermore that minimizes total transport cost Z 1 |x − s(x)|2 dµ+ (x). 2 Rn Unfortunately, proving directly that such a mapping exists is a hard problem, and we will instead consider a relaxed version of the problem. Obviously, given a mapping s for which s# µ+ = µ− we can define a measure π in R2n by Z Z φ(x, y)dπ = φ(x, s(x))dµ+ . R2n

Rn

Additionally, the marginals satisfy π|x = µ+ and π|y = µ− . It is therefore natural to consider the relaxed Monge-Kantorowich problem, which consists in minimizing Z 1 min |x − y|2 dπ, 2 R2n where the minimum is taken over all probability measures that satisfy π|x = µ+ and π|y = µ− , that his Z Z ϕ(x)dπ = ϕ(x)dµ+ , R2n

and

Rn

Z

Z ψ(y)dπ =

R2n

for all continuous functions ϕ and ψ.

Rn

ψ(y)dµ− ,

2. SOME INFORMAL COMPUTATIONS

237

Our strategy is to first prove existence of a solution to the relaxed problem, which can be done under quite general assumptions, and only then to prove (whenever possible) that the support of the optimal plan is in fact a graph (x, s(x)) and, therefore, that there exists an optimal transport mapping. The next example shows that the existence of an optimal transport mapping can in fact fail: Exercise 177. Let µ+ = δ0 and µ− = 12 δ−1 + 21 δ1 . show that there does not exist a function s which transports µ+ into µ− .

2. Some informal computations 2.1. Mather problem. In Mather’s problem, both in the deterministic and in stochastic cases, the constraint Z σ2 vDx ϕ(x) + ∆ϕ(x)dµ = 0, 2 Tn ×Rn (σ ≥ 0) is linear in v. Additionally, the Lagrangian is strictly convex in v. This implies that minimizing measure has support in a graph (x, v¯(x)). In fact, if the minimizing measure µ(x, v) were not support in a graph, we could replace it by another measure µ ˜ given by Z Z φ(x, v)d˜ µ(x, y) = φ(x, v¯(x))dθ(x), Tn ×Rn

Tn

where Z v¯(x) =

vµ(x, v)dv Rn

and Z

Z ψ(x)dθ(x) =

Tn

ψ(x)µ(x, v)dv, Tn ×Rn

for all ψ ∈ C(Tn ). Thus Z vDx ϕ(x) + Tn ×Rn

σ2 ∆ϕ(x)d˜ µ = 0. 2

Additionally, the convexity of L in v implies Z Z Ld˜ µ ≤ Ldµ.

238

5. DUALITY THEORY

If L is strictly convex, the inequality is strict unless v = v¯(x), µ almost everywhere. In conclusion: Theorem 132. Let L(x, v) be strictly convex in v and µ a minimizing measure for Mather’s problem (deterministic or stochastic). Then µ it is supported in a graph (x, v) = (x, v¯(x)). additionally the projection θ of µ in the coordinate x satisfies −∇ · (¯ v (x)θ(x)) +

σ2 ∆θ = 0, 2

and the distribution sense. In order to simplify the presentation we are going to assume that 2 L = |v|2 − U (x). Using formally Lagrange multipliers (see note after exercise 25), we conclude that Mather’s problem is equivalent to the problem without constraints  Z  2 σ2 |v| − U (x) + vDx ϕ + ∆ϕ + H θdx. min θ,v(x) Tn 2 2 The function ϕ corresponds to the Lagrange multiplier for the holoR nomy condition and H to the constraint Tn θ = 1. To obtain the Euler-Lagrange equation, we make the following variations v → v + w,

θ → θ + η.

This implies v = −Dx ϕ(x), and

|v|2 σ2 − U (x) + vDx ϕ + ∆ϕ + H = 0. 2 2

Therefore (141)



σ2 ∆ϕ + H(Dx ϕ, x) = H, 2

2. SOME INFORMAL COMPUTATIONS

239

with H(p, x) =

|p|2 + U (x). 2

Exercise 178. Adapt minimax principle from exercise 25 to Mather’s problem and formally verify the previous results.

As an application, we are going to prove an estimate for the second derivatives of the solution of the Hamilton-Jacobi equation. In order to keep the presentation as elementary as possible we assume that the dimension is 1. We further assume that the solution to equation (141) is twice differentiable in x: −

σ2 ∆(ϕxx ) + Dx ϕDx (ϕxx ) + |Dx ϕx |2 + Uxx = 0. 2

Since v = −Dx ϕ we have Z σ2 − ∆(ϕxx ) + Dx ϕDx (ϕxx )dµ = 0, 2 and therefore Z

|D2 ϕ|2 dµ ≤ C.

In section 5 we will make rigorous many of the ideas discussed in this section. Mather’s problem is an infinite dimensional linear programming problem. In in general, as we have discussed for finite dimensional problems, one can use duality to gain a better understanding of the problem. For Mather’s problem (see exercise 178), the dual is given by σ2 inf sup − ∆φ + H(Dx φ, x). φ 2 x The duality theory implies that the value of this infimum is Z − Ldµ. On the other hand, this value is also the unique number H for which −

σ2 ∆u + H(Dx u, x) = H 2

240

5. DUALITY THEORY

has a periodic solution u. To check this fact directly, let u be a solution of (141) then inf sup − φ

x

σ2 σ2 ∆φ + H(Dx φ, x) ≤ sup − ∆u + H(Dx u, x) = H. 2 2 x

Additionally, for each periodic function φ, u − φ has a minimum at a point x0 . At this point, Dx u = Dx φ, and ∆u ≥ ∆φ. Therefore sup − x

σ2 σ2 ∆φ + H(Dx φ, x) ≥ − ∆φ(x0 ) + H(Dx φ, x0 ) 2 2 σ2 ≥ − ∆u(x0 ) + H(Dx u, x0 ) = H. 2

2.2. Monge-Kantorowich problem. To obtain formally the EulerLagrange equation to the Monge-Kantorowich we will suppose that both µ+ and µ− have densities ρ+ and ρ− . Let s(x) be an optimal mass transport map, µ the measure in R2n induced by s(x), with marginals µ± . Let w be a divergence free vector field in Rn and ϕτ the flow associated to the differential equation w(z) d z= + . dτ ρ (z) Since w has zero divergence   + d ∇· ρ ϕτ = 0. dτ + + 2n Therefore ϕ# as τ µ = µ . Define the measure µτ in R Z Z φ(x, y)dµτ = φ(ϕτ (x), y)dµ.

Since µ0 = µ, and µ is optimal, we have Z d 2 |x − y| dµτ = 0, dτ τ =0 this is Z 2

d (ϕτ (x) − y) · ϕτ (x)dµ = 0. dτ τ =0

This implies Z (x − s(x)) · w(x) = 0.

3. DUALITY

241

This identity holds for all the versions free vector fields. Consequently, the function x − s(x) is a gradient. Therefore s(x) = Dx Ψ(x), for some Ψ(x). The condition s# µ+ = µ− which, by the change of variables formula is equivalent to ρ+ (x) = ρ− (s(x)) det Ds(x), which can be written asMonge-Amp´ere equation ρ+ (x) = ρ− (DΨ(x)) det D2 Ψ(x). Exercise 179. Use the minimax principle, see exercise 25, to determine the dual of Monge-Kantorowich problem. Exercise 180. Consider the anti-optimal transport problem which consists in determining the measure π(x, y) with marginals µ1 and µ2 which maximizes Z |x − y|2 dπ(x, y). R2n

Determine its dual. Exercise 181. Use minimax principle to determine the dual of the problem Z min c(x, y)π(x, y)dxdy R2n

over all nonnegative probability densities π which satisfy Z Z π(x, y)dx = π(y, x)dx. Rn

Rn

3. Duality According to the informal ideas discussed in section 1, we are now going to discuss rigorously the duality theory. The main tool is the Legendre-Fenchel-Rockefellar theorem, whose proof will be presented in what follows, our proof is based in the one presented in [Vil03b]. Let E be a locally convex topological vector space with dual E 0 . The duality pairing between E and E 0 is denoted by (·, ·). Let h : E →

242

5. DUALITY THEORY

(−∞, +∞] be a convex function. The Legendre-Fenchel transform h∗ : E 0 → [−∞, +∞] of h is defined by  h∗ (y) = sup (x, y) − h(x) , x∈E

for y ∈ E 0 . In a similar way, if g : E → [−∞, +∞) is concave we define  g ∗ (y) = inf (x, y) − g(x) . x∈E

Theorem 133 (Fenchel-Legendre-Rockafellar). Let E be a locally convex topological vector space over R with dual E 0 . Let h : E → (−∞, +∞] be a convex function and g : E → [−∞, +∞) a concave function. Then, if there exists a point x0 where both g and h are finite and at least one of them is continuous, (142)

min0 [h∗ (y) − g ∗ (y)] = sup [g(x) − h(x)] . y∈E

x∈E

Remark. It is part of the theorem that the infimum in the left-hand side above is a minimum. Proof. First we show the “≥” inequality in (142). Recall that inf 0 [h∗ (y) − g ∗ (y)] = inf 0 sup [g(x1 ) − h(x2 ) − (y, x1 − x2 )] .

y∈E

y∈E x1 ,x2 ∈E

By choosing x1 = x2 = x we conclude that inf [h∗ (y) − g ∗ (y)] ≥ sup [g(x) − h(x)] .

y∈E 0

x∈E

The opposite inequality is more involved and requires the use of HahnBanach’s theorem. Let λ = sup [g(x) − h(x)] . x∈E

If λ = +∞ there is nothing to prove, thus we may assume λ < +∞. We just need to show that there exists y ∈ E 0 such that for all x1 and x2 we have (143)

g(x1 ) − h(x2 ) − (y, x1 − x2 ) ≤ λ,

since then, by taking the supremum over x1 and x2 yields h∗ (y) − g ∗ (y) ≤ λ.

3. DUALITY

243

From λ ≥ g(x) − h(x) it follows g(x) ≤ λ + h(x). Hence the following convex subsets of E × R:  C1 = (x1 , t1 ) ∈ E × R : t1 < g(x1 ) and  C2 = (x2 , t2 ) ∈ E × R : λ + h(x2 ) < t2 . are disjoint. Let x0 as in the statement of the theorem. We will assume that g is continuous at x0 (for the case in which h is the continuous function the argument is similar). Since (x0 , g(x0 ) − 1) ∈ C1 and g is continuous at x0 , C1 has non empty interior. Therefore, see [?, Chpt 4, sect 14.5], the sets C1 and C2 can be separated by a nonzero linear function, i.e., there exists a nonzero vector z = (w, α) ∈ E 0 × R such that inf (z, c1 ) ≤ sup (z, c2 ),

c1 ∈C1

c2 ∈C2

that is, for any x1 such that g(x1 ) > −∞ and for any x2 s.t. h(x2 ) < +∞ we have (w, x1 ) + αt1 ≤ (w, x2 ) + αt2 , whenever t1 < g(x1 ) and λ + h(x2 ) < t2 . Note that α can not be zero. Otherwise by using x2 = x0 and taking x1 in a neighborhood of x0 where g is finite we deduce that w is also zero. Therefore α > 0, otherwise, by taking t1 → −∞ we would obtain a contradiction. Dividing w by α and letting y = − wα , we would obtain −(y, x1 ) + g(x1 ) ≤ −(y, x2 ) + h(x2 ) + λ. This is equivalent to (143) and thus we completed the proof.



Remark. The condition of continuity at x0 can be relaxed to the condition of “Gˆateaux continuity” or directional continuity, that is the function t 7→ f (x0 + tx) is continuous at t = 0 for any x ∈ E. Here f stands for either h or g.

244

5. DUALITY THEORY

4. Generalized Mather problem The generalized Mather problem is an infinite dimensional linear programming problem. Its dual problem, that we compute in this section, can be obtained using Fenchel-Legendre-Rockafellar’s theorem, as we explain in what follows. Let Ω = Tn × U . If U is bounded, set γ = 1, otherwise, let γ be a function γ(v) : Ω → [1, +∞) satisfying |v| = 0. |v|→+∞ γ(v)

L(x, v) = +∞, |v|→+∞ γ(v) lim

lim

Let M be the set of Radon measures in Ω with weight γ, that is,   Z M = µ signed measure in Ω with γd|µ| < ∞ . Ω

The set M is the dual of the set Cγ,0 (Ω) of continuous functions φ that satisfy φ (144) kφkγ = sup < ∞, γ Ω if U is bounded, and, if U is unbounded, satisfy both (144) and φ(x, v) = 0. |v|→∞ γ(v) lim

Let   Z M1 = µ ∈ M : dµ = 1, µ ≥ 0 , Ω

and   Z Z v k n M2 = cl µ ∈ M : A ϕdµ = Bϕdν, ∀ϕ(x) ∈ C (T ) , Ω



in which k is the degree of differentiability needed on ϕ so that Av ϕ is well defined, and the closure cl is taken in the weak topology. For φ ∈ Cγ,0 (Ω) let h(φ) = sup (−φ(x, v) − L(x, v)). (x,v)∈Ω

4. GENERALIZED MATHER PROBLEM

245

Since h is the supremum of convex functions, it is also a convex function, and, as was shown in [Gom02a], it is also continuous with respect to uniform convergence in Cγ,0 (Ω). Consider the set  C = cl φ : φ = Av ϕ, ϕ ∈ C k (Tn ) , where cl denotes the closure in Cγ,0 . Since Av is a linear operator, C is a convex set. Let ν be a fixed probability measure on Tn , and let µν as in (138). Define  R − φdµ if φ ∈ C, ν g(φ) = −∞ otherwise. As C is a closed convex set, g is concave and upper semicontinuous. R R Note that if φ = Av ϕ, then φdµν = Bϕdν. We claim that the dual of sup g(φ) − h(φ)

(145)

φ∈C0γ (Ω)

is the generalized Mather problem . We start by computing the Legendre transforms of h and g. Proposition 134. We have R  Ldµ h∗ (µ) = +∞ and g ∗ (µ) =

 0

if

µ ∈ M1

otherwise,

if µ ∈ M2

−∞ otherwise.

Proof. By its definition  Z  h (µ) = sup − φdµ − h(φ) . ∗

φ∈C0γ (Ω)

First we show that if µ is non-positive then h∗ (µ) = ∞.

246

5. DUALITY THEORY

Lemma 135. If µ 6≥ 0 then h∗ (µ) = +∞. Proof. If µ 6≥ 0 we can choose a sequence of non-negative functions φn ∈ C0γ (Ω) such that Z −φn dµ → +∞. Therefore, since sup −φn − L ≤ 0, we have h∗ (µ) = +∞.



Lemma 136. If µ ≥ 0 then Z  Z ∗ h (µ) ≥ Ldµ + sup ψdµ − sup ψ . ψ∈C0γ (Ω)

Proof. Let Ln be a sequence of functions in C0γ (Ω) increasing pointwisely to L. Any φ in C0γ (Ω) can be written as φ = −Ln − ψ, for some ψ in C0γ (Ω). Therefore  Z  sup − φdµ − h(φ) = φ∈C0γ (Ω)

Z =

Z Ln dµ +

sup ψ∈C0γ (Ω)

 ψdµ − sup(Ln + ψ − L) .

Since sup (Ln − L) ≤ 0, we have sup(Ln + ψ − L) ≤ sup ψ. Therefore  Z  sup − φdµ − h(φ) ≥ φ∈C0γ (Ω)

Z sup ψ∈C0γ (Ω)

Z Ln dµ +

By the monotone convergence theorem Z Z Ln dµ → Ldµ.

 ψdµ − sup ψ .

4. GENERALIZED MATHER PROBLEM

247

Thus,  Z  Z Z  sup − φdµ − h(φ) ≥ Ldµ + sup ψdµ − sup ψ , φ∈C0γ (Ω)

ψ∈C0γ (Ω)

as required.

If then

R



Ldµ = +∞ then h∗ (µ) = +∞. On the other hand, if Z

 ψdµ − sup ψ

sup

Z ≥ sup α

ψ∈C0γ (Ω)

R

dµ 6= 1

 dµ − 1 = +∞,

α∈R

by choosing ψ = α, constant. Therefore h∗ (µ) = +∞. When

R

dµ = 1, the previous lemma implies Z ∗ h (µ) ≥ Ldµ,

by choosing ψ = 0. Additionally, for each φ Z (−φ − L)dµ ≤ sup(−φ − L), if

R

dµ = 1. Therefore  Z  Z sup − φdµ − h(φ) ≤ Ldµ. φ∈C0γ (Ω)

In this way, R  Ldµ ∗ h (µ) = +∞

if µ ∈ M1 otherwise.

Let µν be such that Z

v

A ϕdµν =

Z Bϕdν,

for all ϕ ∈ C k (Tn ). We can write any measure µ ∈ M2 as a sum of µν + µ ˆ, with Z Av ϕdˆ µ = 0,

248

5. DUALITY THEORY

for all ϕ ∈ C k (Tn ). By continuity, it follows Z φdˆ µ = 0, for all φ ∈ C. Furthermore, for any µ 6∈ M2 , there exists φˆ ∈ C such that Z ˆ φd(µ − µν ) 6= 0. Thus g ∗ (µ) = inf − φ∈C

Z

Z φdµ +

φdµν =

 0

if µ ∈ M2

−∞ otherwise. 

Theorem 137. (146)

sup (g(φ) − h(φ)) = min (h∗ (µ) − g ∗ (µ)). µ∈M

φ∈Cγ,0 (Ω)

Note 1: minµ∈M (h∗ (µ) − g ∗ (µ)) = minµ∈M1 ∩M2

R

Ldµ.

Note 2: It is part of the theorem that the right-hand side of (146) is a minimum, and therefore there exists a generalized Mather measure.

Proof. The set g > −∞ is non-empty, and, in this set, h is a continuous function as proved in [Gom02a]. Then the result follows from Fenchel-Legendre-Rockafellar’s Theorem, see, for instance [Vil03b].  Let H(ϕ, x) = sup −L(x, v) − Av ϕ. v v

As an example, suppose A ϕ = ∆ϕ + vDx ϕ. Then H(ϕ, x) = −∆ϕ + H(Dx ϕ, x).

4. GENERALIZED MATHER PROBLEM

249

The result in Theorem 137 can then be restated in the more convenient identity:   Z Z (147) min Ldµ = − inf sup H(ϕ, x) + Bϕdν , µ

ϕ

x

where the minimum on the left-hand side is taken over all measures µ that satisfy (140), and the infimum on the right-hand side is taken over all ϕ ∈ C k (Tn ). In the remaining of this section we consider Mather’s classical problem Av ϕ = vDx ϕ and B = 0. Theorem 138. Let Av ϕ = vDx ϕ. Let H ? given by H ? = − sup (h2 (φ) − h1 (φ)). φ∈C0γ (Ω)

Then H ? = inf{λ : ∃ϕ ∈ C 1 (Tn ) : H(Dx ϕ, x) < λ}. Proof. It is enough to observe that H? =

inf

sup −vDx ϕ − L =

ϕ∈C 1 (Tn ) (x,v)∈Ω

inf

sup H(Dx ϕ, x).

ϕ∈C 1 (Tn ) x∈Tn

 Theorem 139. H ? is the only value for which H(Dx u, x) = H ? admits a periodic viscosity solution.

Proof. Let u be a periodic viscosity solution of H(Dx u, x) = H. We claim that there is no C 1 function ψ such that (148)

sup H(Dx ψ, x) < H. x

250

5. DUALITY THEORY

By contradiction, let ψ be a function satisfying (148). Since u and ψ are periodic functions, there exists a point x0 in which u − ψ would have a local minimum. But then H(Dx ψ, x0 ) ≥ H, which is is a contradiction. Thus, we conclude that H ? ≥ H. To prove the other inequality, consider a standard mollifier η and define u = η ∗ u. Then H(Dx u , x) ≤ H + h(, x), where h(, x) = sup sup |H(p, x) − H(p, y)|, |p|≤R |x−y|≤

and R is an estimate for the Lipschitz constant of u. Let H  = H + sup h(, x). x

Then u satisfies H(Dx u , x) ≤ H  . Therefore H ? ≤ lim H  = H. →0

?

Consequently H = H.



4.1. Regularity. In this section we present (with small adaptations) the regularity results for viscosity solutions in the support of the Mather measures by [EG01]. We should point out that the proofs of Theorems 141–147 presented here appeared in [EG01]. For the setting of this survey, we had to add an elementary lemma, Lemma 140, for the presentation to be self-contained, as our definition of Mather measures differs from the one used in [EG01]. Lemma 140. Let µ be a minimizing holonomic measure. Then Z Dx L(x, v)dµ = 0. Td ×Rd

4. GENERALIZED MATHER PROBLEM

251

Proof. Let h ∈ Rd , consider the measure µh on Td × Rd given by Z Z φ(x, v)dµh = φ(x + h, v)dµ, Td ×Rd

Td ×Rd

for all continuous and compactly supported function φ : Td × Rd → R. Clearly, for every h, µh is holonomic. Since µ is minimizing, it follows Z d L(x + h, v)dµ = 0, d =0 that is, Z Dx L(x, v)hdµ = 0. Td ×Rd

Since h ∈ R is arbitrary, the statement of the Lemma follows.



It will be convenient to define the measure µ ˜ on Td × Rd as the push forward measure of the measure µ with respect to the one to one map (v, x) 7→ (p, x), where p = Dv L(v, x). In other words we define the measure µ ˜ on Td × Rd to be Z Z φ(x, p)d˜ µ= φ(x, Dv L(x, v))dµ. Td ×Rd

Td ×Rd

We also define projection µ ¯ in Td of a measure µ in Td × Rd as Z Z ϕ(x)d¯ µ(x) = ϕ(x)dµ(x, v). Td

Td ×Rd

Note that, in similar way, µ ¯ is also the projection of the measure µ ˜. Observe that for any smooth function ϕ(x) we have that µ ˜ satisfies the following version of the holonomy condition: Z Dp H(p, x)Dx ϕ(x)d˜ µ = 0, Td ×Rd

because we can use identity (??) if p = Dv (x, v). Theorem 141. Let u be any viscosity solution of (132), and let µ be any minimizing holonomic measure. Then µ ¯-almost everywhere, Dx u(x) exists and p = Dx u(x), µ ˜-almost everywhere.

252

5. DUALITY THEORY

Proof. Let u be any viscosity solution of (132). Let η be a standard mollifier, u = η ∗ u. By strict uniform convexity there exists γ > 0 such that for any p, q ∈ Rd and any x ∈ Td we have γ H(p, x) > H(q, x) + Dp H(q, x)(p − q) + |p − q|2 . 2 By Theorem 131, any viscosity solution of (132), and in particular u, is Lipschitz. Recall that, by Rademacher’s theorem [Eva98a], a locally Lipschitz function is differentiable Lebesgue almost everywhere. Using p = Dx u(y) and q = Dx u (x), conclude that for every point x and for Lebesgue almost every point y: γ H(Dx u(y), x) ≥ H(Dx u (x), x)+Dp H(Dx u (x), x)(Dx u(y)−Dx u (x))+ |Dx u (x)−Dx u(y)|2 . 2 Multiplying the previous identity by η (x − y) and integrating over Rd in y yields Z Z γ   2 H(Dx u (x), x)+ η (x−y)|Dx u (x)−Dx u(y)| dy ≤ η (x−y)H(Dx u(y), x)dy ≤ H+O(). 2 Rd Rd Let

Z γ β (x) = η (x − y)|Dx u (x) − Dx u(y)|2 dy. 2 Rd Now observe that Z Z γ  2 [H(Dx u (x), x) − H(p, x) − Dp H(p, x)(Dx u (x) − p)] d˜ µ |Dx u (x) − p| d˜ µ≤ 2 Td ×Rd d d T ×R Z ≤ H(Dx u (x), x)d˜ µ − H, Td ×Rd

because

Z

Dp H(x, p)Dx u (x) = 0,

Td ×Rd

and pDp H(x, p) − H(x, p) = L(x, Dp H(x, p)), and

R Td ×Rd

L(x, Dp H(x, p))d˜ µ = −H. Therefore, Z Z γ  2 |Dx u (x) − p| d˜ µ+ β (x)d¯ µ ≤ O(). 2 Td ×Rd Td

Thus, for µ ¯-almost every point x, β (x) → 0. Therefore, µ ¯-almost every point is a point of approximate continuity of Dx u (see [EG92], p. 49).

4. GENERALIZED MATHER PROBLEM

253

Since u is semiconcave (Proposition ??), it is differentiable at points of approximate continuity. Furthermore Dx u → Dx u pointwise, µ ¯-almost everywhere, and so Dx u is µ ¯ measurable. Also we have p = Du(x), µ ˜ − almost everywhere.  By looking at the proof the previous theorem we can also state the following useful result: Corollary 142. Let η be a standard mollifier, u = η ∗ u. Then Z |Dx u − Dx u|2 d¯ µ ≤ C, Td

as  → 0. As a Corollary we formulate an equivalent form of Theorem 141. Corollary 143. Let u be any viscosity solution of (132), and let µ be any minimizing holonomic measure. Then µ-almost everywhere, Dx u(x) exists and (149)

Dv L(v, x) = Dx u(x)

µ − almost everywhere.

and (150)

Dx L(v, x) = −Dx H(Dx u(x), x)

µ − almost everywhere.

Proof. First we observe that the measure µ ˜ is the push forward measure of the measure µ with respect to the one to one map (v, x) 7→ (p, x), where p = Dv L(v, x). Therefore an µ ˜ – almost everywhere identity F1 (p, x) = F2 (p, x)

(p, x)-˜ µ almost everywhere

implies the µ – almost everywhere identity F1 (Dv L(v, x), x) = F2 (Dv L(v, x), x)

(v, x)-µ almost everywhere.

254

5. DUALITY THEORY

Thus (149) follows directly from Theorem 141. Using (149) and the identity Dx L(v, x) = −Dx H(Dv L(v, x), x), we arrive at (150).  We observe that from the previous corollary it also follows Z Dp H(Dx , x)Dx ud¯ µ = 0. Td

Indeed, Z Z Dp H(Dx u, x)Dx ud¯ µ= Td

Z Dp H(Dx , x)Dx u d¯ µ+ 

Td

We have

Z

Dp H(Dx u, x) (Dx u − Dx u ) d¯ µ.

Td

Dp H(Dx , x)Dx u d¯ µ = 0.

Td

To handle the second term, fix δ > 0. Then Z Z Z 1  2 µ+ d Dp H(Dx u, x) (Dx u − Dx u ) ≤ δ d |Dp H(Dx u, x)| d¯ δ T

T

|Dx u − Dx u |2 d¯ µ.

Td

Note that since u is Lipschitz the term Dp H(Dx u, x) is bounded, and R so is Td |Dp H(Dx u, x)|2 d¯ µ. Send  → 0, and then let δ → 0. Theorem 144. Let u be any viscosity solution of (132), and let µ be any minimizing holonomic measure. Then Z |Dx u(x + h) − Dx u(x)|2 d¯ µ ≤ C|h|2 . Td

Proof. Applying Theorem ?? we have H(Dx u (x + h), x + h) ≤ H + C. By Theorem 141 the derivative Dx u(x) exists µ ¯ almost everywhere. By proposition ?? viscosity solution satisfies equation (132) in classical sense at all points of differentiability. Thus H(Dx u(x), x) = H for µ ¯ almost all points x. Now observe that C ≥ H(Dx u (x + h), x + h) − H(Dx u(x), x) = H(Dx u (x + h), x + h) − H(Dx u (x + h), x) + H(Dx u (x + h), x) − H(Dx u(x), x)

4. GENERALIZED MATHER PROBLEM

255

The term H(Dx u (x + h), x + h) − H(Dx u (x + h), x) = Dx H(Dx u (x + h), x)h + O(h2 )

= Dx H(Dx u(x), x)h + O(h2 + h|Dx u (x + h) − Dx γ ≥ Dx H(Dx u(x), x)h + O(h2 ) − |Dx u (x + h) − D 4 Therefore, for µ ¯ almost every x, we have γ H(Dx u (x+h), x)−H(Dx u, x) ≤ C−Dx H(Dx u(x), x)h+ |Dx u (x+h)−Dx u(x)|2 +Ch2 . 4 Since γ H(Dx u (x+h), x)−H(Dx u, x) ≥ |Dx u (x+h)−Dx u(x)|2 +Dp H(Dx u, x)(Dx u (x+h)−Dx u(x)) 2 we have Z Z γ  2 2 |Dx u (x+h)−Dx u(x)| d¯ µ ≤ C+C|h| − Dx H(Dx u(x), x)hd¯ µ. 4 By (150) and Lemma 140 it follows Z Z Dx H(Dx u(x), x)hd¯ µ = − Dx L(v, x)hdµ = 0. As  → 0, through a suitable subsequence (since Dx u (x+h) is bounded in L2µ¯ ), we may assume that Dx u (x+h) * ξ(x) in L2µ¯ , for some function ξ ∈ L2µ¯ , and Z |ξ − Dx u|2 d¯ µ ≤ C|h|2 .

Finally, we claim that ξ(x) = Dx u(x + h) for µ ¯ almost all x. This follows from Theorem 141 and the fact that for µ ¯ almost all x we have − − ξ(x) ∈ Dx u(x + h), where Dx stands for the subdifferential. To see this, observe that by Proposition ?? u is semiconcave, therefore u are uniformly semiconcave, that is u (y + h) ≤ u (x + h) + Dx u (x + h)(y − x) + C|y − x|2 , where C is independent of . Fixing y and integrating against a nonnegative function ϕ(x) ∈ L2µ¯ yields Z  u (y + h) − u (x + h) − Dx u (x + h)(y − x) − C|y − x|2 ϕ(x)d¯ µ≤0 Td

By passing to the limit we have that u(y+h) ≤ u(x+h)+ξ(x)(y−x)+C|y−x|2

for all y and µ ¯-almost all x,

256

5. DUALITY THEORY

that is, ξ(x) ∈ Dx− u(x + h) for µ ¯-almost all x.



Lemma 145. Let u be any viscosity solution of (132), and let µ be any minimizing holonomic measure. Let ψ : Td × R → R be a smooth function. Then Z Dp H(Dx u, x)Dx [ψ(x, u(x))] d¯ µ=0 Td

Proof. Clearly we have Z Dp H(Dx u, x)Dx [ψ(x, u (x))] d¯ µ = 0. Td

By the uniform convergence of u to u, and L2µ¯ convergence of Dx u to Dx u, see Corollary 142, we get the result.  Theorem 146. Let u be any viscosity solution of (132), and let µ be any minimizing holonomic measure. Then, for µ ¯ almost every x and d all h ∈ R , |u(x + h) − 2u(x) + u(x − h)| ≤ C|h|2 . Proof. Let h 6= 0 and define u˜(x) = u(x + h),

uˆ(x) = u(x − h).

Consider the mollified functions u˜ , uˆ , where we take 0 <  ≤ η|h|2 ,

(151) for small η > 0. We have

H(D˜ u , x + h) ≤ H + C,

H(Dˆ u , x − h) ≤ H + C.

For µ ¯-almost every point x, (for which Du(x) exists and therefore H(Du(x), x) = H) we have

H(D˜ u , x)−2H(Du, x)+H(Dˆ u , x) ≤ 2C+H(D˜ u , x)−H(D˜ u , x+h)+H(Dˆ u , x)−H(Dˆ u , x−h Hence γ (|D˜ u − Du|2 + |Dˆ u − Du|2 ) + Dp H(Du, x) · (D˜ u − 2Du + Dˆ u ) 2 ≤ C( + |h|2 ) + (Dx H(Dˆ u , x) − Dx H(D˜ u , x)) · h.

4. GENERALIZED MATHER PROBLEM

257

Using the inequality

 ∂2H

∂ 2 H 2 2 Dx H(p, x)−Dx H(q, x) ·h ≤

∂p∂x |p − q| |h| ≤ γ4 |p−q|2 + γ1 ∂p∂x

|h| ,

P

∂2H ∂2H where ∂p∂x = sup sup z h (p, x)

j i ∂pj ∂xi , we arrive at p,x |z|=1,|h|=1 i,j

γ (|D˜ u − Du|2 + |Dˆ u − Du|2 ) + Dp H(Du, x) · (D˜ u − 2Du + Dˆ u ) ≤ C( + |h|2 ). 4 Fix now a smooth, nondecreasing, function Φ : R  → R, and write  u 0 , and φ := Φ ≥ 0. Multiply the last inequality above by φ u˜ −2u+ˆ |h|2 integrate with respect to µ ¯:    Z γ u˜ − 2u + uˆ  2  2 (|D˜ u − Du| + |Dˆ u − Du| )φ (152) d¯ µ 4 Td |h|2 Z Dp H(Du, x) · (D˜ u − 2Du + Dˆ u )φ(· · · ) d¯ µ + Td Z 2 φ(· · · ) d¯ µ. ≤ C( + |h| ) Td

Now the second term on the left hand side of (152) equals     Z Z u˜ − 2u + uˆ 2 (153) |h| Dp H(p, x) · Dx Φ d¯ µ |h|2 Rd Td and thus, by Lemma 145 it vanishes. So now dropping the above term from (152) and rewriting, we deduce (154)    Z u (x + h) − 2u(x) + u (x − h)   2 |Du (x + h) − Du (x − h)| φ d¯ µ |h|2 Td    Z u (x + h) − 2u(x) + u (x − h) 2 ≤ C( + |h| ) φ d¯ µ. |h|2 Td We confront now a technical problem, as (154) entails a mixture of first-order difference quotients for Du and second-order difference quotients for u, u . We can however relate these expressions, since u is semiconcave. To see this, first of all define (155) E := {x ∈ supp(µ) | u (x + h) − 2u(x) + u (x − h) ≤ −κ|h|2 },

258

5. DUALITY THEORY

the large constant κ > 0 to be fixed below. The functions α α (156) u¯(x) := u(x) − |x|2 , u¯ (x) := u (x) − |x|2 2 2 are concave. Also a point x ∈ supp(¯ µ) belongs to E if and only if (157)

u¯ (x + h) − 2¯ u(x) + u¯ (x − h) ≤ −(κ + α)|h|2 .

Set (158)





f (s) := u¯



h x+s |h|

 (−|h| ≤ s ≤ |h|).

Then f is concave, and u¯ (x + h) − 2¯ u (x) + u¯ (x − h) = f  (|h|) − 2f  (0) + f  (−|h|) Z |h| 00 f  (x)(|h| − |s|) ds = −|h|

Z

|h|

00

f  (s) ds

≥ |h|

00

(since f  ≤ 0)

−|h|

h

0

0

= |h| f  (|h|) − f  (−|h|)

i

= (D¯ u (x + h) − D¯ u (x − h)) · h. Consequently, if x ∈ E , this inequality and (157) together imply 2|¯ u (x) − u¯(x)| + |D¯ u (x + h) − D¯ u (x − h)||h| ≥ (κ + α)|h|2 . Now |¯ u (x) − u¯(x)| ≤ C on Td , since u is Lipschitz continuous. We may therefore take η in (151) small enough to deduce from the foregoing that κ (159) |D¯ u (x + h) − D¯ u (x − h)| ≥ ( + α)|h|. 2 But then κ (160) |Du (x + h) − Du (x − h)| ≥ ( − α)|h|. 2 Return now to (154). Taking κ > 2α and  1 if z ≤ −κ φ(z) = 0 if z > −κ.

4. GENERALIZED MATHER PROBLEM

259

The inequality (154) was derived for smooth functions φ. However, by replacing φ in (154) by a sequence φn of smooth functions increasing pointwise to φ, and using the monotone convergence theorem, we conclude that (154) holds for this function φ. Then we discover from (154) that κ ( − α)2 |h|2 µ ¯(E ) ≤ C( + |h|2 )¯ µ(E ). 2 We fix κ so large that κ ( − α)2 ≥ C + 1, 2 to deduce (|h|2 − C)¯ µ(E ) ≤ 0. Thus µ ¯(E ) = 0 if η in (151) is small enough, and this means u (x + h) − 2u(x) + u (x − h) ≥ −κ|h|2 for µ ¯-almost every point x. Now let  → 0: u(x + h) − 2u(x) + u(x − h) ≥ −κ|h|2 µ ¯-almost everywhere Since u(x + h) − 2u(x) + u(x − h) ≤ α|h|2 owing to the semiconcavity, we have |u(x + h) − 2u(x) + u(x − h)| ≤ C|h|2 for µ ¯-almost every point x. As u is continuous, the same inequality obtains for all x ∈ supp(¯ µ).  Now we state and prove the main result of this section. Theorem 147. Let u be any viscosity solution of (132), and let µ be any minimizing holonomic measure. Then for µ ¯-almost every x, Dx u(x) exists and for Lebesgue almost every y (161)

|Dx u(x) − Dx u(y)| ≤ C|x − y|.

260

5. DUALITY THEORY

Proof. First we show that (162)

|u(y) − u(x) − (y − x) · Dx u(x)| ≤ C|x − y|2 .

Fix y ∈ Rd and take any point x ∈ supp(¯ µ) at which u is differentiable. According to Theorem 146 with h := y − x, we have (163)

|u(y) − 2u(x) + u(2x − y)| ≤ C|x − y|2 .

By semiconcavity, we have (164)

u(y) − u(x) − Du(x) · (y − x) ≤ C|x − y|2 ,

and also (165)

u(2x − y) − u(x) − Du(x) · (2x − y − x) ≤ C|x − y|2 .

Use (165) in (163): u(y) − u(x) − Du(x) · (y − x) ≥ −C|x − y|2 . This and (164) establish (162). Estimate (161) follows from (162), as follows. Take x, y as above. Let z be a point to be selected later, with |x − z| ≤ 2|x − y|. The semiconcavity of u implies that (166)

u(z) ≤ u(y) + Du(y) · (z − y) + C|z − y|2 .

Also, u(z) = u(x)+Du(x)·(z−x)+O(|x−z|2 ), u(y) = u(x)+Du(x)·(y−x)+O(|x−y|2 ), according to (162). Insert these identities into (166) and simplify: (Du(x) − Du(y)) · (z − y) ≤ C|x − y|2 . Now take z := y + |x − y| to deduce (161).

Du(x) − Du(y) |Du(x) − Du(y)|

4. GENERALIZED MATHER PROBLEM

261

Now take any point x ∈ supp(¯ µ), and fix y. There exist points xk ∈ supp(¯ µ) (k = 1, . . . ) such that xk → x and u is differentiable at xk . According to estimate (162) |u(y) − u(xk ) − Du(xk ) · (y − xk )| ≤ C|xk − y|2

(k = 1, . . . ).

The constant C does not depend on k or y. Now let k → ∞. Owing to (161) we see that {Du(xk )} converges to some vector η, for which |u(y) − u(x) − η · (y − x)| ≤ C|x − y|2 . Consequently u is differentiable at x and Du(x) = η.



It follows from Theorem 147 that function v defined by Theorem ?? is Lipschitz on a set of full measure µ ¯. Indeed, by substituting the L.H.S. and the R.H.S. of (149) into Hp (p, x) = Hp (p, x) in place of p’s and using (??) we have v(x) = Dp H(Du(x), x) µ ¯ almost everywhere. We can then extend v as a Lipschitz function to the support of µ, which is contained in the closure of this set of full measure. Note that any Lipschitz function ϕ defined on a closed set K can be extended to a globally defined Lipschitz function ϕˆ in the following way: without loss of generality assume that Lip(ϕ) = 1; define ϕ(x) ˆ = inf ϕ(y) + 2d(x, y). y∈K

An easy exercise then shows that ϕˆ = ϕ in K and that ϕˆ is Lipschitz. Therefore we may assume that v is globally defined and Lipschitz. 4.2. Holonomy variations. In this section we study a class of variations that preserve the holonomy constraint. These variations will be used later to establish the invariance under the Euler-Lagrange flow of minimizing holonomic measures. Let ξ : Td → Rd , ξ(x) be a C 1 vector field on Td . Let Φ(t, x) be the flow by ξ, i.e.,  ∂ Φ(t, x) = ξ Φ(t, x) . Φ(0, x) = x, and ∂t

262

5. DUALITY THEORY

Consider the prolongation of ξ to Td × Rd , which is the vector field on Td × Rd given by (167)

x˙ k (x, v) = ξk (x) ,

v˙ k (x, v) = vi

∂ξk (x) . ∂xi

Lemma 148. The flow of (167) is given by (168)

Xk (t, x, v) = Φk (t, x) ,

Vk (t, x, v) = vs

∂Φk (t, x). ∂xs

Proof. Since the X-part of the flow coincides with the Φ-flow, it only remains to show that  ∂ V (t, x, v) = v˙ X(t, x, v), V (t, x, v) . V (0, x, v) = v , and ∂t The first statement (V (0, x, v) = v) is clear since the map x 7→ Φ(0, x) is the identity map. The second statement can be rewritten as ∂ V (t, x, v) ∂t k

= Vi (t, x, v)

∂ξk . ∂xi Φ(t,x)

A simple computations yields ∂ V (t, x, v) ∂t k

= vs ∂x∂ s

   ∂ξk ∂Φi ∂ξk = vs ∂x∂ s ξk Φ(t, x) = vs = Vi (t, x, v) ∂xi Φ(t,x) ∂xs (t,x) ∂xi Φ(t,x)

∂ Φ (t, x) ∂t k

which is the desired identity.



For any real number t and any function ψ(x, v), define a new function ψt as follows  (169) ψt (x, v) = ψ X(t, x, v), V (t, x, v) . Thus the flow (168) generates the flow on space of functions ψ(x, v) given by (169). Lemma 149. The set C, defined in (??), is invariant under the flow given by (169). Proof. Let g ∈ C 1 (Td ) be such that ψ(x, v) = vi ∂x∂ i g(x). Let gt  denote the flow by Φ of the function g, i.e., gt (x) = g Φ(t, x) . We

4. GENERALIZED MATHER PROBLEM

263

claim that for any real number t we have ∂ ψt (x, v) = vi gt (x), ∂xi where ψt is given by (169). Indeed, ψt (x, v) = Vk (t, x, v)

 ∂g ∂Φk ∂ ∂g ∂  g Φ(t, x) = vs gt (x), = vs = vs ∂xk X(t,x,v) ∂xk Φ(t,x) ∂xs (t,x) ∂xs ∂xs

and so the Lemma is proven.



The flow on functions (169) generates the flow on measures: (t, µ) 7→ µt , where Z Z (170) ψdµt = ψt dµ. Lemma 150. The flow (170) preserves the holonomy constraint. Proof. Let µ be a holonomic measure. We have to prove that µt R is also a holonomic, i.e., ψdµt = 0 for any ψ ∈ C. This is clear since the flow (169) preserves the set C.  Theorem 151. Let µ be a minimizing measure for the action (135), subject to the holonomy constraint. Then for any C 1 vector field ξ : Td → Rd we have Z ∂L ∂L (171) ξs + vk ∂ ξs dµ = 0. ∂xs ∂vs ∂xk Proof. Let µt be the flow generated from µ by (170). Relation  R (171) expresses the fact dtd L(x, v)dµt = 0.  t=0

4.3. Invariance. In this section we present a new proof of the invariance under the Euler-Lagrange flow of minimal holonomic measures. In what follows ( )−1 js denotes the j, s entry of the inverse matrix. We will only use this notation for symmetric matrices, thus, this notation will not lead to any ambiguity. Before stating and proving the main Theorem of this section, we will prove an auxiliary lemma.

264

5. DUALITY THEORY

Lemma 152. Let µ be a minimal holonomic measure. Let v  (x) be any smooth function. Let φ(x, v) be any smooth compactly supported function. Then

(172)     Z  ∂φ  ∂ 2 L −1   ∂L ∂φ ∂ 2L     vk x, v (x) + x, v (x) (x, v) − vk x, v (x) dµ = x, v (x) ∂xk ∂vj ∂ 2 v js ∂xs ∂xk ∂vs Z Z Z  ∂L  ∂     ∂L ∂  ∂  ∂L   ˙ φ x, v (x) dµ− vk x, v (x) Xs dµ+ vk x, v  (x) − (x, v) vk ∂xk ∂xk ∂vs ∂vs ∂vs ∂xk where X˙ s is a function of x only (does not depend on v), and is defined as follows:    ∂ 2 L −1  ∂φ   x, v (x) x, v  (x) . X˙ s (x) = 2 ∂vj ∂ v js Remark. We will only use this lemma for the case when v  is the standard smoothing of the function v(x), that is, v  = η ∗ v, where η is a standard mollifier. The function v(x) is the function whose graph contains the support of µ, given in Theorem ??. This explains the notation v  . Proof. This Lemma is based on Theorem 151. In this proof and bellow v  stands for the function v  (x). We have:    ∂vj ∂φ ∂  ∂φ    x, v (x) = vk φ x, v (x) − vk x, v (x) (x). vk ∂xk ∂xk ∂vj ∂xk

Rewrite the last term:  ∂ 2 L −1 ∂vj ∂vq ∂φ ∂φ ∂ 2L ∂ 2L vk (x, v  ) (x, v  (x)) (x) = vk (x, v  ) 2 (x, v  ) (x) = vk X˙ s (x) (x, v ∂vj ∂xk ∂vj ∂ v js ∂vs ∂vq ∂xk ∂vs ∂vq Plug these two lines into (172). And therefore we reduce (172) to (173)   Z  ∂ 2L ∂vq  ∂L ∂ 2L    ˙ Xs (x) (x, v) − vk (x, v ) + (x, v ) dµ = ∂xs ∂xk ∂vs ∂vs ∂vq ∂xk Z Z   ∂L  ∂   ∂  ∂L ∂L  ˙ − vk (x, v )Xs dµ+ vk (x, v  )− (x, v) X˙ s dµ. ∂xk ∂vs ∂vs ∂vs ∂xk

4. GENERALIZED MATHER PROBLEM

265

Using the chain rule in the LHS and the Leibniz rule in the RHS we further reduce (173) to Z

X˙ s



Z Z   ∂L  ∂  ∂L ∂L ∂  ∂L   ∂  ˙ (x, v)−vk (x, v ) dµ = − vk Xs (x, v ) dµ− vk (x, v) ∂xs ∂xk ∂vs ∂xk ∂vs ∂vs ∂xk

Noting the cancellation of the term

R

vk X˙ s ∂x∂ k



∂L (x, v  ) ∂vs



dµ, we see

that the last identity is equivalent to (171) with ξs (x) = X˙ s (x).



Theorem 153. Let µ be a minimizing holonomic measure. Then µ is invariant under the Euler-Lagrange flow.

Proof. By Lemma 59 we have to prove that for any smooth compactly supported function φ(x, v) Z (174)

∂φ ∂φ vk + ∂xk ∂vj



∂ 2L ∂ 2v

−1  js

 ∂L ∂ 2L − vk dµ = 0, ∂xs ∂xk ∂vs

where ( )−1 js stands for the j, s entry of the inverse matrix. The idea of the proof is first to rewrite (174) in an equivalent form and then apply an approximation argument. Since µ is supported by the graph v = v(x) we will change the x, v arguments with x, v(x) 2 −1 2L ∂φ ∂φ , ∂v , ∂∂ 2Lv js , and ∂x∂k ∂v , for the following four types of functions ∂x s j k occurring in (174): (175)     Z  ∂φ  ∂ 2 L −1  ∂L  ∂ 2L ∂φ x, v(x) + x, v(x) (x, v) − vk x, v(x) dµ = 0. vk x, v(x) ∂xk ∂vj ∂ 2 v js ∂xs ∂xk ∂vs To complete the proof of the theorem, we use Lemma 152. The first and second integrals in the RHS of (172) are zero due to the holonomy constraint. The third integral in the RHS of (172) tends to zero as  → 0, because |v  (x) − v(x)| < c and therefore |v  (x) − v| < c µ-a.e., and because X˙ s is uniformly Lipschitz and hence ∂xk X˙ s is uniformly bounded. Therefore the LHS of (172) tends to zero as  → 0.

266

5. DUALITY THEORY

But the LHS of (172) also tends to the LHS of (175) as  → 0. Indeed, since v(x) is a Lipschitz vector field we have ∂v  (x) is uniformly bounded. ∂x Moreover for any smooth function Ψ(x, v) we have    ∂  Ψ x, v  (x) → Ψ x, v(x) (uniformly) and Ψ x, v  (x) is uniformly bounded. ∂x Also note that for µ almost all (x, v) we have v = v(x). Therefore the Theorem is proven.  v  (x) → v(x) (uniformly)

and

5. Monge-Kantorowich problem In this section we are going to study the Monge-Kantorowich problem. First we will show that there exists a solution. Theorem 154. Let µ± be two probability measures on Rn with Z |x|2 dµ± < ∞. Rn

Then exists a measure µ which minimizes Z 1 |x − y|2 dµ(x, y) 2 R2n over all probability measures µ on R2n which satisfy µ|x = µ+ and µ|y = µ− . Remark. The integrability condition see, for instance [Vil03a].

R Rn

|x|2 dµ± < ∞ can be relaxed

Proof. Let µn be a minimizing sequence, that is, Z Z 1 1 2 |x − y| dµn → inf |x − y|2 dµ. µ 2 2 2n 2n R R Since the sequence µn satisfy µn |x = µ+ and µn |y = µ− we have Z sup |x|2 + |y|2 dµn < ∞. n

R2n

5. MONGE-KANTOROWICH PROBLEM

267

consequently, the sequence µn is precompact, that is, through a subsequence µn * µ, for some measure µ with the same marginals. Let ck (x, y) be a sequence of compactly supported continuous functions such that ck (x, y) increases pointwise to 21 |x − y|2 . Then, by the monotone convergence theorem Z Z 1 2 |x − y| dµ = lim ck (x, y)dµ k→∞ R2n 2 R2n Z = lim lim ck (x, y)dµn k→∞ n→∞ R2n Z 1 |x − y|2 dµn , ≤ lim n→∞ R2n 2 from which we conclude that µ is a minimizer.



Exercise 182. Show that the dual of Monge-Kantorowich problem consists in determining continuous functions φ(x) and ψ(y) such that 1 φ(x) + ψ(y) ≤ |x − y|2 2 and that maximize Z

Z ψ(y)dµ2 (y).

φ(x)dµ1 (x) + Rn

Rn

Let φ and ψ be two admissible functions, that is, 1 φ(x) + ψ(y) ≤ |x − y|2 . 2 Then |x|2 |y|2 φ(x) − + ψ(y) − ≤ −xy, 2 2 that is ˜ ˜ φ(x) + ψ(y) ≥ xy, 2 2 |x| |y| ˜ ˜ with φ(x) = 2 − φ(x) and ψ(y) = 2 − ψ(y). On the other hand, Z Z φ(x)dµ+ (x)+ ψ(y)dµ− (y) n n R R Z Z + − ˜ ˜ =− φ(x)dµ (x) − ψ(y)dµ (y) n n R R Z Z |y|2 − |x|2 + + dµ (x) + dµ (y). Rn 2 Rn 2

268

5. DUALITY THEORY

Let ˜ ψ) ˜ =− Θ(φ,

Z

˜ φ(x)dµ (x) − +

Rn

Z

− ˜ ψ(y)dµ (y).

Rn

Let ψ˜∗ (x) = inf xy − ψ(y), y

˜ satisfies be the Legendre transform of ψ. The pair (ψ˜∗ , ψ) ˜ ψ) ˜ ≤ Θ(ψ˜∗ , ψ). ˜ Θ(φ, ˜ and replacing ψ(y) ˜ Applying a similar reasoning to the pair (ψ˜∗ , ψ), by ψ˜∗∗ (y) = inf xy − ψ(x) x

we obtain ˜ ≤ Θ(ψ˜∗ , ψ˜∗∗ ). Θ(ψ˜∗ , ψ) Therefore, the dual of the Monge-Kantorowich problem is equivalent to minimizing Z Z ∗ + ˜ ψ (x)dµ (x) + ψ˜∗∗ (y)dµ− (y) Rn

Rn

over convex conjugate functions satisfying ψ˜∗ (x) + ψ˜∗∗ (y) ≥ xy.

Bibliography [AKN97] V. I. Arnold, V. V. Kozlov, and A. I. Neishtadt. Mathematical aspects of classical and celestial mechanics. Springer-Verlag, Berlin, 1997. Translated from the 1985 Russian original by A. Iacob, Reprint of the original English edition from the series Encyclopaedia of Mathematical Sciences [Dynamical systems. III, Encyclopaedia Math. Sci., 3, Springer, Berlin, 1993; MR 95d:58043a]. [Arn66] V. Arnold. Sur la g´eom´etrie diff´erentielle des groupes de Lie de dimension infinie et ses applications `a l’hydrodynamique des fluides parfaits. Ann. Inst. Fourier (Grenoble), 16(fasc. 1):319–361, 1966. [Bar94] Guy Barles. Solutions de viscosit´e des ´equations de Hamilton-Jacobi. Springer-Verlag, Paris, 1994. [BCD97] Martino Bardi and Italo Capuzzo-Dolcetta. Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Systems & Control: Foundations & Applications. Birkh¨auser Boston Inc., Boston, MA, 1997. With appendices by Maurizio Falcone and Pierpaolo Soravia. [EG92] L. C. Evans and R. F. Gariepy. Measure theory and fine properties of functions. CRC Press, Boca Raton, FL, 1992. [EG01] L. C. Evans and D. Gomes. Effective Hamiltonians and averaging for Hamiltonian dynamics. I. Arch. Ration. Mech. Anal., 157(1):1–33, 2001. [Eva98a] L. C. Evans. Partial differential equations. American Mathematical Society, Providence, RI, 1998. [Eva98b] Lawrence C. Evans. Partial differential equations, volume 19 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 1998. [Eva99] Lawrence C. Evans. Partial differential equations and MongeKantorovich mass transfer. In Current developments in mathematics, 1997 (Cambridge, MA), pages 65–126. Int. Press, Boston, MA, 1999. [Fra02] Joel N. Franklin. Methods of mathematical economics, volume 37 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002. Linear and nonlinear programming, fixed-point theorems, Reprint of the 1980 original.

269

270

[FS93]

BIBLIOGRAPHY

Wendell H. Fleming and H. Mete Soner. Controlled Markov processes and viscosity solutions, volume 25 of Applications of Mathematics. SpringerVerlag, New York, 1993. [Gia83] Mariano Giaquinta. Multiple integrals in the calculus of variations and nonlinear elliptic systems, volume 105 of Annals of Mathematics Studies. Princeton University Press, Princeton, NJ, 1983. [Gia93] Mariano Giaquinta. Introduction to regularity theory for nonlinear elliptic systems. Lectures in Mathematics ETH Z¨ urich. Birkh¨auser Verlag, Basel, 1993. [Gol80] H. Goldstein. Classical mechanics. Addison-Wesley Publishing Co., Reading, Mass., second edition, 1980. Addison-Wesley Series in Physics. [Gom00] D. Gomes. Viscosity Solutions and Asymptotics for Hamiltonian Systems, Ph. D. Thesis. University of California at Berkeley, 2000. [Gom02a] D. Gomes. A stochastic analogue of Aubry-Mather theory. Nonlinearity, 15(3):581–603, 2002. [Gom02b] Diogo Aguiar Gomes. A stochastic analogue of Aubry-Mather theory. Nonlinearity, 15(3):581–603, 2002. [GSS08] D. Gomes, A. Sernadas, and C. Sernadas. Foundations and applications of linear optimization. preprint, 2008. [GT01] David Gilbarg and Neil S. Trudinger. Elliptic partial differential equations of second order. Classics in Mathematics. Springer-Verlag, Berlin, 2001. Reprint of the 1998 edition. [Lio82] Pierre-Louis Lions. Generalized solutions of Hamilton-Jacobi equations. Pitman (Advanced Publishing Program), Boston, Mass., 1982. [LL76] L. D. Landau and E. M. Lifshitz. Course of theoretical physics. Vol. 1. Pergamon Press, Oxford, third edition, 1976. Mechanics, Translated from the Russian by J. B. Skyes and J. S. Bell. [Mat91] J. N. Mather. Action minimizing invariant measures for positive definite Lagrangian systems. Math. Z., 207(2):169–207, 1991. [Mn96] Ricardo Ma˜ n´e. Generic properties and problems of minimizing measures of Lagrangian systems. Nonlinearity, 9(2):273–310, 1996. [Oli98] Waldyr Oliva. Geometric Mechanics. IST - Lecture Notes, Lisbon, 1998. [Vil] C. Villani. Optimal transportation, dissipative pdes and functional inequalities. [Vil03a] C´edric Villani. Topics in optimal transportation, volume 58 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2003. [Vil03b] C´edric Villani. Topics in optimal transportation, volume 58 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2003.

Index

Campanato space, 172 canonical transformation, 83 Christoffel symbol, 61 coercivity in Rn , 11 condition Legendre-Hadamard, 134 conjugate point, 97 connection compatible with the metric, 66 Levi-Civita, 66 symmetric, 65 convex, 13 strictly, 13 critical point, 14, 112 critical point of the action, 51 curvature sectional, 101 curvature tensor, 99

generating function, 84 Harnack inequality, 166 invariant Poincar´e-Cartan, 82 Karush-Kuhn-Tucker (KKT) conditions, 38 Legendre transform, 76 Lemma John-Nirenberg, 164 lower semicontinuity, 12 minimax principle, 28 minimizing sequence, 10 Morrey space, 172 Palais-Smale condition, 113 Parallel transport, 64 Poisson manifold, 121 problem Monge-Kantorowich, 236

derivative covariant, 65 Dynamic programming principle, 188

quasiconvex, 140

equation Poisson, 130 Euler-Lagrange, 51 Monge-Amp´ere, 241 equations Hamilton, 81 Euler-Lagrange equation, 129

regular point, 212 semiconcave, 200 semiconvex, 200 subdifferential, 23, 197 subsolution, 159 271

272

superdifferential, 197 supersolution, 159 symplectic manifold, 121 Theorem DeGiorgi-Nash-Moser, 166 Fenchel-Legendre-Rockafellar, 242 Lax Milgram, 150 torsion, 65 viscosity solution, 216 viscosity supersolution/subsolution, 215 weakly lower semicontinuity, 138

INDEX