calc.of var. notes
calc.of var. notes
Peter J. Olver
School of Mathematics
University of Minnesota
Minneapolis, MN 55455
[email protected]
http://www.math.umn.edu/∼olver
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Examples of Variational Problems . . . . . . . . . . . . . . . . . . 3
Minimal Curves, Optics, and Geodesics . . . . . . . . . . . . . . . 4
Minimal Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . 7
Minimum Energy Principles . . . . . . . . . . . . . . . . . . . . 9
Isoperimetric Problems and Constraints . . . . . . . . . . . . . . . 10
3. The Euler–Lagrange Equation . . . . . . . . . . . . . . . . . . . 11
The First Variation . . . . . . . . . . . . . . . . . . . . . . . . 12
Curves of Shortest Length — Planar Geodesics . . . . . . . . . . . . 15
The Brachistochrone Problem . . . . . . . . . . . . . . . . . . . 16
Minimal Surface of Revolution . . . . . . . . . . . . . . . . . . . 20
The Fundamental Lemma . . . . . . . . . . . . . . . . . . . . . 22
A Cautionary Example . . . . . . . . . . . . . . . . . . . . . . 23
4. Boundary Conditions and Null Lagrangians . . . . . . . . . . . . . 25
Natural Boundary Conditions . . . . . . . . . . . . . . . . . . . 25
Null Lagrangians . . . . . . . . . . . . . . . . . . . . . . . . . 29
General Boundary Conditions . . . . . . . . . . . . . . . . . . . 30
Problems with Variable Endpoints . . . . . . . . . . . . . . . . . 34
5. Variational Problems Involving Several Unknowns . . . . . . . . . . 37
Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . 39
Parametric Variational Problems . . . . . . . . . . . . . . . . . . 40
6. Second Order Variational Problems . . . . . . . . . . . . . . . . . 43
where we abbreviate u′ = du/dx. The function u(x) is required to satisfy the boundary
conditions
u(a) = α, u(b) = β, (2.4)
in order that its graph pass through the two prescribed points (2.1). The minimal curve
problem asks us to find the function y = u(x) that minimizes the arc length functional
(2.3) among all “reasonable” functions satisfying the prescribed boundary conditions. The
reader might pause to meditate on whether it is analytically obvious that the affine function
†
We assume that a 6= b, i.e., the points a, b do not lie on a common vertical line.
†
Assuming time = money!
to the center line; and spiral helices, the latter illustrated in Figure 2. Similarly, the
geodesics on a sphere are arcs of great circles. In aeronautics, to minimize distance flown,
airplanes follow such geodesic paths around the globe, even though they look longer when
illustrated by projection onto a flat planar map. This example is particularly important.
Starting at a given point on the sphere, as one passes through the antipodal point, the
minimizing geodesic switches from one arc of the great circle to the arc on the opposite
side. At the antipodal point itself, there are infinitely many minimizing geodesics, namely,
all the great semicircles with these as their endpoints. The antipodal point is an example
of a “conjugate point”, which will be of importance when we study the second derivative
test in Section 8. Again, while these facts may sound eminently reasonable, an air-tight
justification of the correctness of our geometric intuition is required. Furthermore, the
geometrical characterization of geodesics on other surfaces is far less intuitively evident.
In order to mathematically formulate the geodesic minimization problem, we suppose,
for simplicity, that our surface S ⊂ R 3 is realized as the graph† of a function z = F (x, y).
We seek the geodesic curve C ⊂ S that joins the given points
a = (a, α, F (a, α)), and b = (b, β, F (b, β)), lying on the surface S.
where the last equation ensures that it lies in the surface S. In particular, this requires
a 6= b. The length of the curve is supplied by the standard three-dimensional arc length
†
Cylinders are not graphs, but can be placed within this framework by passing to cylindrical
coordinates. Similarly, spherical surfaces are best treated in spherical coordinates. One can extend
these constructions to general parametrized surfaces; see below.
Minimal Surfaces
The minimal surface problem is a natural generalization of the minimal curve or geodesic
problem. In its simplest manifestation, we are given a simple closed space curve C ⊂ R 3 .
The problem is to find the surface of least total area among all those whose boundary is
the curve C. Thus, we seek to minimize the surface area integral
ZZ
area S = dS (2.9)
S
3
over all possible surfaces S ⊂ R that have the prescribed boundary curve ∂S = C. Such
an area–minimizing surface is known as a minimal surface for short. For example, if C is
a closed plane curve, e.g., a circle, then the minimal surface will just be the planar region
it encloses. But, if the curve C twists into the third dimension, then the shape of the
minimizing surface is by no means evident.
Ω Γ
Physically, if we bend a wire in the shape of the curve C and then dip it into soapy
water, the surface tension forces in the resulting soap film will cause it to minimize surface
area, and hence take the form of a minimal surface† ; see Figure 3. Soap films and bubbles
have been the source of much fascination, physical, æsthetical and mathematical, over the
centuries, [27]. The minimal surface problem is also known as Plateau’s Problem, named
after the nineteenth century French physicist Joseph Plateau who conducted systematic
experiments on such soap films. A completely satisfactory mathematical solution to even
the simplest version of the minimal surface problem was only achieved in the mid twentieth
century, [38, 39]. Minimal surfaces and related variational problems remain an active area
of contemporary research, and are of importance in engineering design, architecture, and
biology, including foams, domes, cell membranes, and so on.
Let us mathematically formulate the search for a minimal surface as a problem in the
calculus of variations. For simplicity, we shall assume that the bounding curve C projects
down to a simple closed curve Γ that bounds an open domain Ω ⊂ R 2 in the (x, y) plane, as
in Figure 4. The space curve C ⊂ R 3 is then given by z = g(x, y) for (x, y) ∈ Γ = ∂Ω. For
“reasonable” boundary curves C, we expect that the minimal surface S will be described as
the graph of a function z = u(x, y) parametrized by (x, y) ∈ Ω. According to multivariable
calculus, the surface area of such a graph is given by the double integral
ZZ s 2 2
∂u ∂u
J[ u ] = 1+ + dx dy. (2.10)
Ω ∂x ∂y
To find the minimal surface, then, we seek the function z = u(x, y) that minimizes the
surface area integral (2.10) when subject to the boundary conditions
u(x, y) = g(x, y) for (x, y) ∈ ∂Ω, (2.11)
†
More accurately, the soap film will realize a local but not necessarily global minimum for
the surface area functional. Non-uniqueness of local minimizers can be realized in the physical
experiment — the same wire may support more than one stable soap film.
We seek a minimizer of this integral among all non-negative functions u(x) that satisfy the
fixed boundary conditions u(a) = α, u(b) = β. The minimal surface of revolution can be
physically realized by stretching a soap film between two circular wires, of respective radius
α and β, that are held a distance b − a apart. Symmetry considerations will require the
minimizing surface to be rotationally symmetric. Interestingly, the revolutionary surface
area functional (2.12) is exactly the same as the optical functional (2.5) when the light
speed at a point is inversely proportional to its distance from the horizontal axis: v(x, y) =
1/(2 π y).
where T is the tension and f (x) the load at a point x. (For simplicity, we only consider ver-
tical displacements.) For small defections, one can replace (2.13) by the simpler quadratic
functional Z b
1 ′2
Q[ u ] = 2 T u + f (x) u dx. (2.14)
a
and thus we seek to maximize the latter integral subject to the arc length constraint (2.15).
We also impose periodic boundary conditions
†
A curve is simple if it does not cross itself.
‡
The sign of the line integral depends upon the orientation of the curve.
In the previous formulation (2.15), the arc length constraint was imposed at every point,
whereas here it is manifested as an integral constraint.
Both types of constraints, pointwise and integral, appear in a wide range of applied and
geometrical problems. Such constrained variational problems can profitably be viewed as
function space versions of constrained optimization problems. Thus, not surprisingly, their
analytical solution will require the introduction of suitable Lagrange multipliers, cf. [4, 35];
see Section 7.
†
At the endpoints a, b one uses the appropriate one-sided derivative.
Here ϕ(x) is a function that prescribes the “direction” in which the derivative is computed.
Classically, ϕ is known as a variation in the function u, sometimes written ϕ = δu, whence
the term “calculus of variations”. Similarly, the gradient operator on functionals is often
referred to as the variational derivative, and often written δJ. The inner product used in
(3.3) is usually taken (again for simplicity) to be the standard L2 inner product
Z b
hf ,gi = f (x) g(x) dx (3.4)
a
on function space. Indeed, while the formula for the gradient will depend upon the under-
lying inner product, the characterization of critical points does not, and so the underlying
choice of inner product is not of significance here.
Now, starting with (3.1), for each fixed u and ϕ, we must compute the derivative of the
function Z b
h(ε) = J[ u + ε ϕ ] = L(x, u + ε ϕ, u′ + ε ϕ′ ) dx. (3.5)
a
Smoothness of the integrand allows us to bring the derivative inside the integral and so,
by the chain rule,
Z b
′ d d
h (ε) = J[ u + ε ϕ ] = L(x, u + ε ϕ, u′ + ε ϕ′ ) dx
dε a dε
Z b
∂L ′ ′ ′ ∂L ′ ′
= ϕ (x, u + ε ϕ, u + ε ϕ ) + ϕ (x, u + ε ϕ, u + ε ϕ ) dx.
a ∂u ∂p
between some function h(x) = ∇J[ u ] and the variation ϕ. The first summand has this
form, but the derivative ϕ′ appearing in the second summand is problematic. However,
one can move derivatives around inside an integral through integration by parts. If we set
∂L
r(x) ≡ x, u(x), u′(x) , (3.7)
∂p
we can rewrite the offending term as
Z b Z b
′
r(x) ϕ (x) dx = r(b) ϕ(b) − r(a) ϕ(a) − r ′ (x) ϕ(x) dx, (3.8)
a a
where, again by the chain rule,
′ d ∂L ′ ∂ 2L ∂2L ∂ 2L
r (x) = (x, u, u ) = (x, u, u′ ) + u′ (x, u, u′ ) + u′′ 2
(x, u, u′ ) .
dx ∂p ∂x ∂p ∂u ∂p ∂p
(3.9)
So far we have not imposed any conditions on our variation ϕ(x). We are only comparing
the values of J[ u ] among functions that satisfy the imposed boundary conditions (3.2).
Therefore, we must make sure that the varied function
uε (x) = u(x) + ε ϕ(x)
remains within this set of functions, and so
uε (a) = u(a) + ε ϕ(a) = α, uε (b) = u(b) + ε ϕ(b) = β,
for all ε. For this to hold, the variation ϕ(x) must satisfy the corresponding homogeneous
boundary conditions
ϕ(a) = 0, ϕ(b) = 0. (3.10)
As a result, both boundary terms in our integration by parts formula (3.8) vanish, and we
can write (3.6) as
Z b Z b
∂L ′ d ∂L ′
h ∇J[ u ] , ϕ i = ∇J[ u ] ϕ dx = ϕ (x, u, u ) − (x, u, u ) dx. (3.11)
a a ∂u dx ∂p
∂L ∂ 2L ∂ 2L ∂ 2L
(x, u, u′ ) − (x, u, u′ ) − u′ (x, u, u′ ) − u′′ (x, u, u′ ) = 0, (3.14)
∂u ∂x ∂p ∂u ∂p ∂p2
known as the Euler–Lagrange equation associated with the variational problem (3.1), in
honor of two of the most important contributors to the subject: Leonhard Euler and
Joseph–Louis Lagrange. Any solution to the Euler–Lagrange equation that is subject to
the assumed boundary conditions forms a critical point for the functional, and hence is a
potential candidate for the desired minimizing function. And, in many cases, the Euler–
Lagrange equation suffices to characterize the minimizer without further ado.
The right hand side of (3.12) or, equivalently, the left hand side of the Euler–Lagrange
equation (3.14), is often referred to as the Euler–Lagrange expression associated with the
Lagrangian L, and written
∂L ′ d ∂L ′
E(L) = (x, u, u ) − (x, u, u ) , (3.15)
∂u dx ∂p
and the associated Euler–Lagrange equation is simply E(L) = 0.
†
See Lemma 3.4 and the ensuing discussion for a complete justification of this step.
u(a) = α, u(b) = β.
Since
∂L ∂L p
= 0, =p , (3.17)
∂u ∂p 1 + p2
the Euler–Lagrange equation (3.13) in this case takes the form
d u′ u′′
0 = − √ = − .
dx 1 + u′ 2 (1 + u′ 2 )3/2
Since the denominator does not vanish, this is the same as the simplest second order
ordinary differential equation
u′′ = 0. (3.18)
We deduce that the solutions to the Euler–Lagrange equation are all affine functions,
u = c x + d, whose graphs are straight lines. Since our solution must also satisfy the
boundary conditions, the only critical function — and hence the sole candidate for a
minimizer — is the straight line
β−α
y= (x − a) + α (3.19)
b−a
passing through the two prescribed points. Thus, the Euler–Lagrange equation helps to
reconfirm our intuition that straight lines minimize distance.
Be that as it may, the fact that a function satisfies the Euler–Lagrange equation and
the boundary conditions merely confirms its status as a critical function, and does not
guarantee that it is the minimizer. Indeed, any critical function is also a candidate for
maximizing the variational problem, too. The nature of a critical function will be elu-
cidated by the variational form of the second derivative test, and requires some further
work. Of course, for the minimum distance problem, we “know” that a straight line cannot
maximize distance, and must be the minimizer. Nevertheless, the reader should have a
small nagging doubt that we may not have completely solved the problem at hand . . .
The most famous classical variational principle is the so-called brachistochrone problem.
The compound Greek word “brachistochrone” means “minimal time”. An experimenter
lets a bead slide down a wire that connects two fixed points under the influence of gravity.
The goal is to shape the wire in such a way that, starting from rest, the bead slides
from one end to the other in minimal time. Naı̈ve guesses for the wire’s optimal shape,
including a straight line, a parabola, a circular arc, or even a catenary, are not optimal,
and one can do better through a careful analysis of the associated variational problem.
The brachistochrone problem was originally posed by the Swiss mathematician Johann
Bernoulli in 1696, and solutions were then found by his brother Jakob Bernoulli, as well as
Newton, Leibniz, von Tschirnhaus and de l’Hôpital. It served as an inspiration for much
of the subsequent development of the subject.
We take, without loss of generality, the starting point of the bead to be at the origin:
a = (0, 0). The wire will bend downwards, and so, to avoid distracting minus signs in the
subsequent formulae, we take the vertical y axis to point downwards. The shape of the
wire will be given by the graph of a function y = u(x) ≥ 0. The end point b = (b, β) is
assumed to lie below and to the right, and so b > 0 and β > 0. The physical set-up is
sketched in Figure 5. Alternatively, the brachistochrone can be viewed as the problem of
designing the optimally shaped playground slide that enables a child to get to the bottom
point the fastest.
To mathematically formulate the problem, the first step is to find the formula for the
transit time of the bead sliding along the wire. Arguing as in our derivation of the optics
functional (2.5), if v(x) denotes the instantaneous speed of descent of the bead when it
for some c ∈ R, whose value can depend upon the solution u(x).
which vanishes as a consequence of the Euler–Lagrange equation (3.13). This implies that
the Hamiltonian function is constant, thereby establishing (3.28). Q.E.D.
Remark : The Hamiltonian function is named after the Irish mathematician William
Rowan Hamilton, and plays a critical role in subsequent developments. Theorem 3.2 is a
special case of Emmy Noether’s powerful Theorem, [42; Chapter 4], that relates symme-
tries of variational problems — in this case translations in the x coordinate — with first
integrals, a.k.a. conservation laws.
Equation (3.28) has the form of an implicitly defined first order ordinary differential
equation which can, in fact, be integrated. Indeed, solving for
u′ = h(u, c) (3.29)
produces an autonomous first order differential equation, whose general solution can be
obtained by integration:
Z
du
= x + δ, (3.30)
h(u, c)
∂L 1
H(u, p) = p −L=−p .
∂p u (1 + p2 )
1
H(x, u, u′ ) = − p = c, which we rewrite as u (1 + u′ 2 ) = k,
u (1 + u′ 2 )
where k = 1/c2 is a constant. Solving for the derivative u′ results in the first order
autonomous ordinary differential equation†
r
du k−u
= .
dx u
This equation can be explicitly solved by separation of variables, and so
Z r
u
du = x + δ
k−u
for some constant δ. The left hand integration relies on the trigonometric substitution
1
u= 2
k (1 − cos θ),
whereby
Z r Z
k 1 − cos θ k 1
x+δ = sin θ dθ = (1 − cos θ) dθ = 2
k (θ − sin θ).
2 1 + cos θ 2
The left hand boundary condition u(0) = 0 implies δ = 0, and so the solution to the
Euler–Lagrange equation are curves parametrized by
x = r (θ − sin θ), u = r (1 − cos θ). (3.31)
With a little more work, it can be proved that the parameter r = 12 k is uniquely prescribed
by the right hand boundary condition, and moreover, the resulting curve supplies the global
minimizer of the brachistochrone functional, [21].
The parametrized curve (3.31) is known as a cycloid , which can be visualized as the
curve traced by a point sitting on the edge of a rolling wheel of radius r, as plotted in
Figure 6. (Checking this assertion is a useful exercise.) Interestingly, in certain configura-
tions, namely if β < 2 b/π, the cycloid that solves the brachistochrone problem dips below
the right hand endpoint b = (b, β), and so the bead is moving upwards when it reaches
the end of the wire, as in Figure 5. Also note that the cycloid (3.31) has a vertical tangent
at the initial point, so u′ (0) = ∞ and the solution is not smooth there. This arises due to
the singularity of the brachistochrone Lagrangian (3.24) when u = 0, or, equivalently, the
fact that u = 0 is a singular point of the Euler–Lagrange equation (3.26).
†
Technically, there is a ± sign in front of the square root, but it is not hard to see that this
ambiguity does not affect the final formulas.
†
The square root is real since, by (3.34), | u | ≤ | c |. Also, as in the brachistochrone, the ± sign
ambiguity in the square root does not affect the final formulas.
a b
In this manner, we have produced the general solution to the Euler–Lagrange equation
(3.33). Any solution that also satisfies the boundary conditions provides a critical function
for the surface area functional (3.32), and hence is a candidate for the minimizer. The
curve prescribed by the graph of a hyperbolic cosine function (3.35) is known as a catenary
from the Latin for “chain” since it is also the profile assumed by a hanging chain. It is
not a parabola, even though to the untrained eye it looks quite similar. Owing to their
minimization properties, catenaries are quite common in engineering design — for instance,
the arch in St. Louis is an inverted catenary.
So far, we have not taken into account the boundary conditions. It turns out that there
are three distinct possibilities, depending upon the configuration of the boundary points:
• There is precisely one value of the two integration constants c, δ that satisfies the two
boundary conditions.
• There are two different possible values of c, δ that satisfy the boundary conditions.
• There are no values of c, δ that allow (3.35) to satisfy the two boundary conditions.
This occurs when the two boundary points a, b are relatively far apart.
In the third configuration, the physical soap film spanning the two circular wires breaks
apart into two circular disks, and this defines the minimizer for the problem; there is
no surface of revolution that has a smaller surface area than the two disks. However,
the “function”† that minimizes this configuration consists of two vertical lines from the
boundary points to the x axis, along with the line segment on the x axis connecting them.
More precisely, we can approximate this function by a sequence of genuine functions that
give progressively smaller and smaller values to the surface area functional (2.12), but the
actual minimum is not attained among the class of (smooth) functions. Figure 7 illustrates
the case when there are two catenaries, with the disk represented by the red polygon. The
†
Here “function” must be taken in a very broad sense, as this one does not even correspond to
a generalized function!
One can replace the continuous function (3.36) by the smooth bump function
δ exp 1 − δ −2 (x − x0 )2 − 1 −2 , | x − x0 | < δ,
ϕ(x) = (3.37)
0, otherwise,
which, in addition to being positive on the required interval† , is C∞ everywhere, includ-
ing at x = x0 ± δ, where all its derivatives vanish. The support of the bump function
(3.37), meaning theclosure of the set where it does not vanish, is the compact (closed and
bounded) interval | x − x0 | ≤ δ . This observation produces a useful strengthening of
Lemma 3.3:
Z b
Lemma 3.4. If f (x) is continuous on [ a, b ], and f (x) ϕ(x) dx = 0 for every C∞
a
function ϕ(x) with compact support in ( a, b ), then f (x) ≡ 0 for all a ≤ x ≤ b.
Note that the compact support condition imposed on ϕ implies that ϕ(a) = ϕ(b) = 0,
thus justifying our derivation of the Euler–Lagrange equation (3.13).
A Cautionary Example
While the Euler–Lagrange equation is fundamental in the calculus of variations, it does
have limitations and analysis based solely thereon may miss important features, as we
will gradually learn. One complicating factor is that convergence in infinite-dimensional
function space is considerably more subtle than in finite-dimensional Euclidean space. So
let us conclude this section with one of a large variety of cautionary examples.
Consider the problem of minimizing the integral
Z 1
1 ′2 2 1 2
J[ u ] = 2
(u − 1) + 2
u dx (3.38)
0
†
Its mathematically correct graph in Figure 8 could be open to misinterpretation; between x0 −δ
and x0 + δ, it is indeed positive, but extremely close to the axis.
Thus, as n → ∞, the value of J[ un ] → 0. On the other hand, the functions (3.39) converge
uniformly to the zero function: un (x) → u∗ (x) ≡ 0, and J[ u∗ ] = 12 . In other words,
h i
lim J[ un ] = 0 6= 21 = J lim un , (3.40)
n→∞ n→∞
†
It may provide a local minimum — or maximum — but to ascertain this, one would need to
invoke the second derivative test.
Our goal is to minimize J[ u ] among all sufficiently smooth function u(x), defined for
a ≤ x ≤ b, but now only subject to a single fixed boundary condition, namely
u(a) = α, (4.2)
0 = h ∇J[ u ] , ϕ i
Z b
∂L ∂L d ∂L (4.3)
= ϕ(b) b, u(b), u′(b) + ϕ (x, u, u′ ) − (x, u, u′ ) dx.
∂p a ∂u dx ∂p
Thus, we can no longer identify the gradient of the functional J[ u ] as the function given by
the Euler–Lagrange expression, since there is an additional boundary component. Instead,
let us work with (4.3) directly. Keep in mind that we have not imposed any conditions on
ϕ(b). However, if we set ϕ(b) = 0, then (4.3) reduces to the previous integral (4.3), whose
vanishing, by the same argument based on Lemma 3.4, implies that u(x) must continue to
satisfy the Euler–Lagrange equation (3.13), whereby the integral term in (4.3) vanishes.
Thus for the entire right hand side to vanish for all such ϕ(x), the term multiplying ϕ(b)
must also vanish, and we conclude that the minimizing function must also satisfy
∂L
b, u(b), u′(b) = 0. (4.4)
∂p
This is known as a natural boundary condition, which imposes a constraint on the minimizer
at the free boundary. We conclude that any minimizer for the boundary value problem
that has only one endpoint fixed must be a solution to the Euler–Lagrange equation (3.13)
supplemented by the two boundary conditions (4.2) and (4.4) — one fixed and the other
natural. Thus, the determination of the critical functions continues to require the solution
to a two point boundary value problem for the second order Euler–Lagrange ordinary
differential equation. If, instead of being fixed, the value of the minimizer at the other
endpoint is also left free, then the same argument implies that the minimizer must also
satisfy a natural boundary condition there:
∂L
a, u(a), u′(a) = 0. (4.5)
∂p
Thus, any minimizer of the free variational problem must solve the Euler–Lagrange equa-
tion along with the two natural boundary conditions (4.4–5). As always, the above are just
necessary conditions for minimizers. Maximizers, if such exist, must also satisfy the same
conditions, which serve to characterize the critical functions of the variational principle
subject to the given boundary constraints. By this line of reasoning, we are led to the
following conclusion.
At each endpoint of the interval, any critical function of a functional, including mini-
mizers and maximizers, must satisfy either homogeneous or inhomogeneous fixed boundary
conditions, or homogeneous natural boundary conditions.
Example 4.1. Let us apply this method to the problem is to find the shortest path
between a point and a straight line. By applying a rigid motion (translation and rotation),
we can take the point to be a = (a, α) and the line to be in the vertical direction, namely
ℓb = { x = b }. Assuming the solution is given by the graph of a function y = u(x), the
length is given by the same arc length functional (3.16), but now there is only a single
fixed boundary condition, namely u(a) = α. To determine the second boundary condition
at x = b, we use (4.4). In view of (3.17), this requires
u′ (b)
p = 0, or, simply, u′ (b) = 0.
1 + u′ (b)2
This means that any critical function u(x) must have horizontal tangent at the point x = b,
or, equivalently, it must be perpendicular to the vertical line ℓb .
The Euler–Lagrange equation is the same as before, reducing to u′′ = 0, and hence the
critical functions are affine: u(x) = c x + d. Substituting into the two boundary conditions
u(a) = α, u′ (b) = 0, we conclude that the minimizer must be a horizontal straight line:
u(x) ≡ α for all a ≤ x ≤ b, reconfirming our earlier observation. As before, this is not a
complete proof that the horizontal straight line is a minimizer — based on what we know
so far, it could be a maximizer — and resolving its status requires further analysis based
on the second variation.
Let us investigate what happens if we try to impose a non-natural, non-fixed boundary
condition at the endpoint — for example, the Robin condition
where β 6= 0 and γ are constants. The solution to the Euler–Lagrange equation satisfying
u(a) = α and the Robin condition (4.6) is easily calculated:
(α β + γ) x + α(1 − β b) − γ a
u(x) = (4.7)
1 − β (b − a)
provided the denominator does not vanish. However, unless α β + γ = 0, so the graph of
u is a horizontal line, this function does not provide a minimum to arc length functional
among functions subject to the prescribed boundary conditions. Indeed, one can construct
functions that satisfy the Robin boundary condition (4.6) whose arc length is arbitrarily
close to that of the horizontal line, which is b − a. For example, we can slightly perturb
a b−ε b
as ε → 0. One can even smooth off its corner, which has the effect of slightly decreasing
its total arc length and thus does not affect the convergence as ε → 0. Thus, while the
Robin boundary value problem has no minimizing solution, it does admit smooth functions
that come arbitrarily close to the minimum possible value, which is b − a. This example
exemplifies a common type of behavior for variational principles that are bounded from
below, but have no bona fide function that achieves the minimum value.
Example 4.2. Let us return to the brachistochrone problem, but now let us seek the
shape of the wire that enable the bead to get from the point a = (0, 0) to a point on the
line ℓb = { x = b } where b > 0 in the shortest time. In this case, we no longer specify which
vertical position on the line the bead ends up at, and so x = b is a free end. Thus, we
continue to minimize the same functional (3.22) subject to the single boundary condition
u(0) = 0. The minimizer must satisfy the natural boundary condition at x = b. Since the
Lagrangian is
s
1 + p2 ∂L p
L(x, u, p) = with =p ,
2g u ∂p 2 g u(1 + p2 )
the natural boundary condition (4.4) is simply u′ (b) = 0. We conclude that the solution
is provided by the cycloid (3.31) that starts out at the origin and has horizontal tangent
at the point x = b. Note that, in view of (3.31), setting
du sin θ
= =0 implies θ = π,
dx 1 − cos θ
Null Lagrangians
Since the Euler–Lagrange expression can be identified with the gradient of the associ-
ated functional, null Lagrangians can be viewed as the calculus of variations equivalent
of constant functions, in that a function defined on a connected subset of R n has zero
gradient if and only if it is constant, [4, 35].
∂N d ∂N d
E(N ) = − = 2x − (x2 ) ≡ 0,
∂u dx ∂p dx
we conclude that N is a null Lagrangian.
Theorem 4.5. A function N (x, u, p) defined for all † (x, u, p) ∈ R 3 is a null Lagrangian
if and only if it is a total derivative, meaning
d ∂S ∂S
N (x, u, u′ ) = S(x, u) = + u′ (4.8)
dx ∂x ∂u
for some function S that depends only on x, u.
†
More generally, N can be defined on a subset of R3 with trivial topology.
∂ 2N
= 0, which implies N (x, u, p) = f (x, u) p + g(x, u)
∂p2
for some functions f, g. Substituting the latter expression back into (3.14) yields
∂f ∂g ∂f ∂f ∂g ∂f
u′ + − − u′ = − = 0.
∂u ∂u ∂x ∂u ∂u ∂x
Since we are assuming N , and hence f, g are defined for all x, u, a standard result in mul-
∂S ∂S
tivariable calculus, [35, 43], implies f = ,g= , for some function S(x, u). Q.E.D.
∂u ∂x
Remark : Theorem 4.5 is a special case of a general theorem characterizing higher order
and multidimensional null Lagrangians; see [42; Theorem 4.7].
With Theorem 4.5 in hand, we can apply the Fundamental Theorem of Calculus to write
the functional associated with a null Lagrangian in the following form:
Z b Z b
′ d
K[ u ] = N (x, u, u ) dx = S(x, u) dx = S b, u(b) − S a, u(a) . (4.9)
a a dx
In other words, the value of a functional associated with a null Lagrangian depends only
on the values of the function u(x) at the endpoints of the interval. We conclude that the
integral defined by a null Lagrangian is path independent, meaning, as with line integrals,
[4, 35], that its value is independent of the path (graph of the function) taken between its
endpoints. And this explains why every function solves the Euler–Lagrange equation and
is in fact a minimizer. It is because the value of the functional K[ u ] depends only upon
the values of u(x) at the endpoints a, b, and not on its behavior for a < x < b, and hence
achieves the same value, meaning that it is constant, whenever u(x) satisfies our usual
fixed boundary conditions u(a) = α, u(b) = β; hence, all such functions are minimizers
(and maximizers).
General Boundary Conditions
The flexibility afforded by null Lagrangians allows us to expand the range of boundary
conditions that can be handled by the calculus of variations. Namely, we can modify the
original variational problem by adding in a suitable null Lagrangian, which does not alter
the Euler–Lagrange equations but does change the associated natural boundary conditions.
In other words, if N = dS/dx is a null Lagrangian, then the modified objective functional
Z b Z b
d
e u] =
J[ ′
L(x, u, u ) + N (x, u, u ) dx = ′ ′
L(x, u, u ) + S(x, u) dx
a a dx
Z b
(4.10)
= S b, u(b) − S a, u(a) + L(x, u, u′ ) dx
a
Thus, while the basic arc length functional does not, in general, admit a minimizer that
satisfies the Robin boundary conditions, the modified arc length (4.21), which has the
same Euler–Lagrange equation, (usually) does.
Let us solve the Robin boundary value problem for the Euler–Lagrange equation, which,
as noted above, is merely u′′ = 0, the solutions of which are straight lines u = c x + d.
Substituting into the Robin boundary conditions (4.12) produces
c = β1 (c a + d) + γ1 = β2 (c b + d) + γ2 .
Thus, if
(b − a) β1 β2 + β2 − β1 6= 0,
the problem admits a unique solution, while if the left hand side is zero, then there is either
a one-parameter family of solutions that all give the same value to the modified variational
problem (even though they have differing arc lengths), or there is no solution, depending
on the values of γ1 , γ2 .
What about coupled boundary conditions, which relate the values of the minimizer and
its derivatives at the endpoints? The simplest are periodic boundary conditions
†
We avoid writing out the more complicated integral expression (4.10) involving the correspond-
ing null Lagrangian.
u(b) = α u(a),
where α is a nonzero constant, requires
∂L 1 ∂L
b, u(b), u′ (b) = a, u(a), u′(a)
∂p α ∂p
as its variationally admissible quasiperiodic counterpart. See [21, 48] for further details
and developments.
Problems with Variable Endpoints
So far, we have kept the endpoints a, b of our interval of integration fixed whilst varying
the functional. However, more general problems also allow us to vary the endpoints of the
interval. Here is a typical example.
Example 4.1 investigated the shortest path — the geodesic — between a point and
a vertical line. Let us generalize this problem to that of determining the shortest path
between a point (a, α) and a given plane curve C ⊂ R 2 , with (a, α) 6∈ C (as otherwise the
problem is trivial). The second endpoint (b, β) ∈ C of the minimizing geodesic is allowed to
be anywhere on the curve, and so the value of b is not fixed — unless the curve is a vertical
where E(L) is the usual Euler–Lagrange expression (3.15). On the other hand, the varia-
tions must also satisfy the curve constraint (4.27), and so
Applying the Fundamental Lemma 3.4 to the integral, we deduce that the critical func-
tions u must satisfy the usual Euler–Lagrange equation E(L) = 0, while vanishing of the
boundary term imposes the boundary condition
∂L
L b, u(b), u′ (b) + σ ′ (b) − u′ (b) b, u(b), u′ (b) = 0 (4.30)
∂p
in addition to (4.27). If, instead of being fixed, the other endpoint is required to lie on a
curve, say y = ρ(x), then the same variational calculation leads to the analogous boundary
condition
∂L
L a, u(a), u′(a) + ρ′ (a) − u′ (a) a, u(a), u′(a) = 0. (4.31)
∂p
Example 4.8. For the geodesic problem introduced above, the Lagrangian for the arc
p
length functional is L(x, u, p) = 1 + p2 . Thus, the boundary condition (4.30) becomes
p u′ (b)
1 + u′ (b)2 + σ ′ (b) − u′ (b) p = 0,
1 + u′ (b)2
which can be algebraically simplified and then solved for
which requires that neither derivative vanish at x = b. This boundary condition has
the geometric interpretation that the graph of the solution y = u(x) intersects the curve
y = σ(x) orthogonally, meaning that their tangent lines are perpendicular at the point of
intersection. In the present case, the solutions to the Euler–Lagrange equation are straight
lines, y = m (x − a) + α with slope m = u′ (b), and hence the shortest path connecting the
point (a, α) to the curve C is a line that is orthogonal to the curve. There may be more
than one of these, some being local minima, and hence the global minimizer will be the
one among these that has the shortest overall length; see the first plot in Figure 11, where
the middle line segment is the minimizer. Similarly, the shortest path between two curves
is a straight line segment that intersects each curve orthogonally. In the second plot in
Figure 11, the two curves are a circle and an ellipse, and the plotted straight lines yield one
global minimum, one local minimum (the longer horizontal segment), and two additional
critical line segments of equal length that are not local minima.
prescribed by the Lagrangian L(x, u, v, p, q) involving two unknown functions u(x), v(x).
We introduce simultaneous variations u(x) + ε ϕ(x), v(x) + ε ψ(x). Arguing as before, we
compute the derivative of the scalar function
Z b
h(ε) = J[ u + ε ϕ, v + ε ψ ] = L(x, u + ε ϕ, v + ε ψ, u′ + ε ϕ′ , v ′ + ε ψ ′ ) dx
a
and require
Z b
′ d
0 = h (0) = L(x, u + ε ϕ, v + ε ψ, u′ + ε ϕ′ , v ′ + ε ψ ′ ) dx
a dε ε=0
Z b (5.2)
∂L ′ ∂L ∂L ′ ∂L
= ϕ(x) + ϕ (x) + ψ(x) + ψ (x) dx.
a ∂u ∂p ∂v ∂q
Integrating the terms involving ϕ′ and ϕ′ by parts, we arrive at the variational condition
Z b
B(b) − B(a) + ϕ(x) Eu(L) + ψ(x) Ev (L) dx = 0.
a
Here
∂L d ∂L
Eu (L) = (x, u, v, u′ , v ′ ) − (x, u, v, u′ , v ′ ),
∂u dx ∂p
(5.3)
∂L d ∂L
Ev (L) = (x, u, v, u′ , v ′ ) − (x, u, v, u′ , v ′ ),
∂v dx ∂q
are the Euler–Lagrange expressions associated with each of the dependent variables, while
the variational boundary terms are obtained by evaluating the following function at the
0
where we use dots to indicate derivatives with respect to the parameter t. Here
c
n(x) = (5.7)
v(x)
is called the index of refraction, and is prescribed by the ratio between the speed of light
in a vacuum c and the speed of light in the medium at a point x ∈ R 3 . The minimizing
light ray is subject to the boundary conditions x(0) = A, x(T ) = B. (An analysis of the
general optics functional (5.6) can be found below.)
Remark : The index of refraction can also depend on the wavelength of the light, causing
white light to split into its color constituents upon refraction. This effect is commonly
observed in rainbows, prisms, soap films, and chromatic aberrations in lenses.
We take the cable to be parallel to the z axis, and use cylindrical coordinates x = r cos θ,
y = r cos θ, z, throughout. We assume that the index of refraction depends only on the
radial coordinate, n(r), and parametrize the light rays by the axial coordinate z, so that
in cylindrical coordinates the curve is given by r(z), θ(z), z for a ≤ z ≤ b. Transforming
(5.6) produces the objective functional
Z b p
J[ r, θ ] = n(r) 1 + r ′ 2 + (r θ ′ )2 dz, (5.8)
a
where the primes are used to indicate derivatives with respect to z. In other words, the
Lagrangian is p
L(r, θ, p, q) = n(r) 1 + p2 + r 2 q 2 ,
†
Alternatively one or the other function could satisfy a fixed boundary condition; this case is
left to the reader.
cusps as well as points where the curve is smooth but the parametrization is singular. We
may also want to assume that the curve is simple, although this requirement will not play
a role in our local analysis.
Proposition 5.2. Any first order parametric (parameter-independent) variational
problem for plane curves must take the form
Z b Z b
v u
e u, v, v dt,
J[ u, v ] = G u, v, u dt = G (5.18)
a u a v
so that the Lagrangian
e v, p/q),
L(t, u, v, p, q) = p G(u, v, q/p) = q G(u,
e v, w), are functions of three variables.
where G(u, v, w), G(u,
Remark : In particular, if the curve is a graph, parametrized by (x, y) = (t, v(t)), so that
u(t) ≡ t, then (5.18) reduces to our usual form for a first order non-parametric variational
problem:
Z b
Jb[ v ] = G(x, v, v ′ ) dx.
a
Vice versa, given any non-parametric problem, we can convert it into an equivalent para-
metric form (5.18) by replacing x 7→ u, v ′ 7→ v/u, dx →
7 u dt.
Proof : Let
Z b
J[ u, v ] = L(t, u, v, u, v) dt
a
†
We use dots to denote derivatives with respect to the parameter t, retaining primes to denote
derivatives with respect to x.
Let us write out the Euler–Lagrange equations (5.3, 5) for the parametric variational
problem (5.18). Using the first version involving G(u, v, w), and applying the product and
chain rules to differentiate, we find
∂L d ∂L ∂G d v ∂G ∂G v d ∂G
0 = Eu (L) = − =u + − G = −v + ,
∂u dt ∂p ∂u dt u ∂w ∂v u dt ∂w
(5.20)
∂L d ∂L ∂G d ∂G
0 = Ev (L) = − =u − ,
∂v dt ∂q ∂v dt ∂w
where G and its partial derivatives are evaluated at (u, v, v/u). We observe that the two
Euler–Lagrange equations are not independent; indeed, they are related by
u Eu (L) + v Ev (L) = 0. (5.21)
Example 5.3. Consider the optics functional (5.6) in three-dimensional space, with
Lagrangian p
L(x, y, z, p, q, r) = n(x, y, z) p2 + q 2 + r 2 , (5.22)
∂L d ∂L ∂n p d n(x, y, z) x
Ex (L) = − = (x, y, z) x2 + y 2 + z 2 − p = 0,
∂x dt ∂p ∂x dt x 2 + y 2 + z 2
and similarly for y and z. Replacing the t derivative by derivatives with respect to arc
length s, these can be combined into a single vector-valued equation,
d dx
n(x) = ∇n(x), (5.23)
ds ds
known as the ray equation. It can be viewed as the optical analogy of Newton’s laws of
mechanics, in which the right hand side plays the role of force and the left hand side can
be viewed as the rate of change of an “optical momentum”.
or, equivalently,
dx d dx
· ∇n(x) − n(x) = 0, (5.24)
ds ds ds
as can be verified directly. One can remove the degeneracy by using one of the coordinates
to parametrize the curve, e.g., reducing (5.6) to an objective functional depending on
y = y(x), z = z(x).
prescribed by the Lagrangian L(x, u, p, q), which, for later purposes, we assume to be at
least four times continuously differentiable.
As usual, we introduce the variation
Z b
h(ε) = J[ u + ε ϕ ] = L(x, u + ε ϕ, u′ + ε ϕ′ , u′′ + ε ϕ′′ ) dx.
a
where the final formula results from integrating the second and third terms in the preceding
integral by parts. Here
∂L ′ ′′ d ∂L ′ ′′ d2 ∂L ′ ′′
E(L) = (x, u, u , u ) − (x, u, u , u ) + 2 (x, u, u , u ) , (6.3)
∂u dx ∂p dx ∂q
is the associated Euler–Lagrange expression, which will, in general, depend on fourth order
derivatives of u, while the variational boundary terms are obtained by evaluating
u(a) = u(b), u′ (a) = u′ (b), u′′ (a) = u′′ (b), u′′′ (a) = u′′′ (b).
One can expand the range of variationally compatible boundary conditions by adding in
a null Lagrangian although, unlike first order problems, not all boundary conditions are
admissible. The full analysis of the various possibilities can be found in [48].
With this in hand, the general case of variational problems depending on higher order
derivatives of several unknowns will be clear; see [42] for details and general formulas, in-
cluding the case of several independent variables (multiple integrals) that will be presented
below.
Example 6.1. The simplest nontrivial second order variational problem is
Z b
1 ′′ 2
J[ u ] = 2u dx (6.6)
a
that represents the potential energy of a flexible beam under small deformations, [3, 49].
An example is a spline, which, in pre–CAD (computer aided design) draftsmanship, was
a long, thin, flexible strip of wood that was used to draw a smooth curve through pre-
scribed points. The points were marked by small pegs, and the spline rested on the pegs.
These inspired the modern mathematical theory of splines, which have become ubiquitous
in numerical analysis, in geometric modeling, in design and manufacturing, in computer
graphics and animation, and in many other applications.
u′′′′ = 0, (6.7)
whose solutions are cubic polynomials:
u(x) = a x3 + b x2 + c x + d. (6.8)
The coefficients are prescribed by the four imposed boundary conditions. Using (6.5), we
see that, in order to be variationally compatible, these must be of the following form,
designated by their physical interpretation:
(a) Fixed (clamped) end: u(a) = α, u′ (a) = β,
(b) Simply supported end: u(a) = α, u′′ (a) = 0,
(c) Sliding end: u′ (a) = β, u′′′ (a) = 0,
(d) Free end: u′′ (a) = u′′′ (a) = 0,
Example 6.2. Euler’s elastica models the bending of a thin flexible rod, cf. [3, 31, 36].
(The beam equation in Example 6.1 can be regarded a first order approximation of the
fully nonlinear elastica, under the assumption that the derivative (deformation gradient)
is small: u′ ≪ 1.) The problem was originally posed by Jacob Bernoulli in 1961, and then
essentially solved by Euler in 1744.
Remark : In Euler’s treatment, he also imposed the isoperimetric constraint on the elas-
tica that its length be fixed. This constraint can be handled by the methods introduced
in the following section; see also [31].
For simplicity, we assume that the elastica is restricted to lie in a plane, and its shape
ius a curve that can be identified with the graph of a function y = u(x) for a ≤ x ≤ b.
Let ℓ denote the length of the elastica (which can be computed using the usual arc length
integral), and let
u′′
κ= (6.9)
(1 + u′ 2 )3/2
denote its curvature, [24]. Then the equilibrium configurations minimize the elastica
functional
Z ℓ Z b
1 2 u′′ 2 dx
J[ u ] = 2 κ ds = ′ 2 5/2
. (6.10)
0 a 2 (1 + u )
where δ is a second constant of integration. Now the integral on the left hand side cannot
be performed in terms of elementary functions, and is a type of elliptic integral, whose
inverse is an elliptic function. Thus, the curvature of an elastica is, in general, an elliptic
function of the arc length.
Remark : Elliptic functions and elliptic integrals are of importance in a broad range of
mathematics and applications, [41]. They first appeared in the literature in Euler’s work
on the elastica.
Finally, to recover the elastic curve, we are in need of the formulas for recovering a curve
from its curvature, [24; Theorem 5.14].
Theorem 6.3. Suppose κ(s) is the curvature of a plane curve C ⊂ R 2expressed as a
function of arc length. Then the curve itself is parametrized by x(s), y(s) , where
Z Z Z
x(s) = cos θ(s) ds + ξ, y(s) = sin θ(s) ds + η, θ(s) = κ(s) ds + ζ, (6.14)
dθ 1 dθ u′′
= √ = = κ,
ds 1 + u′ 2 dx (1 + u′ 2 )3/2
reproducing the final formula in (6.14). Q.E.D.
Remark : Since the curvature is defined as the inverse of the radius of the osculating
circle to the curve, it is unaffected by a rigid motion, i.e., a translation or rotation of the
plane. Thus any translation or rotation of a curve produces the same curvature function
κ(s). Conversely, any two curves that have the same curvature function differ only by a
rigid motion. The constants of integration in (6.14) can be identified with the effect of
translations and rotations on the curve.
Finally, substituting the elliptic function formula for the curvature into (6.14) produces
some quite complicated integrals; nevertheless they can be straightforwardly approximated
through numerical integration, which enabled Euler to produce illustrations of a variety
of possible elastica configurations; Figure 12 is copied from his original paper and shows
several possible shapes achieved by a planar elastica. Note that not all of these are the
graphs of single-valued functions, and for the more general curves one needs to rewrite
the curvature and arc length element for a general parametrized curve x = u(t), y = v(t),
leading to a second order parametric variational problem, whose Euler–Lagrange equations
satisfy the same dependency (5.21) as we found in the first order case. See [31] for further
examples, along with an alternative method for integrating the Euler–Lagrange equation.
where the extremal u(x) is subject to one or more constraints in addition to the usual
range of boundary conditions. Such constraints come in three main flavors:
where c ∈ R is a constant.
• Holonomic † constraints: these are pointwise constraints on the minimizer of the general
form‡
G(x, u) = 0. (7.3)
• Nonholonomic constraints: these are pointwise constraints on the minimizer and its
derivative of the general form
K(x, u, u′ ) = 0. (7.4)
d ∂G ∂G
G(x, u) = + u′ = 0,
dx ∂x ∂u
which is equivalent to the holonomic constraint G(x, u) = c, where c ∈ R is the
constant of integration, which can be fixed by the imposed boundary conditions.
Nonholonomic constraints arise in mechanics and in optimal control, and are con-
siderably more subtle than the holonomic version. For full details, we refer the
reader to [7].
Before explaining how to adapt the variational calculus to constrained optimization
problems, let us introduce a few interesting examples.
Example 7.1. Queen Dido’s Problem. According to the legend of the founding of
Carthage, Queen Dido was granted all the land she could enclose within an ox hide. She
cleverly cut the hide into thin strips that she tied together to form a rope, which she then
used to maximize the area enclosed by the rope. The simplest version of her problem is
the isoperimetric problem introduced in Section 2. Here, the rope forms a closed curve in
the (flat) plane, and one seeks to maximize the area enclosed subject to its length being
fixed. The solution is well known to be a circle with perimeter ℓ; the reader may enjoy
adapting our subsequent analysis to establish this.
Since the area enclosed by a circle with perimeter ℓ = 2 π r, where r is its radius, is
given by π r 2 = ℓ2 /(4 π) which is the maximal area among all such curves, we deduce the
†
The word “holonomic” was coined in 1894 by the German physicist Heinrich Hertz by combining
the Greek words for “entire” and “law”.
‡
Actually, a holonomic constraint on a scalar function is a bit silly, as it would essentially
uniquely prescribe the function u(x), and hence the optimization would be trivial. Thus, holo-
nomic constraints will only play a role when the objective functional depends more than one
function: u(x) = (u1 (x), . . . , un (x)) for n ≥ 2.
−a a
Figure 13. Queen Dido’s Problem.
among all non-negative functions that satisfy the following length constraint and boundary
conditions: Z a p
L[ u ] = 1 + u′ 2 dx = ℓ, u(− a) = u(a) = 0. (7.7)
−a
The endpoints can either be fixed or variable, and the reader may enjoy speculating as to
the shape of the solution in each case before we solve the problem later in this section.
Example 7.2. Geodesics on Surfaces. As noted in Section 2, the surface geodesic
problem is to find the shortest path between two points in R 3 , but with a constraint that
G(x, y, z) = 0. (7.8)
For example, the geodesics on a sphere are parametrized curves u(t) = x(t), y(t), z(t)
satisfying the constraint
x2 + y 2 + z 2 = r 2 , (7.9)
where 0 < r ∈ R is the radius of the sphere, that minimize the arc length functional
Z bq
x2 + y 2 + z 2 dt
J[ u ] = (7.10)
among all curves between the initial and final points u(a) = α, u(a) = β, that both
lie on the sphere. As written, the arc length functional is independent of parameter, and
hence one expects the geodesic equations to be underdetermined, as the form discussed in
Section 5.
As with any holonomic constraint, there are two evident methods one can employ to
solve this problem. One is to explicitly parametrize the surface — for example in the
case of a sphere (7.9), one can use spherical coordinates. However, finding an explicit
parametrization of a complicated implicitly defined surface (7.8) may not be so easy. The
other approach is to use the methods based on Lagrange multipliers, to be presented below.
Lagrange Multipliers
Let us begin by reviewing the calculus of Lagrange multipliers for finite-dimensional con-
strained optimization problems, named in honor of Lagrange, who pioneered the method.
The Lagrange multiplier calculus will then be adapted to the infinite-dimensional situation
under consideration.
To keep the calculations simple, let us concentrate first on the three-dimensional case of
finding the minimum value of a scalar-valued objective function F (x) = F (x, y, z) when
its arguments (x, y, z) are constrained to lie on an implicitly defined surface S ⊂ R 3 , as
given by the constraint of the form (7.8). Suppose x⋆ = (x⋆ , y ⋆ , z ⋆ )T ∈ S is a (local)
T
minimum for the constrained objective function. Let x(t) = ( x(t), y(t), z(t) ) ⊂ S be
any curve contained within the surface that passes through the minimum, with x(0) = x⋆ ,
x′ (0) = v. Then the scalar function g(t) = F (x(t)) must have a local minimum at t = 0,
and hence, by the chain rule,
d
0 = g ′ (0) = F (x(t)) = ∇F (x(0)) · x′ (0) = ∇F (x⋆ ) · v . (7.11)
dt t=0
Thus, the gradient of the objective function at the surface minimum must be orthogonal
to the tangent vector to the curve. Since the curve was constrained to lies entirely in S, its
tangent vector v is tangent to the surface at the point x⋆ . Since, by the Implicit Function
Theorem, [4, 35], every tangent vector to a sufficiently regular surface is tangent to some
curve contained therein, ∇F (x⋆ ) must be orthogonal to every tangent vector, and hence
point in the normal direction to the surface.
∇F (x⋆ ) = λ n, (7.12)
where n denotes a (nonzero) normal to the surface at the point x⋆ . The scalar factor λ
is known as the Lagrange multiplier . The value of the Lagrange multiplier is not fixed
a priori, but must be determined by solving the critical point system (7.12). The same
reasoning applies to local maxima, which are also constrained critical points. The nature of
a constrained critical point — local minimum, local maximum, local saddle point, etc. —
is, in favorable cases, determined by a constrained second derivative test; see [47; Section
4] for details.
For an implicitly defined surface (7.8), the gradient vector ∇G(x) points in the normal
direction to the surface at the point x ∈ S, and hence, provided the regularity condition
n = ∇G(x) 6= 0 holds, the surface critical point condition (7.12) (dropping the star) can
be rewritten as
∇F (x) = λ ∇G(x). (7.13)
Thus, to find the constrained critical points, one needs to solve the combined system (7.8,
13) consisting of four equations in four unknowns x, y, z and the Lagrange multiplier λ.
Formally, one can reformulate the problem as an unconstrained optimization problem by
introducing the augmented objective function
Theorem 7.3. Every regular constrained local minimum and local maximum is a
constrained critical point that satisfies the Lagrange multiplier equation (7.16).
†
When ℓ = 2 a, the solution is a straight line, which does not appear among the solutions to the
Euler–Lagrange equation. Effectively it corresponds to the Lagrange multiplier λ = ∞.
Remark : One can further show that the semicircle is the solution to the variable endpoint
problem, satisfying the natural boundary condition that it be orthogonal to the x axis.
The reader may wish to analyze the case when only one endpoint is fixed.
Example 7.5. A relatively simple variational problem with a holonomic constraint is
that of geodesics on an implicitly defined surface S ⊂ R 3 , given an equation of the form
F (x, y, z) = c (7.20)
for c ∈ R (which could be set equal to zero by modifying F ). We will assume the surface is
non-empty, meaning there exist real solutions to the implicit equation (7.20) and, moreover,
is regular meaning that
∇F (x, y, z) 6= 0
whenever (x, y, z) satisfy (7.20); this ensures that the surface is smooth, without corners,
cusps, or other singularities. Note that ∇F defines a normal vector to the surface at each
point. For eample, the unit sphere is defined by the implicit equation x2 + y 2 + z 2 = 1,
and the normal vector ∇F = 2 (x, y, z) is the outwards pointing radial vector.
A geodesic between two points A, B ∈ S is, by definition, a curve C ⊂ S that provides
a minimum for the arc length functional
Z bp
J[ x ] = x2 + y 2 + z 2 dt,
a
among all parametrized curves x(t) = x(t), y(t), z(t) that satisfy the holonomic con-
straint (7.20) for each a ≤ t ≤ b and join the points A = x(a), B = x(b). Applying
the method of Lagrange multipliers, we see that every geodesic must satisfy the system of
second order ordinary differential equations
d ∂F d ∂F d ∂F
x y z
− p = µ , − p = µ , − p =µ ,
dt x + y + z
2 2 2 ∂x dt x + y + z
2 2 2 ∂y dt z + y + z
2 2 2 ∂z
x Ex (L)
e = 0,
e + z E (L)
e + y E (L)
y z
or, equivalently,
dx d 1 dx
· + µ ∇F (x) = 0,
dt dt k dx/dt k dt
as the reader can verify. Keep in mind that if x(t) satisfies (7.20), then x·∇F = 0, meaning
that the tangent vector to the geodesic is orthogonal to the normal vector to the surface.
For example, in the case of the unit sphere, one easily checks that the great circle through
the north pole parametrized by
p 1
x= 1 − t2 , y = 0, z = t, µ(t) = √ , (7.22)
1 − t2
satisfies the Euler–Lagrange equation (7.21). Since the Euler–Lagrange equations are
invariant under rotations of the sphere, we deduce that the solutions, and candidate
geodesics, are the great circles. With some further work one can show that the minimizing
geodesic between any two points on the sphere is the shorter of the two great circles con-
necting them — unless the points are antipodal in which case all the semi-circular great
circles connecting them have the same minimal arc length. For instance, the great circle
(7.22) is the unique geodesic connecting its endpoints provided 0 ≤ t ≤ b < 1. When
b = 1, it is a semicircle from north to south poles. The south pole, where the nature of
the geodesics changes, is known as a conjugate point to the north pole. We will study
conjugate points in depth in Section 8.
G(x, u) = 0. (7.25)
Suppose that u(x) is a local minimizer. To obtain necessary conditions, the first step
is to perform a variation. Since the constraint will probably not allow linear variations
of the type considered above, we let uε (x) = u(ε, x) be a one-parameter family of func-
tions, depending on (sufficiently small) ε ∈ R, satisfying the boundary conditions and the
constraint† :
G(x, uε (x)) = 0, uε (a) = α, uε (b) = β, (7.26)
We further assume that, at ε = 0, the function u0 (x) = u(x) is our candidate minimizer.
Set
d ∂u
uε (x) = (0, x) = ϕ(x). (7.27)
dε ε=0 ∂ε
Differentiating the boundary conditions with respect to ε, and then setting ε = 0, we
see that the linearized variation ϕ must satisfy the homogeneous boundary conditions
ϕ(a) = 0, ϕ(b) = 0. Similarly differentiating the constraint (7.26) produces
in which
∂L d ∂L
E(L) = E1 (L), . . . , En (L) where Ei (L) = − (7.30)
∂ui dx ∂pi
†
Thus, the family plays the role of the curve lying in the constraint submanifold in the finite-
dimensional case.
where
E(L) · g(x)
µ(x) = , (7.32)
k g(x) k2
which will play the role of our Lagrange multiplier. Now since ψ(x) is arbitrary, we can
apply the Fundamental Lemma 3.4 to the latter integral in (7.31), and conclude that the
constrained minimizer u(x) must satisfy the system
E(L) − µ(x) g(x) = E(L)(x, u(x), u′(x), u′′ (x)) − µ(x) ∇u G(x, u(x)) = 0, (7.33)
which is precisely the Lagrange multiplier equation introduced above. Indeed, (7.32) even
provides an explicit formula for the Lagrange multiplier function µ(x).
In the case of several holonomic constraints
is the n × k matrices with the indicated columns, and the invertibility of the positive
definite Gram matrix G(x)T G(x) follows from our regularity assumption that the columns
of G(x) are linearly independent, [49].
Next, consider the minimization problem for the scalar functional (7.1) subject to the
integral constraint (7.2) and fixed boundary conditions (7.24). Suppose that u(x) is a
local minimizer. Let uε (x) = u(ε, x) be a one-parameter family of functions, depending on
(sufficiently small) ε ∈ R, with u0 = u, that satisfies the constraint, so
Z b
I[ uε ] = F (x, uε , u′ε ) dx = c. (7.37)
a
Set ϕ(x) = ∂uε /∂ε | ε = 0 , which, as usual, satisfies the homogeneous fixed boundary con-
ditions. Differentiating (7.37) with respect to ε, setting ε = 0, and integrating the result
by parts, noting that the induced boundary terms vanish, and so
Z b
∂F d ∂F
E(F ) ϕ(x) dx = 0 where E(F ) = − (7.38)
a ∂u dx ∂p
is the Euler–Lagrange expression associated with the integrand F (x, u, p) evaluated at the
function u(x). On the other hand, the variation of the functional (7.1) is given by our
usual formula (3.11), and hence we require
Z b
E(L) ϕ(x) dx = 0
a
whenever ϕ(x) satisfies (7.38). To proceed, we are in need of a constrained version of the
Fundamental Lemma 3.4.
†
In the third equality, we just switched the dummy integration variables x ↔ y.
‡
If not, just reduce the list of functions to a linearly independent subset.
Z b
1, i = j,
gi (x) ηj (x) dx =
a 0, i 6= j.
Let ψ(x) be an arbitrary C∞ function with compact support. Then
k
X Z b Z b
ϕ(x) = ψ(x) − ηj (x) gj (y) ψ(y) dy satisfies gi (x) ϕ(x) dx = 0, i = 1, . . . , k,
j =1 a a
and hence, performing the same calculation, we conclude that (7.39) holds with
Z b
λi = f (x) ηi (x) dx.
a
Optimal Control
The basic problem of optimal control theory is to produce a suitable control function
u(t) depending on the time variable t that governs the state function v(t), which solves an
initial value problem for a differential equation involving the control, say
dv
= K(t, u, v), v(a) = α. (7.40)
dt
The goal is to determine the control function that optimizes an objective functional of the
form Z b
J[ u, v ] = L(t, u, v) dt, (7.41)
a
known as the performance measure, where the final time t = b can be either fixed or
variable, depending on the problem. Thus, from the viewpoint of the calculus of variations,
the optimal control problem is to optimize the objective functional (7.41) when subject to
the nonholonomic constraint and initial condition (7.40).
Optimal control problems arise in a very broad range of applications, including control
of mechanical devices, e.g., oscillators, mechanisms, and robots, of vehicles, e.g., aircraft,
rockets, space ships, and submarines, of chemical reactions, of electrical circuits and ther-
mostats, and so on. Details can be found in textbooks on the subject, including [7, 29, 34].
Remark : Sometimes one includes a quantity dependent on the final time, and seeks to
optimize the augmented objective functional
Z b
J[ u, v ] = S(b, v(b)) + L(t, u, v) dt. (7.42)
a
For example, one might seek to minimize the final time itself, in which case L ≡ 0 and
S(b, v) = b. However, by subtracting the constant term S(a, v(a)) = S(a, α), one can
where the Lagrange multiplier µ(t) is a function of t because (7.40) is a pointwise constraint.
As before, let
uε (t) = u(ε, t), vε (t) = v(ε, t), (7.44)
be a one-parameter family of functions, depending on ε ∈ R, satisfying the constraint along
with any imposed boundary conditions at a possibly variable final time b(ε). We further
assume that
u(0, t) = u(t), v(0, t) = v(t), b(0) = b,
d d d (7.45)
u(ε, t) = ϕ(t), v(ε, t) = ψ(t), b(ε) = η,
dε ε=0 dε ε=0 dε ε=0
where u(t), v(t) are our candidate minimizing state and control functions, and b is the
corresponding final time. If the final time is fixed, then η(ε) = b for all ε, and so its
variation η = 0. As always, the variation of the augmented functional (7.43) with respect
to the Lagrange multiplier just reproduces the constraint, and so we do not include it in the
computation. Applying the same methodology as in (4.28) including integrating the term
involving ψ by parts and using (7.40) to eliminate some of the boundary terms, produces
Z b(ε)
d
0= L(t, uε , vε ) − µ v ε − K(t, uε , vε ) dt
dε a
ε=0
Z b
∂L ∂K ∂K ∂L
= L b, u(b), v(b) η + µ(b) ψ(b) + +µ ϕ+ µ+µ + ψ dt,
a ∂u ∂u ∂v ∂v
(7.46)
v(b) = β (7.48)
are fixed, then both variations must vanish there: ψ(b) = η = 0, and hence the system con-
sists of the initial value problem (7.40) for the state, as controlled by u, the Euler–Lagrange
equations (7.47), and the boundary condition (7.48). These suffice, under suitably generic
hypotheses, to uniquely determine the solution.
Second, if the final time is fixed but the final state is not specified, then η = 0 but ψ(b)
is no longer fixed, and hence one replaces the boundary condition at t = b by the natural
boundary condition produced by the second boundary term, namely,
µ(b) = 0, (7.49)
specifying the value of the costate at the final time.
Third, if the final state is specified, as in (7.48), but the final time is not fixed, then
we need to make sure that the variations also satisfy the same final time condition on the
costate:
v(b + ε η) + ε ψ(b + ε η) = β.
Differentiating this equation with respect to ε, setting ε = 0, and using (7.40), produces
0 = v(b) η + ψ(b) = K b, u(b), v(b) η + ψ(b).
Substituting the resulting formula for ψ(b) back into the boundary terms in (7.46), we
deduce the corresponding final time condition
µ(b) = K b, u(b), v(b) . (7.50)
Now the system consists of (7.40, 47, 48, 50), which, again under suitable generic hypothe-
ses, serve to uniquely determine the state v(t), the control u(t), the costate µ(t), and the
final time b.
Example 7.7. Let us consider the control of an undamped harmonic oscillator. Choos-
ing units in which the oscillatory frequency equals 1, the state equation is
d2 v
+ v = u, (7.51)
dt2
in which the state variable v(t) measures the displacement of the oscillator from equilib-
rium, and u(t) is used to control its motion. A simple problem is when the oscillator that
on the state variable. Let us select the control u(t) that minimizes the total effort, as
measured by the integral Z π
1 2
J[ u, v ] = 2
u dt. (7.53)
0
Although the constraint (7.51) is second order, we can apply the same Lagrange multi-
plier method, starting with the augmented functional
Z π
1 2
e
J[ u, v ] = u − µ (v + v − u) dt.
(7.54)
2
0
e = u + µ = 0, e d2 µ
Eu (L) Ev (L) = − 2 − µ = 0, (7.55)
dt
which are supplemented by the constraint (7.51) and the boundary conditions (7.52).
(Since the final time and state are both fixed, there are no boundary conditions imposed
on the costate µ.) We solve
Nonholonomic Mechanics
Surprisingly, the equations of nonholonomic mechanics cannot be properly formulated
using the Lagrange multiplier methods that we employed to derive the equations of optimal
control! The variational calculus employed in the preceding subsection is often referred to
as Hamilton’s principle, whereas nonholonomic mechanics relies on an alternative varia-
tional calculation, known as the Lagrange-d’Alembert principle. The difference between the
two principles lies in their specifications of the allowable variations. The counterintuitive
fact is that the equations derived using Hamilton’s principle do not reproduce Newton’s
depending on two functions u, v of the time variable t. Suppose they are subject to a single
holonomic constraint
G(t, u, v) = 0. (7.59)
As before, let uε (t) = u(ε, t), vε (t) = v(ε, t), be a family of variations depending on ε ∈ R,
with
d d
u(0, t) = u(t), v(0, t) = v(t), u(ε, t) = ϕ(t), v(ε, t) = ψ(t),
dε ε=0 dε ε=0
(7.60)
where u(t), v(t) is our candidate minimizer. Then, substituting into the constraint (7.59),
differentiating with respect to ε and setting ε = 0, we deduce that the variations ϕ, ψ,
satisfy the linearized constraint equation
∂G ∂G
(t, u, v) ϕ + (t, u, v) ψ = 0, (7.61)
∂u ∂v
from which, as above, we recover the constrained Euler–Lagrange equations
∂G ∂G
Eu (L) = µ , Ev (L) = µ , (7.62)
∂u ∂v
where µ(t) is the Lagrange multiplier. The solutions u, v, µ to (7.59, 62) that satisfy the
relevant boundary conditions (fixed or natural) are the critical functions for the constrained
variational principle.
Now, if we differentiate the constraint (7.59) with respect to t, we obtain
∂G ∂G ∂G
(t, u, v) + (u, v) u + (u, v) v = 0. (7.63)
∂t ∂u ∂v
The key observation is that the terms involving u, v coincide with the linearized constraint
equation (7.61) under the identification (ϕ, ψ) ↔ (u, v). This observation underlies the
Lagrange-d’Alembert principle in the general nonholonomic case. At each point (t, u, v),
the variations must satisfy the homogeneous linear constraints imposed on the velocities.
Thus, given a linear nonholonomic constraint
M (t, u, v, u, v) = 0 (7.66)
the physically permitted variations will satisfy the constraint obtained by linearization
with respect to the velocity variables:
∂M ∂M
ϕ+ ψ = 0. (7.67)
∂u ∂v
In other words, at each configuration (t, u, v), the physically allowed variations (ϕ, ψ) must
lie on a tangent direction to the curve in the (u, v) plane specified by the constraint (7.66).
Let us see how this restriction affects the variational calculus. Substituting the variations
into (7.58), differentiating with respect to ε, setting ε = 0 using (7.44), and then integrating
the result by parts and discarding the boundary terms (e.g., by assuming the variations
have compact support), we find that
Z b
Eu (L) ϕ + Eu (L) ψ dt = 0
a
whenever ϕ, ψ satisfy the Lagrange-d’Alembert condition (7.65), where Eu (L), Ev (L) are
the usual Euler–Lagrange expressions (5.3). Applying the method of Lagrange multipliers,
we arrive at the Lagrange-d’Alembert variational equations
where µ(t) is the Lagrange multiplier. If A, B come from a holonomic constraint, these
equations coincide with (7.62). Thus, u, v satisfy a second order system of ordinary differ-
ential equations that involves the Lagrange multiplier µ(t), whose value will be prescribed
by the constraint (7.64). In the presence of suitable initial and/or boundary conditions,
the constrained system serves to uniquely specify the critical functions. One finds, [7], that
applying this methodology to physical systems with nonholonomic constraints reproduces
the correct Newtonian dynamics.
Surprisingly, the Lagrange multiplier calculus presented in the preceding subsection —
that is, Hamilton’s principle — leads to a different variational system that, in the case of
nonholonomic mechanics, does not reproduce the physically correct equations. To see this,
note that if the variations ϕ, ψ, are not a priori constrained, then upon substituting (7.44)
into the nonholonomic constraint (7.64), differentiating with respect to ε and setting ε = 0,
the resulting variational condition requires
∂A ∂B ∂C ∂A ∂B ∂C
Aϕ + B ψ +
u+ v+ ϕ+ u+ v+ ψ = 0,
∂u ∂u ∂u ∂v ∂v ∂v
role of the Lagrange multiplier, previously denoted by µ. On the other hand, if (7.69) is
not satisfied, then (7.71) is not the same as the physical Lagrange-d’Alembert equations
(7.68); the additional terms multiplying µ are in violation of Newton’s Laws relating forces,
masses, and accelerations. On the other hand, if this were an optimal control problem,
(7.71) would be the correct equations of motion.
Example 7.8. An simple example of a nonholonomic mechanical system is a skier
sliding down a slope. We idealize the physical situation by assuming that the slope is a
flat plane, and the skier is modeled by a mass on a single blade (a knife edge) that slides
frictionlessly through the snow. Let x, y denote coordinates within the plane, so that the
plane slopes downhill in the positive x direction making an angle α with the horizontal;
see Figure 15.
The position of the mass on the hill is prescribed by (x(t), y(t)) while the blade’s orien-
tation relative to the x-axis is prescribed by the angle θ(t), which, as always, is measured
modulo 2 π. In the absence of external forcing, the variational problem is the difference
between kinetic and potential energy
Z b
1 2 2 1
2
J[ x, y, θ ] = 2 m (x + y ) + 2 I θ + m g x sin α dt, (7.72)
a
θ
x
where g is the gravitational constant, m is the skier’s mass, and I the skier’s moment of in-
ertia (about the vertical axis), which are all assumed to be constant. The ski blade prevents
the skier from experiencing any transverse velocity, and hence imposes the nonholonomic
constraint
G(x, y, θ, x, y, θ) = x sin θ − y cos θ = 0. (7.73)
h(ε) = J[ u + ε ϕ ],
A(x) ϕ′ (x)2 + 2 B(x) ϕ(x) ϕ′ (x) + C(x) ϕ(x)2 > 0 whenever a < x < b, (8.4)
then Q[ u ; ϕ ] > 0 is positive definite. Viewing the left hand side as a homogeneous quadra-
tic polynomial in ϕ, ϕ′ , positivity at a fixed x will be implied by the algebraic inequalities
A(x) > 0 and B(x)2 − 4 A(x) C(x) < 0.
Example 8.1. For the arc length minimization functional (2.3), the Lagrangian is
p
L(x, u, p) = 1 + p2 . To analyze the second variation, we first compute
∂ 2L 1 ∂ 2L ∂2L
2
= , = 0, = 0.
∂p (1 + p2 )3/2 ∂u ∂p ∂u2
For the critical straight line function
β−α β−α
u(x) = (x − a) + α, with u′ (x) = ,
b−a b−a
and so, evaluating the second derivatives of the Lagrangian at the critical function,
∂ 2L (b − a)3 ∂ 2L ∂ 2L
A(x) = = 3/2 ≡ c, B(x) = = 0, C(x) = = 0,
∂p2 (b − a)2 + (β − α)2 ∂u ∂p ∂u2
where c > 0 is a positive constant. Therefore, the second variation functional (8.2) is
Z b
Q[ u ; ϕ ] = c ϕ′ (x)2 dx ≥ 0.
a
Moreover, Q[ u ; ϕ] = 0 vanishes if and only if ϕ(x) is a constant function. But the variation
ϕ(x) is required to satisfy the homogeneous boundary conditions ϕ(a) = ϕ(b) = 0, and
hence Q[ u ; ϕ ] > 0 for all allowable nonzero variations. Therefore, we conclude that the
straight line is, indeed, a (local, weak) minimizer for the arc length functional.
However, as the following example demonstrates, the pointwise positivity condition (8.4)
is overly restrictive.
Example 8.2. Consider the quadratic functional
Z 1
Q[ ϕ ] = (ϕ′ 2 − ϕ2 ) dx. (8.5)
0
We claim that Q[ ϕ ] > 0 for all nonzero ϕ 6≡ 0 subject to homogeneous Dirichlet boundary
conditions ϕ(0) = 0 = ϕ(1). This result is not trivial! Indeed, the boundary conditions
play an essential role, since choosing ϕ(x) ≡ c 6= 0 to be any constant function will produce
a negative value for the functional: Q[ ϕ ] = − c2 .
In the second equality, we integrated the middle term by parts, using (ϕ2 )′ = 2 ϕ ϕ′ ,
and noting that the resulting boundary terms vanish owing to our imposed boundary
e ϕ ] is positive definite, so is Q[ ϕ ], justifying the previous claim.
conditions. Since Q[
To appreciate how delicate this result is, consider the almost identical quadratic func-
tional Z 4
b
Q[ ϕ ] = (ϕ′ 2 − ϕ2 ) dx, (8.6)
0
the only difference being the upper limit of the integral. A quick computation shows that
the function ϕ(x) = x (4 − x) satisfies the boundary conditions ϕ(0) = 0 = ϕ(4), but
Z 4
64
b ϕ] =
Q[ (4 − 2 x)2 − x2 (4 − x)2 dx = − < 0.
0 5
Therefore, Q[b ϕ ] is not positive definite. Our preceding analysis does not apply be-
cause the function tan x becomes singular at x = 21 π, and so the auxiliary integral
Z 4
(ϕ′ + ϕ tan x)2 dx does not converge.
0
Proposition 8.3. Suppose that A(x), B(x), C(x) are continuous for x ∈ [ a, b ], and
that the quadratic functional (8.7) is positive definite. Then A(x) ≥ 0 for all a ≤ x ≤ b.
Proof : Suppose not, so there exists a point† a < c < b where A(c) = − m < 0 for some
0 < m ∈ R. Assuming this, we will construct a function ϕ(x) satisfying the boundary
conditions such that Q[ ϕ ] < 0, in contradiction to positive definiteness.
By continuity, we can find δ > 0 so that a < c − δ < c + δ < b, and
Using the formula (8.3) for A(x) when the quadratic functional (8.7) represents the sec-
ond variation (8.2), Proposition 8.3 immediately implies the Legendre necessary condition
for a minimizer of a functional, which is named in honor of Adrien-Marie Legendre.
†
Note that if A(a) < 0, then, by continuity, there is a nearby point a < c < b where A(c) < 0;
similarly if A(b) < 0.
‡
One could, with a little more work, construct a smooth function with the required property, as
in the proof of the Fundamental Lemma 3.4.
∂ 2L ′
x, u(x), u (x) ≥ 0. (8.10)
∂p2
For a local maximizer, the Legendre inequality is reversed. On the other hand, the
strengthened Legendre condition, where the inequality is strict does not suffice to show
that a solution to the Euler–Lagrange boundary value problem is a minimizer (respectively,
maximizer). Indeed, the quadratic functional (8.6) provides an elementary counterexample,
where the zero function satisfies both the Euler–Lagrange equation and the boundary
conditions, as well as the strengthened Legendre condition, since ∂ 2 L/∂p2 ≡ 2, but, as we
saw, u(x) ≡ 0 is not a minimizer of the functional. Thus, to prove that a given solution is
a minimizer, we must find and impose additional conditions.
Envelopes
where F is continuously differentiable, then the envelope E characterized as the set of all
points (x, y) ∈ R 2 that satisfy, for some t ∈ I, both (8.11) and the equation obtained by
differentiation with respect to the parameter:
∂F
(x, y, t) = 0. (8.12)
∂t
The envelope may be empty.
We are particularly interested in the case when the curves are the graphs of a one-
parameter family of functions, and so (8.11) has the form y = ut (x) = u(x, t). Thus, the
envelope E of the family ut is the set of points (x, y) ∈ R 2 satisfying
∂u
y = u(x, t), (x, t) = 0, for some t ∈ I. (8.13)
∂t
Most of the reuslts require that the function u(t, x) be continuously differentiable, although
to avoid complications, we will assume that u ∈ C2 .
Example 8.5. The family y = u(x, t) = x + t of parallel straight lines has empty
envelope since ∂u/∂t ≡ 1. The family y = u(x, t) = t x straight lines passing through the
origin satisfies ∂u/∂t = x and hence the envelope consists of just the origin y = x = 0.
Indeed, in general, if all the curves in the family pass through a common point, then that
point belongs to the envelope.
t−x
u(x, t) = .
t2
Since
∂u 2x − t
= =0 when t = 2 x,
∂t t3
the envelope is the hyperbola y = 1/(4 x), as illustrated in the first plot in Figure 16.
Finally, if ρ(t) > 0 is any positive function, consider the one-parameter family of circles
(x − t)2 + y 2 = ρ(t)2
of radii ρ(t) and centered at (t, 0). Differentiating with respect to t as in (8.12), we find
that the envelope must satisfy x = t − ρ(t) ρ′ (t), and hence it is the curve parametrized by
p
x(t) = t − ρ(t) ρ′ (t), y(t) = ± ρ(t) 1 − ρ′ (t)2 ,
where t is restricted to where | ρ′ (t) | ≤ 1. Several cases are illustrated in Figure 16; the
circles are in black and the envelope is in red. In the last plot, with ρ(t) = sin t, the
envelope is a pair of mirror image cycloids.
∂ 2u
An envelope point (x0 , t0 ) ∈ E is called regular if (x , t ) 6= 0; vice versa, if the
∂t2 0 0
second derivative vanishes, the envelope point is called singular . Nearby a regular point,
the envelope is the graph of a smooth curve y = f (x). Indeed, the Implicit Function
Theorem tells us that, in a neighborhood of a regular point, one can locally solve the
second envelope equation for t = h(x). Substituting this expression into the first equation
shows that, for such x, the envelope is the graph of the function y = f (x) = u(x, h(x)). At
a regular point, the graph of y = ut0 (x) intersects the envelope curve tangentially, since
∂u ∂u ∂u
(x0 , t0 ) = f ′ (x0 ) = (x0 , t0 ) + h′ (x0 ) (x , t ),
∂x ∂x ∂t 0 0
and the final term is zero because of the envelope condition.
As noted above, envelopes are closely connected with the points of intersection of the
curves in the family. In particular, if they all intersect at a single point A = (a, α), so
∂u ∂ 2u
u(a, t) = α for all t, then clearly (a, t) = 2 (a, t) = 0, and hence A ∈ E is a singular
∂t ∂t
envelope point. While general intersection points are not typically points on the envelope,
their limits are.
Proposition 8.6. Let t be fixed. Suppose that, for all s sufficiently close to t, the
graph of us (x) intersects that of ut (x) at a point P (s), and suppose P (s) → P0 as s → t.
Then the limiting intersection point lies in the envelope: P0 ∈ E.
Proof : The intersection assumption implies P (s) = x(s), y(s) satisfies
y(s) = u(x(s), s) = u(x(s), t).
Differentiating with respect to s, we have
∂u ∂u ∂u
y ′ (s) = (x(s), s) x′(s) + (x(s), s) = (x(s), t) x′ (s).
∂x ∂s ∂x
Taking the limit as s → t, given that P (s) = (x(s), y(s)) → (x0 , y0 ) = P0 , and using the
continuity of the partial derivatives of u, the equation becomes
∂u ∂u ∂u ∂u
(x0 , t) x′ (t) + (x0 , t) = ( , t) x′ (t), and hence (x , t) = 0,
∂x ∂s ∂x 0 ∂s 0
which implies P0 = (x0 , y0 ) ∈ E is in the envelope. Q.E.D.
In the scalar-valued case under consideration, intersection of the graphs implies the
existence of envelope points.
Conjugate Points
The envelope of a certain one-parameter family of solutions to the Euler–Lagrange equa-
tion plays a critical role in the characterization of minima (and maxima) of the functional.
We first note that the Euler–Lagrange equation is a regular second order ordinary differen-
tial equation provided the coefficient of u′′ is nonzero: ∂ 2 L/∂p2 6= 0. Under this condition,
the standard existence, uniqueness, and continuous dependence results for solutions to or-
dinary differential equations, [9], can be applied. When dealing with a minimizer, we will
thus assume that the strict Legendre condition
∂ 2L
(x, u, p) > 0 (8.14)
∂p2
holds for all a ≤ x ≤ b and all the solutions under consideration. (When treating maxima,
one imposes the opposite inequality.)
Let A = (a, α) ∈ R 2 . Consider the one-parameter family ut (x) = u(x, t) of solutions to
the second order Euler–Lagrange equation with initial conditions
C = { (c, 0) | u1 (c) = 0, c 6= a } .
Since ut (c) = t u1 (c) = 0 as well, we deduce that the conjugate values are the common
zeros (except for a) of the solutions† ut (x).
Let us further analyze the functions
∂u
(x, t)
vt (x) = v(x, t) = (8.18)
∂t
whose vanishing determines the conjugate locus. Since all the ut (x) are assumed to solve
the Euler–Lagrange equation (3.15), we can differentiate it with respect to the parameter t
†
Keep in mind that u0 (x) ≡ 0.
Observe that a conjugate value c can thus be identified as a point at which this solution
to the Jacobi equation vanishes: vt (c) = 0. On the other hand, according to Example 8.9,
these are precisely the conjugate values of the second variation functional. We have thus
established that the conjugate values of a functional and its second variation coincide. If c
is a conjugate value, then the associated conjugate point for the quadratic second variation
functional is (c, 0), whereas for the original functional, the conjugate point is (c, ut (c)).
Suppose that we are not at a conjugate point, so vt (x) 6= 0. Then a short computation
based on (8.19) reveals the following useful identity. If ϕ(x) is any smooth function,
′2 ′ 2 At (vt ϕ′ − vt′ ϕ)2 d (At vt′ + Bt vt ) ϕ2
At ϕ + 2 Bt ϕ ϕ + Ct ϕ = + . (8.22)
vt2 dx vt
Let us assume ϕ(a) = ϕ(b) = 0. Then, integrating both sides of (8.22),
Z b Z b
′2 ′ 2
At (vt ϕ′ − vt′ ϕ)2
At ϕ + 2 Bt ϕ ϕ + Ct ϕ dx = dx, (8.23)
a a vt2
where the integral of the final term is zero due to the Fundamental Theorem of Calculus
and the assumed boundary conditions for ϕ. The strict Legendre condition (8.14) implies
At (x) > 0; moreover, vt (x) 6= 0 for a ≤ x ≤ b, and therefore the right hand side of (8.23)
is ≥ 0. Moreover, the integral equals 0 if and only if vt (x) ϕ′ (x) − vt′ (x) ϕ(x) ≡ 0 for all
a ≤ x ≤ b, which the reader may recognize as the Wronskian of the functions ϕ, vt , [9].
Thus, ϕ(x) satisfies the first order initial value problem
dϕ
= w(x) ϕ, ϕ(b) = 0, where w(x) = vt′ (x)/vt (x),
dx
1/7/22 79 c 2022 Peter J. Olver
and hence, by uniqueness of solutions, [9], ϕ(x) ≡ 0. Thus, if there are no conjugate
values in the interval [ a, b ], then the quadratic functional on the left hand side of (8.23) is
positive definite.
The alert reader may have spotted the one flaw in the above argument. Namely, (8.21)
implies that vt (a) = 0, and hence the integral on the right hand side may have an insur-
mountable singularity at x = a. The way to get around this is to move the initial point
where we defined the family of solutions to the Euler–Lagrange equation a little to the left
so that it lies outside of the interval [ a, b ]. Thus, we fix ε > 0 small, and define uet (x) to
be the one-parameter family of solutions to the Euler–Lagrange equation satisfying
u
et (a − ε) = α, et′ (a − ε) = t.
u
∂e
u
Setting vet (x) = (x, t) as before, the fact that there were no conjugate values for ut in
∂t
[ a, b ] implies, by the continuous dependence of solutions to ordinary differential equations
on initial data, [9], that, provided ε is sufficiently small, the same holds for u et , meaning
vet (x) 6= 0 for all a ≤ x ≤ b, including x = a. We now rerun the identity (8.22), but with vt
replaced by vet , and this time the argument leading to positive definiteness is completely
legitimate. In particular, setting t = u′ (a) where u = ut is our candidate minimizer, we
have proved, under the above conditions, the positive definiteness of the second variation
of the functional. This implies that u is indeed a strict local minimizer for the variational
problem.
Theorem 8.10. Suppose that u(x) satisfies the Euler–Lagrange equation (3.13) with
fixed boundary conditions. Assume that, in addition, the strict Legendre condition (8.14)
holds and there are no conjugate values to a on the interval ( a, b ]. Then u(x) is a strict
weak local minimum for the functional.
Corollary 8.11. Consider the quadratic functional (8.7). If A(x) > 0 and there are no
conjugate points in the interval ( a, b ], then the quadratic functional is positive definite.
subject to the boundary conditions u(0) = u(1) = 0. Since the Lagrangian only depends
on u′ , the Euler–Lagrange equation (3.13) can be immediately integrated, taking the form
u′ + u′ 2 = c,
where c ∈ R is the constant of integration. Thus u′ is constant, and the solutions to the
Euler–Lagrange equation are straight lines: u = a x+b. Imposing the boundary conditions,
we infer that u(x) ≡ 0 is the unique critical function, with J[ u ] = 0. We claim that u is a
weak local minimizer but is not a strong local minimizer.
To justify the first claim, note that if | v ′ (x) | < r for all a ≤ x ≤ b, then
Z 1 Z 1
1 ′2 1 ′3
J[ v ] = 2
v + 3v dx = 1
2
+ 13 v ′ v ′ 2 dx > 0 = J[ u ],
0 0
provided r < 23 and v ′ (x) 6≡ 0. Indeed, one easily verifies that the strengthened Legendre
condition holds:
∂ 2L
(x, u, u′ ) = 1 + 2 u′ = 1 > 0 when u ≡ 0.
∂p2
Moreover, the Jacobi equation (8.19) is simply − ϕ′′ = 0, so if ϕ(0) = 0 and ϕ 6≡ 0, then
ϕ(x) = c x 6= 0 for all x 6= 0 and hence there are no conjugate points. Thus, u ≡ 0 satisfies
the conditions of Theorem 8.10 guaranteeing its status as a strict (weak) local minimizer.
On the other hand, given r > 0, consider the continuous, piecewise affine function
r x/(1 − r 2 ) 0 ≤ x ≤ 1 − r2 ,
vr (x) = (9.2)
(1 − x)/r, 1 − r 2 ≤ x ≤ 1,
1 − r2
as graphed in Figure 17. Clearly | vr (x) | ≤ r for all 0 ≤ x ≤ 1. On the other hand, one
easily computes
r 2 (3 + 2 r − 3 r 2 ) 1 1
J[ vr ] = 2 2
+ − .
6 (1 − r ) 2 3r
As r → 0+ , the final term on the right hand side becomes increasingly large negative, while
the first two terms are bounded, and hence J[ vr ] < 0 for r sufficiently small, e.g., r < .47.
This implies that u(x) ≡ 0 is not a strong local minimizer. Although vr is uniformly close
to the zero function, its large negative slope on a small interval near x = 1 suffices to make
the functional negative. We will learn how to avoid weak minimizers that are not strong
minimizers in the following section.
Example 9.3. Consider the variational problem
Z 1
dx
J[ u ] = ′2
, u(0) = u(1) = 0. (9.3)
0 1+u
As in the preceding example, the solutions to the Euler–Lagrange equation are affine:
u = a x + b, and hence u(x) ≡ 0 is the unique critical function, with J[ u ] = 1. Moreover,
it satisfies the conditions of Theorem 8.10 and hence is a strict weak local minimizer.
However, the integral, while always positive, can be made arbitrarily small by making
the derivative u′ large. Thus, we introduce a “spikier” adaptation of the sawtooth function
(3.39):
k k 2k + 1
n x − n2 ,
n 2
≤x≤
2 n2
,
k = 0, . . . , n2 − 1,
un (x) = (9.4)
k + 1 2 k + 1 k + 1 n = 1, 2, 3, . . . .
n −x , ≤x≤ ,
n2 2 n2 n2
Observe that
1
0 ≤ un (x) ≤ , u′n (x) = ± n.
n
1/7/22 83 c 2022 Peter J. Olver
We conclude that un (x) → 0 uniformly as n → ∞. Moreover, J[ un ] = (1 + n2 )−1 → 0 as
n → ∞, while, as we already noted, J[ 0 ] = 1. Thus, u(x) ≡ 0 is a weak minimizer, but not
a strong minimizer. A similar result holds for any other fixed boundary conditions: the
affine function u = c x + d that interpolates the boundary values is a weak but not strong
minimizer since one can replace it bit a nearby sawtooth-like function that has arbitrarily
small value for the functional.
A related variational problem with similar properties is
Z b
b u] = x dx
J[ . (9.5)
a 1 + u′ 2
This functional represents the resistance of a cylindrically symmetric projectile that moves
through a uniform fluid, and was introduced by Newton who asked which shape would
produce the least resistance, making this (and not the brachistochrone) the oldest genuine
problem in the calculus of variations. While the solutions to the Euler–Lagrange equation
satisfying suitable boundary conditions are weak minimizers, one can similarly replace
them by arbitrarily close jagged sawtooth-like approximants that produce an arbitrarily
small value for the functional. This counterintuitive fact was noted by Legendre, but proved
to be so disconcerting that the problem was mostly ignored (if not actively suppressed)
until Weierstrass appeared on the scene. See [31; §11.3] for details.
There are cases where the minimizer of a variational problem is not everywhere smooth,
but has one or more corners. In the calculus of variations literature, a non-smooth critical
function is known as a broken extremal, [21, 31].
Necessary conditions for a non-smooth critical function can be derived using a similar
calculation to that used when a boundary condition was constrained to lie on a curve.
For simplicity, let us assume the function has just one corner at a < c < b. (Extending
u1 (x) u2 (x)
a c c+ε b
the analysis to finitely many corners is straightforward.) Thus, we suppose our minimizer
takes the form
u1 (x), a ≤ x ≤ c,
u(x) =
u2 (x), c ≤ x ≤ b,
where, in order that u be continuous and have a corner at x = c, we require
u(c) = u1 (c) = u2 (c) whereas u′ (c− ) = u′1 (c) 6= u′2 (c) = u′ (c+ ). (9.8)
When we vary u(x) we can also vary the location of the corner, replacing c by c + ε ζ for
some ζ ∈ R; see Figure 18. Thus, the variation takes the form
u1 (x) + ε ϕ1 (x), a ≤ x ≤ c + ε ζ,
uε (x) =
u2 (x) + ε ϕ2 (x), c + ε ζ ≤ x ≤ b,
where we assume ϕ1 , ϕ2 are smooth and a ≤ c + ε ζ ≤ b. In order that the variations
remain continuous, we require
u1 (c + ε ζ) + ε ϕ1 (c + ε ζ) = u2 (c + ε ζ) + ε ϕ2 (c + ε ζ). (9.9)
The variation in the functional is thus given by
Z c+ε ζ
h(ε) = J[ uε ] = L x, u1 (x) + ε ϕ1 (x), u1′ (x) + ε ϕ1′ (x) dx
a
Z b
+ L x, u2 (x) + ε ϕ2 (x), u2′ (x) + ε ϕ2′ (x) dx.
c+ε ζ
As always, we take the derivative of h with respect to ε and then set ε = 0, integrating
by parts at the appropriate stage. Using the formula for the derivative of an integral with
variable endpoints, the net result is
∂L
0 = h′ (0) = L c, u1 (c), u1′ (c) − L c, u2 (c), u2′ (c) ζ + c, u1 (c), u1′ (c) ϕ1 (c)
∂p
Z c Z b
∂L ′
− c, u2 (c), u2 (c) ϕ2 (c) + E(L)[u1 ] ϕ1 (x) dx + E(L)[u2 ] ϕ2 (x) dx.
∂p a c
(9.10)
which are trivially satisfied for the minimizer (9.7) at c = 0, since u(0) = 0, u′ (0− ) = 0,
u′ (0+ ) = 1.
†
Which has nothing to do with the algebraic notion of a field like R or C.
By a field of extremals ‡ we mean a field of curves each of which is the graph of a critical
function, i.e., a solution to the Euler–Lagrange equation. This imposes the following
condition on the associated slope function ψ(x, u). Differentiating (10.1) and using the
chain rule,
d2 u ∂ψ ∂ψ du ∂ψ ∂ψ
2
= + = +ψ . (10.3)
dx ∂x ∂u dx ∂x ∂u
Substituting (10.1) and (10.3) into the Euler–Lagrange equation (3.14), we deduce that
ψ(x, u) must satisfy the first order partial differential equation
∂ψ ∂ψ ∂ 2 L ∂ 2L ∂ 2L ∂L
+ψ 2
(x, u, ψ) + ψ (x, u, ψ) + (x, u, ψ) − (x, u, ψ) = 0.
∂x ∂u ∂p ∂u ∂p ∂x ∂p ∂u
(10.4)
The general theory of first order partial differential equations, [10, 17], implies that, un-
der fairly mild assumptions on the Lagrangian L, there exist solutions to the slope equa-
tion (10.4) on suitable open domains U ⊂ R 2 .
‡
As in the Remark following Theorem 3.1, any solution of the Euler–Lagrange equation is referred
to in the calculus of variations literature as an “extremal” whether or not it provides an extreme
value for the functional — a maximizer or minimizer. To us, a preferable terminology here, more
in tune with the rest of mathematics, would be critical foliation. However, slightly reluctantly,
we shall retain the standard calculus of variations version.
Equivalent Functionals
It turns out that Carathéodory’s Royal Road is paved by null Lagrangians! In Section 4,
we showed how null Lagrangians enable one to enlarge the range of variationally admissible
natural boundary conditions. Here we will employ them to solve the strong minimization
problem!
Let
d ∂S ∂S
N (x, u, p) = S(x, u) = +p (10.6)
dx ∂x ∂u
be a null Lagrangian prescribed by a function S(x, u). As noted in the remarks after (4.10),
the modified functional
Z b Z b
e u] =
J[ e u, u ) dx =
L(x, ′
L(x, u, u′ ) − N (x, u, u′ ) dx
a a
Z b (10.7)
′ ∂S ′ ∂S
= L(x, u, u ) − −u dx = J[ u ] − S b, β + S a, α
a ∂x ∂u
has the same Euler–Lagrange equations and hence the same critical functions as J[ u ]. In
e u ] are equivalent functionals.
other words, J[ u ] and J[
Given a field of extremals containing our candidate minimizer u(x), let ψ(x, y) denote
the associated slope function. Suppose we can find a null Lagrangian (10.6) such that the
modified Lagrangian
∂L ∂ 2L ∂ψ ∂ 2 L
(x, u, ψ) − ψ (x, u, ψ) − ψ (x, u, ψ)
∂u ∂u ∂p ∂u ∂p2
∂ 2S ∂ 2L ∂ψ ∂ 2 L
= = (x, u, ψ) + (x, u, ψ),
∂x ∂u ∂x ∂p ∂x ∂p2
which reproduces the condition (10.4), and hence is automatically valid. Thus we are
assured of the existence of a null Lagrangian that satisfies the first condition (10.13). The
resulting integrated null Lagrangian
Z b
∂S ′ ∂S
I[ u ] = +u dx = S(b, β) − S(a, α)
a ∂x ∂u
Z b (10.16)
′ ∂L
= L x, u, ψ(x, u) − ψ(x, u) − u x, u, ψ(x, u) dx
a ∂p
depends only on the boundary values of u(x) and thus its value is the same for any
function that satisfies the fixed boundary conditions. The left hand side of (10.16) is
known as Hilbert’s invariant integral, where “invariance” refers to its independence of the
path taken between the points (a, α) and (b, β); see the discussion following (4.9).
Tq (x, u, p)
p
q
Thus, the Weierstrass condition (10.20) holds. Moreover, we can easily embed any mini-
mizer in a field of extremals — for example, straight lines all possessing the same slope.
We have thus finally rigorously proved our expectation that a straight line is a strong
minimizer of the arc length functional.
1
Example 10.4. On the other hand, for the Lagrangian L = 2
p2 + 13 p3 associated with
the variational problem (9.1), we find
E(x, u, q, p) = 1
2
p2 + 31 p3 − 21 q 2 − 13 q 3 − (p − q) (q + q 2 ) = 1
2
p2 + 31 p3 − p q − p q 2 + 21 q 2 + 23 q 3 ,
which, for any fixed value of q, will be arbitrarily large negative when p ≪ 0 is large
negative. Thus, the Weierstrass condition (10.20) fails, which explains why u(x) ≡ 0 is
not a strong minimizer.
While positivity of the Weierstrass excess function suffices for the extremal to be a strong
local minimizer, its non-negativity provides an additional necessary condition that must
be satisfied by a strong minimizer.
Proof : Let p ∈ R, and let ε > 0 be such that a < c − ε < c < b. Consider the following
variation:
u(x) + s (x − a),
a ≤ x ≤ c − ε,
uε (x) = p (x − c) + u(c), c − ε ≤ x ≤ c,
u(x), c ≤ x ≤ b,
u(x)
a c−ε c b
in order that uε be continuous at x = c−ε. See Figure 20. Observe that (10.22) guarantees
that s → 0 as ε → 0+ , and hence uε (x) is uniformly close to u(x). However, when
c − ε ≤ x ≤ c, the graph of uε (x) is a straight line whose slope, p, does not go to zero as
ε → 0+ . Thus, uε (x) constitutes a strong but not a weak variation of u(x) = u0 (x).
For ε > 0, consider
Z c−ε
h(ε) = J[ uε ] = L x, u(x) + s (x − a), u′ (x) + s dx
a
Z c Z b
+ L x, u(c) + p (x − c), p dx + L x, u(x), u′ (x) dx.
c−ε c
Since we are assuming u is a strong local minimum, h(ε) must reach a minimum when
ε → 0+ , and hence h′ (0+ ) ≥ 0. We differentiate (10.22) to compute
ds
σ (c − a) = u′ (c) − q, where σ= . (10.23)
dε ε = 0+
Thus, employing an integration by parts, and using the fact that u satisfies the Euler–
Lagrange equation E(L) = 0, we find
0 ≤ h′ (0+ ) = − L c, u(c), u′ (c) + L c, u(c), p
Z c
∂L ′
∂L ′
+ x, u(x), u (x) t (x − a) + x, u(x), u (x) t dx
a ∂u ∂p
∂L
= L c, u(c), p − L c, u(c), u′ (c) + c, u(c), u′ (c) σ (c − a)
∂p
Z c
∂L ′
d ∂L ′
+ x, u(x), u (x) − x, u(x), u (x) σ (x − a) dx
a ∂u dx ∂p
∂L
= L c, u(c), p − L c, u(c), u′ (c) − (u′ (c) − p) c, u(c), u′ (c)
∂p
′
= E c, u(c), u (c), p .
having the form of a double integral over a prescribed domain Ω ⊂ R 2 , which, for simplicity,
is assumed to be bounded. The Lagrangian L(x, y, u, p, q) is assumed to be a sufficiently
smooth function of its five arguments. Our goal is to find the function(s) u = f (x, y) that
minimize the value of J[ u ] when subject to a set of prescribed boundary conditions on ∂Ω,
which is assumed to be reasonably nice, e.g., piecewise smooth. The two principal types
of boundary conditions are (a) the Dirichlet or fixed boundary value problem that require
that the minimizer satisfy
u(x, y) = g(x, y), (11.2)
for (x, y) ∈ ∂Ω and (b) free boundary conditions, in which no conditions are a priori
imposed on the boundary behavior of u. More generally, one can have mixed boundary
conditions, in which the boundary is split into two subsets, Ω = Ω1 ∪ Ω2 , with fixed
boundary conditions imposed on the first and free boundary conditions on the second. As
in the one-dimensional case, the variational calculus will impose certain natural boundary
conditions on the free part of the boundary.
To convert (11.4) into this form, we need to remove the offending derivatives from ϕ.
In two dimensions, the requisite integration by parts formula is based on Green’s Theo-
rem, which is written in divergence form, [43]:
ZZ I ZZ
∂ϕ ∂ϕ ∂w1 ∂w2
w + w dx dy = ϕ (− w2 dx + w1 dy) − ϕ + dx dy,
Ω ∂x 1 ∂y 2 ∂Ω Ω ∂x ∂y
(11.5)
in which w1 , w2 are arbitrary smooth functions, and the boundary term is a line integral
around ∂Ω, which is oriented in the counterclockwise direction. Equivalently, we can write
(11.5) in vectorial form
ZZ I ZZ
∇ϕ · w dx dy = ϕ w · n ds − ϕ ∇ · w dx dy, (11.6)
Ω ∂Ω Ω
where w = (w1 , w2 ) is a vector field, s denotes arc length, and
n = dy/ds, − dx/ds is
the unit outward normal on ∂Ω. Setting w = ∂L/∂p, ∂L/∂q , we find
ZZ I
∂L ∂L
ϕx + ϕy dx dy = ϕ w · n ds
Ω ∂p ∂q ∂Ω
ZZ (11.7)
∂L ∂L
− ϕ Dx + Dy dx dy.
Ω ∂p ∂q
Lu − Lxp − Lyq − ux Lup − uy Luq − uxx Lpp − 2 uxy Lpq − uyy Lqq = 0, (11.13)
where we use subscripts to indicate derivatives of L, which are evaluated at (x, y, u, p, q) =
(x, y, u, ux, uy ).
In general, the solutions to the Euler–Lagrange boundary value problem are critical
functions for the variational problem, and hence include all (smooth) local and global min-
imizers. Determination of which solutions are genuine minima requires a further analysis
e = { y = x + u(x) | x ∈ Ω } .
Ω
The one-dimensional case governs bars, beams and rods, two-dimensional bodies include
thin plates and shells, while n = 3 for fully three-dimensional solid bodies. See [3, 25] for
details and physical derivations.
For small deformations, we can use a linear theory to approximate the more complicated
equations of nonlinear elasticity. The simplest case is that of a homogeneous and isotropic
planar body Ω ⊂ R 2 . The equilibrium configurations are described by the displacement
ux uy
function u(x) = ( u(x, y), v(x, y) ) whose Jacobian matrix is denoted by ∇u = .
vx vy
which we write as an inner product (using the standard L2 inner product between vector
fields) between the variation ϕ = (ϕ, ψ) and the functional gradient ∇J = ( ∇u J, ∇v J ).
For the particular functional (11.16), we find h′ (0) equals
ZZ
λ + 2 µ (ux ϕx + vy ψy ) + µ (uy ϕy + vx ψx ) + (λ + µ) (ux ψy + vy ϕx ) dx dy.
Ω
We use the integration by parts formula (11.5) to remove the derivatives from the variations
ϕ, ψ. Discarding the boundary integrals, which are used to prescribe the variationally
admissible boundary conditions, we find
ZZ !
(λ + 2 µ) u xx + µ u yy + (λ + µ) vxy ϕ
h′ (0) = − dx dy.
Ω + (λ + µ) uxy + µ vxx + (λ + 2 µ) vyy ψ
The two terms in braces give the two components of the functional gradient. Setting them
equal to zero produces the second order linear system of Euler–Lagrange equations
(λ + 2 µ) uxx + µ uyy + (λ + µ) vxy = 0,
(11.17)
(λ + µ) uxy + µ vxx + (λ + 2 µ) vyy = 0,
known as Navier’s equations can be compactly written as
µ ∆u + (µ + λ) ∇(∇ · u) = 0 (11.18)
for the displacement vector u = ( u, v ). Its solutions are the critical displacements that,
under appropriate boundary conditions, minimize the potential energy functional.
As our next example, let us analyze the minimal surface problem, i.e., finding the surface
S ⊂ R 3 whose boundary is a specified space curve, ∂S = C, that minimizes the total
surface area. Initially we will assume, in order to simplify the exposition, that the surface
is described by the graph of a function z = u(x, y), although the ensuing results will apply
as stated to general parametrized surfaces.
From (2.10), the surface area integral
ZZ p p
J[ u ] = 1 + u2x + u2y dx dy has Lagrangian L = 1 + p2 + q 2 . (11.19)
Ω
Note that
∂L ∂L p ∂L q
= 0, =p , =p .
∂u ∂p 1 + p2 + q 2 ∂q 1 + p2 + q 2
Therefore, replacing p → ux and q → uy and then evaluating the derivatives, the Euler–
Lagrange equation (11.12) becomes
We are confronted with a complicated, nonlinear, second order partial differential equation,
which has been the focus of some of the most sophisticated and deep analysis over the past
two centuries. We refer the interested reader to [24, 27, 38, 39] for classical and modern
developments.
It turns out that one can employ a combination of basic surface differential geometry,
[24], and complex analysis, [2, 46], to derive a large class of solutions to the minimal
surface equation. Consider a parametrized surface S ⊂ R 3 , given by the image of
T
x(ξ, η) = ( x(ξ, η), y(ξ, η), z(ξ, η) ) , (11.22)
where the parameters range over a connected open subset of the plane: (ξ, η) ∈ U ⊂ R 2 .
The tangent plane at a point x(ξ, η) ∈ S is spanned by the two tangent vectors
T T
∂x ∂y ∂z ∂x ∂y ∂z
xξ = , , , xη = , , ,
∂ξ ∂ξ ∂ξ ∂η ∂η ∂η
which are required to be linearly independent in order to avoid singularities; equivalently,
we require xξ × xη 6= 0, where × is used to denote the cross product in R 3 . The surface
We should also assume that the surface is simple, meaning that it does not self intersect,
b ηb ) whenever (ξ, η) 6= (ξ,
so that x(ξ, η) 6= x(ξ, b ηb ). However, since all our considerations
are local, this global condition can be ignored in the ensuing analysis; moreover, the
Implicit Function Theorem, [4, 35], implies that a surface is locally non-self-intersecting
nearby any nonsingular point. We will sometimes assume that the surface can be locally
identified with the graph of a function z = u(x, y), in which case we can parametrize S
by x(x, y) = (x, y, u(x, y))T ; this is referred to as a Monge patch on the surface. Again
by the Implicit Function Theorem, this is locally true provided the last component of n
is nonzero, xξ yη − xη yξ 6= 0; if one of the other components is non-zero, one can do the
same by relabelling the x, y, z coordinates.
The surface metric tensor , also known as the “First Fundamental Form”, is traditionally
denoted by
ds2 = dx · dx = E dξ 2 + 2 F dξ dη + G dη 2 , (11.24)
where dx = xξ dξ + xη dη, and hence
E = k xξ k2 , F = xξ · xη , G = k xη k2 . (11.25)
The mean and Gauss curvatures of S are given, respectively, by, cf. [24; Theorem 13.25],
eG − 2f F + g E eg − f2
H= , K= , (11.26)
2 (E G − F 2 ) EG − F2
where
e = xξξ · n, f = xξη · n, g = xηη · n, (11.27)
are the coefficients of the “Second Fundamental Form”. In particular, on a Monge patch,
†
In fact, there are two unit normals at each point, namely ± n, and the choice of one of them
induces an orientation on S. For example, if S is a closed surface, one might consistently choose
the unit outward normal. Non-orientable surfaces, like Möbius bands, do not admit an everywhere
smooth choice of normal.
†
Two metrics are said to be conformally equivalent if they measure identical angles, which
requires that they differ pointwise only by an overall scalar multiple.
‡
Finding the integrating factor for the first component automatically implies that it is also an
integrating factor for the second component.
and hence
(xλλ + xµµ ) · xλ = (xλλ + xµµ ) · xµ = 0.
Thus, the vector xλλ + xµµ is orthogonal to both tangent vectors, and hence must be a
scalar multiple of the unit normal:
ω (xλλ + xµµ ) · n
H= 2
= . (11.34)
2ρ 2 ρ2
In particular, if the surface is minimal, and hence H = 0, then (11.33, 34) imply
Theorem 11.5. Every minimal surface can be locally identified as the real part of a
minimal curve: x = Re z.
Remark : If z(ζ) defines a minimal curve, then one easily sees that so does e i t z(ζ) for any
t ∈ R, and hence any minimal curve defines a one-parameter family of minimal surfaces:
x(t; λ, µ) = Re e i t z(ζ) . (11.38)
Example 11.6. The function
z(ζ) = ζ − 13 ζ 3 , i ζ + 1
3 i ζ 3, ζ 2 has z ′ (ζ) = 1 − ζ 2 , i + i ζ 2 , ζ , (11.39)
and hence defines a minimal curve. The corresponding minimal surface is known as
Enneper’s surface and is one of a wide variety of classically studied examples. Its isother-
mal parametrization is given by
x(λ, µ) = Re z(λ + i µ) = − 31 λ3 + λ µ2 + λ, 31 µ3 − λ2 µ − µ, λ2 − µ2 . (11.40)
A part of it, corresponding to −2.4 ≤ λ, µ ≤ 2.4, is plotted in Figure 21. Enneper’s surface
is not simple, meaning that it has self-intersections, as the figure makes clear. Remark :
One can use formula (11.38) to construct a one-parameter family of Enneper surfaces,
but it turns out they can be mapped to each other by a rigid motion, and hence define
essentially the same surface.
Minimal curves were first investigated by the Norwegian mathematician Sophus Lie.
Weierstrass realized that they can be explicitly characterized as follows. Let f (ζ), g(ζ) be
scalar complex analytic functions defined on U ⊂ C. By direct calculation, the analytic
vector
T
w(ζ) = 12 f (ζ) 1 − g(ζ)2 , 21 i f (ζ) 1 + g(ζ)2 , f (ζ) g(ζ) (11.41)
where we integrate component-wise and can use any constants of integration, then z′ (ζ)
satisfies (11.37) and hence defines a minimal curve. In fact, most minimal curves can be
so represented. To see this, set z′ = (z1′ , z2′ , z3′ )T . Assuming z1′ − i z2′ 6= 0, then
z3′
f = z1′ − i z2′ , g= , (11.43)
z1′ − i z2′
recovers the Weierstrass representation (11.41). For example, the Enneper surface obtained
from the minimal curve (11.39) has Weierstrass representation give by f (ζ) = 2, g(ζ) = 21 ζ.
By selecting various complex functions f, g, one can thereby explicitly construct a wide
variety of interesting minimal surfaces through formulas (11.41, 42). See [24; Chapter 22]
for further details, including plots and Mathematica Notebooks for computing a variety
of interesting examples.
where ℘(ζ) is the Weierstrass elliptic function with moduli g2 = 4 e2 , g3 = 0, cf. [2, 41].
For the particular constant in the numerator of g, it can be proved that the Costa surface
is both simple and complete, meaning that it has no self-intersections and no boundary;
see Figure 22. (The soap film in Figure 3 is also a Costa surface.) Until its discovery only
a few such surfaces were known.
†
One could also try other powers in the fidelity term, but this has no appreciable effect on
performance.
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1.5 -1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
is not differentiable when p = q = 0, which is the cause of the singular denominator when
∇u = 0 in the Euler–Lagrange equation (11.50). These singularities causes both analytic
and numerical difficulties that interfere with the efficacy of the method. To avoid them,
one can regularize the TV functional, using
Z Z hp i
Jε [ u ] = k ∇u k2 + ε2 + 21 λ (u − f )2 dx dy, (11.51)
Ω
∇u
− div p + λ u = λ f, (11.52)
k ∇u k2 + ε2
or, in full,
− (u2y + ε2 )uxx + 2 ux uy uxy − (u2x + ε2 )uyy
+ λ u = λ f. (11.53)
(u2x + u2y + ε2 )3/2
Observe that if one rescales u 7−→ ε u then the initial term in the regularized functional
(11.51) is ε times the surface area functional (11.19), and so the second order terms in
the Euler–Lagrange equation (11.52) form a rescaled version of the minimal surface equa-
tion (11.20). Figure 23 shows the effect of TV denoising of a one-dimensional noisy signal
— the equation is obtained from (11.53) by eliminating all the y derivative terms; observe
how the sharp discontinuities (edges) are retained while the noise is smoothed out. Fig-
ure 24 shows the effect of TV denoising on an image of Vincent Hall, the home of the
†
These images are copied from Jeff Calder’s lecture notes, [ 11 ], with the author’s permission.
†
In particular, Abel was the first to prove that one could not solve a general scalar polynomial
equation, p(x) = 0, of degree ≥ 5 using algebraic operations: addition, subtraction, multiplication,
division, and taking roots.
Example 12.5. An evident example is provided by the usual linear action of the
general linear group GL(n), acting by matrix multiplication on column vectors, whereby
Φ(A, x) = A x for A ∈ GL(n) and x ∈ Rn . This action clearly induces a linear action
of any subgroup of GL(n, R) on R n . In fact most, but not all, Lie groups can be real-
ized as subgroups G ⊂ GL(n) of a general linear group of some order n. A linear action
of G is known as a representation, and representation theory is a well developed field of
mathematics with numerous applications throughout mathematics and physics, particu-
larly in quantum mechanics, [13, 54]. On the other hand, nonlinear actions are also of
great importance in applications, including Noether’s Theorems.
As noted above, for our purposes we need only consider the action of the connected one-
parameter group R, which acts on a space M as a one-parameter transformation group via
a map Φ: R × M → M satisfying‡
Φ(t, x) = x + t ξ(x) + · · · ,
and hence ξ(x) prescribes the “infinitesimal” motion of the point x under the group action
or flow. In particular, if ξ(x0 ) = 0, then x0 is a fixed point for the transformation group,
with Φ(t, x0 ) = x0 for all t ∈ R. In particular, if ξ(x) ≡ 0, then all points are fixed and
the action is trivial — the group consists of only the identity transformation.
Example 12.6. Let M = R 2 . The simplest nontrivial one-parameter transformation
group is the group of translations in the horizontal direction:
Φ t, (x, y) = (x + t, y). (12.6)
Differentiating, as in (12.5), we see that the corresponding vector field has ξ(x, y) = (1, 0).
It is easily verified that these translations, x(t) = x + t, y(t) = y, form the solution to the
dx dy
initial value problem = 1, = 0, with initial conditions x(0) = x, y(0) = y. More
dt dt
generally, translations in the fixed direction (a, b) ∈ R 2 are given by
Φ t, (x, y) = (x + t a, y + t b) with infinitesimal generator ξ(x, y) = (a, b).
The planar rotation group is given by
Φ t, (x, y) = (x cos t − y sin t, x sin t + y cos t), (12.7)
†
If the group is a local transformation group, the requirement only holds for those values of t
where defined.
g
7−→
x X
Differentiating with respect to t, setting t = 0, and using the chain rule, we deduce that
n
X ∂
v(I) = 0 where v= ξi (x) . (12.9)
i=1
∂xi
The expression on the right hand side is a convenient differential geometric notation for
the vector field defined by the function ξ(x) = (ξ1 (x), . . . , ξn (x)). Indeed, from here on we
will identify vector fields and first order linear partial differential operators as in (12.9).
Given a transformation group G acting on M , there is an induced action on subsets
S ⊂ M . Namely, we set g · S = { g · x | x ∈ S }. If g · S = S, we call g ∈ G a symmetry of
the subset S. In particular, the identity element e is a symmetry of every subset. Moreover,
the set of symmetries of a subset S is easily seen to be a subgroup of G, known as the
symmetry subgroup of S. In general, if every element g ∈ G is a symmetry of a subset
S, we call S a G-invariant subset. If I: M → R is an invariant function, then every level
set, Sc = { I(x) = c } for any c ∈ R, is a G-invariant subset. For the rotation group SO(2)
acting on the plane, the circles (and the origin) are the level sets of the radius invariant
and clearly rotationally invariant.
Action on Functions
We are particularly interested in the action of groups on the graphs of functions. In
the simplest case, consider a one-parameter transformation group action on the plane, so
M = R 2 . We use coordinates (x, u), regarding x as the independent variable and u as the
dependent variable. We identify a function u = f (x) with its graph Γf = (x, f (x)) ⊂
R 2 . The group transformations act on the graph as above, and we identify the transformed
graph g · Γf = Γg·f with the graph of the transformed function F = g · f ; see Figure 25.
In formulas, the group will act on R 2 plane via transformations of the form
X = A(t, x, u), U = B(t, x, u), (12.10)
where t ∈ R is the group parameter, while A, B are the components of Φ t, (x, u) . Each
point (x, f (x)) ∈ Γf in the graph of u = f (x) will thus be mapped to the point
X = A(t, x, f (x)), U = B(t, x, f (x)), (12.11)
With this in hand, let us investigate the action of one-parameter transformation groups
on variational problems. To begin, let us concentrate on the simplest case of a scalar first
order functional Z b
J[ u ] = L(x, u, u′ ) dx (12.24)
a
v∗ (L) − p L = 0.
Equation (12.32) produces the first integral
∂L (x − u p) F (x2 + u2 )
I(x, u, p) = x (x, u, p) + u H(x, u, p) = p ,
∂p 1 + p2
which is constant when u = f (x) and p = f ′ (x), where f is any solution to the Euler–
Lagrange equation. This effectively reduces the Euler–Lagrange equation to the first order
ordinary differential equation I x, f (x), f ′(x) = c where c is the constant of integration.
If one goes to polar coordinates x = r cos θ, u = r sin θ, and parametrizes the solution by
θ = h(r), then the equation reduces to
2 r 2 h′ (r)
p = c. (12.33)
1 + r 2 h′ (r)2
The latter equation can be explicitly integrated by solving for h′ (r). Indeed, in polar
coordinates, the original variational problem has the simpler Lagrangian
p
L(r, θ, q) = F (r) 1 + r 2 q 2 ,
where q represents θ ′ . This is independent of θ and hence the first integration method
produces the first integral (12.33). The fact that one can solve the resulting first order
ordinary differential equation by quadrature is a manifestation of Lie’s theory of integration
of ordinary differential equations possessing a one-parameter symmetry group, [42].
uα α
J = ∂J f (x) of a function u = f (x). We write (x, u
(n)
) for the complete collection of
i α
coordinates x and uJ for i = 1, . . . , p, α = 1, . . . , q, 0 ≤ #J ≤ n.
A function F (x, u(n) ) depending on the independent variables, the dependent variables,
and a finite number of their derivatives is known as a differential function. The order of F is
the highest order partial derivative it explicitly depend on. A system of partial differential
equations is thus defined by the vanishing of a collection of differential functions:
The total derivative with respect to the ith independent variable xi is the first order
differential operator
Xq X
∂ ∂
Di = Dxi = + uα
J,i , (12.36)
∂x i
α=1
∂uαJ
J
where uα α α
J,i = Di (uJ ) = uj1 ...jk i . The sum in (12.36) is over all symmetric multi-indices
J of arbitrary order. Even though Di involves an infinite summation, when applying the
total derivative to any particular differential function, only finitely many terms are needed.
Applying the total derivative Di to a differential function has the effect of differentiating
it with respect to xi treating the uα and its derivatives as functions of x1 , . . . , xn . Higher
order total derivatives are defined so that DJ = Dj1 · · · Djk for any symmetric multi-index
J = (j1 , . . . , jk ), 1 ≤ jν ≤ p. If J0 = ∅ is the empty multi-index with #J = 0, then,
by convention DJ0 is the identity map. If u = f (x) is a (smooth) solution to our system
of partial differential equations (12.35) then it also satisfies the differentiated equations
DJ Fj = 0 for all multi-indices J and all j = 1, . . . , k.
Let Ω ⊂ X denote a connected open set with piecewise smooth boundary ∂Ω. By an nth
order variational problem, we mean the problem of finding the extremals (minima and/or
maxima) of a functional
Z Z
(n)
J[ u ] = L(x, u ) dx = L(x, u(n) ) dx1 · · · dxp (12.37)
Ω Ω
over some space of functions u = f (x) for x ∈ Ω. The integrand L(x, u(n) ), which is a
smooth differential function on the jet space Jn , is referred to as the Lagrangian of the
Theorem 12.10. The smooth extremals u = f (x) of a variational problem with La-
grangian L(x, u(n) ) must satisfy the system of Euler–Lagrange equations obtained by ap-
plying the Euler operators to the Lagrangian:
X ∂L
Eα (L) = (−1)#J DJ = 0, α = 1, . . . , q. (12.39)
∂uαJ
J
Note that, as with the total derivatives, even though the Euler operator (12.38) is defined
using an infinite sum, for any given Lagrangian only finitely many summands are needed
to compute the corresponding Euler–Lagrange expressions E(L).
A general vector field on the space of independent and dependent variables takes the
form
p q
X ∂ X ∂
i
v= ξ (x, u) i + ϕα (x, u) α . (12.40)
i=1
∂x α=1
∂u
The induced flow or one-parameter transformation group x(t), u(t) = Φ t, (x, u) is
obtained by integrating the associated system of ordinary differential equations
dxi duα
= ξ i (x, u), = ϕα (x, u),
dt dt
subject to the initial conditions x(0) = x, u(0) = u.
The characteristic of the vector field (12.40) is the q-tuple Q(x, u(1) ) of first order dif-
ferential functions defined by
p
X
α (1) α ∂uα
Q (x, u ) = ϕ (x, u) − ξ i (x, u) , α = 1, . . . , q. (12.41)
i=1
∂xi
The evolutionary form of v is the generalized vector field
q
X ∂
vQ = Qα . (12.42)
α=1
∂uα
The prolongation of vQ to the derivative coordinates uα
J is then given by
q
X X
∗ = ∂
vQ DJ Qα , (12.43)
α=1
∂uαJ
J
where T is referred to as the density while X = (X1 , . . . , Xp−1 ) is the associated flux . In
this case, (12.49) combined with the Divergence Theorem implies
Z Z
d (n)
T (t, x, u ) dx = X(t, x, u(n)) · n dS. (12.50)
dt Ω ∂Ω
where the Euler–Lagrange expressions associated with L(x, u(n) ) are given in (12.39) and
the divergence terms resulting from the integrations by parts are collected together in
the p-tuple C. Substituting back into (12.51, 52), and rearranging terms, we deduce the
general Noether identity
are the components of what is called the characteristic of the conservation law. Substi-
tuting back into the Noether identity (12.55), we deduce that if vQ is the evolutionary
vector field corresponding to the q-tuple in (12.56), then it defines a variational symmetry
of the variational problem, meaning that is satisfies (12.53) for some B. We have thus
established Noether’s First Theorem.
Theorem 12.11. Every variational symmetry gives a conservation laws of the Euler–
Lagrange equation and, conversely, every conservation law comes from such a symmetry.
Remark : Higher order conservation laws produce higher order generalized symmetries.
Their existence is a hallmark of the integrability of the Euler–Lagrange equations, [42].
Example 12.12. Consider the one-dimensional unforced wave equation
utt − c2 uxx = 0, (12.57)
for the displacement u(t, x) as a function of the time and the single spatial coordinate.
Here, the wave speed c > 0 is assumed to be constant, whereby (12.57) models the unforced
propagation of (small amplitude) waves in a one-dimensional homogeneous elastic medium.
The wave equation is the Euler–Lagrange equation for the variational problem
ZZ
1 2 1 2 2
J[ u ] = u
2 t − 2 c u x dx dt,
representing the difference between the kinetic and potential energies. In this case, we
ignore details on the domain of integration, which could be all of R 2 , and boundary con-
ditions.
A variety of conservation laws can be constructed using the Noether machinery. First,
invariance of the Lagrangian with respect to time translation, which has infinitesimal
generator ∂t and corresponding characteristic Q = − ut produces the conservation law†
Dt 12 u2t + 12 c2 u2x − Dx c2 ux ut = ut (utt − c2 uxx ) = 0
†
In most cases, we use − Q in the conservation law to avoid additional minus signs.
Dt T + Dx X = (x ut + t ux ) (utt − c2 uxx ) = 0,
where
T = 1
2 t 1
2 u2t + 21 c2 u2x + x ux ut , X = − 21 x 1
2 u2t + 12 c2 u2x − x2 t ux ut .
The integral of T equals t times the energy integral plus the center of mass integral and, in
the absence of boundary fluxes, implies that the center of mass of the disturbance moves
as an affine function of time: C(t) = E t + C0 .
The above are but four of an infinite family of conservation laws of the wave equation;
see [42] for additional examples and their classification.
acceleration vector. For three-dimensional motion, each point mass has three independent
coordinate positions, and so n = 3 k, where k is the number of individual masses, and the
positions and masses are labelled accordingly.
The Lagrangian approach to classical Newtonian mechanics requires one to determine
the critical functions of a first order variational problem
Z b
(13.1)
I[ q ] = L(t, q, q) dt,
a
d ∂L ∂L
Ei (L) = − + = 0, i = 1, . . . , n. (13.2)
dt ∂ q i ∂qi
In this dynamical context, the solutions are typically specified by initial conditions rather
than boundary conditions.
In particle dynamics, the Lagrangian is the difference of kinetic and potential energy:
n
1 X
mi q 2i − V (t, q),
L= (13.3)
2 i=1
where mi is the mass associated with the particle coordinate qi . The Euler-Lagrange
equations (13.2) are just Newton’s laws F = m a:
d2 qi ∂V
mi 2
+ = 0, i = 1, . . . , n, (13.4)
dt ∂qi
relating the accelerations to the forces induced by the potential energy function. Observe
that, because the Lagrangian is the difference in energies, we do not expect the solutions
to the Euler–Lagrange equations, i.e., the particle trajectories, to minimize or maximize
the variational problem, hence their status as critical functions and not extremizers.
For example, the two body problem in three-dimensional space governs the gravitational
interactions between two point masses — e.g., the (idealized) earth and its moon — that
are moving in space. Here q(t) = (q1 (t), . . . , q6 (t)) has six components, where (q1 , q2 , q3 )
represent the x, y, z coordinates of the first body, while (q4 , q5 , q6 ) represent the x, y, z
coordinates of the second body. Consequently, m1 = m2 = m3 = M1 equal the mass of the
first, while m4 = m5 = m6 = M2 equal the mass of the second. According to Newtonian
gravitation, the potential function is proportional to the product of the masses divided by
the distance between them:
G M1 M2
V (t, q) = − p
(q1 − q4 )2 + (q2 − q5 )2 + (q3 − q5 )2
where G ≈ 6.674 × 1011 m3 kg −1 s−2 is the universal gravitational constant. The partial
derivatives of the potential V appearing in the Newtonian system (13.4) produce the inverse
square law of gravitation attraction in the non-relativistic, flat universe.
Hamiltonian Systems
The alternative approach to classical mechanics is due to Hamilton, who first studied
optics before extending his methods to all of mechanics. The starting point is to define
the (generalized) momenta associated with the Lagrangian in (13.1). These are given by
then the Implicit Function Theorem, [4, 35], allows us to locally uniquely solve (13.5) for
q as a function of t, q, p:
q i = ϕi (t, q, p). (13.8)
The map between the Lagrangian variables (t, q, q) and the Hamiltonian variables (t, q, p)
Since p2i /mi = mi q 2i , the initial summation is again the kinetic energy, and hence H is the
sum of kinetic and potential energy, i.e., the total energy, whereas the Lagrangian L is the
difference of the two.
We now need to compute the partial derivatives of the Hamiltonian function; the easiest
way to see this is to use differentials. Using formula (13.9), we compute
∂H ∂H ∂H
dH = dt + dqi + dp . (13.11)
∂t ∂qi ∂pi i
†
We are changing our earlier notation; from here on p will represent a momentum, and not the
derivative of q.
where n denotes the index of refraction. We cannot apply the usual Hamiltonian theory
since the Lagrangian is degenerate: det(∂ 2 L/∂ qi ∂ q j ) ≡ 0. This is in essence because the
solutions are the curves followed by the light rays, and hence are unaffected by reparametri-
zation. In other words, the variational principle is parameter-independent, and, as we saw
in Example 5.3, there is a dependency among the associated Euler–Lagrange equations.
a a
Now the horizontal coordinate x plays the role of time. The Euler-Lagrange equation of
this variational problem is
!
d n(x, y) y ∂n p
− p + 1 + y 2 = 0, (13.16)
dx 1+y
2
∂y
where now means d/dx. To compute the Hamiltonian form of these equations, the
Lagrangian is p
L(x, y, y) = n(x, y) 1 + y 2 , (13.17)
hence
∂L ny p
p= = p , which can be explicitly inverted: y=p
∂y 1 + y2 n2 − p2
Therefore, the Hamiltonian is
p
H(x, p, y) = p y − L = − n(x, y)2 − p2 (13.18)
∂H n ∂n/∂y ∂H p
p=− =p , y= =p ,
(13.19)
∂y n2 − p2 ∂p n − p2
2
The field of Hamiltonian dynamics is vast, and we refer the reader to [1, 5, 17, 23, 42, 56]
for further developments.
Hamilton–Jacobi Theory
There is an intimate connection between first order partial differential equations and
systems of first order ordinary differential equations, [10, 17, 43]. The two subjects are,
in essence, equivalent, and a complete solution to one gives the complete solution to the
other. This relationship, going back to Hamilton, Jacobi, and others in the first half of
the nineteenth century, lies at the heart of wave/particle duality, and the interconnections
between classical and quantum mechanics.
We begin with a first order variational problem
Z t1
L(t, q, q) dt
t0
where γ is the extremal curve contained in the field that connects (t0 , q0 ) to the point
(t, q) ∈ U . It is not hard to see that S is a continuously differentiable function of its
arguments. Moreover it satisfies a very important first order partial differential equation.
Theorem 13.3. The action function is a solution to the Hamilton–Jacobi equation:
∂S ∂S
+ H t, q, = 0, (13.21)
∂t ∂q
where H(t, q, p) is the Hamiltonian (13.9) associated with the variational problem.
Proof : This requires us to compute the partial derivatives of S, which we do by varying
its arguments. To avoid confusing the endpoints with the integration variables, let us set
Z b+ε η
L(t, q + ε ϕ, q + ε ϕ) dt,
h(ε) = S(b + ε η, β + ε ϑ) =
a
where the value of the variation in q at the varied endpoint gives the variation in β:
Remark : We will use the mathematical convention for spherical coordinates, cf. [43],
where − π < θ ≤ π is the azimuthal angle or longitude, while 0 ≤ ϕ ≤ π is the zenith angle
or co-latitude, whereby
x = r sin ϕ cos θ, y = r sin ϕ sin θ, z = r cos ϕ. (13.27)
In many books, particularly those in physics, the roles of ϕ and θ are reversed , leading to
much confusion when one is perusing the literature.
Example 13.5. In two-dimensional geometric optics, as we have seen, the Hamiltonian
function (13.18) is p
H(x, y, p) = − n(x, y)2 − p2
with x playing the role of t. The Hamilton–Jacobi equation becomes
s 2
∂S ∂S
− n(x, y)2 − = 0.
∂x ∂y
Characteristics
The physical motivation for the theory of characteristics comes from geometrical optics,
where the level sets { S = c } of the action function are the wavefronts where the light
waves have constant phase, [8]. The corresponding characteristics are just the paths
followed by the light rays or the photons. This principle is at the heart of Hamilton’s
“optical-mechanical” analogy, and, ultimately, the basis of the wave/particle duality in
quantum mechanics.
In general, consider a first order partial differential equation
S = { x | h(x) = 0 }
as the zero locus of a smooth scalar-valued function h with ∇h 6= 0 on S. Assuming
without loss of generality that ∂h/∂xn 6= 0 we can locally introduce new coordinates
The preceding discussion serves as motivation for the following crucial definition.
Definition 13.6. An n-tuple of numbers ξ = (ξ1 , . . . , ξn ) determines a characteristic
direction for the partial differential equation F (x, u, p) = 0 at the point (x, u, p) if
ξ · Fp (x, u, p) = 0.
∂f
u(s) = f (x(s)), p(s) = (x(s)).
∂x
We can use the chain rule to compute how the u and p components of the prolonged curve
depend on s. Thus, in view of (13.32),
n n
du X ∂f dxi X ∂F ∂F
= =λ pi = λp .
ds i = 1 ∂xi ds i=1
∂pi ∂p
and also,
dpi X ∂p dx X ∂ 2 f ∂F
i k
= =λ .
ds ∂xk ds ∂xi ∂xk ∂pk
k k
At this stage it appears that we also need to know how the second derivatives of the
solution behave. However, u = f (x) is assumed to be a solution, so it also satisfies
∂ ∂F ∂F ∂f X ∂F ∂ 2 f
0= F x, f (x), fx(x) = + + .
∂xi ∂xi ∂u ∂xi ∂pk ∂xi ∂xk
k
In general, by standard existence and uniqueness results for ordinary differential equa-
tions, given a point (x0 , u0 , p0 ) ∈ R 2 n+1 , there is a unique characteristic curve passing
through it. Moreover, by the preceding calculations, if u0 = f (x0 ), p0 = fx (x0 ) so that
(x0 , u0 , p0 ) belongs to the prolonged graph of a solution u = f (x) to the partial differential
equation (13.28), then the characteristic curve passing through (x0 , u0 , p0 ) is contained in
the prolonged graph. Since the characteristic curves are uniquely determined by their ini-
tial conditions, we deduce that every solution to our partial differential equation is swept
out by an (n − 1)–parameter family of characteristic curves, parametrized by the initial
values of u and p on the (n − 1)-dimensional non-characteristic Cauchy surface.
If F doesn’t depend on u, then the characteristic equations (13.33) are essentially the
same as Hamilton’s equations for the time-independent Hamiltonian F = H(q, p) (with
x = q) since u can be determined from x, p by a single quadrature. For the time-dependent
Hamilton–Jacobi equation
∂S ∂S
F (t, q, S, St, Sq ) ≡ + H t, q, = 0, (13.34)
∂t ∂q
we replace S by u to find the corresponding equations for characteristic curves
dt dq ∂H dp ∂H dπ ∂H du ∂H
= 1, = , =− , =− , =π+p , (13.35)
ds ds ∂p ds ∂q ds ∂t ds ∂p
where π represents St . Thus t = s + c, and, after we solve the Hamiltonian system for q, p,
we can recover the complete expression for the characteristic curves by quadrature. The as-
sociated characteristic curves (solutions to Hamilton’s equations) are found by integrating
the first order system
dq ∂H ∂S
= t, q, (q, t) (13.36)
dt ∂p ∂q
†
s is not necessarily arc length.
∂S
p(t) = (t, q(t)).
∂q
to get the corresponding momenta. The solution q(t) to (13.36) will describe the particle
trajectories in physical space.
As another example, suppose that we know a solution S(q) to the eikonal equation
k ∇S k2 = n(q) (13.37)
of geometrical optics. Then the associated light rays, which follow the characteristics, are
found by integrating
dq
= 2 ∇S(q),
dt
The functions R(r), Φ(ϕ) satisfy the pair of ordinary differential equations
2 2
1 dR −2 dΦ α2
+ V (r) + β = γ r , + + γ = 0,
2m dr dϕ sin2 ϕ
The derivation of the equations for geometric optics from those of wave optics pro-
vides the key to Schrödinger’s establishment of the basic equation of quantum mechanics,
and also the classical limit of quantum mechanical phenomena. The basic framework is
encapsulated in the following diagram.
❄ ❄
Classical ✛ Quantum
Mechanics Mechanics
As we will see, just as the equation of geometric optics, which govern the motion of
photons, can be derived as the high frequency limit of the equations of wave mechanics, so
the equation of classical mechanics appear as the high frequency (or, equivalently, small
Planck’s constant) limit of the equations of quantum mechanics. This limiting procedure
is well defined mathematically; On the other hand, going in the reverse direction is not,
since many equations can have the same high frequency limit. For example, the equations
of geometric optics do not uniquely prescribe the equations of wave mechanics. Nor do the
equations of classical mechanics uniquely prescribe the fundamental Schrödinger equation
of quantum mechanics, and the process of “quantization”, that is going from classical to
quantum mechanics, is more of an art than a well defined mathematical process.
It is fascinating that Hamilton, in the first part of the nineteenth century, had three
of the four boxes in the preceding diagram well in hand. He knew how to frame the
equations of classical mechanics in analogy with those of geometric optics through the
Hamiltonian formulation. But he failed to fill in the remaining box — quantum mechanics
— not because he did not have the required mathematical tools, but because there was
not reason at that time to introduce a “wave theory of matter”. But if he had ignored the
missing physical underpinnings, he might well have discovered the Schrödinger equation
70 or so years earlier!
The Wave Equation
Let us begin by describing the connections between wave mechanics and geometric optics.
For simplicity, we first consider the scalar wave equation
c2
ϕtt − ∆ϕ = 0 (14.1)
n2
in an inhomogeneous medium, where t is time, x = (x1 , . . . , xk ) are the spatial coordinates
(and so k = 3 for the physical universe), and n(x) is the index of refraction. In the
homogeneous case where n is constant, the solutions to this problem are superpositions of
plane waves
ϕ = A e i (k·x−ω c t) ,
ϕ(t, x) = u(x) e i ω c t .
This allows us to factor out the t dependence in the wave equation, implying that u satisfies
the Helmholtz equation
∆u + n2 ω 2 u = 0. (14.2)
We seek complex solutions of the form
k ∇S k2 = n2 , (14.6)
which is the eikonal equation we already encountered in (13.37). It says that the hyper-
surfaces of constant phase { S = c } are the same as the characteristic hypersurfaces for
the wave equation. If we interpret S as the action function, then the eikonal equation is
the same as the Hamilton–Jacobi equation (13.21) for the geometric optics Hamiltonian.
Thus, the phase surfaces propagate along the characteristics, which are just the solutions
to Hamilton’s equations.
The next term, of order ω, says that the leading amplitude A(x) will satisfy the transport
equation
2 ∇S · ∇A + A ∆S = 0. (14.7)
Note that if A(q) = 0, so there is zero amplitude to leading order, then A = 0 along the
entire characteristic emanating from q. Therefore, in the first approximation, the solutions
to the wave equation are concentrated on the characteristics; this reflects the fact that
waves and signals propagate along characteristics.
High Frequency Limit and Quantization
In general, suppose we have a linear partial differential equation
F [ ψ ] = 0, (14.8)
depending on a large parameter ω. The differential operator F can be written as
F = Fb(x, i ∂; ω)
where Fb(x, p; ω) is a smooth function, which is a polynomial in the derivative coordinate
p = i ∂. We use ∂ = (∂1 , . . . , ∂n ) to denote the derivative operators ∂j = ∂/∂xj , j =
1, . . . , n. There is a problematic ambiguity in this representation, since we have to specify
the order of the derivatives and the function. For instance, if Fb (x, p) = x p, then there is
a question as to whether this should represent the differential operator i x ∂ or i ∂ · x =
i x ∂ + i . For convenience, we adopt the “normal ordering” convention that the derivatives
always appear last, so x p corresponds to the differential operator i x ∂. (However, this will
come back to haunt us later.)
In the high frequency limit, we make the ansatz
ψ(x, ω) = A(x, ω) e i ωS(x,ω) ,
where A and S are real, A having the usual asymptotic expansion (14.5) in decreasing
powers of ω. In order to determine the analogue of the eikonal equation, it is helpful to
rewrite the operator in the form
1
F = F x, ∂, ω .
iω
E = ~ω (14.10)
that relates frequency and energy, which we take to be fixed. The resulting equation
~2
H=− ∆ + V (r), (14.13)
2m
where ∆ is the ordinary Laplacian. The Schrödinger equation is
~2
i ~ ψt = − ∆ψ + V (r) ψ. (14.14)
2m
In the case V (r) = − e2 /r, where e is the charge on an electron, we are in the situation
of the quantum mechanical hydrogen atom, meaning a single electron circling a single
(heavy) proton. The Hamilton–Jacobi equation governs the leading order asymptotics,
i.e., the solution to the eikonal equation.
A Word of Warning: The Schrödinger equation looks a lot like the heat equation, but
the complex factor i makes it of an entirely different character. It is, in fact, a dispersive
hyperbolic partial differential equation, not a dissipative parabolic equation. One way to
see the difference right away is to look at the norm of the solution† k ψ k2 = ψ ψ. For the
one-dimensional Schrödinger equation
~2
i ~ ψt = − ψ + V (x) ψ, (14.15)
2 m xx
we have
∂ ~ ~
k ψ k 2 = ψ ψ t + ψt ψ = ψ ψ − V (x) ψ + − ψ + V (x) ψ ψ
∂t 2 i m xx 2 i m xx
~ ∂
= ψ ψ x − ψ ψx .
2 i m ∂x
Therefore, assuming that that ψ(x) and its derivative ψ ′ (x) tend to 0 as x → ∞, its L2
norm is constant:
Z
d 2 d ∞ 2 ~ ∞
kψk = k ψ k dx = ψ ψ x − ψx ψ = 0.
dt dt −∞ 2im −∞
†
We use an overbar to denote complex conjugate throughout.
i ~ ψt = H[ ψ ]. (14.18)
Assume that the Hamiltonian operator H is independent of t. Then we can separate
variables by setting
b e i ωt ,
ψ(t, x) = ψ(x)
leading to the time independent form of the Schrödinger equation
H[ ψb ] = ~ ω ψb = E ψ.
b (14.19)
where, in view of Einstein’s relation (14.10), E denotes the energy of the system. Thus,
(14.19) implies that the energy E is an eigenvalue of the corresponding Hamiltonian oper-
ator H. There is thus an intimate connection between the possible energies of a physical
system, and the spectrum of the corresponding Hamiltonian operator. We assume (for
the time being) that the solutions of the time-independent Schrödinger equation must be
smooth and bounded over all space. In the particular case of the hydrogen atom, or a
more general particle in a central force field V (r), with V → 0 as r → ∞, the spectrum of
H = −∆ + V (r)
consists of two parts:
(i ) The discrete spectrum, E < 0, which consists of a finite number of negative eigenvalues
corresponding to bound states. The associated eigenfunctions ψ are in L2 , and,
in particular, ψ → 0 as r → ∞.
(ii ) The continuous spectrum, E > 0, where the associated eigenfunction ψ no longer goes
to zero as r → ∞, but rather its asymptotic behavior is like that of a plane wave
e i k·x . These correspond to scattering states.
The key difference between classical mechanics and quantum mechanics is that in clas-
sical mechanics, the energy can take on any positive value, but in quantum mechanics, the
bound state energies are quantized, i.e. they can only take on discrete values. The investi-
gation of the spectrum of Hamiltonian operators is the fundamental problem of quantum
mechanics.
Finally, we list a few basic references on quantum mechanics that the reader may prof-
itably study: [19, 33, 37, 44, 52].
References
[1] Abraham, R., and Marsden, J.E., Foundations of Mechanics, 2nd ed., The
Benjamin–Cummings Publ. Co., Reading, Mass., 1978.
[2] Ahlfors, L., Complex Analysis, McGraw–Hill, New York, 1966.
[3] Antman, S.S., Nonlinear Problems of Elasticity, Appl. Math. Sci., vol. 107,
Springer–Verlag, New York, 1995.
[4] Apostol, T.M., Mathematical Analysis, 2nd ed., Addison–Wesley Publ. Co., Reading,
Mass., 1974.
[5] Arnol’d, V.I., Mathematical Methods of Classical Mechanics, Graduate Texts in
Mathematics, vol. 60, Springer–Verlag, New York, 1978.
[6] Ball, J.M., and James, R.D., Fine phase mixtures as minimizers of energy, Arch. Rat.
Mech. Anal. 100 (1987), 13–52.
[7] Bloch, A.M., Nonholonomic Mechanics and Control, 2nd ed., Interdisciplinary Applied
Mathematics, vol. 24, Springer–Verlag, New York, 2003.
[8] Born, M., and Wolf, E., Principles of Optics, Fourth Edition, Pergamon Press, New
York, 1970.
[9] Boyce, W.E., and DiPrima, R.C., Elementary Differential Equations and Boundary
Value Problems, 7th ed., John Wiley, New York, 2001.
[10] Carathéodory, C., Calculus of Variations and Partial Differential Equations of the
First Order, Parts I, II, Holden-Day, New York, 1965, 1967.
[11] Calder, J., The Calculus of Variations, Lecture Notes, University of Minnesota, 2020.
https://www.math.umn.edu/∼jwcalder/CalculusOfVariations.pdf
[12] Chan, T.F., and Shen, J., Image Processing and Analysis. Variational, PDE, Wavelet,
and Stochastic methods, SIAM, Philadelphia, PA, 2005.
[13] Chen, J.Q., Group Representation Theory for Physicists, World Scientific, Singapore,
1989.
[14] Cioranescu, D., and Donato, P., An Introduction to Homogenization, Oxford
University Press, Oxford, 1999.
[15] Costa, C.J., Example of a complete minimal immersion in R 3 of genus one and three
embedded ends, Bol. Soc. Brasil. Mat. 15 (1984), 47–54.
[16] Courant, R., Differential and Integral Calculus, vol. 2, Interscience Publ., New York,
1936.
[17] Courant, R., and Hilbert, D., Methods of Mathematical Physics, Interscience Publ.,
New York, 1953.
[18] Dacorogna, B., Introduction to the Calculus of Variations, Imperial College Press,
London, 2004.