0% found this document useful (0 votes)
3 views

calc.of var. notes

The document discusses the calculus of variations, a mathematical framework for solving optimization problems in infinite-dimensional function spaces, with applications in various fields such as physics and engineering. It outlines key concepts, including the Euler-Lagrange equation, variational problems, and boundary conditions, while also providing historical context and examples of variational problems. The text emphasizes the importance of rigorous mathematical proofs in establishing solutions to these optimization challenges.

Uploaded by

osamabahadali
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

calc.of var. notes

The document discusses the calculus of variations, a mathematical framework for solving optimization problems in infinite-dimensional function spaces, with applications in various fields such as physics and engineering. It outlines key concepts, including the Euler-Lagrange equation, variational problems, and boundary conditions, while also providing historical context and examples of variational problems. The text emphasizes the importance of rigorous mathematical proofs in establishing solutions to these optimization challenges.

Uploaded by

osamabahadali
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 149

The Calculus of Variations

Peter J. Olver
School of Mathematics
University of Minnesota
Minneapolis, MN 55455
[email protected]
http://www.math.umn.edu/∼olver

Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Examples of Variational Problems . . . . . . . . . . . . . . . . . . 3
Minimal Curves, Optics, and Geodesics . . . . . . . . . . . . . . . 4
Minimal Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . 7
Minimum Energy Principles . . . . . . . . . . . . . . . . . . . . 9
Isoperimetric Problems and Constraints . . . . . . . . . . . . . . . 10
3. The Euler–Lagrange Equation . . . . . . . . . . . . . . . . . . . 11
The First Variation . . . . . . . . . . . . . . . . . . . . . . . . 12
Curves of Shortest Length — Planar Geodesics . . . . . . . . . . . . 15
The Brachistochrone Problem . . . . . . . . . . . . . . . . . . . 16
Minimal Surface of Revolution . . . . . . . . . . . . . . . . . . . 20
The Fundamental Lemma . . . . . . . . . . . . . . . . . . . . . 22
A Cautionary Example . . . . . . . . . . . . . . . . . . . . . . 23
4. Boundary Conditions and Null Lagrangians . . . . . . . . . . . . . 25
Natural Boundary Conditions . . . . . . . . . . . . . . . . . . . 25
Null Lagrangians . . . . . . . . . . . . . . . . . . . . . . . . . 29
General Boundary Conditions . . . . . . . . . . . . . . . . . . . 30
Problems with Variable Endpoints . . . . . . . . . . . . . . . . . 34
5. Variational Problems Involving Several Unknowns . . . . . . . . . . 37
Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . 39
Parametric Variational Problems . . . . . . . . . . . . . . . . . . 40
6. Second Order Variational Problems . . . . . . . . . . . . . . . . . 43

1/7/22 1 c 2022 Peter J. Olver


7. Variational Problems with Constraints . . . . . . . . . . . . . . . 47
Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . 51
Justification of the Method of Lagrange Multipliers . . . . . . . . . . 55
Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . 61
Nonholonomic Mechanics . . . . . . . . . . . . . . . . . . . . . 64
8. The Second Variation . . . . . . . . . . . . . . . . . . . . . . . . 69
The Legendre Condition . . . . . . . . . . . . . . . . . . . . . . 72
Envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Conjugate Points . . . . . . . . . . . . . . . . . . . . . . . . . 77
9. Weak and Strong Extremals . . . . . . . . . . . . . . . . . . . . 81
Extremals with Corners . . . . . . . . . . . . . . . . . . . . . . 84
10. The Royal Road . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Fields of Extremals . . . . . . . . . . . . . . . . . . . . . . . . 87
Equivalent Functionals . . . . . . . . . . . . . . . . . . . . . . 89
The Excess Function . . . . . . . . . . . . . . . . . . . . . . . 92
11. Multi-dimensional Variational Problems . . . . . . . . . . . . . . . 95
The First Variation and the Euler–Lagrange Equations . . . . . . . . 95
Minimal Surfaces . . . . . . . . . . . . . . . . . . . . . . . . 100
Image Processing . . . . . . . . . . . . . . . . . . . . . . . . 106
Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . 110
12. Noether’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 111
Lie Groups and Symmetry Groups . . . . . . . . . . . . . . . . 112
Transformation Groups and Their Infinitesimal Generators . . . . . . 114
Action on Functions . . . . . . . . . . . . . . . . . . . . . . . 117
Variational Symmetries and First Integrals . . . . . . . . . . . . . 120
The General Case . . . . . . . . . . . . . . . . . . . . . . . . 122
Conservation Laws and Noether’s Identity . . . . . . . . . . . . . 125
13. Lagrangian and Hamiltonian Mechanics . . . . . . . . . . . . . . 128
Hamiltonian Systems . . . . . . . . . . . . . . . . . . . . . . 129
Hamilton–Jacobi Theory . . . . . . . . . . . . . . . . . . . . . 132
Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 135
14. Geometric Optics and Wave Mechanics . . . . . . . . . . . . . . 139
The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . 140
High Frequency Limit and Quantization . . . . . . . . . . . . . . 142
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

1/7/22 2 c 2022 Peter J. Olver


1. Introduction.
Minimization and maximization principles form one of the most wide-ranging means
of formulating mathematical models governing the equilibrium configurations of physical
systems. In these notes, we will develop the basic mathematical analysis of nonlinear
optimization principles on infinite-dimensional function spaces — a subject known as the
“calculus of variations”, for reasons that will be explained as soon as we present the basic
ideas. The mathematical techniques that have been developed to handle such optimization
problems are fundamental in many areas of mathematics and its applications.
The history of the calculus of variations is tightly interwoven with the history of math-
ematics, [27]. The field has drawn the attention of a remarkable range of mathematical
luminaries, beginning with Newton and Leibniz, then initiated as a subject in its own right
by the Bernoulli brothers Jakob and Johann. The initial major developments appeared in
the work of Euler, Lagrange, and Legendre. In the nineteenth century, Hamilton, Jacobi,
Dirichlet, Weierstrass, and Hilbert are the leading contributors. In modern times, the
calculus of variations has continued to occupy center stage, witnessing major theoretical
advances coupled with wide-ranging applications in physics, engineering, and throughout
mathematics.
Just as the vanishing of the gradient of a function of several variables singles out the
critical points, among which are the minima, both local and global, so a similar “func-
tional gradient” will distinguish the candidate functions that might be minimizers of the
functional. The finite-dimensional calculus leads to a system of algebraic equations for
the critical points; the infinite-dimensional functional analog results a boundary value
problem for a nonlinear ordinary or partial differential equation whose solutions are the
critical functions for the variational problem. The second derivative test for distinguishing
minimizers, maximizers, and other types fo critical functions is formalized through the
so-called second variation, which is considerably more subtle than the finite-dimensional
counterpart.
Minimization problems that can be analyzed by the calculus of variations serve to char-
acterize the equilibrium configurations of almost all continuous physical systems, ranging
through elasticity, solid and fluid mechanics, electro-magnetism, gravitation, quantum me-
chanics, string theory, and many, many others. Many geometrical configurations, such as
geodesics and minimal surfaces, can be conveniently formulated as optimization problems.
Moreover, numerical approximations to the equilibrium solutions of such boundary value
problems can be based on a nonlinear finite element approach that reduces the infinite-
dimensional minimization problem to a finite-dimensional problem, cf. [43; Chapter 11].
For further details on the calculus of variations, we refer the reader to a variety of
standard textbooks, including [17, 18, 20, 21, 22, 31].

2. Examples of Variational Problems.


The best way to appreciate the calculus of variations is by introducing a few concrete
examples of both mathematical and practical importance. Some of these minimization
problems played a key role in the historical development of the subject. And they still
serve as an excellent means of learning its basic constructions.

1/7/22 3 c 2022 Peter J. Olver


Figure 1. The Shortest Path is a Straight Line.

Minimal Curves, Optics, and Geodesics


The minimal curve problem is to find the shortest path between two specified locations.
In its simplest manifestation, we are given two distinct points

a = (a, α) and b = (b, β) in the plane R2, (2.1)


and our task is to find the curve of shortest length connecting them. “Obviously”, as you
learn in childhood, the shortest route between two points is a straight line; see Figure 1.
Mathematically, then, the minimizing curve should be the graph of the particular affine
function†
β−α
y = cx + d = (x − a) + α (2.2)
b−a
that passes through or interpolates the two points. However, this commonly accepted
“fact” — that (2.2) is the solution to the minimization problem — is, upon closer inspec-
tion, perhaps not so immediately obvious from a rigorous mathematical standpoint.
Let us see how we might formulate the minimal curve problem in a mathematically
precise way. For simplicity, we assume that the minimal curve is given as the graph of
a smooth function y = u(x). Then, the length of the curve is given by the standard arc
length integral
Z bp
J[ u ] = 1 + u′ (x)2 dx, (2.3)
a

where we abbreviate u′ = du/dx. The function u(x) is required to satisfy the boundary
conditions
u(a) = α, u(b) = β, (2.4)
in order that its graph pass through the two prescribed points (2.1). The minimal curve
problem asks us to find the function y = u(x) that minimizes the arc length functional
(2.3) among all “reasonable” functions satisfying the prescribed boundary conditions. The
reader might pause to meditate on whether it is analytically obvious that the affine function


We assume that a 6= b, i.e., the points a, b do not lie on a common vertical line.

1/7/22 4 c 2022 Peter J. Olver


(2.2) is the one that minimizes the arc length integral (2.3) subject to the given boundary
conditions. One of the motivating tasks of the calculus of variations, then, is to rigorously
prove that our everyday intuition is indeed correct.
Indeed, the word “reasonable” is important. For the arc length functional (2.3) to be
defined, the function u(x) should be at least piecewise C1 , i.e., continuous with a piecewise
continuous derivative. Indeed, if we were to allow discontinuous functions, then the straight
line (2.2) would not, in most cases, provide the minimizer. (Why?) Moreover, continuous
functions which are not piecewise C1 need not have a well-defined arc length. The more
seriously one thinks about these issues, the less evident the “obvious” solution becomes.
But before you get too worried that your intuition is leading you astray, rest assured that
the straight line (2.2) is indeed the true minimizer. However, a fully rigorous proof of
this fact requires a careful development of the mathematical machinery of the calculus of
variations.
A closely related problem arises in geometrical optics. The underlying physical principle,
first formulated by the seventeenth century French mathematician Pierre de Fermat, is
that, when a light ray moves through an optical medium, it travels along a path that
minimizes the travel time. As always, Nature seeks the most economical† solution. In
an inhomogeneous planar optical medium, the speed of light, v(x, y), varies from point to
point, depending on its optical properties. Speed equals the time derivative of distance
traveled, namely, the arc length of the curve y = u(x) traced by the light ray. Thus,
ds p dx
v(x, u(x)) = = 1 + u′ (x)2 .
dt dt
Integrating from start to finish, we conclude that the total travel time along the curve is
Z T Z b Z bp
dt 1 + u′ (x)2
T[u] = dt = dx = dx. (2.5)
0 a dx a v(x, u(x))
Fermat’s Principle states that, to get from point a = (a.α) to point b = (b, β), the light
ray follows the curve y = u(x) that minimizes this functional subject to the same fixed
boundary conditions (2.4). If the medium is homogeneous, e.g., a vacuum, then v(x, y) ≡ c
is constant, and T [ u ] is a multiple of the arc length functional (2.3), whose minimizers are
the “obvious” straight lines traced by the light rays. In an inhomogeneous medium, the
path taken by the light ray is no longer evident, and we are in need of a systematic method
for solving the minimization problem. Indeed, all of the known laws of geometric optics,
including focusing, refraction, aberrations, lens design, and so on, [8], will be consequences
of the geometric properties of solutions to Fermat’s minimization principle.
Another minimization problem of a similar ilk is to construct the geodesics on a curved
surface, meaning the curves of minimal length lying therein. Given two points a, b on a
surface S ⊂ R 3 , we seek the curve C ⊂ S that joins them and has the minimal possible
length. For example, if S is a circular cylinder, then there are three possible types of
geodesic curves: straight line segments parallel to the center line; arcs of circles orthogonal


Assuming time = money!

1/7/22 5 c 2022 Peter J. Olver


a

Figure 2. Geodesic on a Cylinder.

to the center line; and spiral helices, the latter illustrated in Figure 2. Similarly, the
geodesics on a sphere are arcs of great circles. In aeronautics, to minimize distance flown,
airplanes follow such geodesic paths around the globe, even though they look longer when
illustrated by projection onto a flat planar map. This example is particularly important.
Starting at a given point on the sphere, as one passes through the antipodal point, the
minimizing geodesic switches from one arc of the great circle to the arc on the opposite
side. At the antipodal point itself, there are infinitely many minimizing geodesics, namely,
all the great semicircles with these as their endpoints. The antipodal point is an example
of a “conjugate point”, which will be of importance when we study the second derivative
test in Section 8. Again, while these facts may sound eminently reasonable, an air-tight
justification of the correctness of our geometric intuition is required. Furthermore, the
geometrical characterization of geodesics on other surfaces is far less intuitively evident.
In order to mathematically formulate the geodesic minimization problem, we suppose,
for simplicity, that our surface S ⊂ R 3 is realized as the graph† of a function z = F (x, y).
We seek the geodesic curve C ⊂ S that joins the given points

a = (a, α, F (a, α)), and b = (b, β, F (b, β)), lying on the surface S.

Let us assume that C can be parametrized by the x coordinate, in the form

y = u(x), z = v(x) = F (x, u(x)),

where the last equation ensures that it lies in the surface S. In particular, this requires
a 6= b. The length of the curve is supplied by the standard three-dimensional arc length


Cylinders are not graphs, but can be placed within this framework by passing to cylindrical
coordinates. Similarly, spherical surfaces are best treated in spherical coordinates. One can extend
these constructions to general parametrized surfaces; see below.

1/7/22 6 c 2022 Peter J. Olver


Figure 3. A Soap Film.

integral. Thus, to find the geodesics, we must minimize the functional


Z bs  2  2
dy dz
J[ u ] = 1+ + dx
a dx dx
(2.6)
Z bs  2  
du ∂F ∂F du 2
= 1+ + (x, u(x)) + (x, u(x)) dx,
a dx ∂x ∂u dx
subject to the boundary conditions u(a) = α, u(b) = β. For example, geodesics on the
paraboloid
z = 21 x2 + 12 y 2 (2.7)
can be found by minimizing the functional
Z bp
J[ u ] = 1 + u′ 2 + (x + u u′ )2 dx. (2.8)
a

Minimal Surfaces
The minimal surface problem is a natural generalization of the minimal curve or geodesic
problem. In its simplest manifestation, we are given a simple closed space curve C ⊂ R 3 .
The problem is to find the surface of least total area among all those whose boundary is
the curve C. Thus, we seek to minimize the surface area integral
ZZ
area S = dS (2.9)
S
3
over all possible surfaces S ⊂ R that have the prescribed boundary curve ∂S = C. Such
an area–minimizing surface is known as a minimal surface for short. For example, if C is
a closed plane curve, e.g., a circle, then the minimal surface will just be the planar region
it encloses. But, if the curve C twists into the third dimension, then the shape of the
minimizing surface is by no means evident.

1/7/22 7 c 2022 Peter J. Olver


C

Ω Γ

Figure 4. Minimal Surface.

Physically, if we bend a wire in the shape of the curve C and then dip it into soapy
water, the surface tension forces in the resulting soap film will cause it to minimize surface
area, and hence take the form of a minimal surface† ; see Figure 3. Soap films and bubbles
have been the source of much fascination, physical, æsthetical and mathematical, over the
centuries, [27]. The minimal surface problem is also known as Plateau’s Problem, named
after the nineteenth century French physicist Joseph Plateau who conducted systematic
experiments on such soap films. A completely satisfactory mathematical solution to even
the simplest version of the minimal surface problem was only achieved in the mid twentieth
century, [38, 39]. Minimal surfaces and related variational problems remain an active area
of contemporary research, and are of importance in engineering design, architecture, and
biology, including foams, domes, cell membranes, and so on.
Let us mathematically formulate the search for a minimal surface as a problem in the
calculus of variations. For simplicity, we shall assume that the bounding curve C projects
down to a simple closed curve Γ that bounds an open domain Ω ⊂ R 2 in the (x, y) plane, as
in Figure 4. The space curve C ⊂ R 3 is then given by z = g(x, y) for (x, y) ∈ Γ = ∂Ω. For
“reasonable” boundary curves C, we expect that the minimal surface S will be described as
the graph of a function z = u(x, y) parametrized by (x, y) ∈ Ω. According to multivariable
calculus, the surface area of such a graph is given by the double integral
ZZ s  2  2
∂u ∂u
J[ u ] = 1+ + dx dy. (2.10)
Ω ∂x ∂y
To find the minimal surface, then, we seek the function z = u(x, y) that minimizes the
surface area integral (2.10) when subject to the boundary conditions
u(x, y) = g(x, y) for (x, y) ∈ ∂Ω, (2.11)


More accurately, the soap film will realize a local but not necessarily global minimum for
the surface area functional. Non-uniqueness of local minimizers can be realized in the physical
experiment — the same wire may support more than one stable soap film.

1/7/22 8 c 2022 Peter J. Olver


that prescribe the boundary curve C. As we will see, (11.21), the (smooth) solutions to
this minimization problem satisfy a complicated nonlinear second order partial differential
equation.
A simple version of the minimal surface problem, that still contains some interesting
features, is to find minimal surfaces with rotational symmetry. A surface of revolution is
obtained by revolving a plane curve about an axis, which, for definiteness, we take to be
the x axis. Thus, given two points a = (a, α), b = (b, β) ∈ R 2 , in the upper half plane,
so α, β ≥ 0, the goal is to find the curve y = u(x) ≥ 0 joining them such that the surface
of revolution obtained by revolving the curve around the x-axis has the least surface area.
Each cross-section of the resulting surface is a circle centered on the x axis. The area of
such a surface of revolution is given by
Z b p
J[ u ] = 2π u 1 + u′ 2 dx. (2.12)
a

We seek a minimizer of this integral among all non-negative functions u(x) that satisfy the
fixed boundary conditions u(a) = α, u(b) = β. The minimal surface of revolution can be
physically realized by stretching a soap film between two circular wires, of respective radius
α and β, that are held a distance b − a apart. Symmetry considerations will require the
minimizing surface to be rotationally symmetric. Interestingly, the revolutionary surface
area functional (2.12) is exactly the same as the optical functional (2.5) when the light
speed at a point is inversely proportional to its distance from the horizontal axis: v(x, y) =
1/(2 π y).

Minimum Energy Principles

Many important problems in equilibrium continuum mechanics are described by a min-


imum energy principle, meaning that configuration(s) achieved by a system in equilibrium
minimizes (locally) the potential energy functional. Here is a simple example.
Suppose an elastic cable is stretched between two supports and subjected to a transverse
load. The Principle of Minimum Potential Energy states that the equilibrium configuration
is found by minimizing the energy. Assuming the cable lies in a plane and coincides with
the graph of a function y = u(x), then its local stretch per unit length is given by
p
1 + u′ (x)2 − 1,

and thus the potential energy is given by


Z b p
 
J[ u ] = T( 1 + u′ 2 − 1) + f (x) u dx, (2.13)
a

where T is the tension and f (x) the load at a point x. (For simplicity, we only consider ver-
tical displacements.) For small defections, one can replace (2.13) by the simpler quadratic
functional Z b
1 ′2

Q[ u ] = 2 T u + f (x) u dx. (2.14)
a

1/7/22 9 c 2022 Peter J. Olver


The equilibrium configuration of the cable will be the graph of the function y = u(x) that
minimizes the functional when subject to the boundary conditions u(a) = α, u(b) = β.
Isoperimetric Problems and Constraints
The most basic isoperimetric problem is to construct the simple† closed plane curve with
a prescribed length ℓ > 0 that encloses a domain having the largest possible area. In other
words, we seek to maximize
ZZ I
area Ω = dx dy subject to the constraint length ∂Ω = ds = ℓ,
Ω ∂Ω
2
over all possible domains Ω ⊂ R . Of course, the “obvious” solution to this problem is that
the curve must be a circle whose perimeter is ℓ, whence the name “isoperimetric”. Note
that the problem, as stated, does not have a unique solution, since if Ω is a maximizing
domain, any translated or rotated version of Ω will also maximize area subject to the
length constraint.
To make progress on the isoperimetric problem, let us assume that the boundary curve
is parametrized by its arc length, so x(s) = ( x(s), y(s) ) with 0 ≤ s ≤ ℓ, subject to the
requirement that
 2  2
dx dy
+ = 1. (2.15)
ds ds
We can compute the area of the domain by a line integral‡ around its boundary,
I Z ℓ
dx
area Ω = y dx = y ds , (2.16)
∂Ω 0 ds

and thus we seek to maximize the latter integral subject to the arc length constraint (2.15).
We also impose periodic boundary conditions

x(0) = x(ℓ), y(0) = y(ℓ), (2.17)


that guarantee that the curve x(s) closes up.
The three-dimensional version of this isoperimetric problem is to find the simple closed
surface that has a prescribed surface area and encloses the largest possible volume. The
evident answer is that the surface must be a sphere. This is the reason that, in the absence
of external forces, soap bubbles assume a spherical form. (A more challenging version is
to determine the shapes of multiple bubbles.) We leave it to the reader to formulate this
isoperimetric problem mathematically, and to see how it relates to the minimal surface
problem.
Another basic isoperimetric problem, with a perhaps less evident solution, is the follow-
ing. Among all curves of length ℓ in the upper half plane that connect two points (−a, 0)


A curve is simple if it does not cross itself.

The sign of the line integral depends upon the orientation of the curve.

1/7/22 10 c 2022 Peter J. Olver


and (a, 0), find the one that, along with the interval [ − a, a ], encloses the region having
the largest area. Of course, we must take ℓ ≥ 2 a, as otherwise the curve will be too short
to connect the points. In this case, we assume the curve is represented by the graph of a
non-negative function y = u(x) ≥ 0, and we seek to maximize the functional
Z a Z ap
u dx subject to the constraint 1 + u′ 2 dx = ℓ. (2.18)
−a −a

In the previous formulation (2.15), the arc length constraint was imposed at every point,
whereas here it is manifested as an integral constraint.
Both types of constraints, pointwise and integral, appear in a wide range of applied and
geometrical problems. Such constrained variational problems can profitably be viewed as
function space versions of constrained optimization problems. Thus, not surprisingly, their
analytical solution will require the introduction of suitable Lagrange multipliers, cf. [4, 35];
see Section 7.

3. The Euler–Lagrange Equation.


Even the preceding limited collection of examples of variational problems should already
convince the reader of the tremendous potential afforded by the calculus of variations. Let
us now discuss the most basic analytical techniques for solving such optimization problems.
We will concentrate on classical techniques, leaving more modern direct methods — the
function space counterparts of of gradient descent and its relatives — to a more in–depth
treatment of the subject, [18, 50].
Let us concentrate on the simplest class of variational problems, in which the unknown
is a continuously differentiable scalar function, and the functional to be minimized depends
upon at most its first derivative. The basic minimization problem, then, is to determine
a suitable function y = u(x) ∈ C1 [ a, b ], which denotes the space of continuously differen-
tiable† functions on the interval a ≤ x ≤ b, that minimizes the objective functional
Z b
J[ u ] = L(x, u, u′ ) dx. (3.1)
a
The integrand is known as the Lagrangian for the variational problem, in honor of Joseph–
Louis Lagrange, one of the founders of the subject. We usually assume that the Lagrangian
L(x, u, p) is a reasonably smooth function of all three of its (scalar) arguments x, u, and
p, which represents the derivative u′ . For example, the arc length functional (2.3) has
p
Lagrangian function L(x, u, p) = 1 + p2 , whereas in the surface of revolution problem
p
(2.12), L(x, u, p) = 2 π u 1 + p2 .
In order to uniquely specify a minimizing function, we must impose suitable boundary
conditions at the endpoints of the interval. To begin with, we concentrate on fixing the
values of the function
u(a) = α, u(b) = β, (3.2)


At the endpoints a, b one uses the appropriate one-sided derivative.

1/7/22 11 c 2022 Peter J. Olver


at the two endpoints. Other possibilities are considered in the following Section 4.

The First Variation

The (local) minimizers of a (sufficiently nice) objective function defined on a finite-


dimensional vector space are initially characterized as critical points, where the objective
function’s gradient vanishes, [35, 47]. An analogous construction applies in the infinite-
dimensional context treated by the calculus of variations. Every sufficiently nice minimizer
of a sufficiently nice functional J[ u ] is a “critical function”, meaning that its functional gra-
dient vanishes: ∇J[ u ] = 0. Of course, not every critical point turns out to be a minimum
— maxima, saddles, and many degenerate points are also critical. The characterization of
nondegenerate critical points as local minima or maxima relies on the second derivative
test, whose functional version, known as the second variation, will be the topic of Section 8.
But we are getting ahead of ourselves. The first order of business is to learn how
to compute the gradient of a functional defined on an infinite-dimensional function space.
The general definition of the gradient requires that we first impose an inner product h u , v i
on the underlying function space. The gradient ∇J[ u ] of the functional (3.1) will then
be defined by the same basic directional derivative formula as in the finite-dimensional
situation:
d
h ∇J[ u ] , ϕ i = J[ u + ε ϕ ] . (3.3)
dε ε=0

Here ϕ(x) is a function that prescribes the “direction” in which the derivative is computed.
Classically, ϕ is known as a variation in the function u, sometimes written ϕ = δu, whence
the term “calculus of variations”. Similarly, the gradient operator on functionals is often
referred to as the variational derivative, and often written δJ. The inner product used in
(3.3) is usually taken (again for simplicity) to be the standard L2 inner product
Z b
hf ,gi = f (x) g(x) dx (3.4)
a

on function space. Indeed, while the formula for the gradient will depend upon the under-
lying inner product, the characterization of critical points does not, and so the underlying
choice of inner product is not of significance here.
Now, starting with (3.1), for each fixed u and ϕ, we must compute the derivative of the
function Z b
h(ε) = J[ u + ε ϕ ] = L(x, u + ε ϕ, u′ + ε ϕ′ ) dx. (3.5)
a

Smoothness of the integrand allows us to bring the derivative inside the integral and so,
by the chain rule,
Z b
′ d d
h (ε) = J[ u + ε ϕ ] = L(x, u + ε ϕ, u′ + ε ϕ′ ) dx
dε a dε
Z b 
∂L ′ ′ ′ ∂L ′ ′
= ϕ (x, u + ε ϕ, u + ε ϕ ) + ϕ (x, u + ε ϕ, u + ε ϕ ) dx.
a ∂u ∂p

1/7/22 12 c 2022 Peter J. Olver


Therefore, setting ε = 0 in order to evaluate (3.3), we find
Z b 
∂L ′ ′ ∂L ′
h ∇J[ u ] , ϕ i = ϕ (x, u, u ) + ϕ (x, u, u ) dx. (3.6)
a ∂u ∂p
The resulting integral often referred to as the first variation of the functional J[ u ]. The
condition
h ∇J[ u ] , ϕ i = 0
for a minimizer is known as the weak form of the variational derivative.
To obtain an explicit formula for ∇J[ u ], the right hand side of (3.6) needs to be written
as an inner product,
Z b Z b
h ∇J[ u ] , ϕ i = ∇J[ u ] ϕ dx = h ϕ dx
a a

between some function h(x) = ∇J[ u ] and the variation ϕ. The first summand has this
form, but the derivative ϕ′ appearing in the second summand is problematic. However,
one can move derivatives around inside an integral through integration by parts. If we set
∂L 
r(x) ≡ x, u(x), u′(x) , (3.7)
∂p
we can rewrite the offending term as
Z b Z b

 
r(x) ϕ (x) dx = r(b) ϕ(b) − r(a) ϕ(a) − r ′ (x) ϕ(x) dx, (3.8)
a a
where, again by the chain rule,
 
′ d ∂L ′ ∂ 2L ∂2L ∂ 2L
r (x) = (x, u, u ) = (x, u, u′ ) + u′ (x, u, u′ ) + u′′ 2
(x, u, u′ ) .
dx ∂p ∂x ∂p ∂u ∂p ∂p
(3.9)
So far we have not imposed any conditions on our variation ϕ(x). We are only comparing
the values of J[ u ] among functions that satisfy the imposed boundary conditions (3.2).
Therefore, we must make sure that the varied function
uε (x) = u(x) + ε ϕ(x)
remains within this set of functions, and so
uε (a) = u(a) + ε ϕ(a) = α, uε (b) = u(b) + ε ϕ(b) = β,
for all ε. For this to hold, the variation ϕ(x) must satisfy the corresponding homogeneous
boundary conditions
ϕ(a) = 0, ϕ(b) = 0. (3.10)
As a result, both boundary terms in our integration by parts formula (3.8) vanish, and we
can write (3.6) as
Z b Z b   
∂L ′ d ∂L ′
h ∇J[ u ] , ϕ i = ∇J[ u ] ϕ dx = ϕ (x, u, u ) − (x, u, u ) dx. (3.11)
a a ∂u dx ∂p

1/7/22 13 c 2022 Peter J. Olver


Since this holds for all variations ϕ(x), we conclude that†
 
∂L ′ d ∂L ′
∇J[ u ] = (x, u, u ) − (x, u, u ) . (3.12)
∂u dx ∂p
This is our explicit formula for the functional gradient or variational derivative of the func-
tional (3.1) with Lagrangian L(x, u, p). Observe that the gradient ∇J[ u ] of a functional
is a function.
The critical functions u(x) are, by definition, those for which the functional gradient
vanishes:
∂L d ∂L
∇J[ u ] = (x, u, u′ ) − (x, u, u′ ) = 0. (3.13)
∂u dx ∂p
In view of (3.9), the critical equation (3.13) is, in fact, a second order ordinary differential
equation,

∂L ∂ 2L ∂ 2L ∂ 2L
(x, u, u′ ) − (x, u, u′ ) − u′ (x, u, u′ ) − u′′ (x, u, u′ ) = 0, (3.14)
∂u ∂x ∂p ∂u ∂p ∂p2
known as the Euler–Lagrange equation associated with the variational problem (3.1), in
honor of two of the most important contributors to the subject: Leonhard Euler and
Joseph–Louis Lagrange. Any solution to the Euler–Lagrange equation that is subject to
the assumed boundary conditions forms a critical point for the functional, and hence is a
potential candidate for the desired minimizing function. And, in many cases, the Euler–
Lagrange equation suffices to characterize the minimizer without further ado.
The right hand side of (3.12) or, equivalently, the left hand side of the Euler–Lagrange
equation (3.14), is often referred to as the Euler–Lagrange expression associated with the
Lagrangian L, and written
 
∂L ′ d ∂L ′
E(L) = (x, u, u ) − (x, u, u ) , (3.15)
∂u dx ∂p
and the associated Euler–Lagrange equation is simply E(L) = 0.

Remark : The Euler–Lagrange expression (3.15) is also referred to as the variational


derivative of the functional (3.1) with respect to its argument u, and denoted δJ/δu =
E(L), the notation inspired by the classical notation for variations.

Theorem 3.1. Suppose the Lagrangian function is twice continuously differentiable:


L(x, u, p) ∈ C2 . Then any C2 minimizer u(x) to the corresponding functional J[ u ] =
Z b
L(x, u, u′ ) dx, subject to any imposed boundary conditions, must satisfy the associated
a
Euler–Lagrange equation E(L) = 0.


See Lemma 3.4 and the ensuing discussion for a complete justification of this step.

1/7/22 14 c 2022 Peter J. Olver


Let us now investigate what the Euler–Lagrange equation tells us about the examples
of variational problems presented at the beginning of this section. One word of caution:
there do exist seemingly reasonable functionals whose minimizers are not, in fact, C2 ,
and hence do not solve the Euler–Lagrange equation in the classical sense; see below for
examples. Fortunately, in most (but not all!) variational problems that are of importance
in real-world applications, such pathologies do not appear.
Remark : Confusingly, in the standard variational terminology, all solutions of the Euler–
Lagrange equation are known as “extremals” even when they are not maximizers or mini-
mizers. We will try to avoid this potentially misleading convention, and refer to solutions
of the Euler–Lagrange equation as critical functions.

Curves of Shortest Length — Planar Geodesics


Let us return to the most elementary problem in the calculus of variations: finding the
curve of shortest length connecting two points a = (a, α), b = (b, β) ∈ R 2 . As we noted
in Section 3, such planar geodesics minimize the arc length integral
Z bp p
J[ u ] = 1 + u′ 2 dx with Lagrangian L(x, u, p) = 1 + p2 , (3.16)
a

subject to the boundary conditions

u(a) = α, u(b) = β.
Since
∂L ∂L p
= 0, =p , (3.17)
∂u ∂p 1 + p2
the Euler–Lagrange equation (3.13) in this case takes the form
d u′ u′′
0 = − √ = − .
dx 1 + u′ 2 (1 + u′ 2 )3/2
Since the denominator does not vanish, this is the same as the simplest second order
ordinary differential equation
u′′ = 0. (3.18)
We deduce that the solutions to the Euler–Lagrange equation are all affine functions,
u = c x + d, whose graphs are straight lines. Since our solution must also satisfy the
boundary conditions, the only critical function — and hence the sole candidate for a
minimizer — is the straight line
β−α
y= (x − a) + α (3.19)
b−a
passing through the two prescribed points. Thus, the Euler–Lagrange equation helps to
reconfirm our intuition that straight lines minimize distance.
Be that as it may, the fact that a function satisfies the Euler–Lagrange equation and
the boundary conditions merely confirms its status as a critical function, and does not

1/7/22 15 c 2022 Peter J. Olver


x

Figure 5. The Brachistochrone Problem.

guarantee that it is the minimizer. Indeed, any critical function is also a candidate for
maximizing the variational problem, too. The nature of a critical function will be elu-
cidated by the variational form of the second derivative test, and requires some further
work. Of course, for the minimum distance problem, we “know” that a straight line cannot
maximize distance, and must be the minimizer. Nevertheless, the reader should have a
small nagging doubt that we may not have completely solved the problem at hand . . .

The Brachistochrone Problem

The most famous classical variational principle is the so-called brachistochrone problem.
The compound Greek word “brachistochrone” means “minimal time”. An experimenter
lets a bead slide down a wire that connects two fixed points under the influence of gravity.
The goal is to shape the wire in such a way that, starting from rest, the bead slides
from one end to the other in minimal time. Naı̈ve guesses for the wire’s optimal shape,
including a straight line, a parabola, a circular arc, or even a catenary, are not optimal,
and one can do better through a careful analysis of the associated variational problem.
The brachistochrone problem was originally posed by the Swiss mathematician Johann
Bernoulli in 1696, and solutions were then found by his brother Jakob Bernoulli, as well as
Newton, Leibniz, von Tschirnhaus and de l’Hôpital. It served as an inspiration for much
of the subsequent development of the subject.
We take, without loss of generality, the starting point of the bead to be at the origin:
a = (0, 0). The wire will bend downwards, and so, to avoid distracting minus signs in the
subsequent formulae, we take the vertical y axis to point downwards. The shape of the
wire will be given by the graph of a function y = u(x) ≥ 0. The end point b = (b, β) is
assumed to lie below and to the right, and so b > 0 and β > 0. The physical set-up is
sketched in Figure 5. Alternatively, the brachistochrone can be viewed as the problem of
designing the optimally shaped playground slide that enables a child to get to the bottom
point the fastest.
To mathematically formulate the problem, the first step is to find the formula for the
transit time of the bead sliding along the wire. Arguing as in our derivation of the optics
functional (2.5), if v(x) denotes the instantaneous speed of descent of the bead when it

1/7/22 16 c 2022 Peter J. Olver


reaches position (x, u(x)), then the total travel time is
Z ℓ Z b√
ds 1 + u′ 2
T[u] = = dx, (3.20)
0 v 0 v

where ds = 1 + u′ 2 dx is the usual arc length element, and ℓ is the overall length of the
wire.
We shall use conservation of energy to determine a formula for the speed v as a function
of the position along the wire. The kinetic energy of the bead is 21 m v 2 , where m is its
mass. On the other hand, due to our sign convention, the potential energy of the bead
when it is at height y = u(x) is − m g u(x), where g the gravitational constant, and we
take the initial height as the zero potential energy level. The bead is initially at rest, with
0 kinetic energy and 0 potential energy. Assuming that frictional forces are negligible,
conservation of energy implies that the total energy must remain equal to 0, and hence
0= 1
2 m v 2 − m g u.
We can solve this equation to determine the bead’s speed as a function of its height:
p
v = 2g u . (3.21)
Substituting this expression into (3.20), we conclude that the shape y = u(x) of the wire
is obtained by minimizing the functional
s
Z b
1 + u′ 2
T[u] = dx, (3.22)
0 2g u
subject to the boundary conditions
u(0) = 0, u(b) = β. (3.23)
The associated Lagrangian is
r
1 + p2
L(x, u, p) = , (3.24)
u
√ 1
where we omit an irrelevant factor of 2 g (or adopt physical units in which g = 2 ). We
compute p
∂L 1 + p2 ∂L p
= − , = p .
∂u 2 u3/2 ∂p u (1 + p2 )
Therefore, the Euler–Lagrange equation for the brachistochrone functional is

1 + u′ 2 d u′ 2 u u′′ + u′ 2 + 1
− − p = − p = 0. (3.25)
2 u3/2 dx u (1 + u′ 2 ) 2 u (1 + u′ 2 )
Thus, the (smooth) minimizing functions must solve the nonlinear second order ordinary
differential equation
2 u u′′ + u′ 2 + 1 = 0, (3.26)
of an unfamiliar form. Rather than try to solve this differential equation directly, we note
that the Lagrangian (3.24) does not explicitly depend upon x, and therefore we can appeal
to the following important result.

1/7/22 17 c 2022 Peter J. Olver


Theorem 3.2. Suppose the Lagrangian L(x, u, p) = L(u, p) does not depend on x.
Then the Hamiltonian function
∂L
H(u, p) = p (u, p) − L(u, p) (3.27)
∂p
is a first integral for the Euler–Lagrange equation, meaning that it is constant on each
solution:

H u(x), u′ (x) = c (3.28)

for some c ∈ R, whose value can depend upon the solution u(x).

Proof : Differentiating (3.27), we find


   
d ′ d ′ ∂L ′ ′ ′ d ∂L ′ ∂L ′
H(u, u ) = u (u, u ) − L(u, u ) = u (u, u ) − (u, u ) = 0,
dx dx ∂p dx ∂p ∂u

which vanishes as a consequence of the Euler–Lagrange equation (3.13). This implies that
the Hamiltonian function is constant, thereby establishing (3.28). Q.E.D.

Remark : The Hamiltonian function is named after the Irish mathematician William
Rowan Hamilton, and plays a critical role in subsequent developments. Theorem 3.2 is a
special case of Emmy Noether’s powerful Theorem, [42; Chapter 4], that relates symme-
tries of variational problems — in this case translations in the x coordinate — with first
integrals, a.k.a. conservation laws.

Equation (3.28) has the form of an implicitly defined first order ordinary differential
equation which can, in fact, be integrated. Indeed, solving for

u′ = h(u, c) (3.29)

produces an autonomous first order differential equation, whose general solution can be
obtained by integration:
Z
du
= x + δ, (3.30)
h(u, c)

where δ is a second integration constant.


In our case (3.24), the Hamiltonian function is

∂L 1
H(u, p) = p −L=−p .
∂p u (1 + p2 )

Thus, in view of (3.28),

1
H(x, u, u′ ) = − p = c, which we rewrite as u (1 + u′ 2 ) = k,
u (1 + u′ 2 )

1/7/22 18 c 2022 Peter J. Olver


Figure 6. A Cycloid.

where k = 1/c2 is a constant. Solving for the derivative u′ results in the first order
autonomous ordinary differential equation†
r
du k−u
= .
dx u
This equation can be explicitly solved by separation of variables, and so
Z r
u
du = x + δ
k−u
for some constant δ. The left hand integration relies on the trigonometric substitution
1
u= 2
k (1 − cos θ),
whereby
Z r Z
k 1 − cos θ k 1
x+δ = sin θ dθ = (1 − cos θ) dθ = 2
k (θ − sin θ).
2 1 + cos θ 2
The left hand boundary condition u(0) = 0 implies δ = 0, and so the solution to the
Euler–Lagrange equation are curves parametrized by
x = r (θ − sin θ), u = r (1 − cos θ). (3.31)
With a little more work, it can be proved that the parameter r = 12 k is uniquely prescribed
by the right hand boundary condition, and moreover, the resulting curve supplies the global
minimizer of the brachistochrone functional, [21].
The parametrized curve (3.31) is known as a cycloid , which can be visualized as the
curve traced by a point sitting on the edge of a rolling wheel of radius r, as plotted in
Figure 6. (Checking this assertion is a useful exercise.) Interestingly, in certain configura-
tions, namely if β < 2 b/π, the cycloid that solves the brachistochrone problem dips below
the right hand endpoint b = (b, β), and so the bead is moving upwards when it reaches
the end of the wire, as in Figure 5. Also note that the cycloid (3.31) has a vertical tangent
at the initial point, so u′ (0) = ∞ and the solution is not smooth there. This arises due to
the singularity of the brachistochrone Lagrangian (3.24) when u = 0, or, equivalently, the
fact that u = 0 is a singular point of the Euler–Lagrange equation (3.26).


Technically, there is a ± sign in front of the square root, but it is not hard to see that this
ambiguity does not affect the final formulas.

1/7/22 19 c 2022 Peter J. Olver


Minimal Surface of Revolution
Consider next the problem of finding the curve connecting two points that generates
a surface of revolution of minimal surface area. Without loss of generality, we assume
that the curve is given by the graph of a non-negative function y = u(x) ≥ 0 that passes
through the points a = (a, α), b = (b, β) ∈ R 2 . According to (2.12), the required curve
will minimize the functional
Z b p p
J[ u ] = u 1 + u′ 2 dx, with Lagrangian L(x, u, p) = u 1 + p2 , (3.32)
a

where we have omitted an irrelevant factor of 2 π.


We have
∂L p ∂L up
= 1 + p2 , =p ,
∂u ∂p 1 + p2
and so the Euler–Lagrange equation (3.13) is
p d u u′ 1 + u′ 2 − u u′′
1 + u′ 2 − √ = = 0. (3.33)
dx 1 + u′ 2 (1 + u′ 2 )3/2
Since the Lagrangian (3.24) does not depend upon x, we can again apply Theorem 3.2.
The Hamiltonian function (3.27) is
∂L u
H(u, p) = p −L=−p ,
∂p 1 + p2
and hence (3.28) implies
u
H(u, u′ ) = − √ = −c (3.34)
1 + u′ 2
for some constant c ∈ R. (This can be checked by directly calculating dH/dx ≡ 0.) Solving
for† √
du u2 − c2
= u′ =
dx c
results in an autonomous first order ordinary differential equation, which we can immedi-
ately solve: Z
c du
√ = x + δ,
u2 − c2
where δ is a constant of integration. The most useful form of the left hand integral is in
terms of the inverse to the hyperbolic cosine function cosh z = 21 (ez + e−z ), whereby
 
−1 u x+δ
c cosh = x + δ, and hence u = c cosh . (3.35)
c c


The square root is real since, by (3.34), | u | ≤ | c |. Also, as in the brachistochrone, the ± sign
ambiguity in the square root does not affect the final formulas.

1/7/22 20 c 2022 Peter J. Olver


a
b

a b

Figure 7. Minimal Surfaces of Revolution.

In this manner, we have produced the general solution to the Euler–Lagrange equation
(3.33). Any solution that also satisfies the boundary conditions provides a critical function
for the surface area functional (3.32), and hence is a candidate for the minimizer. The
curve prescribed by the graph of a hyperbolic cosine function (3.35) is known as a catenary
from the Latin for “chain” since it is also the profile assumed by a hanging chain. It is
not a parabola, even though to the untrained eye it looks quite similar. Owing to their
minimization properties, catenaries are quite common in engineering design — for instance,
the arch in St. Louis is an inverted catenary.
So far, we have not taken into account the boundary conditions. It turns out that there
are three distinct possibilities, depending upon the configuration of the boundary points:
• There is precisely one value of the two integration constants c, δ that satisfies the two
boundary conditions.
• There are two different possible values of c, δ that satisfy the boundary conditions.
• There are no values of c, δ that allow (3.35) to satisfy the two boundary conditions.
This occurs when the two boundary points a, b are relatively far apart.
In the third configuration, the physical soap film spanning the two circular wires breaks
apart into two circular disks, and this defines the minimizer for the problem; there is
no surface of revolution that has a smaller surface area than the two disks. However,
the “function”† that minimizes this configuration consists of two vertical lines from the
boundary points to the x axis, along with the line segment on the x axis connecting them.
More precisely, we can approximate this function by a sequence of genuine functions that
give progressively smaller and smaller values to the surface area functional (2.12), but the
actual minimum is not attained among the class of (smooth) functions. Figure 7 illustrates
the case when there are two catenaries, with the disk represented by the red polygon. The


Here “function” must be taken in a very broad sense, as this one does not even correspond to
a generalized function!

1/7/22 21 c 2022 Peter J. Olver


corresponding surfaces are obtained by rotating all three curves around the x axis. In
this configuration, the two disks obtained by rotating the red curve will have the smallest
surface area, while if the endpoints are moved closer together, the surface obtained by
rotating the blue curve will be the minimizer.
There are further subtleties in the other two cases; see [32] for details. When only one
catenary solution exists, it is not the minimizer, and the discontinuous solution constructed
above has a smaller surface area. In the region where there are two possible catenaries,
the one that is farther away from the axis, perhaps counterintuitively, gives the smaller
surface area. However, in some configurations, its surface area is not smaller than that
offered by the two disks; when this occurs, there are again functions whose surface areas
are arbitrarily close to the minimum offered by the disks and hence smaller than the area
of both catenary solutions. Thus, even in such a reasonably simple example, a number of
the subtle complications can already be seen.
The Fundamental Lemma
The justification that a critical function satisfies the Euler–Lagrange equation relies on a
basic result known as the Fundamental Lemma of the calculus of variations. The simplest
version can be stated as follows.
Lemma 3.3. If f (x) is continuous on the closed interval [ a, b ], and
Z b
f (x) ϕ(x) dx = 0
a
for every continuous function ϕ(x), then f (x) ≡ 0 for all a ≤ x ≤ b.
Proof : A simple proof of the stated result proceeds by setting ϕ(x) = f (x). Then
Z b Z b
 2
0= f (x) ϕ(x) dx = f (x) dx,
a a
which, by continuity, is only possible when f (x) ≡ 0.
However, for many purposes, we would like to impose additional conditions on the func-
tions ϕ(x), and the preceding proof will not work unless f (x) satisfies the same conditions.
A proof that has much wider applicability proceeds as follows.
Suppose f (x0 ) 6= 0 for some a < x0 < b. By replacing f (x) by − f (x) if necessary, we
can assume f (x0 ) > 0. Then, by continuity, f (x) > 0 for all x lying in in some small
interval around x0 , of the form a < x0 − δ < x < x0 + δ < b for some δ > 0. Choose ϕ(x)
to be a continuous function that is > 0 in this interval and = 0 outside. An example is
the piecewise linear function

δ − | x − x0 |, | x − x0 | < δ,
ϕ(x) = (3.36)
0, otherwise,
which is illustrated in the first plot in Figure 8. Then f (x) ϕ(x) ≥ 0 everywhere, hence
Z b
f (x) ϕ(x) dx ≥ 0, and, by continuity, is equal to zero if and only if f (x) ϕ(x) ≡ 0, which
a
is a contradiction. Q.E.D.

1/7/22 22 c 2022 Peter J. Olver


a x0 − δ x0 x0 + δ b a x0 − δ x0 x0 + δ b
Figure 8. Bump Functions.

One can replace the continuous function (3.36) by the smooth bump function
   
δ exp 1 − δ −2 (x − x0 )2 − 1 −2 , | x − x0 | < δ,
ϕ(x) = (3.37)
0, otherwise,
which, in addition to being positive on the required interval† , is C∞ everywhere, includ-
ing at x = x0 ± δ, where all its derivatives vanish. The support of the bump function
(3.37), meaning theclosure of the set where it does not vanish, is the compact (closed and
bounded) interval | x − x0 | ≤ δ . This observation produces a useful strengthening of
Lemma 3.3:
Z b
Lemma 3.4. If f (x) is continuous on [ a, b ], and f (x) ϕ(x) dx = 0 for every C∞
a
function ϕ(x) with compact support in ( a, b ), then f (x) ≡ 0 for all a ≤ x ≤ b.
Note that the compact support condition imposed on ϕ implies that ϕ(a) = ϕ(b) = 0,
thus justifying our derivation of the Euler–Lagrange equation (3.13).
A Cautionary Example
While the Euler–Lagrange equation is fundamental in the calculus of variations, it does
have limitations and analysis based solely thereon may miss important features, as we
will gradually learn. One complicating factor is that convergence in infinite-dimensional
function space is considerably more subtle than in finite-dimensional Euclidean space. So
let us conclude this section with one of a large variety of cautionary examples.
Consider the problem of minimizing the integral
Z 1
 1 ′2 2 1 2

J[ u ] = 2
(u − 1) + 2
u dx (3.38)
0

subject to the boundary conditions u(0) = u(1) = 0. The Euler–Lagrange equation is


d  ′ ′2 
0=− 2 u (u − 1) + u = (2 − 6 u′ 2 )u′′ + u = 0.
dx
Since the Lagrangian L = 12 (p2 −1)2 + 21 u2 is independent of x, Theorem 3.2 can be applied
to reduce this to the first order ordinary differential equation given by the constancy of


Its mathematically correct graph in Figure 8 could be open to misinterpretation; between x0 −δ
and x0 + δ, it is indeed positive, but extremely close to the axis.

1/7/22 23 c 2022 Peter J. Olver


u2 (x) u3 (x) u8 (x)
Figure 9. Sawtooth Functions.

the Hamiltonian function


∂L
H(u, u′ ) = u′ (u, u′ ) − L(u, u′ ) = 3
2
u′ 4 − u′ 2 − 12 u2 − 1
2
= c,
∂p
where c ∈ R. As in (3.29, 30), the resulting quartic polynomial in u′ can be solved for
u′ = h(u, c), and the final autonomous first order ordinary differential equation can then
be integrated; see (3.30). However, the resulting very complicated formula for the solution
u(x) is not particularly helpful, especially when one tries to impose the admitted boundary
conditions to fix the values of the integration constants c, δ. A numerical approximation to
the preceding two point boundary value problem would be considerably more enlightening.
Moreover, the resulting critical function does not, in fact minimize† the functional,
as the following direct analysis shows. Observe that the functional is clearly bounded
below by J[ u ] ≥ 0. However the minimum value of 0 cannot be achieved, since it would
simultaneously require that u′ (x) = ±1 and u(x) = 0. On the other hand, one can come
arbitrarily close to the minimum as follows. Consider the “sawtooth” functions

 k k 2k + 1

 x− , ≤x≤ ,
n n 2n k = 0, . . . , n − 1,
un (x) = (3.39)

 k + 1 − x,
 2k + 1
≤x≤
k+1
,
n = 1, 2, 3, . . . ,
n 2n n
which are continuous, piecewise linear, with slope ±1 except at their corners, three of
which are plotted in Figure 9. For these particular functions, the first term in the integrand
vanishes, and hence
Z 1 2Xn−1 Z (j+1)/(2 n)
1 2 1 2 2n 1
J[ un ] = 2 un (x) dx = 2 un (x) dx = 8 n2 = 4 n .
0 j =0 j/(2 n)

Thus, as n → ∞, the value of J[ un ] → 0. On the other hand, the functions (3.39) converge
uniformly to the zero function: un (x) → u∗ (x) ≡ 0, and J[ u∗ ] = 12 . In other words,
h i
lim J[ un ] = 0 6= 21 = J lim un , (3.40)
n→∞ n→∞


It may provide a local minimum — or maximum — but to ascertain this, one would need to
invoke the second derivative test.

1/7/22 24 c 2022 Peter J. Olver


illustrating the potential pitfalls of interchanging limits and integration in the absence of
suitable hypotheses. We conclude that one can find functions u(x) for which the value
of the integral (3.38) is arbitrarily close to its overall minimum, but there is no function
which achieves the value J[ u ] = 0.
Remark : The reader may object that the functions (3.39) used in this argument have
corners at the points xj = j/(2 n), and hence are not C1 . But the corners can be smoothed
out, replacing each un by a nearby C2 or even C∞ function u en , whose value J[ u
en ] can be
made arbitrarily close to J[ un ], and thus possess the same limiting property (3.40).

4. Boundary Conditions and Null Lagrangians.


So far, we have only imposed fixed boundary conditions on our variational problems, in
which the values of the minimizer are specified at both ends of the interval. Other types of
boundary conditions can be handled, but there are some subtleties that should be delved
into.
Natural Boundary Conditions
A second common type of condition is that of a free boundary, in which no conditions
are imposed on the minimizer at one or both ends of the interval. As an example of this
type of problem, consider the planar geodesic problem in which one wishes to find the
curve of shortest length that connects a given point to a vertical line. In other words, we
seek to minimize the arc length functional (2.3) on the interval [ a, b ] where the minimizing
function is required to pass through the point (a, α), so that u(a) = α, but there are
no conditions at the endpoint x = b. Thus we are seeking the shortest curve that goes
from (a, α) to any point on the vertical line ℓb = { x = b }. A moment’s reflection should
convince you that the minimizing curve is a horizontal straight line, so u(x) ≡ α for all
a ≤ x ≤ b, that meets the vertical line ℓb in a perpendicular direction. If we also impose
no conditions at the initial endpoint, we then seek a curve of minimal length that goes
from the vertical line ℓa = { x = a } to the vertical line ℓb = { x = b }. In this case any
horizontal line will serve as a minimizer, and so the minimizer is not unique.
Similarly, one could seek the solution to the brachistochrone problem with a free end, to
determine the shape of a wire starting at the origin with the property that a bead sliding
along it reaches the line ℓb in the shortest possible time, without any need to specify which
point on the line it ends up on. Thus, the functional to be minimized is again (3.22), but
here one only imposes a condition, say u(0) = 0, at one end of the interval.
To analyze such problems, let us return to our general variational calculation. Thus,
consider the usual objective functional
Z b
J[ u ] = L(x, u, u′ ) dx. (4.1)
a

Our goal is to minimize J[ u ] among all sufficiently smooth function u(x), defined for
a ≤ x ≤ b, but now only subject to a single fixed boundary condition, namely

u(a) = α, (4.2)

1/7/22 25 c 2022 Peter J. Olver


while the value u(b) at the other end of the interval remains arbitrary.
We can apply the same variational computation that leads to (3.7–8) since up to this
point, the boundary conditions are not used. In order to preserve the remaining fixed
boundary condition (4.2), the variation ϕ(x) must vanish at the left hand boundary point:
ϕ(a) = 0. However, there are no conditions imposed on ϕ(b) at the right hand end. Thus,
only one of the boundary terms in (3.8) vanishes, and the result of the integration by parts
computation, replacing (3.11), is

0 = h ∇J[ u ] , ϕ i
Z b   
∂L  ∂L d ∂L (4.3)
= ϕ(b) b, u(b), u′(b) + ϕ (x, u, u′ ) − (x, u, u′ ) dx.
∂p a ∂u dx ∂p
Thus, we can no longer identify the gradient of the functional J[ u ] as the function given by
the Euler–Lagrange expression, since there is an additional boundary component. Instead,
let us work with (4.3) directly. Keep in mind that we have not imposed any conditions on
ϕ(b). However, if we set ϕ(b) = 0, then (4.3) reduces to the previous integral (4.3), whose
vanishing, by the same argument based on Lemma 3.4, implies that u(x) must continue to
satisfy the Euler–Lagrange equation (3.13), whereby the integral term in (4.3) vanishes.
Thus for the entire right hand side to vanish for all such ϕ(x), the term multiplying ϕ(b)
must also vanish, and we conclude that the minimizing function must also satisfy

∂L 
b, u(b), u′(b) = 0. (4.4)
∂p
This is known as a natural boundary condition, which imposes a constraint on the minimizer
at the free boundary. We conclude that any minimizer for the boundary value problem
that has only one endpoint fixed must be a solution to the Euler–Lagrange equation (3.13)
supplemented by the two boundary conditions (4.2) and (4.4) — one fixed and the other
natural. Thus, the determination of the critical functions continues to require the solution
to a two point boundary value problem for the second order Euler–Lagrange ordinary
differential equation. If, instead of being fixed, the value of the minimizer at the other
endpoint is also left free, then the same argument implies that the minimizer must also
satisfy a natural boundary condition there:

∂L 
a, u(a), u′(a) = 0. (4.5)
∂p
Thus, any minimizer of the free variational problem must solve the Euler–Lagrange equa-
tion along with the two natural boundary conditions (4.4–5). As always, the above are just
necessary conditions for minimizers. Maximizers, if such exist, must also satisfy the same
conditions, which serve to characterize the critical functions of the variational principle
subject to the given boundary constraints. By this line of reasoning, we are led to the
following conclusion.
At each endpoint of the interval, any critical function of a functional, including mini-
mizers and maximizers, must satisfy either homogeneous or inhomogeneous fixed boundary
conditions, or homogeneous natural boundary conditions.

1/7/22 26 c 2022 Peter J. Olver


Consequently, imposing any other type of boundary conditions at an endpoint, say
x = a, will force the critical function to satisfy more than one boundary condition at the
endpoint — the natural boundary condition plus the imposed boundary condition. This
will effectively prescribe both u(a) and u′ (a) which, by the basic existence and uniqueness
theorem for the solution of second order ordinary differential equations, [9], would imply
that they are satisfied by one and only one solution to the Euler–Lagrange equation. The
resulting solution will probably not satisfy the additional boundary condition(s) at the
other endpoint, in which case there would be no critical functions associated with the
variational principle, and hence no candidate minimizers.

Example 4.1. Let us apply this method to the problem is to find the shortest path
between a point and a straight line. By applying a rigid motion (translation and rotation),
we can take the point to be a = (a, α) and the line to be in the vertical direction, namely
ℓb = { x = b }. Assuming the solution is given by the graph of a function y = u(x), the
length is given by the same arc length functional (3.16), but now there is only a single
fixed boundary condition, namely u(a) = α. To determine the second boundary condition
at x = b, we use (4.4). In view of (3.17), this requires

u′ (b)
p = 0, or, simply, u′ (b) = 0.
1 + u′ (b)2
This means that any critical function u(x) must have horizontal tangent at the point x = b,
or, equivalently, it must be perpendicular to the vertical line ℓb .
The Euler–Lagrange equation is the same as before, reducing to u′′ = 0, and hence the
critical functions are affine: u(x) = c x + d. Substituting into the two boundary conditions
u(a) = α, u′ (b) = 0, we conclude that the minimizer must be a horizontal straight line:
u(x) ≡ α for all a ≤ x ≤ b, reconfirming our earlier observation. As before, this is not a
complete proof that the horizontal straight line is a minimizer — based on what we know
so far, it could be a maximizer — and resolving its status requires further analysis based
on the second variation.
Let us investigate what happens if we try to impose a non-natural, non-fixed boundary
condition at the endpoint — for example, the Robin condition

u′ (b) = β u(b) + γ, (4.6)

where β 6= 0 and γ are constants. The solution to the Euler–Lagrange equation satisfying
u(a) = α and the Robin condition (4.6) is easily calculated:

(α β + γ) x + α(1 − β b) − γ a
u(x) = (4.7)
1 − β (b − a)
provided the denominator does not vanish. However, unless α β + γ = 0, so the graph of
u is a horizontal line, this function does not provide a minimum to arc length functional
among functions subject to the prescribed boundary conditions. Indeed, one can construct
functions that satisfy the Robin boundary condition (4.6) whose arc length is arbitrarily
close to that of the horizontal line, which is b − a. For example, we can slightly perturb

1/7/22 27 c 2022 Peter J. Olver


α

a b−ε b

Figure 10. Perturbed Horizontal Line.

the line by, say, setting



α, a ≤ x ≤ b − ε,
uε (x) =
α + (α β + γ) (x − b + ε)/(1 − ε β), b − ε ≤ x ≤ b,
where 0 < ε < | β |, which does satisfy (4.6); see Figure 10. Its arc length
s   !
αβ + γ 2
b−a+ε 1+ −1 −→ b − a
1 − εβ

as ε → 0. One can even smooth off its corner, which has the effect of slightly decreasing
its total arc length and thus does not affect the convergence as ε → 0. Thus, while the
Robin boundary value problem has no minimizing solution, it does admit smooth functions
that come arbitrarily close to the minimum possible value, which is b − a. This example
exemplifies a common type of behavior for variational principles that are bounded from
below, but have no bona fide function that achieves the minimum value.
Example 4.2. Let us return to the brachistochrone problem, but now let us seek the
shape of the wire that enable the bead to get from the point a = (0, 0) to a point on the
line ℓb = { x = b } where b > 0 in the shortest time. In this case, we no longer specify which
vertical position on the line the bead ends up at, and so x = b is a free end. Thus, we
continue to minimize the same functional (3.22) subject to the single boundary condition
u(0) = 0. The minimizer must satisfy the natural boundary condition at x = b. Since the
Lagrangian is
s
1 + p2 ∂L p
L(x, u, p) = with =p ,
2g u ∂p 2 g u(1 + p2 )
the natural boundary condition (4.4) is simply u′ (b) = 0. We conclude that the solution
is provided by the cycloid (3.31) that starts out at the origin and has horizontal tangent
at the point x = b. Note that, in view of (3.31), setting
du sin θ
= =0 implies θ = π,
dx 1 − cos θ

1/7/22 28 c 2022 Peter J. Olver


which thereby serves to prescribe the parameter r = b/π. The resulting curve is the graph
of a half of a cycloid arc.

Null Lagrangians

Despite the perhaps disappointing conclusion afforded by the inadmissibility of Robin


conditions in Example 4.1, it turns out it is possible to impose other types of boundary
conditions on a variational principle, by suitably modifying the functional. With this aim,
we first introduce an important notion of independent interest in the calculus of variations.

Definition 4.3. A function N (x, u, p) is called a null Lagrangian if its associated


Euler–Lagrange expression vanishes identically: E(N ) ≡ 0.

Since the Euler–Lagrange expression can be identified with the gradient of the associ-
ated functional, null Lagrangians can be viewed as the calculus of variations equivalent
of constant functions, in that a function defined on a connected subset of R n has zero
gradient if and only if it is constant, [4, 35].

Example 4.4. Consider the variational problem


Z ∞

K[ u ] = x2 u′ + 2 x u dx
0

for the Lagrangian N (x, u, p) = x2 p + 2 x u. We compute its Euler–Lagrange expression


using the basic equation (3.15). Since

∂N d ∂N d
E(N ) = − = 2x − (x2 ) ≡ 0,
∂u dx ∂p dx
we conclude that N is a null Lagrangian.

Since the Euler–Lagrange equation of a null Lagrangian is simply 0 = 0, every function


satisfies it and thus, provided it also satisfies the imposed boundary conditions, is a critical
function. We will shortly explain what this signifies. But first let us state an explicit
characterization of all null Lagrangians.

Theorem 4.5. A function N (x, u, p) defined for all † (x, u, p) ∈ R 3 is a null Lagrangian
if and only if it is a total derivative, meaning

d ∂S ∂S
N (x, u, u′ ) = S(x, u) = + u′ (4.8)
dx ∂x ∂u
for some function S that depends only on x, u.


More generally, N can be defined on a subset of R3 with trivial topology.

1/7/22 29 c 2022 Peter J. Olver


Proof : Referring back to the explicit form of the Euler–Lagrange equation (3.14), the
only term involving u′′ is the last one, and so if the left hand side is to vanish for all
possible functions u(x), the coefficient of u′′ must vanish, so

∂ 2N
= 0, which implies N (x, u, p) = f (x, u) p + g(x, u)
∂p2
for some functions f, g. Substituting the latter expression back into (3.14) yields

∂f ∂g ∂f ∂f ∂g ∂f
u′ + − − u′ = − = 0.
∂u ∂u ∂x ∂u ∂u ∂x
Since we are assuming N , and hence f, g are defined for all x, u, a standard result in mul-
∂S ∂S
tivariable calculus, [35, 43], implies f = ,g= , for some function S(x, u). Q.E.D.
∂u ∂x
Remark : Theorem 4.5 is a special case of a general theorem characterizing higher order
and multidimensional null Lagrangians; see [42; Theorem 4.7].
With Theorem 4.5 in hand, we can apply the Fundamental Theorem of Calculus to write
the functional associated with a null Lagrangian in the following form:
Z b Z b
′ d    
K[ u ] = N (x, u, u ) dx = S(x, u) dx = S b, u(b) − S a, u(a) . (4.9)
a a dx
In other words, the value of a functional associated with a null Lagrangian depends only
on the values of the function u(x) at the endpoints of the interval. We conclude that the
integral defined by a null Lagrangian is path independent, meaning, as with line integrals,
[4, 35], that its value is independent of the path (graph of the function) taken between its
endpoints. And this explains why every function solves the Euler–Lagrange equation and
is in fact a minimizer. It is because the value of the functional K[ u ] depends only upon
the values of u(x) at the endpoints a, b, and not on its behavior for a < x < b, and hence
achieves the same value, meaning that it is constant, whenever u(x) satisfies our usual
fixed boundary conditions u(a) = α, u(b) = β; hence, all such functions are minimizers
(and maximizers).
General Boundary Conditions
The flexibility afforded by null Lagrangians allows us to expand the range of boundary
conditions that can be handled by the calculus of variations. Namely, we can modify the
original variational problem by adding in a suitable null Lagrangian, which does not alter
the Euler–Lagrange equations but does change the associated natural boundary conditions.
In other words, if N = dS/dx is a null Lagrangian, then the modified objective functional
Z b Z b  
  d  
e u] =
J[ ′
L(x, u, u ) + N (x, u, u ) dx = ′ ′
L(x, u, u ) + S(x, u) dx
a a dx
Z b
(4.10)
 
= S b, u(b) − S a, u(a) + L(x, u, u′ ) dx
a

1/7/22 30 c 2022 Peter J. Olver


Z b
has the same Euler–Lagrange equations as J[ u ] = L(x, u, u′ ) dx. Moreover, when sub-
a
ject to fixed boundary conditions, they have exactly the same critical functions, and hence
the same minimizers, since their values differ only by the initial two boundary terms in
the final expression, which depend only on the values of u at the endpoints a, b. On the
other hand, as we will see, the two variational problems have different natural boundary
conditions, and this flexibility allows us to admit any (uncoupled) boundary conditions
into the variational framework.
To prove this and explain how to construct the required null Lagrangian, let us assume
that both boundary conditions explicitly involve the derivative of the function u(x) at the
endpoint. We can already handle fixed boundary conditions, and the remaining “mixed”
case, in which one end is fixed and the boundary condition at the other end involves the
derivative, is left as an exercise for the reader. Under mild algebraic assumptions, we can
solve the boundary conditions for the derivative:
 
u′ (a) = B1 u(a) , u′ (b) = B2 u(b) , (4.11)
for some functions B1 , B2 depending only on the value of u at the endpoint in question.
(They may, of course, depend on the respective endpoints a, b, but this is taken care of by
allowing different functions at each end.) Besides a free end, where Bi (u) ≡ 0, another
case of interest in a variety of physical situations are the aforementioned Robin boundary
conditions
u′ (a) = β1 u(a) + γ1 , u′ (b) = β2 u(b) + γ2 , (4.12)
in which β1 , β2 , γ1 , γ2 are prescribed constants.
Given a variational problem with Euler–Lagrange equation supplemented by the pre-
scribed boundary conditions (4.11), let us add in a suitable null Lagrangian (4.8) in order
that the natural boundary conditions associated with the modified Lagrangian

e u, p) = L(x, u, p) + N (x, u, p) = L(x, u, p) + p ∂S (x, u) + ∂S (x, u)


L(x, (4.13)
∂u ∂x
are equivalent to the desired boundary conditions (4.11). Thus, at the right hand endpoint,
in view of (4.4), this means
e
∂L  ∂L  ∂S 
0= b, u(b), u′ (b) = b, u(b), u′ (b) + b, u(b)
∂p ∂p ∂u

is equivalent to the boundary equation u′ (b) = B2 u(b) . For this to occur, we must
require
∂S  ∂L 
b, u(b) = b, u(b), B2 u(b) . (4.14)
∂u ∂p
Similarly, at the left hand endpoint, we require
∂S  ∂L 
a, u(a) = a, u(a), B1 u(a) . (4.15)
∂u ∂p
These two conditions suffice to prescribe (4.11) as the natural boundary conditions for the
variational problem associated with the modified Lagrangian (4.13), thus justifying our

1/7/22 31 c 2022 Peter J. Olver


claim that by a suitable choice of S(x, u), or, equivalently, by adding in a suitable null
Lagrangian N = dS/dx we can arrange for any boundary conditions of the above form to
be the natural boundary conditions associated with the variational problem.
We can combine the conditions (4.14–15) into a simpler form as follows. Choose an
“interpolating function” B(x, u) such that
   
B a, u(a) = B1 u(a) , B b, u(b) = B2 u(b) . (4.16)
For example, we can use linear interpolation and set
x−a x−b
B(x, u) = B2 (u) − B (u). (4.17)
b−a b−a 1
In particular, if B1 (u) ≡ B2 (u), so that we have the same boundary condition at both
ends, we can set B(x, u) = B1 (u) = B2 (u). Then (4.14–15) are implied by the interpolated
equation
Z
∂S ∂L  ∂L 
(x, u) = − x, u, B(x, u) , and thus S(x, u) = − x, u, B(x, u) du
∂u ∂p ∂p
(4.18)
is any anti-derivative of the integrand. We have thus proved a general result about varia-
tional problems with specified boundary conditions.
Z b
Theorem 4.6. Let J[ u ] = L(x, u, u′ ) dx be a variational problem whose minimizers
a
are subject to the boundary conditions
 
u′ (a) = B a, u(a) , u′ (b) = B b, u(b) , (4.19)
for some function B(x, u). Let S(x, u) be defined by (4.18). Then the modified variational
problem
Z b  Z b
d  
e
J[ u ] = ′
L(x, u, u ) + S(x, u) dx = S b, u(b) − S a, u(a) + L(x, u, u′ ) dx
a dx a
(4.20)
has the same Euler–Lagrange equations and natural boundary conditions given by (4.19).
Observe that the modified variational problem (4.20) differs from the original through
the addition of certain boundary “corrections” that depend only on the boundary values
of u. The point is that any solution to the resulting boundary value problem will be a
candidate minimizer for the modified variational problem.
Example 4.7. Let us consider the problem of minimizing arc length subject to the
Robin boundary conditions (4.12) on the interval [ a, b ]. In accordance with (4.17), set
x−a x−b x−a x−b
B(x, u) = β(x) u + γ(x) where β(x) = β2 − β1 , γ(x) = γ2 − γ .
b−a b−a b−a b−a 1
Using the Lagrangian (3.16), we compute
∂L p
=p .
∂p 1 + p2

1/7/22 32 c 2022 Peter J. Olver


Substituting into (4.18), the required function S is obtained by integration:
Z p
β(x) u + γ(x) 1 + [ β(x) u + γ(x) ]2
S(x, u) = − p du = − ,
1 + [ β(x) u + γ(x)]2 β(x)
provided β(x) 6= 0. The modified variational problem (4.20) can thus be written in the
form†
p p Z bp
1 + [ β u(b) + γ ] 2 1 + [ β1 u(a) + γ1 ]2
e
J[ u ] = − 2 2
+ + 1 + u′ (x)2 dx
β2 β1 a
p p Z bp (4.21)

1 + u (b) 2 1 + u (a)′ 2
=− + + 1 + u′ (x)2 dx.
β2 β1 a

Thus, while the basic arc length functional does not, in general, admit a minimizer that
satisfies the Robin boundary conditions, the modified arc length (4.21), which has the
same Euler–Lagrange equation, (usually) does.
Let us solve the Robin boundary value problem for the Euler–Lagrange equation, which,
as noted above, is merely u′′ = 0, the solutions of which are straight lines u = c x + d.
Substituting into the Robin boundary conditions (4.12) produces

c = β1 (c a + d) + γ1 = β2 (c b + d) + γ2 .
Thus, if
(b − a) β1 β2 + β2 − β1 6= 0,
the problem admits a unique solution, while if the left hand side is zero, then there is either
a one-parameter family of solutions that all give the same value to the modified variational
problem (even though they have differing arc lengths), or there is no solution, depending
on the values of γ1 , γ2 .
What about coupled boundary conditions, which relate the values of the minimizer and
its derivatives at the endpoints? The simplest are periodic boundary conditions

u(a) = u(b), u′ (a) = u′ (b). (4.22)


Replacing u 7→ u+ε ϕ, we find that any variation must satisfy the same periodic conditions:
ϕ(a) = ϕ(b), ϕ′ (a) = ϕ′ (b). Thus, the difference of the two variational boundary terms in
(3.8) will vanish provided ∂L/∂p is also periodic in x, meaning
∂L ∂L
(a, u, p) = (b, u, p). (4.23)
∂p ∂p
More generally we could try to impose a pair of coupled boundary conditions of the form
 
F1 u(a), u′ (a), u(b), u′(b) = F2 u(a), u′ (a), u(b), u′(b) = 0,


We avoid writing out the more complicated integral expression (4.10) involving the correspond-
ing null Lagrangian.

1/7/22 33 c 2022 Peter J. Olver


where each Fi (u, p, v, q) depends on 4 arguments. The variation ϕ(x) will thus satisfy the
linearization of each:
∂Fi ∂Fi ′ ∂Fi ∂Fi ′
ϕ(a) + ϕ (a) + ϕ(b) + ϕ (b) = 0. (4.24)
∂u ∂p ∂v ∂q
Now, if there are no constraints relating ϕ(a) and ϕ(b), meaning they can achieve indepen-
dent values, then (3.8) will imply that both natural boundary conditions must be satisfied,
and hence, unless there are more than two boundary conditions and the boundary value
problem is overdetermined, variational admissibility implies that the boundary conditions
decouple into the usual natural conditions at each end. For this not to be the case, the first
coupled boundary condition must relate the values of the critical function at the endpoints,
say 
F u(a), u(b) = 0. (4.25)
Thus equation (4.24) will imply
∂F  ∂F 
u(a), u(b) ϕ(a) + u(a), u(b) ϕ(b) = 0.
∂u ∂v
Further, the vanishing of the boundary terms in (3.8) requires

G u(a), u′ (a), u(b), u′(b) = 0, where
∂F ∂L ∂F ∂L (4.26)
G(u, p, v, q) = (u, v) (a, u, p) − (u, v) (b, v, q),
∂v ∂p ∂u ∂p
which thus provides the second “coupled natural boundary condition” complementing
(4.25). Thus, variationally admissible coupled boundary conditions necessarily take the
form (4.25, 26). For example, the quasiperiodic condition

u(b) = α u(a),
where α is a nonzero constant, requires
∂L  1 ∂L 
b, u(b), u′ (b) = a, u(a), u′(a)
∂p α ∂p
as its variationally admissible quasiperiodic counterpart. See [21, 48] for further details
and developments.
Problems with Variable Endpoints
So far, we have kept the endpoints a, b of our interval of integration fixed whilst varying
the functional. However, more general problems also allow us to vary the endpoints of the
interval. Here is a typical example.
Example 4.1 investigated the shortest path — the geodesic — between a point and
a vertical line. Let us generalize this problem to that of determining the shortest path
between a point (a, α) and a given plane curve C ⊂ R 2 , with (a, α) 6∈ C (as otherwise the
problem is trivial). The second endpoint (b, β) ∈ C of the minimizing geodesic is allowed to
be anywhere on the curve, and so the value of b is not fixed — unless the curve is a vertical

1/7/22 34 c 2022 Peter J. Olver


line. Or, even more generally, one can seek the shortest path between two nonintersecting
plane curves C1 , C2 ⊂ R 2 . Using our earlier considerations as a guide, you may well be
able to figure out what the answer to these problems is, but let’s do this from scratch.
Let’s generalize this problem and seek to minimize a first order functional of the usual
form (3.1). Let’s begin with the case when one endpoint is fixed, say u(a) = α, but the
other endpoint is constrained to lie on a curve, which we assume is the graph of a function
y = σ(x). (More general curves can be treated parametrically, in which case the problem
should be converted into parametric form, to be discussed in Section 5.) Thus, the second
boundary condition should take this into account, and so

u(b) = σ(b), (4.27)

where the endpoint b is variable.


To apply the variational calculus, we now vary both u(x) 7−→ u(x) +ε ϕ(x) as well as the
endpoint b 7−→ b + ε η where η ∈ R, and ε ∈ R varies over an interval containing 0. Using
the standard calculus formula for the derivative of an integral with variable endpoints, we
then compute — as usual performing an integration by parts on the term involving ϕ′ :
Z b+ε η
d 
0= L x, u(x) + ε ϕ(x), u′ (x) + ε ϕ′ (x) dx
dε a
ε=0
Z b  
 ∂L ∂L ′ (4.28)
= L b, u(b), u′ (b) η + ϕ+ ϕ dx
a ∂u ∂p
Z b

 ∂L ′

= L b, u(b), u (b) η + b, u(b), u (b) ϕ(b) + E(L) ϕ dx,
∂p a

where E(L) is the usual Euler–Lagrange expression (3.15). On the other hand, the varia-
tions must also satisfy the curve constraint (4.27), and so

u(b + ε η) + ε ϕ(b + ε η) = σ(b + ε η). (4.29)

Differentiating this equation with respect to ε and setting ε = 0 produces

u′ (b) η + ϕ(b) = σ ′ (b) η.

Substituting into (4.28) produces


  Z b
 ′′ ′
 ∂L ′

0 = L b, u(b), u (b) + σ (b) − u (b) b, u(b), u (b) η + E(L) ϕ dx.
∂p a

Applying the Fundamental Lemma 3.4 to the integral, we deduce that the critical func-
tions u must satisfy the usual Euler–Lagrange equation E(L) = 0, while vanishing of the
boundary term imposes the boundary condition
  ∂L 
L b, u(b), u′ (b) + σ ′ (b) − u′ (b) b, u(b), u′ (b) = 0 (4.30)
∂p

1/7/22 35 c 2022 Peter J. Olver


Figure 11. Shortest Path from a Point to a Curve and Between Two Curves.

in addition to (4.27). If, instead of being fixed, the other endpoint is required to lie on a
curve, say y = ρ(x), then the same variational calculation leads to the analogous boundary
condition
  ∂L 
L a, u(a), u′(a) + ρ′ (a) − u′ (a) a, u(a), u′(a) = 0. (4.31)
∂p

Example 4.8. For the geodesic problem introduced above, the Lagrangian for the arc
p
length functional is L(x, u, p) = 1 + p2 . Thus, the boundary condition (4.30) becomes
p  u′ (b)
1 + u′ (b)2 + σ ′ (b) − u′ (b) p = 0,
1 + u′ (b)2
which can be algebraically simplified and then solved for

u′ (b) = −1/σ ′ (b), (4.32)

which requires that neither derivative vanish at x = b. This boundary condition has
the geometric interpretation that the graph of the solution y = u(x) intersects the curve
y = σ(x) orthogonally, meaning that their tangent lines are perpendicular at the point of
intersection. In the present case, the solutions to the Euler–Lagrange equation are straight
lines, y = m (x − a) + α with slope m = u′ (b), and hence the shortest path connecting the
point (a, α) to the curve C is a line that is orthogonal to the curve. There may be more
than one of these, some being local minima, and hence the global minimizer will be the
one among these that has the shortest overall length; see the first plot in Figure 11, where
the middle line segment is the minimizer. Similarly, the shortest path between two curves
is a straight line segment that intersects each curve orthogonally. In the second plot in
Figure 11, the two curves are a circle and an ellipse, and the plotted straight lines yield one
global minimum, one local minimum (the longer horizontal segment), and two additional
critical line segments of equal length that are not local minima.

1/7/22 36 c 2022 Peter J. Olver


Remark : A similar calculation demonstrates that the same orthogonality condition holds
p
for any variational problem with Lagrangian of the form L(x, u, p) = f (x, u) 1 + p2 ,
which includes the brachistochrone (3.22), minimal surface of revolution (2.12), and ge-
ometric optics (2.5) problems. In all cases, the minimizing curve (which need not be a
straight line) necessarily intersects a prescribed boundary curve orthogonally.

5. Variational Problems Involving Several Unknowns.


The extension of the variational calculus to the case of first order variational problems
involving one independent and several dependent variables is fairly straightforward. The
one twist is that, unlike the scalar first order case, compatibility constraints are required
even for uncoupled boundary conditions in order that they be variationally admissible.
For simplicity, we provide details for the case of two variables. The extension of this
analysis to first order variational problems involving more than two dependent variables is
straightforward.
We thus consider a functional
Z b
J[ u, v ] = L(x, u, v, u′, v ′ ) dx, (5.1)
a

prescribed by the Lagrangian L(x, u, v, p, q) involving two unknown functions u(x), v(x).
We introduce simultaneous variations u(x) + ε ϕ(x), v(x) + ε ψ(x). Arguing as before, we
compute the derivative of the scalar function
Z b
h(ε) = J[ u + ε ϕ, v + ε ψ ] = L(x, u + ε ϕ, v + ε ψ, u′ + ε ϕ′ , v ′ + ε ψ ′ ) dx
a

and require
Z b
′ d
0 = h (0) = L(x, u + ε ϕ, v + ε ψ, u′ + ε ϕ′ , v ′ + ε ψ ′ ) dx
a dε ε=0
Z b  (5.2)
∂L ′ ∂L ∂L ′ ∂L
= ϕ(x) + ϕ (x) + ψ(x) + ψ (x) dx.
a ∂u ∂p ∂v ∂q
Integrating the terms involving ϕ′ and ϕ′ by parts, we arrive at the variational condition
Z b
 
B(b) − B(a) + ϕ(x) Eu(L) + ψ(x) Ev (L) dx = 0.
a

Here
∂L d ∂L
Eu (L) = (x, u, v, u′ , v ′ ) − (x, u, v, u′ , v ′ ),
∂u dx ∂p
(5.3)
∂L d ∂L
Ev (L) = (x, u, v, u′ , v ′ ) − (x, u, v, u′ , v ′ ),
∂v dx ∂q
are the Euler–Lagrange expressions associated with each of the dependent variables, while
the variational boundary terms are obtained by evaluating the following function at the

1/7/22 37 c 2022 Peter J. Olver


endpoints x = a, b:
∂L ∂L
B(x) = ϕ(x) (x, u, v, u′ , v ′ ) + ψ(x) (x, u, v, u′, v ′ ). (5.4)
∂p ∂q
The variations ϕ(x), ψ(x) are independent. Taking them to both have compact sup-
port in ( a, b ) annihilates the boundary terms in (5.2). Thus, applying the Fundamental
Lemma 3.4 to the individual terms in the integral, we conclude that any critical functions
u, v must satisfy the system of second order Euler–Lagrange equations
Eu (L) = 0, Eu (L) = 0. (5.5)
Further, in order that the boundary terms vanish at an endpoint, say x = a, we must
have either inhomogeneous fixed condition u(a) = α or thecorresponding homogeneous
natural boundary condition ∂L/∂p a, u(a), v(a), u′(a), v ′ (a) = 0; the same goes for the
other variable: either v(a) = β or ∂L/∂q a, u(a), v(a), u′(a), v ′ (a) = 0.
Example 5.1. Let us investigate the propagation of light rays in a coaxial cable,
cf. [31]. First off, in a three-dimensional optical medium, Fermat’s Principle implies
that the light rays minimize
 travel time between points. Arguing as in (2.5), the path
x(t) = x(t), y(t), z(t) traced by the ray will minimize the optics functional
Z T p
n(x, y, z) x2 + y 2 + z 2 dt,

J[ x ] = (5.6)


0
where we use dots to indicate derivatives with respect to the parameter t. Here
c
n(x) = (5.7)
v(x)
is called the index of refraction, and is prescribed by the ratio between the speed of light
in a vacuum c and the speed of light in the medium at a point x ∈ R 3 . The minimizing
light ray is subject to the boundary conditions x(0) = A, x(T ) = B. (An analysis of the
general optics functional (5.6) can be found below.)
Remark : The index of refraction can also depend on the wavelength of the light, causing
white light to split into its color constituents upon refraction. This effect is commonly
observed in rainbows, prisms, soap films, and chromatic aberrations in lenses.
We take the cable to be parallel to the z axis, and use cylindrical coordinates x = r cos θ,
y = r cos θ, z, throughout. We assume that the index of refraction depends only on the
radial coordinate, n(r), and parametrize the light rays by the  axial coordinate z, so that
in cylindrical coordinates the curve is given by r(z), θ(z), z for a ≤ z ≤ b. Transforming
(5.6) produces the objective functional
Z b p
J[ r, θ ] = n(r) 1 + r ′ 2 + (r θ ′ )2 dz, (5.8)
a
where the primes are used to indicate derivatives with respect to z. In other words, the
Lagrangian is p
L(r, θ, p, q) = n(r) 1 + p2 + r 2 q 2 ,

1/7/22 38 c 2022 Peter J. Olver


where we use p, q to represent the derivatives r ′ , θ ′ , respectively. As in (5.3), the Euler–
Lagrange equations have the form
∂L d ∂L d ∂L
Er (L) = − = 0, Eθ (L) = − = 0.
∂r dz ∂q dz ∂q
Thus, the second equation can be immediately integrated:
∂L n(r) r 2 θ ′
(z, r, r ′ , θ, θ ′ ) = p = α, (5.9)
∂q 1 + r ′ 2 + (r θ ′ )2
where α ∈ R is the constant of integration. As for the first Euler–Lagrange equation, we
note that the Lagrangian does not depend on the independent variable z, and hence, by
adapting Theorem 3.2, the associated Hamiltonian is constant:
∂L ∂L n(r)
H(z, r, r ′ , θ, θ ′) = r ′ + θ′ −L=− p = − β, (5.10)
∂p ∂q 1 + r ′ 2 + (r θ ′ )2
for some β ∈ R. Replacing the square root in (5.9) by its expression from (5.10) produces
dθ α
= . (5.11)
dz β r2
Substituting back into (5.10) and solving for r ′ yields
dr p −2
= β n(r)2 − β −2 α2 r −2 − 1,
dz
and hence we can determine r(z) by quadrature:
Z
dr
p =γ
β −2 n(r)2 − β −2 α2 r −2 − 1
for some γ ∈ R. The result can then be substituted into (5.11) to produce a formula for
θ(z) via a second quadrature. For example, if n(r) is constant, then the resulting curves
are, of course, straight lines, but re-expressed in cylindrical coordinates; details are left as
an exercise for the reader.
Boundary Conditions
Let us see how far we can expand the range of variationally admissible boundary condi-
tions for such functionals using the techniques of Section 4. In this situation, it is not hard
to prove, adapting the proof of Theorem 4.5, that all first order null Lagrangians have the
form
d ∂A ∂A ∂A
N (x, u, v, p, q) = A(x, u, v) = +p +q (5.12)
dx ∂x ∂u ∂v
for some function A that depends only on x, u, v. Replacing L by L + N in the preceding
computation, we arrive at the modified boundary function
   
∂L ∂A ∂L ∂A
B(x) = + ϕ(x) + + ψ(x). (5.13)
∂p ∂u ∂q ∂v

1/7/22 39 c 2022 Peter J. Olver


Thus, given a set of general first order boundary conditions†
 
u′ (a) = α u(a), v(a) , u′ (a) = β u(a), v(a) , (5.14)
the boundary term B(a) = 0 for all allowable variations if and only if
∂A ∂A
(a, u, v) = P (u, v), (a, u, v) = Q(u, v), (5.15)
∂u ∂u
where
∂L  
P (u, v) = − a, u, v, α u(a), v(a) , β u(a), v(a) ,
∂p
(5.16)
∂L  
Q(u, v) = − a, u, v, α u(a), v(a) , β u(a), v(a) .
∂q
Thus, we can construct a null Lagrangian that makes the boundary conditions (5.14)
variationally admissible if and only if we can solve (5.15), which requires the compatibility
constraint
∂P ∂Q
= . (5.17)
∂v ∂u
We conclude that, unlike the scalar case, not every set of boundary conditions is vari-
ationally admissible for a first order problem in several dependent variables. Boundary
conditions satisfying the constraint (5.17) are also known as self-adjoint boundary condi-
tions, [21].
Parametric Variational Problems
Some variational problems are more naturally formulated in terms of general parametri-
zed curves and surfaces, in which the extremal is no longer required to be the graph of a
function. For example, we earlier insisted that length-minimizing curves — geodesics — be
graphs of functions, but this restricts the class of allowable curves and may exclude those
that minimize length. In the planar case it is not difficult to show that any arc length
minimizing curve connecting two points that do not lie on a common vertical line must
be the graph of a function, namely the straight line connecting them. On the other hand,
often geodesics on more complicated surfaces, for example a torus, cannot be naturally
formulated as graphs of single-valued functions. Similarly, minimal surfaces — those that
minimize the surface area integral (2.9) — are also not necessarily graphs of single-valued
functions z = u(x, y), especially when the boundary is a complicated space curve. The
three-dimensional isometric problem of finding the closed surface of a prescribed surface
area that maximizes its enclosed volume should be formulated for general parametrized
surfaces, since (at least in rectangular coordinates) a closed surface will certainly not be
the graph of a single-valued smooth function.
It is thus of interest to formulate and study what are called parametric variational
problems. In this section, we will treat the simplest case — plane curves. We assume


Alternatively one or the other function could satisfy a fixed boundary condition; this case is
left to the reader.

1/7/22 40 c 2022 Peter J. Olver


a curve C ⊂ R 2 is parametrized by a pair of smooth (C2 ) functions (x, y) = (u(t), v(t))
for a ≤ t ≤ b. We assume that the parametrization is regular , meaning that its tangent
vector † t = (u, v) 6= 0 does not vanish. Singular points, where t = 0, include corners or
 

cusps as well as points where the curve is smooth but the parametrization is singular. We
may also want to assume that the curve is simple, although this requirement will not play
a role in our local analysis.
Proposition 5.2. Any first order parametric (parameter-independent) variational
problem for plane curves must take the form
Z b   Z b   
v  u
e u, v,  v dt,
J[ u, v ] = G u, v,  u dt = G (5.18)
a u a v
so that the Lagrangian
e v, p/q),
L(t, u, v, p, q) = p G(u, v, q/p) = q G(u,
e v, w), are functions of three variables.
where G(u, v, w), G(u,
Remark : In particular, if the curve is a graph, parametrized by (x, y) = (t, v(t)), so that
u(t) ≡ t, then (5.18) reduces to our usual form for a first order non-parametric variational
problem:
Z b
Jb[ v ] = G(x, v, v ′ ) dx.
a

Vice versa, given any non-parametric problem, we can convert it into an equivalent para-
metric form (5.18) by replacing x 7→ u, v ′ 7→ v/u, dx →
  
7 u dt.
Proof : Let
Z b
 
J[ u, v ] = L(t, u, v, u, v) dt
a

be parameter-independent. Suppose t = h(τ ) is an (orientation preserving) reparametriz-


ation of the curve, with c ≤ τ ≤ d, so that
 a = h(c), b = h(d), and h′ (τ ) > 0 for c ≤ τ ≤ d.
Then, by the chain rule d/dt = 1/h′ (τ ) d/dτ , and hence, by the usual formula for change
of variables in an integral,
Z d  
uτ vτ
J[ u, v ] = L h(τ ), u, v, ′ , ′ h′ (τ ) dτ.
c h (τ ) h (τ )
Parameter-independence requires that the latter integral have the form
Z d
L(τ, u, v, uτ , vτ ) dτ,
c


We use dots to denote derivatives with respect to the parameter t, retaining primes to denote
derivatives with respect to x.

1/7/22 41 c 2022 Peter J. Olver


and hence 
L(τ, u, v, p, q) = λ L h(τ ), u, v, p/λ, q/λ (5.19)

for all reparametrization functions h and all λ ∈ R+ . Since h is arbitrary, L must be


independent of τ . Moreover, setting λ = ± p in (5.19) implies

L(u, v, p, q) = p G(u, v, q/p), where G(u, v, w) = ± L(u, v, ±1, ± w),

reproducing the first expression in (5.18). A similar calculation setting λ = ± q produces


the second version. Q.E.D.

Let us write out the Euler–Lagrange equations (5.3, 5) for the parametric variational
problem (5.18). Using the first version involving G(u, v, w), and applying the product and
chain rules to differentiate, we find
   
∂L d ∂L  ∂G d v ∂G  ∂G v d ∂G
0 = Eu (L) = − =u + − G = −v +  ,
∂u dt ∂p ∂u dt u ∂w ∂v u dt ∂w

(5.20)
∂L d ∂L  ∂G d ∂G
0 = Ev (L) = − =u − ,
∂v dt ∂q ∂v dt ∂w
 
where G and its partial derivatives are evaluated at (u, v, v/u). We observe that the two
Euler–Lagrange equations are not independent; indeed, they are related by
 
u Eu (L) + v Ev (L) = 0. (5.21)

This result is not an accident. It is a consequence of Noether’s Second Theorem, and


results from the reparametrization invariance of the variational problem; see [42; Theorem
5.66] for full details.

Example 5.3. Consider the optics functional (5.6) in three-dimensional space, with
Lagrangian p
L(x, y, z, p, q, r) = n(x, y, z) p2 + q 2 + r 2 , (5.22)

where p, q, r represent the derivatives of x, y, z with respect to t. The three associated


Euler–Lagrange equations take the form

∂L d ∂L ∂n p d n(x, y, z) x

Ex (L) = − = (x, y, z) x2 + y 2 + z 2 − p = 0,
 
∂x dt ∂p ∂x dt x 2 + y 2 + z 2

and similarly for y and z. Replacing the t derivative by derivatives with respect to arc
length s, these can be combined into a single vector-valued equation,
 
d dx
n(x) = ∇n(x), (5.23)
ds ds
known as the ray equation. It can be viewed as the optical analogy of Newton’s laws of
mechanics, in which the right hand side plays the role of force and the left hand side can
be viewed as the rate of change of an “optical momentum”.

1/7/22 42 c 2022 Peter J. Olver


The Lagrangian (5.22) is easily seen to be parameter-independent. This implies that the
three associated Euler–Lagrange equations admit the dependency

x Ex (L) + y Ey (L) + z Ez (L) = 0,


  

or, equivalently,   
dx d dx
· ∇n(x) − n(x) = 0, (5.24)
ds ds ds
as can be verified directly. One can remove the degeneracy by using one of the coordinates
to parametrize the curve, e.g., reducing (5.6) to an objective functional depending on
y = y(x), z = z(x).

6. Second Order Variational Problems.


Another extension of the variational calculus is to variational problems involving higher
order derivatives of the functions. Here we discuss the case of a second order variational
problem involving a single unknown, and hence the objective functional has the form
Z b
J[ u ] = L(x, u, u′ , u′′ ) dx, (6.1)
a

prescribed by the Lagrangian L(x, u, p, q), which, for later purposes, we assume to be at
least four times continuously differentiable.
As usual, we introduce the variation
Z b
h(ε) = J[ u + ε ϕ ] = L(x, u + ε ϕ, u′ + ε ϕ′ , u′′ + ε ϕ′′ ) dx.
a

Any sufficiently smooth critical function u satisfies


Z b
′ d d
0 = h (0) = J[ u + ε ϕ ] = L(x, u + ε ϕ, u′ + ε ϕ′ , u′′ + ε ϕ′′ ) dx
dε ε=0 a dε ε=0
Z b 
∂L ′ ′′ ′ ∂L ′ ′′ ′′ ∂L ′ ′′
= ϕ(x) (x, u, u , u ) + ϕ (x) (x, u, u , u ) + ϕ (x) (x, u, u , u ) dx (6.2)
a ∂u ∂p ∂q
Z b
= B(b) − B(a) + ϕ(x) E(L) dx,
a

where the final formula results from integrating the second and third terms in the preceding
integral by parts. Here
   
∂L ′ ′′ d ∂L ′ ′′ d2 ∂L ′ ′′
E(L) = (x, u, u , u ) − (x, u, u , u ) + 2 (x, u, u , u ) , (6.3)
∂u dx ∂p dx ∂q
is the associated Euler–Lagrange expression, which will, in general, depend on fourth order
derivatives of u, while the variational boundary terms are obtained by evaluating

B(x) = g(x) ϕ(x) + h(x) ϕ′ (x), (6.4)

1/7/22 43 c 2022 Peter J. Olver


where
∂L d ∂L ∂L
g(x) = (x, u, u′ , u′′ ) − (x, u, u′ , u′′ ), h(x) = (x, u, u′ , u′′ ), (6.5)
∂p dx ∂q ∂q
at the endpoints x = a, b.
If ϕ(x) has compact support in ( a, b ) for each ε, then the boundary terms in (6.2) both
vanish: B(a) = B(b) = 0. Thus, applying the Fundamental Lemma 3.4, we conclude that
any critical function u must satisfy the fourth order Euler–Lagrange equation E(L) = 0.
This differential equation will be supplemented by four boundary conditions — two at
each endpoint. There are four possible pairs that ensure that the boundary terms in (6.2)
vanish. Working at the left hand endpoint x = a, we first note that if we have the fixed
boundary condition u(a) = α, then the variation satisfies ϕ(a) = 0 and hence the first term
in B(a) vanishes; otherwise, we need to impose a natural boundary condition of the form
g(a) = 0. Similarly, if we have u′ a) = β, then ϕ′ (a) = 0 and hence the second term in B(a)
vanishes; otherwise, we need to impose a natural boundary condition of the form h(a) = 0.
There are thus four possible pairs of variationally admissible boundary conditions at the
endpoint:

u(a) = α, u′ (a) = β; u(a) = α, g(a) = 0; u′ (a) = β, h(a) = 0; g(a) = h(a) = 0,


where g(x) and h(x) are given in (6.5). Similar conditions can be imposed at the right
hand endpoint x = b, leading to a total of 16 possible boundary value problems in which
the boundary conditions are uncoupled; one can also impose coupled boundary conditions
af several types, including periodic conditions

u(a) = u(b), u′ (a) = u′ (b), u′′ (a) = u′′ (b), u′′′ (a) = u′′′ (b).
One can expand the range of variationally compatible boundary conditions by adding in
a null Lagrangian although, unlike first order problems, not all boundary conditions are
admissible. The full analysis of the various possibilities can be found in [48].
With this in hand, the general case of variational problems depending on higher order
derivatives of several unknowns will be clear; see [42] for details and general formulas, in-
cluding the case of several independent variables (multiple integrals) that will be presented
below.
Example 6.1. The simplest nontrivial second order variational problem is
Z b
1 ′′ 2
J[ u ] = 2u dx (6.6)
a

that represents the potential energy of a flexible beam under small deformations, [3, 49].
An example is a spline, which, in pre–CAD (computer aided design) draftsmanship, was
a long, thin, flexible strip of wood that was used to draw a smooth curve through pre-
scribed points. The points were marked by small pegs, and the spline rested on the pegs.
These inspired the modern mathematical theory of splines, which have become ubiquitous
in numerical analysis, in geometric modeling, in design and manufacturing, in computer
graphics and animation, and in many other applications.

1/7/22 44 c 2022 Peter J. Olver


1
The Lagrangian is L = 2 q 2 , and the Euler–Lagrange equation (6.3) is simply

u′′′′ = 0, (6.7)
whose solutions are cubic polynomials:

u(x) = a x3 + b x2 + c x + d. (6.8)
The coefficients are prescribed by the four imposed boundary conditions. Using (6.5), we
see that, in order to be variationally compatible, these must be of the following form,
designated by their physical interpretation:
(a) Fixed (clamped) end: u(a) = α, u′ (a) = β,
(b) Simply supported end: u(a) = α, u′′ (a) = 0,
(c) Sliding end: u′ (a) = β, u′′′ (a) = 0,
(d) Free end: u′′ (a) = u′′′ (a) = 0,

Example 6.2. Euler’s elastica models the bending of a thin flexible rod, cf. [3, 31, 36].
(The beam equation in Example 6.1 can be regarded a first order approximation of the
fully nonlinear elastica, under the assumption that the derivative (deformation gradient)
is small: u′ ≪ 1.) The problem was originally posed by Jacob Bernoulli in 1961, and then
essentially solved by Euler in 1744.
Remark : In Euler’s treatment, he also imposed the isoperimetric constraint on the elas-
tica that its length be fixed. This constraint can be handled by the methods introduced
in the following section; see also [31].
For simplicity, we assume that the elastica is restricted to lie in a plane, and its shape
ius a curve that can be identified with the graph of a function y = u(x) for a ≤ x ≤ b.
Let ℓ denote the length of the elastica (which can be computed using the usual arc length
integral), and let
u′′
κ= (6.9)
(1 + u′ 2 )3/2
denote its curvature, [24]. Then the equilibrium configurations minimize the elastica
functional
Z ℓ Z b
1 2 u′′ 2 dx
J[ u ] = 2 κ ds = ′ 2 5/2
. (6.10)
0 a 2 (1 + u )

The Euler-Lagrange equation thus has the form


   
d ∂L ′ ′′ d2 ∂L ′ ′′
− (x, u, u , u ) + 2 (x, u, u , u ) = 0. (6.11)
dx ∂p dx ∂q
As such, it is a very complicated nonlinear fourth order ordinary differential equation, and
solving it appears to be hopeless.
Nevertheless, in a tour-de-force, Euler figured out how to solve the Euler–Lagrange
equation (6.11) and thus describe the possible equilibrium configurations of the elastica.

1/7/22 45 c 2022 Peter J. Olver


His ingenious idea was to re-express the equation in terms of the curvature (6.9) and its
derivatives with respect to the arc length parameter, using the basic formula
d 1 d
=√ . (6.12)
ds 1 + u′ 2 dx
A lengthy but straightforward computation shows that the left hand side of (6.11) can be
re-expressed in the relatively simple form
 
1 d 1 dκ d2 κ
√ √ + 12 κ3 = 2 + 12 κ3 = 0.
1 + u′ 2 dx 1 + u′ 2 dx ds
Thus, we can determine κ(s) as a function of arclength by solving the nonlinear second
order ordinary differential equation
d2 κ 1 3
+ 2 κ = 0. (6.13)
ds2
This form of the Euler–Lagrange equation is intrinsic and does not require that the curve
be the graph of a single-val;ued function, i.e., it applies as written to all nonsingular curves
that are critical functions for the elastica functional.
It is not difficult to integrate (6.13) to determine the curvature explicitly as a function
of arc length. First, multiplying the equation by dκ/ds allows us to integrate once:
 
1 dκ 2 1 4
+ 8 κ = 18 c,
2 ds
where c is a constant of integration. Solving for dκ/ds produces an autonomous first order
ordinary differential equation, and hence we can recover κ(s) by a second integration and
inverting the result: Z
ds
√ = 21 s + δ,
c−κ 4

where δ is a second constant of integration. Now the integral on the left hand side cannot
be performed in terms of elementary functions, and is a type of elliptic integral, whose
inverse is an elliptic function. Thus, the curvature of an elastica is, in general, an elliptic
function of the arc length.
Remark : Elliptic functions and elliptic integrals are of importance in a broad range of
mathematics and applications, [41]. They first appeared in the literature in Euler’s work
on the elastica.
Finally, to recover the elastic curve, we are in need of the formulas for recovering a curve
from its curvature, [24; Theorem 5.14].
Theorem 6.3. Suppose κ(s) is the curvature of a plane curve C ⊂ R 2expressed as a
function of arc length. Then the curve itself is parametrized by x(s), y(s) , where
Z Z Z
x(s) = cos θ(s) ds + ξ, y(s) = sin θ(s) ds + η, θ(s) = κ(s) ds + ζ, (6.14)

where ξ, η, ζ are constants of integration.

1/7/22 46 c 2022 Peter J. Olver


Proof : First note that
dx dy
= cos θ(s), = sin θ(s),
ds ds
which implies that they satisfy (2.15), proving that the curve defined by (6.14) is param-
etrized by arc length.
Second, if we write out the final integral, using x as the parameter, identifying the curve
with the graph of y = u(x), we find
Z
u′′ dx −1 ′ 1 u′
θ= = tan (u ), whence cos θ = √ , sin θ = √ .
1 + u′ 2 1 + u′ 2 1 + u′ 2
The function θ is known as the turning angle of the curve, and measures the angle between
its tangent and the x-axis, which is evident from the preceding formulas. Moreover,

dθ 1 dθ u′′
= √ = = κ,
ds 1 + u′ 2 dx (1 + u′ 2 )3/2
reproducing the final formula in (6.14). Q.E.D.

Remark : Since the curvature is defined as the inverse of the radius of the osculating
circle to the curve, it is unaffected by a rigid motion, i.e., a translation or rotation of the
plane. Thus any translation or rotation of a curve produces the same curvature function
κ(s). Conversely, any two curves that have the same curvature function differ only by a
rigid motion. The constants of integration in (6.14) can be identified with the effect of
translations and rotations on the curve.

Finally, substituting the elliptic function formula for the curvature into (6.14) produces
some quite complicated integrals; nevertheless they can be straightforwardly approximated
through numerical integration, which enabled Euler to produce illustrations of a variety
of possible elastica configurations; Figure 12 is copied from his original paper and shows
several possible shapes achieved by a planar elastica. Note that not all of these are the
graphs of single-valued functions, and for the more general curves one needs to rewrite
the curvature and arc length element for a general parametrized curve x = u(t), y = v(t),
leading to a second order parametric variational problem, whose Euler–Lagrange equations
satisfy the same dependency (5.21) as we found in the first order case. See [31] for further
examples, along with an alternative method for integrating the Euler–Lagrange equation.

7. Variational Problems with Constraints.


More generally, one might seek to optimize an objective functional, say
Z b
J[ u ] = L(x, u, u′ ) dx, (7.1)
a

where the extremal u(x) is subject to one or more constraints in addition to the usual
range of boundary conditions. Such constraints come in three main flavors:

1/7/22 47 c 2022 Peter J. Olver


Figure 12. Euler’s Solutions to the Elastica.

1/7/22 48 c 2022 Peter J. Olver


• Integral constraints:
Z b
I[ u ] = F (x, u, u′ ) dx = c, (7.2)
a

where c ∈ R is a constant.
• Holonomic † constraints: these are pointwise constraints on the minimizer of the general
form‡
G(x, u) = 0. (7.3)

• Nonholonomic constraints: these are pointwise constraints on the minimizer and its
derivative of the general form

K(x, u, u′ ) = 0. (7.4)

To be genuinely nonholonomic, the constraint should not be the derivative of a


holonomic constraint (7.3), i.e., of the form

d ∂G ∂G
G(x, u) = + u′ = 0,
dx ∂x ∂u
which is equivalent to the holonomic constraint G(x, u) = c, where c ∈ R is the
constant of integration, which can be fixed by the imposed boundary conditions.
Nonholonomic constraints arise in mechanics and in optimal control, and are con-
siderably more subtle than the holonomic version. For full details, we refer the
reader to [7].
Before explaining how to adapt the variational calculus to constrained optimization
problems, let us introduce a few interesting examples.

Example 7.1. Queen Dido’s Problem. According to the legend of the founding of
Carthage, Queen Dido was granted all the land she could enclose within an ox hide. She
cleverly cut the hide into thin strips that she tied together to form a rope, which she then
used to maximize the area enclosed by the rope. The simplest version of her problem is
the isoperimetric problem introduced in Section 2. Here, the rope forms a closed curve in
the (flat) plane, and one seeks to maximize the area enclosed subject to its length being
fixed. The solution is well known to be a circle with perimeter ℓ; the reader may enjoy
adapting our subsequent analysis to establish this.
Since the area enclosed by a circle with perimeter ℓ = 2 π r, where r is its radius, is
given by π r 2 = ℓ2 /(4 π) which is the maximal area among all such curves, we deduce the


The word “holonomic” was coined in 1894 by the German physicist Heinrich Hertz by combining
the Greek words for “entire” and “law”.

Actually, a holonomic constraint on a scalar function is a bit silly, as it would essentially
uniquely prescribe the function u(x), and hence the optimization would be trivial. Thus, holo-
nomic constraints will only play a role when the objective functional depends more than one
function: u(x) = (u1 (x), . . . , un (x)) for n ≥ 2.

1/7/22 49 c 2022 Peter J. Olver


u(x)

−a a
Figure 13. Queen Dido’s Problem.

important isoperimetric inequality: If C ⊂ R 2 is a simple closed curve of length ℓ enclosing


a domain of area A, then
ℓ2
A≤ , (7.5)

with equality if and only if the curve is a circle. Vice versa, if we fix A and ask for
the closed curve of shortest length whose enclosed area equals A, then √ the isoperimetric
inequality (7.5) implies that the curve must be a circle of length ℓ = 2 πA. This is the
reason why drop of oil on the surface of a bowl of soup assumes a circular shape: molecular
forces cause it to minimize the length of its boundary. The similar three-dimensional result
implies that a soap bubble enclosing a fixed volume of air must, in the absence of external
forces, assume a spherical shape, as will a globule of oil or water in outer space.
A slightly more challenging version of Dido’s isoperimetric problem is when the area to
be enclosed is on the shore of the Mediterranean Sea which, for simplicity, we assume to
have a straight shoreline, which we take to be the x axis with the land lying above it. We
seek the curve lying on or above the axis of a given length ℓ that connects two points,
say (− a, 0), (a, 0), with a > 0, that maximizes the area enclosed by it and the x axis;
see Figure 13. If the curve is the graph of a function y = u(x) ≥ 0, then the variational
problem is to maximize the area functional
Z a
J[ u ] = u dx, (7.6)
−a

among all non-negative functions that satisfy the following length constraint and boundary
conditions: Z a p
L[ u ] = 1 + u′ 2 dx = ℓ, u(− a) = u(a) = 0. (7.7)
−a

The endpoints can either be fixed or variable, and the reader may enjoy speculating as to
the shape of the solution in each case before we solve the problem later in this section.
Example 7.2. Geodesics on Surfaces. As noted in Section 2, the surface geodesic
problem is to find the shortest path between two points in R 3 , but with a constraint that

1/7/22 50 c 2022 Peter J. Olver


the curve must lie on a prescribed surface, say that defined by an implicit equation

G(x, y, z) = 0. (7.8)

For example, the geodesics on a sphere are parametrized curves u(t) = x(t), y(t), z(t)
satisfying the constraint
x2 + y 2 + z 2 = r 2 , (7.9)
where 0 < r ∈ R is the radius of the sphere, that minimize the arc length functional
Z bq
x2 + y 2 + z 2 dt
 
J[ u ] = (7.10)


among all curves between the initial and final points u(a) = α, u(a) = β, that both
lie on the sphere. As written, the arc length functional is independent of parameter, and
hence one expects the geodesic equations to be underdetermined, as the form discussed in
Section 5.
As with any holonomic constraint, there are two evident methods one can employ to
solve this problem. One is to explicitly parametrize the surface — for example in the
case of a sphere (7.9), one can use spherical coordinates. However, finding an explicit
parametrization of a complicated implicitly defined surface (7.8) may not be so easy. The
other approach is to use the methods based on Lagrange multipliers, to be presented below.

Lagrange Multipliers
Let us begin by reviewing the calculus of Lagrange multipliers for finite-dimensional con-
strained optimization problems, named in honor of Lagrange, who pioneered the method.
The Lagrange multiplier calculus will then be adapted to the infinite-dimensional situation
under consideration.
To keep the calculations simple, let us concentrate first on the three-dimensional case of
finding the minimum value of a scalar-valued objective function F (x) = F (x, y, z) when
its arguments (x, y, z) are constrained to lie on an implicitly defined surface S ⊂ R 3 , as
given by the constraint of the form (7.8). Suppose x⋆ = (x⋆ , y ⋆ , z ⋆ )T ∈ S is a (local)
T
minimum for the constrained objective function. Let x(t) = ( x(t), y(t), z(t) ) ⊂ S be
any curve contained within the surface that passes through the minimum, with x(0) = x⋆ ,
x′ (0) = v. Then the scalar function g(t) = F (x(t)) must have a local minimum at t = 0,
and hence, by the chain rule,
d
0 = g ′ (0) = F (x(t)) = ∇F (x(0)) · x′ (0) = ∇F (x⋆ ) · v . (7.11)
dt t=0

Thus, the gradient of the objective function at the surface minimum must be orthogonal
to the tangent vector to the curve. Since the curve was constrained to lies entirely in S, its
tangent vector v is tangent to the surface at the point x⋆ . Since, by the Implicit Function
Theorem, [4, 35], every tangent vector to a sufficiently regular surface is tangent to some
curve contained therein, ∇F (x⋆ ) must be orthogonal to every tangent vector, and hence
point in the normal direction to the surface.

1/7/22 51 c 2022 Peter J. Olver


Thus, a constrained critical point x⋆ ∈ S of a function on a surface is defined so that

∇F (x⋆ ) = λ n, (7.12)
where n denotes a (nonzero) normal to the surface at the point x⋆ . The scalar factor λ
is known as the Lagrange multiplier . The value of the Lagrange multiplier is not fixed
a priori, but must be determined by solving the critical point system (7.12). The same
reasoning applies to local maxima, which are also constrained critical points. The nature of
a constrained critical point — local minimum, local maximum, local saddle point, etc. —
is, in favorable cases, determined by a constrained second derivative test; see [47; Section
4] for details.
For an implicitly defined surface (7.8), the gradient vector ∇G(x) points in the normal
direction to the surface at the point x ∈ S, and hence, provided the regularity condition
n = ∇G(x) 6= 0 holds, the surface critical point condition (7.12) (dropping the star) can
be rewritten as
∇F (x) = λ ∇G(x). (7.13)
Thus, to find the constrained critical points, one needs to solve the combined system (7.8,
13) consisting of four equations in four unknowns x, y, z and the Lagrange multiplier λ.
Formally, one can reformulate the problem as an unconstrained optimization problem by
introducing the augmented objective function

K(x, y, z, λ) = F (x, y, z) − λ G(x, y, z). (7.14)


The critical points of the augmented function are where its gradient, with respect to all four
arguments, vanishes. Setting the partial derivatives with respect to x, y, z to 0 reproduces
the system (7.13), while its partial derivative with respect to λ reproduces the constraint
(7.8).
The same ideas can be applied to optimization problems involving objective functions
depending on several variables that are subject to multiple constraints. Suppose we seek
to minimize the objective function F (x) = F (x1 , . . . , xn ) under the constraints

G1 (x) = · · · = Gk (x) = 0. (7.15)


A point x satisfying the constraints is termed regular if the corresponding gradient vectors
∇G1 (x), . . . , ∇Gk (x) are linearly independent. (Singular points are more tricky, and must
be handled separately.) Applying the same calculation, a regular constrained critical point
necessarily satisfies the vector equation

∇F (x) = λ1 ∇G1 (x) + · · · + λk ∇Gk (x), (7.16)


where the unspecified scalars λ1 , . . . , λk are called the Lagrange multipliers. The critical
points are thus found by solving the combined system (7.15–16) for the n + k variables
x1 , . . . , xn , λ1 , . . . , λk . As in (7.14) we can reformulate this as an unconstrained optimiza-
tion problem for the augmented objective function
k
X
K(x, λ) = F (x) − λi Gi (x). (7.17)
i=1

1/7/22 52 c 2022 Peter J. Olver


The gradient with respect to x reproduces the critical point system (7.16), while its gradient
with respect to to λ = (λ1 , . . . , λk ) recovers the constraints (7.15). Putting the pieces
together, we have proved:

Theorem 7.3. Every regular constrained local minimum and local maximum is a
constrained critical point that satisfies the Lagrange multiplier equation (7.16).

Similarly, solving a constrained variational problem can be effected by the introduction of


suitable Lagrange multipliers. For integral constraints, the Lagrange multiplier is a scalar,
whereas for holonomic and nonholonomic constraints it is a function. (Intuitively, each
individual point where the constraint is imposed requires a separate Lagrange multiplier,
and the net result is that the collection of all such Lagrange multipliers can be viewed as
a function in its own right.) One then modifies the objective functional by adding in all
the constraints multiplied by their respective Lagrange multipliers. For example, given
(7.1) subject to an integral constraint (7.2) and a holonomic constraint (7.3), we form the
augmented variational problem
Z b "Z # Z
b b
b u, λ, µ] =
J[ L(x, u, u′ ) dx − λ F (x, u, u′ ) dx − c − µ(x) G(x, u) dx
a a a
Z (7.18)
b  
= L(x, u, u′ ) − λ F (x, u, u′ ) − µ(x) G(x, u) dx + λ c,
a

where λ ∈ R is the constant Lagrange multiplier corresponding to the integral constraint,


whereas µ(x) is a Lagrange multiplier function associated with the pointwise holonomic
constraint. A justification of this construction appears at the end of this section.
To determine the first variation of the resulting augmented variational problem (7.18),
we vary both u(x) and the Lagrangian multipliers λ and µ(x), and thus compute the
derivative of the scalar function
Z b

b
h(ε) = J[ u + ε ϕ, λ + ε ξ, µ + ε η ] = L(x, u dx + ε ϕ, u′ + ε ϕ′ )
a

− (λ + ε ξ) F (x, u + ε ϕ, u′ + ε ϕ′ ) − (µ + ε η) G(x, u + ε v) dx + (λ + ε ξ) c,
where ϕ(x), η(x) are functions, while ξ ∈ R is a scalar. We find
Z b      
′ ∂L ∂F ∂G ′ ∂L ∂F
h (0) = ϕ −λ −µ +ϕ −λ − ξ F − η G dx + ξ c.
a ∂u ∂u ∂u ∂p ∂p
Integrating the term involving ϕ′ by parts, and keeping in mind that λ and ξ are both
constant, produces
  b Z b !
∂L ∂F
h′ (0) = ϕ −λ −ξ F dx − c
∂p ∂p x=a a
Z b     
∂L d ∂L ∂F d ∂F ∂G
+ ϕ − −λ − −µ − η G dx.
a ∂u dx ∂p ∂u dx ∂p ∂u

1/7/22 53 c 2022 Peter J. Olver


The variations ϕ, ξ, η are independent, and so we can set the coefficients of each to 0.
First, if we impose fixed boundary conditions at x = a, b, then ϕ(a) = ϕ(b) = 0, and hence
the boundary terms vanish. Second, the coefficient of ξ must vanish, which recovers the
integral constraint (7.2). Applying the analogue of the Fundamental Lemma 3.4 to the first
integral, the coefficient of the variation ϕ produces the associated system of constrained
Euler–Lagrange equations, while the coefficient of η reproduces the holonomic constraint:
  Z b
∂L d ∂L ∂F d ∂F ∂G
− −λ − −µ = 0, F (x, u, u′ ) dx = c, G(x, u) = 0.
∂u dx ∂p ∂u dx ∂p ∂u a
(7.19)
In practice, one can solve the Euler–Lagrange equation for u(x) in terms of λ and µ(x)
and then substitute the result in the constraints to fix the latter’s values and hence the
candidate solution.
Let us treat a couple of illustrative examples based on this methodology.
Example 7.4. Consider the Mediterranean version (7.6–7) of Queen Dido’s Problem
of maximizing the area under the graph of a function subject to a length constraint and
imposed boundary conditions. Let λ ∈ R denote the Lagrange multiplier corresponding to
the length constraint, so our augmented functional is
Z a p 
e u] =
J[ u − λ 1 + u′ 2 dx + λ ℓ.
−a

The constrained Euler–Lagrange equation


d λ u′ p
1+ √ = 0 can immediately be integrated: λ u′ 1 + u′ 2 = c − x,
dx 1 + u′ 2
where c ∈ R is the integration constant. Solving for
x−c p
u′ = ± p , and then integrating again, we find u = d ± λ2 − (x − c)2 ,
λ2 − (x − c)2
where d ∈ R. Equivalently
(x − c)2 + (u − d)2 = λ2 ,
and so the critical function is a circular arc. Thus, the candidate maximizer is the circular
arc of length ℓ ≥ 2 a passing through the points (− a, 0) and (a, 0), with c = 0 and hence
the center (0, d) of the circle lies on the y axis, midway between the two points. Such a
circular arc coincides with the graph of a function y = u(x) provided† 2 a < ℓ ≤ 2 π a, the
upper bound corresponding to a semicircle centered at the midpoint on the x axis (which is
not differentiable at the endpoints). In fact, the parametric version of the problem implies
that the solution continues to be a circular arc even when ℓ > 2 π a and hence the only
actual length requirement is ℓ ≥ 2 a. Figure 14 illustrates a few representative examples
of such area maximizers.


When ℓ = 2 a, the solution is a straight line, which does not appear among the solutions to the
Euler–Lagrange equation. Effectively it corresponds to the Lagrange multiplier λ = ∞.

1/7/22 54 c 2022 Peter J. Olver


Figure 14. Queen Dido’s Problem.

Remark : One can further show that the semicircle is the solution to the variable endpoint
problem, satisfying the natural boundary condition that it be orthogonal to the x axis.
The reader may wish to analyze the case when only one endpoint is fixed.
Example 7.5. A relatively simple variational problem with a holonomic constraint is
that of geodesics on an implicitly defined surface S ⊂ R 3 , given an equation of the form
F (x, y, z) = c (7.20)
for c ∈ R (which could be set equal to zero by modifying F ). We will assume the surface is
non-empty, meaning there exist real solutions to the implicit equation (7.20) and, moreover,
is regular meaning that
∇F (x, y, z) 6= 0
whenever (x, y, z) satisfy (7.20); this ensures that the surface is smooth, without corners,
cusps, or other singularities. Note that ∇F defines a normal vector to the surface at each
point. For eample, the unit sphere is defined by the implicit equation x2 + y 2 + z 2 = 1,
and the normal vector ∇F = 2 (x, y, z) is the outwards pointing radial vector.
A geodesic between two points A, B ∈ S is, by definition, a curve C ⊂ S that provides
a minimum for the arc length functional
Z bp
J[ x ] = x2 + y 2 + z 2 dt,
  

a

among all parametrized curves x(t) = x(t), y(t), z(t) that satisfy the holonomic con-
straint (7.20) for each a ≤ t ≤ b and join the points A = x(a), B = x(b). Applying
the method of Lagrange multipliers, we see that every geodesic must satisfy the system of
second order ordinary differential equations
d ∂F d ∂F d ∂F
  
x y z
− p = µ , − p = µ , − p =µ ,
dt x + y + z
2  2  2 ∂x dt x + y + z
 2  2  2 ∂y dt z + y + z
2  2  2 ∂z

1/7/22 55 c 2022 Peter J. Olver


where µ(t) is the Lagrange multiplier associated with the constraint (7.20). The left hand
side is the Euler–Lagrange expression associated with the arc length functional. We can
combine these into a single vector-valued equation,
 
d 1 dx
= − µ ∇F (x), (7.21)
dt k dx/dt k dt
The geodesic problem is parameter-independent since we can reparametrize the geodesic
without affecting either its length or the fact that it is constrained to lie in the surface.
(The reparametrization does affect the Lagrange multiplier.) This implies a degeneracy
among the Euler–Lagrange equations that takes the form

x Ex (L)

e = 0,
e + z E (L)
e + y E (L)
y z

or, equivalently,    
dx d 1 dx
· + µ ∇F (x) = 0,
dt dt k dx/dt k dt

as the reader can verify. Keep in mind that if x(t) satisfies (7.20), then x·∇F = 0, meaning
that the tangent vector to the geodesic is orthogonal to the normal vector to the surface.
For example, in the case of the unit sphere, one easily checks that the great circle through
the north pole parametrized by
p 1
x= 1 − t2 , y = 0, z = t, µ(t) = √ , (7.22)
1 − t2
satisfies the Euler–Lagrange equation (7.21). Since the Euler–Lagrange equations are
invariant under rotations of the sphere, we deduce that the solutions, and candidate
geodesics, are the great circles. With some further work one can show that the minimizing
geodesic between any two points on the sphere is the shorter of the two great circles con-
necting them — unless the points are antipodal in which case all the semi-circular great
circles connecting them have the same minimal arc length. For instance, the great circle
(7.22) is the unique geodesic connecting its endpoints provided 0 ≤ t ≤ b < 1. When
b = 1, it is a semicircle from north to south poles. The south pole, where the nature of
the geodesics changes, is known as a conjugate point to the north pole. We will study
conjugate points in depth in Section 8.

Justification of the Method of Lagrange Multipliers


Let us justify the Lagrange multiplier construction for constrained variational problems
that was employed above. For simplicity, we will separately analyze the cases of a single
integral constraint and then a single holonomic constraint, leaving the reader to extend
the arguments to multiple constraints.
Let us start with a single holonomic constraint. Since, as noted above, this make little
sense for a single function, we consider a variational problem
Z b
J[ u ] = F (x, u, u′) dx (7.23)
a

1/7/22 56 c 2022 Peter J. Olver


depending on several unknown functions u(x) = (u1 (x), . . . , un (x)) depending on a scalar
variable x ∈ R. We seek to minimize the functional (7.23), subject to, say, fixed boundary
conditions
u(a) = α, u(b) = β, (7.24)
whenever u(x) satisfies the holonomic constraint

G(x, u) = 0. (7.25)

Suppose that u(x) is a local minimizer. To obtain necessary conditions, the first step
is to perform a variation. Since the constraint will probably not allow linear variations
of the type considered above, we let uε (x) = u(ε, x) be a one-parameter family of func-
tions, depending on (sufficiently small) ε ∈ R, satisfying the boundary conditions and the
constraint† :
G(x, uε (x)) = 0, uε (a) = α, uε (b) = β, (7.26)
We further assume that, at ε = 0, the function u0 (x) = u(x) is our candidate minimizer.
Set
d ∂u
uε (x) = (0, x) = ϕ(x). (7.27)
dε ε=0 ∂ε
Differentiating the boundary conditions with respect to ε, and then setting ε = 0, we
see that the linearized variation ϕ must satisfy the homogeneous boundary conditions
ϕ(a) = 0, ϕ(b) = 0. Similarly differentiating the constraint (7.26) produces

g(x) · ϕ(x) = 0, where g(x) = ∇u G(x, u(x)) (7.28)


is the gradient of G with respect to the variables u1 , . . . , un , evaluated at the minimizer
(x, u(x)). As in the finite-dimensional constrained optimization problem, we make the non-
degeneracy assumption that g(x) 6= 0 at each point x. The Implicit Function Theorem,
[4, 35], implies that, at each fixed x, the constraint equation defines a n − 1 dimens-
ional submanifold. Moreover, these submanifolds depend smoothly on x, which in turn
guarantees the existence of appropriate variations uε (x).
If u is a minimizer, then the scalar function h(ε) = J[ uε ] has a local minimum at ε = 0.
Differentiating with respect to ε, setting ε = 0, and integrating the result by parts as
before, we see that the induced boundary terms vanish, producing the variational equation
Z b
E(L) · ϕ(x) dx = 0 (7.29)
a

in which
 ∂L d ∂L
E(L) = E1 (L), . . . , En (L) where Ei (L) = − (7.30)
∂ui dx ∂pi


Thus, the family plays the role of the curve lying in the constraint submanifold in the finite-
dimensional case.

1/7/22 57 c 2022 Peter J. Olver


is the Euler–Lagrange expression corresponding to ui , with pi representing its derivative;
see (5.3). At this point, we are unable to apply the Fundamental Lemma 3.4 as stated
to (7.29) because the variation ϕ is not arbitrary, but must also satisfy the linearized
constraint (7.28). However, we can circumvent this difficulty by the following device.
Given a general compactly supported vector-valued function ψ(x) = (ψ1 (x), . . . , ψn (x))
let us set
ψ(x) · g(x)
ϕ(x) = ψ(x) − g(x).
k g(x) k2
It is easily seen that ϕ(x) is also of compact support and, moreover, satisfies the linearized
constraint (7.28). Substituting into (7.29),
Z b Z b  
ψ(x) · g(x)
0= E(L) · ϕ(x) dx = E(L) · ψ(x) − E(L) · g(x) dx
a a k g(x) k2
Z b
(7.31)
 
= E(L) − µ(x) g(x) · ψ(x) dx,
a

where
E(L) · g(x)
µ(x) = , (7.32)
k g(x) k2
which will play the role of our Lagrange multiplier. Now since ψ(x) is arbitrary, we can
apply the Fundamental Lemma 3.4 to the latter integral in (7.31), and conclude that the
constrained minimizer u(x) must satisfy the system

E(L) − µ(x) g(x) = E(L)(x, u(x), u′(x), u′′ (x)) − µ(x) ∇u G(x, u(x)) = 0, (7.33)

which is precisely the Lagrange multiplier equation introduced above. Indeed, (7.32) even
provides an explicit formula for the Lagrange multiplier function µ(x).
In the case of several holonomic constraints

G1 (x, u) = · · · = Gk (x, u) = 0, (7.34)

a similar calculation produces the associated Lagrange multiplier equation. In detail, we


set gi (x) = ∇u Gi (x, u(x)), and require the regularity condition that the gradient vectors
g1 (x), . . . , gk (x) are linearly independent at each point x. Let h1 (x), . . . , hk (x) be “dual”
functions satisfying 
1, i = j,
gi (x) · hj (x) = (7.35)
0, i 6= j.
For example, we can set
−1
H(x) = (h1 (x), . . . , hk (x)) = G(x) G(x)T G(x) , where G(x) = (g1 (x), . . . , gk (x))

is the n × k matrices with the indicated columns, and the invertibility of the positive
definite Gram matrix G(x)T G(x) follows from our regularity assumption that the columns
of G(x) are linearly independent, [49].

1/7/22 58 c 2022 Peter J. Olver


As before, let uε (x) be a one-parameter family of functions satisfying the constraints
with u0 = u. Thus, ϕ(x) = ∂uε /∂ε | ε = 0 is subject to the linear constraints
gi (x) · ϕ(x) = 0, i = 1, . . . , k. (7.36)
If u is a minimizer, then h(ε) = J[ uε ] has a local minimum at ε = 0. Differentiating, we
see that the variation ϕ(x) satisfies (7.29). Generalizing the preceding argument, given a
general compactly supported variation ψ(x), we set
k
X  
ϕ(x) = ψ(x) − gi (x) · ψ(x) hi (x).
i=1
In view of (7.35), we readily see that ϕ(x) is compactly supported and satisfies the lin-
earized constraints (7.36). Thus, by a similar calculation,
Z b Z b k
!
X
0= E(L) · ϕ(x) dx = E(L) − µi (x) gi (x) · ψ(x) dx,
a a i=1
where the Lagrange multiplier functions are given by
µi (x) = E(L) · hi (x), i = 1, . . . , k.
Since ψ is arbitrary, the Fundamental Lemma 3.4 implies the minimizer u(x) must satisfy
the Lagrange multiplier system of Euler–Lagrange equations
k
X k
X
E(L) − µi gi = E(L) − µi ∇u Gi = 0.
i=1 i=1

Next, consider the minimization problem for the scalar functional (7.1) subject to the
integral constraint (7.2) and fixed boundary conditions (7.24). Suppose that u(x) is a
local minimizer. Let uε (x) = u(ε, x) be a one-parameter family of functions, depending on
(sufficiently small) ε ∈ R, with u0 = u, that satisfies the constraint, so
Z b
I[ uε ] = F (x, uε , u′ε ) dx = c. (7.37)
a
Set ϕ(x) = ∂uε /∂ε | ε = 0 , which, as usual, satisfies the homogeneous fixed boundary con-
ditions. Differentiating (7.37) with respect to ε, setting ε = 0, and integrating the result
by parts, noting that the induced boundary terms vanish, and so
Z b
∂F d ∂F
E(F ) ϕ(x) dx = 0 where E(F ) = − (7.38)
a ∂u dx ∂p
is the Euler–Lagrange expression associated with the integrand F (x, u, p) evaluated at the
function u(x). On the other hand, the variation of the functional (7.1) is given by our
usual formula (3.11), and hence we require
Z b
E(L) ϕ(x) dx = 0
a
whenever ϕ(x) satisfies (7.38). To proceed, we are in need of a constrained version of the
Fundamental Lemma 3.4.

1/7/22 59 c 2022 Peter J. Olver


Z b
Lemma 7.6. Let f (x) and g(x) be continuous on [ a, b ]. Suppose f (x) ϕ(x) dx = 0
Z b a

for every compactly supported C function ϕ(x) that satisfies g(x) ϕ(x) dx = 0. Then
a
f (x) ≡ λ g(x) for all a ≤ x ≤ b, for some λ ∈ R.
Proof : If g(x) ≡ 0, there is no constraint and the result reduces to the usual Fundamental
Lemma 3.4. Otherwise, when g(x) 6≡ 0, there exists a C∞ function η(x) with compact
Z b
support such that g(x) η(x) dx = 1. Indeed, the Fundamental Lemma 3.4 implies that
a Z b

there exists a C function ξ(x) with compact support such that g(x) ξ(x) dx = c 6= 0,
a
and we can set η(x) = ξ(x)/c.
Let ψ(x) be an arbitrary C∞ function with compact support. Then
Z b Z b
ϕ(x) = ψ(x) − η(x) g(y) ψ(y) dy satisfies g(x) ϕ(x) dx = 0,
a a

and hence
Z b Z b Z Z !
b b
0= f (x) ϕ(x) dx = f (x) ψ(x) dx − f (x) η(x) g(y) ψ(y) dy dx
a a a a
Z b Z b Z b Z b  
= f (x) ψ(x) dx − g(x) ψ(x) dx f (y) η(y) dy = f (x) − λ g(x) ψ(x) dx
a a a a
where Z b
λ= f (x) η(x) dx.
a
Since ψ is arbitrary, we can apply the Fundamental Lemma 3.4 to the final integral to
conclude that f (x) ≡ λ g(x). Q.E.D.
In the case of several constraints, one extends Lemma 7.6 as follows. Suppose f (x) and
g1 (x), . . . , gk (x) are continuous on [ a, b ]. We can assume, without loss of generality‡ , the
regularity condition that the latter are “linearly independent” in the sense that there exists
C∞ functions ξ1 (x), . . . , ξk (x) with compact support such that the k × k matrix A with
Z b Z b
entries aij = gi (x) ξj (x) dx is nonsingular. Then f (x) ϕ(x) dx = 0 for every C∞ func-
a a Z b
tion ϕ(x) with compact support in ( a, b ) that satisfies the constraints gi (x) ϕ(x) dx = 0
a
for i = 1, . . . , k if and only if there exist λ1 , . . . , λk ∈ R such that
f (x) ≡ λ1 g1 (x) + · · · + λk gk (x), a ≤ x ≤ b. (7.39)


In the third equality, we just switched the dummy integration variables x ↔ y.

If not, just reduce the list of functions to a linearly independent subset.

1/7/22 60 c 2022 Peter J. Olver


k
X
The proof is similar. The first step is to set ηi (x) = bij ξj (x) where B = A−1 is the
inverse matrix, whereby j =1

Z b 
1, i = j,
gi (x) ηj (x) dx =
a 0, i 6= j.
Let ψ(x) be an arbitrary C∞ function with compact support. Then
k
X Z b Z b
ϕ(x) = ψ(x) − ηj (x) gj (y) ψ(y) dy satisfies gi (x) ϕ(x) dx = 0, i = 1, . . . , k,
j =1 a a

and hence, performing the same calculation, we conclude that (7.39) holds with
Z b
λi = f (x) ηi (x) dx.
a

Optimal Control
The basic problem of optimal control theory is to produce a suitable control function
u(t) depending on the time variable t that governs the state function v(t), which solves an
initial value problem for a differential equation involving the control, say
dv
= K(t, u, v), v(a) = α. (7.40)
dt
The goal is to determine the control function that optimizes an objective functional of the
form Z b
J[ u, v ] = L(t, u, v) dt, (7.41)
a

known as the performance measure, where the final time t = b can be either fixed or
variable, depending on the problem. Thus, from the viewpoint of the calculus of variations,
the optimal control problem is to optimize the objective functional (7.41) when subject to
the nonholonomic constraint and initial condition (7.40).
Optimal control problems arise in a very broad range of applications, including control
of mechanical devices, e.g., oscillators, mechanisms, and robots, of vehicles, e.g., aircraft,
rockets, space ships, and submarines, of chemical reactions, of electrical circuits and ther-
mostats, and so on. Details can be found in textbooks on the subject, including [7, 29, 34].
Remark : Sometimes one includes a quantity dependent on the final time, and seeks to
optimize the augmented objective functional
Z b
J[ u, v ] = S(b, v(b)) + L(t, u, v) dt. (7.42)
a

For example, one might seek to minimize the final time itself, in which case L ≡ 0 and
S(b, v) = b. However, by subtracting the constant term S(a, v(a)) = S(a, α), one can

1/7/22 61 c 2022 Peter J. Olver


replace the boundary terms by an integrated null Lagrangian, as in (4.9), and use the
equivalent augmented objective functional
Z b
e
J [ u, v ] = e u, v) dt,
L(t,
a

where, using (7.40),

e u, v) = L(t, u, v) + d S(t, v) = L(t, u, v) + ∂S (t, v) + K(t, u, v) ∂S (t, v).


L(t,
dt ∂t ∂v
Hence, there is no loss of generality is considering a performance measure of the original
form (7.41).
Remark : Another common constraint is to bound the amplitude of the control function
due to physical limitations, e.g., require | u(t) | ≤ c for some c > 0. We will not attempt
to deal with such inequality constraints here, and refer the reader to the above-cited texts
for details.
The method of Lagrange multipliers works as before. We replace the constrained varia-
tional problem by the augmented functional
Z b
 
b u, v, µ] =
J[

L(t, u, v) − µ v − K(t, u, v) dt, (7.43)
a

where the Lagrange multiplier µ(t) is a function of t because (7.40) is a pointwise constraint.
As before, let
uε (t) = u(ε, t), vε (t) = v(ε, t), (7.44)
be a one-parameter family of functions, depending on ε ∈ R, satisfying the constraint along
with any imposed boundary conditions at a possibly variable final time b(ε). We further
assume that
u(0, t) = u(t), v(0, t) = v(t), b(0) = b,
d d d (7.45)
u(ε, t) = ϕ(t), v(ε, t) = ψ(t), b(ε) = η,
dε ε=0 dε ε=0 dε ε=0

where u(t), v(t) are our candidate minimizing state and control functions, and b is the
corresponding final time. If the final time is fixed, then η(ε) = b for all ε, and so its
variation η = 0. As always, the variation of the augmented functional (7.43) with respect
to the Lagrange multiplier just reproduces the constraint, and so we do not include it in the
computation. Applying the same methodology as in (4.28) including integrating the term

involving ψ by parts and using (7.40) to eliminate some of the boundary terms, produces
Z b(ε) 
d  
0= L(t, uε , vε ) − µ v ε − K(t, uε , vε ) dt
dε a
ε=0
Z b     
 ∂L ∂K ∂K ∂L
= L b, u(b), v(b) η + µ(b) ψ(b) + +µ ϕ+ µ+µ + ψ dt,


a ∂u ∂u ∂v ∂v
(7.46)

1/7/22 62 c 2022 Peter J. Olver



where the derivatives of L and K are evaluated at t, u(t), v(t) . Applying the Fundamental
Lemma 3.4 to the integral, we deduce the Euler–Lagrange equations
∂L ∂K dµ ∂K ∂L
+µ = 0, = −µ − . (7.47)
∂u ∂u dt ∂v ∂v
In optimal control, the Lagrangian multiplier is referred to as the costate and the second
Euler–Lagrange equation governing its derivative is called the costate equation. The first
Euler–Lagrange equation is an additional algebraic constraint relating t, u(t), v(t), µ(t).
There are three basic types of conditions that can be imposed on the state of the system
at the final time. First, if both the final time t = b and the final state

v(b) = β (7.48)
are fixed, then both variations must vanish there: ψ(b) = η = 0, and hence the system con-
sists of the initial value problem (7.40) for the state, as controlled by u, the Euler–Lagrange
equations (7.47), and the boundary condition (7.48). These suffice, under suitably generic
hypotheses, to uniquely determine the solution.
Second, if the final time is fixed but the final state is not specified, then η = 0 but ψ(b)
is no longer fixed, and hence one replaces the boundary condition at t = b by the natural
boundary condition produced by the second boundary term, namely,

µ(b) = 0, (7.49)
specifying the value of the costate at the final time.
Third, if the final state is specified, as in (7.48), but the final time is not fixed, then
we need to make sure that the variations also satisfy the same final time condition on the
costate:
v(b + ε η) + ε ψ(b + ε η) = β.
Differentiating this equation with respect to ε, setting ε = 0, and using (7.40), produces
 
0 = v(b) η + ψ(b) = K b, u(b), v(b) η + ψ(b).
Substituting the resulting formula for ψ(b) back into the boundary terms in (7.46), we
deduce the corresponding final time condition

µ(b) = K b, u(b), v(b) . (7.50)
Now the system consists of (7.40, 47, 48, 50), which, again under suitable generic hypothe-
ses, serve to uniquely determine the state v(t), the control u(t), the costate µ(t), and the
final time b.
Example 7.7. Let us consider the control of an undamped harmonic oscillator. Choos-
ing units in which the oscillatory frequency equals 1, the state equation is
d2 v
+ v = u, (7.51)
dt2
in which the state variable v(t) measures the displacement of the oscillator from equilib-
rium, and u(t) is used to control its motion. A simple problem is when the oscillator that

1/7/22 63 c 2022 Peter J. Olver


starts out at rest in a given position, say v = 1, and the goal is to steer it so that it comes
to rest at the origin at time t = π. We thus impose the boundary conditions

v(0) = 1, v(0) = 0, v(π) = 0, v(π) = 0, (7.52)


 

on the state variable. Let us select the control u(t) that minimizes the total effort, as
measured by the integral Z π
1 2
J[ u, v ] = 2
u dt. (7.53)
0

Although the constraint (7.51) is second order, we can apply the same Lagrange multi-
plier method, starting with the augmented functional
Z π
1 2 
e
J[ u, v ] = u − µ (v + v − u) dt.

(7.54)
2
0

The Euler–Lagrange equations are

e = u + µ = 0, e d2 µ
Eu (L) Ev (L) = − 2 − µ = 0, (7.55)
dt
which are supplemented by the constraint (7.51) and the boundary conditions (7.52).
(Since the final time and state are both fixed, there are no boundary conditions imposed
on the costate µ.) We solve

u(t) = − µ(t) = a cos t + b sin t, (7.56)


where a, b are constants. Substituting into the state equation (7.51) produces
 
v(t) = c − 21 b t cos t + d + 12 a t sin t,
where c, d are two further constants. Substituting latter formula into the boundary condi-
tions (7.52), we recover a = 0, b = 2/π, c = 1, d = 1/π, and hence
 
t 1 2
v(t) = 1 − cos t + sin t, u(t) = sin t, (7.57)
π π π
is the optimal control for the given oscillator problem. Variants of this problem can be
investigated by the reader.

Nonholonomic Mechanics
Surprisingly, the equations of nonholonomic mechanics cannot be properly formulated
using the Lagrange multiplier methods that we employed to derive the equations of optimal
control! The variational calculus employed in the preceding subsection is often referred to
as Hamilton’s principle, whereas nonholonomic mechanics relies on an alternative varia-
tional calculation, known as the Lagrange-d’Alembert principle. The difference between the
two principles lies in their specifications of the allowable variations. The counterintuitive
fact is that the equations derived using Hamilton’s principle do not reproduce Newton’s

1/7/22 64 c 2022 Peter J. Olver


laws of motion for the constrained dynamical system, whereas the alternative Lagrange-
d’Alembert principle does. On the other hand, Hamilton’s principle does produce the
correct optimal control equations. (Control of a nonholonomic mechanical system requires
a hybrid approach.) This subtle distinction has been the cause of much controversy and
erroneous analysis over the years, persisting to this day. See [7] for a careful presentation
of both principles in the context of examples and and historical developments.
The easiest way to understand what is at issue is to return to the case of holonomic
constraints, where the two principles coincide. To keep the exposition as simple as possible,
we focus our attention on a first order functional
Z b
J[ u, v ] = L(t, u, v, u, v) dt (7.58)
 

depending on two functions u, v of the time variable t. Suppose they are subject to a single
holonomic constraint
G(t, u, v) = 0. (7.59)
As before, let uε (t) = u(ε, t), vε (t) = v(ε, t), be a family of variations depending on ε ∈ R,
with
d d
u(0, t) = u(t), v(0, t) = v(t), u(ε, t) = ϕ(t), v(ε, t) = ψ(t),
dε ε=0 dε ε=0
(7.60)
where u(t), v(t) is our candidate minimizer. Then, substituting into the constraint (7.59),
differentiating with respect to ε and setting ε = 0, we deduce that the variations ϕ, ψ,
satisfy the linearized constraint equation
∂G ∂G
(t, u, v) ϕ + (t, u, v) ψ = 0, (7.61)
∂u ∂v
from which, as above, we recover the constrained Euler–Lagrange equations
∂G ∂G
Eu (L) = µ , Ev (L) = µ , (7.62)
∂u ∂v
where µ(t) is the Lagrange multiplier. The solutions u, v, µ to (7.59, 62) that satisfy the
relevant boundary conditions (fixed or natural) are the critical functions for the constrained
variational principle.
Now, if we differentiate the constraint (7.59) with respect to t, we obtain
∂G ∂G  ∂G 
(t, u, v) + (u, v) u + (u, v) v = 0. (7.63)
∂t ∂u ∂v
 
The key observation is that the terms involving u, v coincide with the linearized constraint
equation (7.61) under the identification (ϕ, ψ) ↔ (u, v). This observation underlies the
 

Lagrange-d’Alembert principle in the general nonholonomic case. At each point (t, u, v),
the variations must satisfy the homogeneous linear constraints imposed on the velocities.
Thus, given a linear nonholonomic constraint

A(t, u, v) u + B(t, u, v) v + C(t, u, v) = 0 (7.64)


 

1/7/22 65 c 2022 Peter J. Olver


on the velocities, the Lagrange-d’Alembert principle requires that the variations remain
“physical” in the sense that we require

A(t, u, v) ϕ + B(t, u, v) ψ = 0. (7.65)

More generally, given a nonlinear nonholonomic constraint

M (t, u, v, u, v) = 0 (7.66)
 

the physically permitted variations will satisfy the constraint obtained by linearization
with respect to the velocity variables:

∂M ∂M
 ϕ+  ψ = 0. (7.67)
∂u ∂v
In other words, at each configuration (t, u, v), the physically allowed variations (ϕ, ψ) must
lie on a tangent direction to the curve in the (u, v) plane specified by the constraint (7.66).
 

Let us see how this restriction affects the variational calculus. Substituting the variations
into (7.58), differentiating with respect to ε, setting ε = 0 using (7.44), and then integrating
the result by parts and discarding the boundary terms (e.g., by assuming the variations
have compact support), we find that
Z b  
Eu (L) ϕ + Eu (L) ψ dt = 0
a

whenever ϕ, ψ satisfy the Lagrange-d’Alembert condition (7.65), where Eu (L), Ev (L) are
the usual Euler–Lagrange expressions (5.3). Applying the method of Lagrange multipliers,
we arrive at the Lagrange-d’Alembert variational equations

Eu (L) = µ A, Ev (L) = µ B, (7.68)

where µ(t) is the Lagrange multiplier. If A, B come from a holonomic constraint, these
equations coincide with (7.62). Thus, u, v satisfy a second order system of ordinary differ-
ential equations that involves the Lagrange multiplier µ(t), whose value will be prescribed
by the constraint (7.64). In the presence of suitable initial and/or boundary conditions,
the constrained system serves to uniquely specify the critical functions. One finds, [7], that
applying this methodology to physical systems with nonholonomic constraints reproduces
the correct Newtonian dynamics.
Surprisingly, the Lagrange multiplier calculus presented in the preceding subsection —
that is, Hamilton’s principle — leads to a different variational system that, in the case of
nonholonomic mechanics, does not reproduce the physically correct equations. To see this,
note that if the variations ϕ, ψ, are not a priori constrained, then upon substituting (7.44)
into the nonholonomic constraint (7.64), differentiating with respect to ε and setting ε = 0,
the resulting variational condition requires
   
 ∂A  ∂B  ∂C ∂A  ∂B  ∂C
Aϕ + B ψ +

u+ v+ ϕ+ u+ v+ ψ = 0,
∂u ∂u ∂u ∂v ∂v ∂v

1/7/22 66 c 2022 Peter J. Olver


where A, B, C and their derivatives are all evaulted at (t, u, v). On the other hand, differ-
entiating the Lagrange-d’Alembert condition (7.65) with respect to t produces
   
 ∂A ∂A  ∂A  ∂B ∂B  ∂B 
Aϕ + B ψ +

+ u+ v ϕ+ + u+ v ψ = 0.
∂t ∂u ∂v ∂t ∂u ∂v
These equations look similar, but, in fact, they coincide if and only if
∂A ∂B ∂A ∂C ∂B ∂C
= , = , = . (7.69)
∂v ∂u ∂t ∂u ∂t ∂v
Using the result that the curl of a vector field is zero if and only if the vector field is locally
a divergence, [35], (7.69) implies that, locally,
∂G ∂G ∂G
A= , B= , C= , (7.70)
∂u ∂v ∂t
for some function G(t, u, v), and hence the constraint (7.64) is, in fact, holonomic, being
the time derivative of (7.59). Furthermore, applying the usual variational calculus to the
augmented Lagrangian
 
b u, v, u,
L(t, v, µ) = L(t, u, v, u, v) − µ A(t, u, v) u + B(t, u, v) v + C(t, u, v)
     

produces the modified Euler–Lagrange equations


   
∂B ∂A  ∂C ∂A
Eu (L) = µ A + µ − v+ − ,

∂u ∂v ∂u ∂t
    (7.71)
∂A ∂B  ∂C ∂A
Ev (L) = µ B + µ − u+ − .

∂v ∂u ∂v ∂t
If the holonomic condition (7.69) is satisfied, then (7.71) coincides with the Lagrange-
d’Alembert system (7.68) and also the holonomic system (7.62), in which µ plays the


role of the Lagrange multiplier, previously denoted by µ. On the other hand, if (7.69) is
not satisfied, then (7.71) is not the same as the physical Lagrange-d’Alembert equations
(7.68); the additional terms multiplying µ are in violation of Newton’s Laws relating forces,
masses, and accelerations. On the other hand, if this were an optimal control problem,
(7.71) would be the correct equations of motion.
Example 7.8. An simple example of a nonholonomic mechanical system is a skier
sliding down a slope. We idealize the physical situation by assuming that the slope is a
flat plane, and the skier is modeled by a mass on a single blade (a knife edge) that slides
frictionlessly through the snow. Let x, y denote coordinates within the plane, so that the
plane slopes downhill in the positive x direction making an angle α with the horizontal;
see Figure 15.
The position of the mass on the hill is prescribed by (x(t), y(t)) while the blade’s orien-
tation relative to the x-axis is prescribed by the angle θ(t), which, as always, is measured
modulo 2 π. In the absence of external forcing, the variational problem is the difference
between kinetic and potential energy
Z b
1 2 2 1

2

J[ x, y, θ ] = 2 m (x + y ) + 2 I θ + m g x sin α dt, (7.72)
a

1/7/22 67 c 2022 Peter J. Olver


y

θ
x

Figure 15. Spinning Skier on a Slope.

where g is the gravitational constant, m is the skier’s mass, and I the skier’s moment of in-
ertia (about the vertical axis), which are all assumed to be constant. The ski blade prevents
the skier from experiencing any transverse velocity, and hence imposes the nonholonomic
constraint
  
G(x, y, θ, x, y, θ) = x sin θ − y cos θ = 0. (7.73)
 

According to the Lagrange-d’Alembert principle, as in (7.68), the nonholonomic dynamics


of the sleigh is governed by the system
∂G e = µ ∂G , e = µ ∂G
Ex (L) = µ  , Ey (L) Eθ (L)  .
∂x ∂y ∂θ
Explicitly,

− m x + m g sin α = µ sin θ,

− m y = − µ cos θ,

− I θ = 0, (7.74)
which are coupled with the velocity constraint (7.73). Let us impose the initial data

x(0) = y(0) = θ(0) = x(0) = y(0),
 
θ(0) = ω, (7.75)
so that the skier starts at rest while spinning at angular velocity ω. To solve the initial
value problem, we first integrate the equation for θ and use the initial conditions (7.75) to
deduce
θ(t) = ω t. (7.76)
Second, we note that the total energy density (the sum of kinetic and potential energies)

1
m (x2 + y 2 ) + 12 I θ2 − m g x sin α = 1
m (x2 + y 2 ) + 21 I ω 2 − m g x sin α
 
E(t) = (7.77)
 
2 2
is constant; indeed, a straightforward calculation using (7.73, 74) shows that dE/dt = 0.
(This is the analog of the constancy of the Hamiltonian function, as in Theorem 3.2.)
Thus, using the initial conditions (7.75), we deduce
E(t) = E(0) = 1
2 I ω2. (7.78)

1/7/22 68 c 2022 Peter J. Olver


Substituting the constraint (7.73) along with (7.76) into (7.77), and using a trigonometric
identity, (7.78) implies
dx p
1
m x2 sec2 ω t − m g x sin α = 0,

2 whence = 2 g x sin α cos ω t,
dt
which
√ is a separable first order ordinary differential equation, [9]. Dividing both sides by
x, integrating, and using the initial conditions (7.75), we finally deduce the form of x(t),
from which y(t) follows from integrating the constraint (7.73) and again using the initial
conditions (7.75). We conclude that the skier’s motion is given by
g sin α g sin α 
x(t) = 2
sin2 ω t, y(t) = ωt − 1
2 sin 2 ω t , θ(t) = ω t, (7.79)
2ω 2 ω2
while the Lagrange multiplier is

µ(t) = 2 m g sin α sin ω t. (7.80)


Comparing the formulas for x(t), y(t) with (3.31), we deduce that., in view of the periodicity
of x(t), the skier traces a path that goes periodically down and back up the hill, never
reaching the bottom, while tracing out a path in the shape of a cycloid as they move
laterally across the slope.
A more realistic version of the problem is to place the skier on a curved surface, and
introduce a control mechanism and performance measure, [26].

8. The Second Variation.


The solutions to the Euler–Lagrange boundary value problem are the critical functions
for the generating variational principle, meaning that they cause its functional gradient
to vanish. For finite-dimensional optimization problems, being a critical point is only a
necessary condition for minimality. One must impose additional conditions, based on the
second derivative of the objective function at the critical point, in order to guarantee that
the critical point is a (local) minimum and not a maximum or a saddle point. Similarly, in
the calculus of variations, the solutions to the Euler–Lagrange equation may also include
(local) maxima, as well as other non-extremizing critical functions. To distinguish between
the possibilities, we are in need of a second derivative test. In the calculus of variations,
the second derivative of a functional is known as its second variation, and the goal of this
section is to construct and analyze it in its simplest manifestation. A more in depth study
of the sufficient conditions for minimality, and their various subtleties, will be the subject
of the ensuing sections.
Recall that, for a finite-dimensional objective function F (x) = F (x1 , . . . , xn ), the second
derivative test is based on the positive definiteness of its n × n Hessian matrix ∇2 F , with
entries ∂ 2 F/∂xi ∂xj , evaluated at the critical point; cf. [49]. The justification comes
directly from the second order Taylor expansion of the objective function there.
In an analogous fashion, we expand an objective functional J[ u ] near the critical func-
tion. Consider the scalar function

h(ε) = J[ u + ε ϕ ],

1/7/22 69 c 2022 Peter J. Olver


depending on ε ∈ R, where the function ϕ(x) represents a variation. The second order
Taylor expansion of h(ε) at ε = 0 takes the form
h(ε) = J[ u + ε ϕ ] = J[ u ] + ε K[ u ; ϕ] + 1
2 ε2 Q[ u ; ϕ] + · · · .
The first order terms are linear in the variation ϕ, and, according to our earlier calculation,
given by the inner product
h′ (0) = K[ u ; ϕ] = h ∇J[ u ] , ϕ i
between the variation and the functional gradient. In particular, u is a critical function if
and only if the first order terms vanish,
K[ u ; ϕ ] = h ∇J[ u ] , ϕ i = 0,
for all allowable variations ϕ. Therefore, the nature of the critical function u — minimum,
maximum, or neither — will, in most cases, be determined by the second derivative terms
h′′ (0) = Q[ u ; ϕ ].
Now, if u is a minimizer, then Q[ u ; ϕ] ≥ 0. Conversely, if Q[ u ; ϕ] > 0 for all ϕ 6≡ 0, i.e.,
the second variation is positive definite, then the critical function u should be a strict local
minimizer. This forms the crux of the second derivative test.
Let us explicitly evaluate the second variation of the simplest functional (3.1). Consider
the scalar function
Z b
h(ε) = J[ u + ε ϕ ] = L(x, u + ε ϕ, u′ + ε ϕ′ ) dx, (8.1)
a

whose first derivative h (0) was already determined in (3.6); here we require the second
derivative: Z b
 
′′
Q[ u ; ϕ] = h (0) = A ϕ′ 2 + 2 B ϕ ϕ′ + C ϕ2 dx, (8.2)
a
where (assuming sufficient smoothness throughout) the coefficient functions
∂ 2L ′
 ∂ 2L ′
 ∂ 2L 
A(x) = 2
x, u(x), u (x) , B(x) = x, u(x), u (x) , C(x) = 2
x, u(x), u′ (x) ,
∂p ∂u ∂p ∂u
(8.3)
are obtained by evaluating certain second order partial derivatives of the Lagrangian at
the critical function u(x). In contrast to the first variation, integration by parts will
not eliminate all of the derivatives on ϕ in the quadratic functional (8.2), which causes
significant complications in the ensuing analysis.
Remark : The function (8.1) is known as a weak variation of the functional J. The term
“weak” refers to the fact that not only is the variation u + ε ϕ close to the minimizer u,
its derivative u′ + ε ϕ′ is also assumed to be close. Functions that are minima under weak
variations are thus known as weak minimizers of the functional. There is a substantial
difference between weak and strong minimizers, the derivatives of whose variations need
not be close to those of the minimizer. The precise nature of the two types of minimizer
will be developed in the following section.

1/7/22 70 c 2022 Peter J. Olver


The second derivative test for a weak minimizer relies on the positivity of the second
variation. So, in order to determine whether the critical function is a minimizer for the
functional, we need to establish criteria guaranteeing the positive definiteness of such
a quadratic functional, meaning that Q[ u ; ϕ] > 0 for all allowable non-zero variations
ϕ(x) 6≡ 0. Clearly, if the integrand is positive at each point, so

A(x) ϕ′ (x)2 + 2 B(x) ϕ(x) ϕ′ (x) + C(x) ϕ(x)2 > 0 whenever a < x < b, (8.4)
then Q[ u ; ϕ ] > 0 is positive definite. Viewing the left hand side as a homogeneous quadra-
tic polynomial in ϕ, ϕ′ , positivity at a fixed x will be implied by the algebraic inequalities
A(x) > 0 and B(x)2 − 4 A(x) C(x) < 0.
Example 8.1. For the arc length minimization functional (2.3), the Lagrangian is
p
L(x, u, p) = 1 + p2 . To analyze the second variation, we first compute

∂ 2L 1 ∂ 2L ∂2L
2
= , = 0, = 0.
∂p (1 + p2 )3/2 ∂u ∂p ∂u2
For the critical straight line function
β−α β−α
u(x) = (x − a) + α, with u′ (x) = ,
b−a b−a
and so, evaluating the second derivatives of the Lagrangian at the critical function,
∂ 2L (b − a)3 ∂ 2L ∂ 2L
A(x) = =  3/2 ≡ c, B(x) = = 0, C(x) = = 0,
∂p2 (b − a)2 + (β − α)2 ∂u ∂p ∂u2

where c > 0 is a positive constant. Therefore, the second variation functional (8.2) is
Z b
Q[ u ; ϕ ] = c ϕ′ (x)2 dx ≥ 0.
a

Moreover, Q[ u ; ϕ] = 0 vanishes if and only if ϕ(x) is a constant function. But the variation
ϕ(x) is required to satisfy the homogeneous boundary conditions ϕ(a) = ϕ(b) = 0, and
hence Q[ u ; ϕ ] > 0 for all allowable nonzero variations. Therefore, we conclude that the
straight line is, indeed, a (local, weak) minimizer for the arc length functional.
However, as the following example demonstrates, the pointwise positivity condition (8.4)
is overly restrictive.
Example 8.2. Consider the quadratic functional
Z 1
Q[ ϕ ] = (ϕ′ 2 − ϕ2 ) dx. (8.5)
0

We claim that Q[ ϕ ] > 0 for all nonzero ϕ 6≡ 0 subject to homogeneous Dirichlet boundary
conditions ϕ(0) = 0 = ϕ(1). This result is not trivial! Indeed, the boundary conditions
play an essential role, since choosing ϕ(x) ≡ c 6= 0 to be any constant function will produce
a negative value for the functional: Q[ ϕ ] = − c2 .

1/7/22 71 c 2022 Peter J. Olver


To prove the claim, consider the quadratic functional
Z 1
e
Q[ ϕ ] = (ϕ′ + ϕ tan x)2 dx,
0

which is clearly non-negative, since the integrand is everywhere ≥ 0. Moreover, by continu-


ity, the integral vanishes if and only if ϕ satisfies the first order linear ordinary differential
equation
ϕ′ + ϕ tan x = 0, for all 0 ≤ x ≤ 1.
The only solution that also satisfies boundary condition ϕ(0) = 0 is the trivial one ϕ ≡ 0.
We conclude that Q[ e ϕ ] = 0 if and only if ϕ ≡ 0, and hence Q[
e ϕ ] is a positive definite
quadratic functional on the space of allowable variations.
Let us expand the latter functional,
Z 1
e ϕ] =
Q[ (ϕ′ 2 + 2 ϕ ϕ′ tan x + ϕ2 tan2 x) dx
0
Z 1 Z 1
′2 2 2 2

= ′
ϕ − ϕ (tan x) + ϕ tan x dx = (ϕ′ 2 − ϕ2 ) dx = Q[ ϕ ].
0 0

In the second equality, we integrated the middle term by parts, using (ϕ2 )′ = 2 ϕ ϕ′ ,
and noting that the resulting boundary terms vanish owing to our imposed boundary
e ϕ ] is positive definite, so is Q[ ϕ ], justifying the previous claim.
conditions. Since Q[
To appreciate how delicate this result is, consider the almost identical quadratic func-
tional Z 4
b
Q[ ϕ ] = (ϕ′ 2 − ϕ2 ) dx, (8.6)
0

the only difference being the upper limit of the integral. A quick computation shows that
the function ϕ(x) = x (4 − x) satisfies the boundary conditions ϕ(0) = 0 = ϕ(4), but
Z 4
  64
b ϕ] =
Q[ (4 − 2 x)2 − x2 (4 − x)2 dx = − < 0.
0 5

Therefore, Q[b ϕ ] is not positive definite. Our preceding analysis does not apply be-
cause the function tan x becomes singular at x = 21 π, and so the auxiliary integral
Z 4
(ϕ′ + ϕ tan x)2 dx does not converge.
0

The Legendre Condition


The complete analysis of positive definiteness of quadratic (first order) functionals can
be effected by adapting the preceding examples. The goal is to determine when a quadratic
functional
Z b
 
Q[ ϕ ] = A(x) ϕ′ (x)2 + 2 B(x) ϕ(x) ϕ′ (x) + C(x) ϕ(x)2 dx (8.7)
a

1/7/22 72 c 2022 Peter J. Olver


is positive definite, meaning Q[ ϕ ] > 0 for all ϕ 6≡ 0 satisfying the homogeneous Dirichlet
boundary conditions ϕ(a) = ϕ(b) = 0. Example 8.2 demonstrated that positivity of the
integrand is overly restrictive. On the other hand, one does need positivity of the initial
coefficient.

Proposition 8.3. Suppose that A(x), B(x), C(x) are continuous for x ∈ [ a, b ], and
that the quadratic functional (8.7) is positive definite. Then A(x) ≥ 0 for all a ≤ x ≤ b.

Proof : Suppose not, so there exists a point† a < c < b where A(c) = − m < 0 for some
0 < m ∈ R. Assuming this, we will construct a function ϕ(x) satisfying the boundary
conditions such that Q[ ϕ ] < 0, in contradiction to positive definiteness.
By continuity, we can find δ > 0 so that a < c − δ < c + δ < b, and

A(x) ≤ − 21 m for c − δ ≤ x ≤ c + δ. (8.8)


Furthermore, we can bound

− M ≤ B(x) ≤ M, C(x) ≤ M, for c − δ ≤ x ≤ c + δ, (8.9)


for some M > 0. Let 0 < ε < δ, and consider the continuous, piecewise quadratic function‡
( −2 (
ε (x − c)2 − 1, | x − c | < ε, ′
2 ε−2 (x − c), | x − c | < ε,
ϕε (x) = ϕε (x) =
0 | x − c | ≥ ε, 0 | x − c | > ε.
We note that ϕε (a) = ϕε (b) = 0; further, ϕε (x) ϕ′ε (x) < 0 when c < x < c + ε and is > 0
when c − ε < x < c. Using the bounds (8.8, 9), a straightforward calculation shows that
Z c+ε  
Q[ ϕε ] = A(x) ϕ′ε (x)2 + 2 B(x) ϕε (x) ϕ′ε (x) + C(x) ϕε (x)2 dx
c−ε
Z c+ε Z 0
≤− m 1
2
ϕ′ε (x)2 dx + 2 M ϕε (x) ϕ′ε (x) dx
c−ε c−ε
Z c+ε Z c+ε
4m 16 M ε
− 2M ϕε (x) ϕ′ε (x) dx +M ϕε (x)2 dx = − + 2M + .
0 c−ε 3ε 15
Thus, if 0 < ε < δ is sufficiently small, the final expression is negative, which establishes
the result. Q.E.D.

Using the formula (8.3) for A(x) when the quadratic functional (8.7) represents the sec-
ond variation (8.2), Proposition 8.3 immediately implies the Legendre necessary condition
for a minimizer of a functional, which is named in honor of Adrien-Marie Legendre.


Note that if A(a) < 0, then, by continuity, there is a nearby point a < c < b where A(c) < 0;
similarly if A(b) < 0.

One could, with a little more work, construct a smooth function with the required property, as
in the proof of the Fundamental Lemma 3.4.

1/7/22 73 c 2022 Peter J. Olver


Theorem 8.4. If u(x) is a local minimizer of the variational problem with Lagrangian
L(x, u, p) subject to fixed boundary conditions, then

∂ 2L ′

x, u(x), u (x) ≥ 0. (8.10)
∂p2

For a local maximizer, the Legendre inequality is reversed. On the other hand, the
strengthened Legendre condition, where the inequality is strict does not suffice to show
that a solution to the Euler–Lagrange boundary value problem is a minimizer (respectively,
maximizer). Indeed, the quadratic functional (8.6) provides an elementary counterexample,
where the zero function satisfies both the Euler–Lagrange equation and the boundary
conditions, as well as the strengthened Legendre condition, since ∂ 2 L/∂p2 ≡ 2, but, as we
saw, u(x) ≡ 0 is not a minimizer of the functional. Thus, to prove that a given solution is
a minimizer, we must find and impose additional conditions.

Envelopes

Before proceeding further, let us review the construction of envelopes of families of


curves, [16]. In general, given a collection of plane curves, Ct ⊂ R 2 depending on a real
parameter t ∈ I ⊂ R, where I is an open interval, their envelope is, roughly speaking,
the set of limit points of their intersection points. If the curves are given by the implicit
equations
F (x, y, t) = 0, (8.11)

where F is continuously differentiable, then the envelope E characterized as the set of all
points (x, y) ∈ R 2 that satisfy, for some t ∈ I, both (8.11) and the equation obtained by
differentiation with respect to the parameter:

∂F
(x, y, t) = 0. (8.12)
∂t
The envelope may be empty.
We are particularly interested in the case when the curves are the graphs of a one-
parameter family of functions, and so (8.11) has the form y = ut (x) = u(x, t). Thus, the
envelope E of the family ut is the set of points (x, y) ∈ R 2 satisfying

∂u
y = u(x, t), (x, t) = 0, for some t ∈ I. (8.13)
∂t
Most of the reuslts require that the function u(t, x) be continuously differentiable, although
to avoid complications, we will assume that u ∈ C2 .

Example 8.5. The family y = u(x, t) = x + t of parallel straight lines has empty
envelope since ∂u/∂t ≡ 1. The family y = u(x, t) = t x straight lines passing through the
origin satisfies ∂u/∂t = x and hence the envelope consists of just the origin y = x = 0.
Indeed, in general, if all the curves in the family pass through a common point, then that
point belongs to the envelope.

1/7/22 74 c 2022 Peter J. Olver


A more interesting envelope is provided by the family of straight lines that go through
the points (t, 0) and (0, 1/t) on the x and y axes, respectively; these are given by

t−x
u(x, t) = .
t2
Since
∂u 2x − t
= =0 when t = 2 x,
∂t t3
the envelope is the hyperbola y = 1/(4 x), as illustrated in the first plot in Figure 16.
Finally, if ρ(t) > 0 is any positive function, consider the one-parameter family of circles

(x − t)2 + y 2 = ρ(t)2

of radii ρ(t) and centered at (t, 0). Differentiating with respect to t as in (8.12), we find
that the envelope must satisfy x = t − ρ(t) ρ′ (t), and hence it is the curve parametrized by
p
x(t) = t − ρ(t) ρ′ (t), y(t) = ± ρ(t) 1 − ρ′ (t)2 ,

where t is restricted to where | ρ′ (t) | ≤ 1. Several cases are illustrated in Figure 16; the
circles are in black and the envelope is in red. In the last plot, with ρ(t) = sin t, the
envelope is a pair of mirror image cycloids.
∂ 2u
An envelope point (x0 , t0 ) ∈ E is called regular if (x , t ) 6= 0; vice versa, if the
∂t2 0 0
second derivative vanishes, the envelope point is called singular . Nearby a regular point,
the envelope is the graph of a smooth curve y = f (x). Indeed, the Implicit Function
Theorem tells us that, in a neighborhood of a regular point, one can locally solve the
second envelope equation for t = h(x). Substituting this expression into the first equation
shows that, for such x, the envelope is the graph of the function y = f (x) = u(x, h(x)). At
a regular point, the graph of y = ut0 (x) intersects the envelope curve tangentially, since

∂u ∂u ∂u
(x0 , t0 ) = f ′ (x0 ) = (x0 , t0 ) + h′ (x0 ) (x , t ),
∂x ∂x ∂t 0 0
and the final term is zero because of the envelope condition.
As noted above, envelopes are closely connected with the points of intersection of the
curves in the family. In particular, if they all intersect at a single point A = (a, α), so
∂u ∂ 2u
u(a, t) = α for all t, then clearly (a, t) = 2 (a, t) = 0, and hence A ∈ E is a singular
∂t ∂t
envelope point. While general intersection points are not typically points on the envelope,
their limits are.

Proposition 8.6. Let t be fixed. Suppose that, for all s sufficiently close to t, the
graph of us (x) intersects that of ut (x) at a point P (s), and suppose P (s) → P0 as s → t.
Then the limiting intersection point lies in the envelope: P0 ∈ E.

1/7/22 75 c 2022 Peter J. Olver



Lines Circles with ρ(t) = t/ 2

Circles with ρ(t) = t2 Circles with ρ(t) = sin t


Figure 16. Envelopes.


Proof : The intersection assumption implies P (s) = x(s), y(s) satisfies
y(s) = u(x(s), s) = u(x(s), t).
Differentiating with respect to s, we have
∂u ∂u ∂u
y ′ (s) = (x(s), s) x′(s) + (x(s), s) = (x(s), t) x′ (s).
∂x ∂s ∂x
Taking the limit as s → t, given that P (s) = (x(s), y(s)) → (x0 , y0 ) = P0 , and using the
continuity of the partial derivatives of u, the equation becomes
∂u ∂u ∂u ∂u
(x0 , t) x′ (t) + (x0 , t) = ( , t) x′ (t), and hence (x , t) = 0,
∂x ∂s ∂x 0 ∂s 0
which implies P0 = (x0 , y0 ) ∈ E is in the envelope. Q.E.D.
In the scalar-valued case under consideration, intersection of the graphs implies the
existence of envelope points.

1/7/22 76 c 2022 Peter J. Olver


Lemma 8.7. Suppose s < t, and the graphs of us and ut intersect at a point (x, α), so
that u(x, s) = u(x, t) = α. Then there exists an envelope point (x, y) ∈ E with y = u(x, r)
for some s < r < t.
Proof : Applying the Mean Value Theorem
∂u 
0 = u(x, s) − u(x, t) = (s − t) x, r ,
∂t
for some s < r < t. Since s 6= t, we deduce that the point (x, y) = (x, u(x, r)) satisfies the
envelope conditions (8.13). Q.E.D.

Conjugate Points
The envelope of a certain one-parameter family of solutions to the Euler–Lagrange equa-
tion plays a critical role in the characterization of minima (and maxima) of the functional.
We first note that the Euler–Lagrange equation is a regular second order ordinary differen-
tial equation provided the coefficient of u′′ is nonzero: ∂ 2 L/∂p2 6= 0. Under this condition,
the standard existence, uniqueness, and continuous dependence results for solutions to or-
dinary differential equations, [9], can be applied. When dealing with a minimizer, we will
thus assume that the strict Legendre condition
∂ 2L
(x, u, p) > 0 (8.14)
∂p2
holds for all a ≤ x ≤ b and all the solutions under consideration. (When treating maxima,
one imposes the opposite inequality.)
Let A = (a, α) ∈ R 2 . Consider the one-parameter family ut (x) = u(x, t) of solutions to
the second order Euler–Lagrange equation with initial conditions

ut (a) = α, u′t (a) = t, (8.15)


where t ∈ R. In particular, our potential minimizer, which also satisfies u(b) = β, is an
element of this family. We let E denote the associated envelope set.
Definition 8.8. The conjugate locus to the point A consists of all points in the envelope
except for A itself: C = E \ { A }. The points C ∈ C are known as conjugate points.
As above, since all solutions pass through A, it must be an envelope point: A ∈ E.
However, the conjugate locus may be empty. In particular, this is the case when the graphs
of the solutions ut do not intersect each other (except at the point A). In the literature,
such a family of non-intersecting solutions is known as a central field of extremals based
at the point a, and will be of importance in Section 10.
For example, consider the problem of geodesics on the sphere. Starting at the north
pole, there is a unique geodesic to any non-antipodal point, i.e., anywhere except the
south pole, namely the arc of a great circle. All such geodesics intersect at the south
pole, which hence belongs to their envelope set and is a conjugate point to the north pole.
The fact that there is not a unique minimizing geodesic from north to south indicates an
interrelationship between conjugate points and minimizers, as we now explore.

1/7/22 77 c 2022 Peter J. Olver


Since all solutions u(x, t) = ut (x) pass through the point A = (a, α), the abscissa of a
conjugate point C = (c, γ) ∈ C satisfies c 6= a, and is known as the associated conjugate
value. Thus, using the envelope conditions (8.13), a point C = (c, γ) with c 6= a is a
conjugate point provided
∂u
(c, t) = 0, u(c, t) = ut (c) = γ, (8.16)
∂t

for some t, and hence C ∈ Γut ∩ C is an intersection point of the graph Γut = (x, ut (x)
of ut and the conjugate locus. We will refer to C = (c, γ) as a conjugate point to A for
the solution ut (x), and its abscissa, c, is called a conjugate value to a. As we will show,
the condition that the given solution u be a minimizer requires that it has no conjugate
points, meaning that its graph Γu = { (x, u(x)) | a ≤ x ≤ b } has empty intersection with
the conjugate locus: Γu ∩ C = ∅.
Example 8.9. Let us investigate the case of a quadratic functional (8.7). Its Euler–
Lagrange equation is readily found:
d  
− [ A(x)u′ ] + C(x) − B ′ (x) u = 0. (8.17)
dx
Assuming A(x) 6= 0 for all a ≤ x ≤ b, (8.17) is a regular, homogeneous linear second order
ordinary differential equation. This implies that the solutions ut (x) to the initial value
problem corresponding to (8.15), namely

ut (a) = 0, u′t (a) = t,


are scalar multiples of a single solution:

u(x, t) = ut (x) = t u1 (x), where u1 (a) = 0, u′1 (a) = 1.


Since
∂u
= u1 (x),
∂t
equation (8.16) requires that the conjugate locus is

C = { (c, 0) | u1 (c) = 0, c 6= a } .
Since ut (c) = t u1 (c) = 0 as well, we deduce that the conjugate values are the common
zeros (except for a) of the solutions† ut (x).
Let us further analyze the functions
∂u
(x, t)
vt (x) = v(x, t) = (8.18)
∂t
whose vanishing determines the conjugate locus. Since all the ut (x) are assumed to solve
the Euler–Lagrange equation (3.15), we can differentiate it with respect to the parameter t


Keep in mind that u0 (x) ≡ 0.

1/7/22 78 c 2022 Peter J. Olver


and conclude that each vt (x) satisfies a linear second order ordinary differential equation,
known as the Jacobi equation in honor of Carl Jacobi, who also gave his name to the
Jacobian matrix:  
d dvt  
− At (x) + Ct (x) − Bt′ (x) vt = 0, (8.19)
dx dx
where
∂ 2L ′
 ∂ 2L ′

At (x) = x, u t (x), u t (x) , B t (x) = x, u t (x), u t (x) ,
∂p2 ∂u ∂p
(8.20)
∂ 2L ′

Ct (x) = x, u t (x), u t (x) .
∂u2
Comparison with (8.2, 3, 17) demonstrates that the Jacobi equation coincides with the
Euler–Lagrange equation for the quadratic functional given by the second variation of the
original variational problem evaluated on the solution ut (x), i.e., the quadratic functional
Q[ ut ; ϕ], where we identify vt = ϕ. Moreover, differentiation of (8.15) with respect to t
implies that vt has initial conditions

vt (a) = 0, vt′ (a) = 1. (8.21)

Observe that a conjugate value c can thus be identified as a point at which this solution
to the Jacobi equation vanishes: vt (c) = 0. On the other hand, according to Example 8.9,
these are precisely the conjugate values of the second variation functional. We have thus
established that the conjugate values of a functional and its second variation coincide. If c
is a conjugate value, then the associated conjugate point for the quadratic second variation
functional is (c, 0), whereas for the original functional, the conjugate point is (c, ut (c)).
Suppose that we are not at a conjugate point, so vt (x) 6= 0. Then a short computation
based on (8.19) reveals the following useful identity. If ϕ(x) is any smooth function,
 
′2 ′ 2 At (vt ϕ′ − vt′ ϕ)2 d (At vt′ + Bt vt ) ϕ2
At ϕ + 2 Bt ϕ ϕ + Ct ϕ = + . (8.22)
vt2 dx vt
Let us assume ϕ(a) = ϕ(b) = 0. Then, integrating both sides of (8.22),
Z b Z b
 ′2 ′ 2
 At (vt ϕ′ − vt′ ϕ)2
At ϕ + 2 Bt ϕ ϕ + Ct ϕ dx = dx, (8.23)
a a vt2
where the integral of the final term is zero due to the Fundamental Theorem of Calculus
and the assumed boundary conditions for ϕ. The strict Legendre condition (8.14) implies
At (x) > 0; moreover, vt (x) 6= 0 for a ≤ x ≤ b, and therefore the right hand side of (8.23)
is ≥ 0. Moreover, the integral equals 0 if and only if vt (x) ϕ′ (x) − vt′ (x) ϕ(x) ≡ 0 for all
a ≤ x ≤ b, which the reader may recognize as the Wronskian of the functions ϕ, vt , [9].
Thus, ϕ(x) satisfies the first order initial value problem


= w(x) ϕ, ϕ(b) = 0, where w(x) = vt′ (x)/vt (x),
dx
1/7/22 79 c 2022 Peter J. Olver
and hence, by uniqueness of solutions, [9], ϕ(x) ≡ 0. Thus, if there are no conjugate
values in the interval [ a, b ], then the quadratic functional on the left hand side of (8.23) is
positive definite.
The alert reader may have spotted the one flaw in the above argument. Namely, (8.21)
implies that vt (a) = 0, and hence the integral on the right hand side may have an insur-
mountable singularity at x = a. The way to get around this is to move the initial point
where we defined the family of solutions to the Euler–Lagrange equation a little to the left
so that it lies outside of the interval [ a, b ]. Thus, we fix ε > 0 small, and define uet (x) to
be the one-parameter family of solutions to the Euler–Lagrange equation satisfying

u
et (a − ε) = α, et′ (a − ε) = t.
u
∂e
u
Setting vet (x) = (x, t) as before, the fact that there were no conjugate values for ut in
∂t
[ a, b ] implies, by the continuous dependence of solutions to ordinary differential equations
on initial data, [9], that, provided ε is sufficiently small, the same holds for u et , meaning
vet (x) 6= 0 for all a ≤ x ≤ b, including x = a. We now rerun the identity (8.22), but with vt
replaced by vet , and this time the argument leading to positive definiteness is completely
legitimate. In particular, setting t = u′ (a) where u = ut is our candidate minimizer, we
have proved, under the above conditions, the positive definiteness of the second variation
of the functional. This implies that u is indeed a strict local minimizer for the variational
problem.

Theorem 8.10. Suppose that u(x) satisfies the Euler–Lagrange equation (3.13) with
fixed boundary conditions. Assume that, in addition, the strict Legendre condition (8.14)
holds and there are no conjugate values to a on the interval ( a, b ]. Then u(x) is a strict
weak local minimum for the functional.

Corollary 8.11. Consider the quadratic functional (8.7). If A(x) > 0 and there are no
conjugate points in the interval ( a, b ], then the quadratic functional is positive definite.

Example 8.12. The quadratic functional


Z b
J[ u ] = (u′ 2 − u2 ) dx (8.24)
0

has Euler–Lagrange equation


− u′′ − u = 0.
The solutions to the initial value problems u(0) = 0, u′ (0) = t, are ut (x) = t sin x. They
all pass through the points (n π, 0) for n ∈ Z, and hence, as in Example 8.9, the conjugate
locus consists of all these points except the origin. (Note that these are all singular envelope
points.) Theorem 8.10 implies that u ≡ 0 is a strict minimizer, and hence the quadratic
functional (8.24) is positive definite, provided the upper integration limit b < π. This
explains why the original quadratic functional (8.5) is positive definite, since there are no
conjugate values to 0 contained in the interval ( 0, 1 ], while the modified version (8.6) is
not, because the first conjugate value π lies in the interval ( 0, 4 ].

1/7/22 80 c 2022 Peter J. Olver


Finally, let us show that, conversely, a strict local minimizer cannot admit any conjugate
points. This is based on the fact that its second variation must be positive definite, and
hence, according to the following result, cannot admit any conjugate values.
Theorem 8.13. Suppose that the quadratic functional (8.7) is positive definite. Then
there is no conjugate value in ( a, b ].
Proof : Suppose that there is a conjugate value a < c ≤ b. Let ϕ(x) ≡
6 0 be a solution
to the Jacobi equation (8.17) satisfying ϕ(a) = ϕ(c) = 0. Define the piecewise smooth
function 
ϕ(x), a ≤ x ≤ c,
ψ(x) = (8.25)
0, c ≤ x ≤ b.
Factoring our ϕ′ from the first two terms and then integrating by parts, we find
Z c Z c 
 ′2 ′ 2
 d ′ ′
Q[ ψ ] = A ϕ + 2 B ϕ ϕ + C ϕ dx = − (A ϕ ) + (C − B ) ϕ ϕ dx = 0,
a a dx
noting that the boundary contributions vanish by our assumptions on ϕ. This show that
Q[ u ] is not positive definite. Q.E.D.

9. Weak and Strong Extremals.


In the classical era of the calculus of variations, meaning before the foundational con-
tributions of Karl Weierstrass, only “weak” variations were considered, meaning, in ac-
cordance with the Remark towards the beginning of Section 8, that one compares the
candidate minimizer with functions having the property that both they and their deriva-
tives lie close to it. This was viewed as the most natural setting for both theory and
applications. However, troubling examples, similar in nature to the cautionary example
(3.38), showed that this setting is insufficient for a fully general mathematical theory. And
such analytically challenging phenomena play a key role in modern applications of the cal-
culus of variations, including homogenization theory, [14], and phase transitions in metal
alloys, [6, 28].
To proceed, let us formally define what are meant by weak and strong minimizers.
(Maximizers are, of course defined in a completely analogous manner.) We will, to keep
the exposition as simple as possible, restrict our attention to scalar first order functionals,
so that x, u ∈ R and the Lagrangian L(x, u, p) depends smoothly on x, u and the derivative
variable p ∈ R. We focus on fixed (Dirichlet) boundary conditions. We will further assume
that the allowable functions are piecewise smooth meaning that u(x) is continuous and
its derivative u′ (x) is piecewise continuous; in other words the function u(x) is allowed
to have a finite number of corners but its derivatives (slopes) must have well-defined and
finite limiting values on either side of the corner; a simple example is the absolute value
function | x | which has a corner at x = 0.
To begin with, however, let us assume that the minimizing function is smooth, meaning
that u′ (x) is also continuous. At the end of this section we will formulate additional
necessary conditions for a piecewise smooth minimizer: the Weierstrass–Erdmann corner
conditions.

1/7/22 81 c 2022 Peter J. Olver


Definition 9.1. Let J[ u ] be a first order functional that is defined on a bounded
interval [ a, b ].
• A function u(x) is a strong local minimizer if J[ u ] ≤ J[ v ] for all piecewise smooth
functions v(x) satisfying the boundary conditions and such that
| u(x) − v(x) | < r for some r > 0.
• A function u(x) is a weak local minimizer if J[ u ] ≤ J[ v ] for all piecewise smooth
functions v(x) satisfying the boundary conditions and such that
| u(x) − v(x) | < r and | u′ (x) − v ′ (x) | < r for some r > 0.
Thus, for a strong minimizer, one compares the value of the functional at the minimizer
with its values at all uniformly close functions, i.e., those whose graphs lie within a band of
width 2 r surrounding the minimizer. For a weak minimizer, one further requires that the
comparison functions have derivatives that lie uniformly close to those of the minimizer.
Clearly any strong minimizer is automatically a weak minimizer, but the converse is not
necessarily valid as the following examples demonstrate.
Example 9.2. Consider the problem of minimizing the integral
Z 1
1 ′2

J[ u ] = 2
u + 13 u′ 3 dx (9.1)
0

subject to the boundary conditions u(0) = u(1) = 0. Since the Lagrangian only depends
on u′ , the Euler–Lagrange equation (3.13) can be immediately integrated, taking the form

u′ + u′ 2 = c,
where c ∈ R is the constant of integration. Thus u′ is constant, and the solutions to the
Euler–Lagrange equation are straight lines: u = a x+b. Imposing the boundary conditions,
we infer that u(x) ≡ 0 is the unique critical function, with J[ u ] = 0. We claim that u is a
weak local minimizer but is not a strong local minimizer.
To justify the first claim, note that if | v ′ (x) | < r for all a ≤ x ≤ b, then
Z 1 Z 1
1 ′2 1 ′3
 
J[ v ] = 2
v + 3v dx = 1
2
+ 13 v ′ v ′ 2 dx > 0 = J[ u ],
0 0

provided r < 23 and v ′ (x) 6≡ 0. Indeed, one easily verifies that the strengthened Legendre
condition holds:
∂ 2L
(x, u, u′ ) = 1 + 2 u′ = 1 > 0 when u ≡ 0.
∂p2
Moreover, the Jacobi equation (8.19) is simply − ϕ′′ = 0, so if ϕ(0) = 0 and ϕ 6≡ 0, then
ϕ(x) = c x 6= 0 for all x 6= 0 and hence there are no conjugate points. Thus, u ≡ 0 satisfies
the conditions of Theorem 8.10 guaranteeing its status as a strict (weak) local minimizer.
On the other hand, given r > 0, consider the continuous, piecewise affine function

r x/(1 − r 2 ) 0 ≤ x ≤ 1 − r2 ,
vr (x) = (9.2)
(1 − x)/r, 1 − r 2 ≤ x ≤ 1,

1/7/22 82 c 2022 Peter J. Olver


r

1 − r2

Figure 17. The Function vr (x).

as graphed in Figure 17. Clearly | vr (x) | ≤ r for all 0 ≤ x ≤ 1. On the other hand, one
easily computes
r 2 (3 + 2 r − 3 r 2 ) 1 1
J[ vr ] = 2 2
+ − .
6 (1 − r ) 2 3r
As r → 0+ , the final term on the right hand side becomes increasingly large negative, while
the first two terms are bounded, and hence J[ vr ] < 0 for r sufficiently small, e.g., r < .47.
This implies that u(x) ≡ 0 is not a strong local minimizer. Although vr is uniformly close
to the zero function, its large negative slope on a small interval near x = 1 suffices to make
the functional negative. We will learn how to avoid weak minimizers that are not strong
minimizers in the following section.
Example 9.3. Consider the variational problem
Z 1
dx
J[ u ] = ′2
, u(0) = u(1) = 0. (9.3)
0 1+u

As in the preceding example, the solutions to the Euler–Lagrange equation are affine:
u = a x + b, and hence u(x) ≡ 0 is the unique critical function, with J[ u ] = 1. Moreover,
it satisfies the conditions of Theorem 8.10 and hence is a strict weak local minimizer.
However, the integral, while always positive, can be made arbitrarily small by making
the derivative u′ large. Thus, we introduce a “spikier” adaptation of the sawtooth function
(3.39):
  

 k k 2k + 1
 n x − n2 ,

n 2
≤x≤
2 n2
,
k = 0, . . . , n2 − 1,
un (x) =   (9.4)

 k + 1 2 k + 1 k + 1 n = 1, 2, 3, . . . .

 n −x , ≤x≤ ,
n2 2 n2 n2
Observe that
1
0 ≤ un (x) ≤ , u′n (x) = ± n.
n
1/7/22 83 c 2022 Peter J. Olver
We conclude that un (x) → 0 uniformly as n → ∞. Moreover, J[ un ] = (1 + n2 )−1 → 0 as
n → ∞, while, as we already noted, J[ 0 ] = 1. Thus, u(x) ≡ 0 is a weak minimizer, but not
a strong minimizer. A similar result holds for any other fixed boundary conditions: the
affine function u = c x + d that interpolates the boundary values is a weak but not strong
minimizer since one can replace it bit a nearby sawtooth-like function that has arbitrarily
small value for the functional.
A related variational problem with similar properties is
Z b
b u] = x dx
J[ . (9.5)
a 1 + u′ 2
This functional represents the resistance of a cylindrically symmetric projectile that moves
through a uniform fluid, and was introduced by Newton who asked which shape would
produce the least resistance, making this (and not the brachistochrone) the oldest genuine
problem in the calculus of variations. While the solutions to the Euler–Lagrange equation
satisfying suitable boundary conditions are weak minimizers, one can similarly replace
them by arbitrarily close jagged sawtooth-like approximants that produce an arbitrarily
small value for the functional. This counterintuitive fact was noted by Legendre, but proved
to be so disconcerting that the problem was mostly ignored (if not actively suppressed)
until Weierstrass appeared on the scene. See [31; §11.3] for details.

Extremals with Corners

There are cases where the minimizer of a variational problem is not everywhere smooth,
but has one or more corners. In the calculus of variations literature, a non-smooth critical
function is known as a broken extremal, [21, 31].

Example 9.4. Consider the minimization problem for


Z 1
J[ u ] = 1
2 u2 (u′ − 1)2 dx, u(−1) = 0, u(1) = 1. (9.6)
−1

Clearly J[ u ] ≥ 0 and is = 0 if and only if either u = 0 or u′ = 1. Thus, a global minimum


is the function 
0, −1 ≤ x ≤ 0,
u(x) = max{ 0, x } = (9.7)
x, 0 ≤ x ≤ 1.
Moreover, this is the only function that satisfies both boundary conditions and gives a zero
value to the functional, so there is no smooth minimizer. (We leave it to the reader to
investigate the nature of solutions to the Euler–Lagrange equation. Keep in mind that the
Lagrangian does not depend on x and hence the Hamiltonian function (3.27) is constant
on solutions.)

Necessary conditions for a non-smooth critical function can be derived using a similar
calculation to that used when a boundary condition was constrained to lie on a curve.
For simplicity, let us assume the function has just one corner at a < c < b. (Extending

1/7/22 84 c 2022 Peter J. Olver


v1 (x) v2 (x)

u1 (x) u2 (x)

a c c+ε b

Figure 18. Varying a Broken Extremal.

the analysis to finitely many corners is straightforward.) Thus, we suppose our minimizer
takes the form 
u1 (x), a ≤ x ≤ c,
u(x) =
u2 (x), c ≤ x ≤ b,
where, in order that u be continuous and have a corner at x = c, we require
u(c) = u1 (c) = u2 (c) whereas u′ (c− ) = u′1 (c) 6= u′2 (c) = u′ (c+ ). (9.8)
When we vary u(x) we can also vary the location of the corner, replacing c by c + ε ζ for
some ζ ∈ R; see Figure 18. Thus, the variation takes the form

u1 (x) + ε ϕ1 (x), a ≤ x ≤ c + ε ζ,
uε (x) =
u2 (x) + ε ϕ2 (x), c + ε ζ ≤ x ≤ b,
where we assume ϕ1 , ϕ2 are smooth and a ≤ c + ε ζ ≤ b. In order that the variations
remain continuous, we require
u1 (c + ε ζ) + ε ϕ1 (c + ε ζ) = u2 (c + ε ζ) + ε ϕ2 (c + ε ζ). (9.9)
The variation in the functional is thus given by
Z c+ε ζ

h(ε) = J[ uε ] = L x, u1 (x) + ε ϕ1 (x), u1′ (x) + ε ϕ1′ (x) dx
a
Z b 
+ L x, u2 (x) + ε ϕ2 (x), u2′ (x) + ε ϕ2′ (x) dx.
c+ε ζ

As always, we take the derivative of h with respect to ε and then set ε = 0, integrating
by parts at the appropriate stage. Using the formula for the derivative of an integral with
variable endpoints, the net result is
   ∂L 
0 = h′ (0) = L c, u1 (c), u1′ (c) − L c, u2 (c), u2′ (c) ζ + c, u1 (c), u1′ (c) ϕ1 (c)
∂p
Z c Z b
∂L ′

− c, u2 (c), u2 (c) ϕ2 (c) + E(L)[u1 ] ϕ1 (x) dx + E(L)[u2 ] ϕ2 (x) dx.
∂p a c
(9.10)

1/7/22 85 c 2022 Peter J. Olver


Here E(L)[ui], for i = 1, 2, denotes the usual second order Euler–Lagrange expression
(3.15) for the Lagrangian L evaluated on the function ui (x), and we have omitted the
boundary terms at x = a, b since these will vanish when u satisfies either the fixed or
natural boundary conditions with ϕ1 , ϕ2 having the appropriate behavior at their respec-
tive endpoint a, b. Thus, taking ϕ1 (x), ϕ2 (x) to have compact support in their respective
intervals ( a, c ), ( c, b ), the Fundamental Lemma 3.4 implies that both Euler–Lagrange ex-
pressions must vanish, or, equivalently, that the non-smooth critical function u(x) satisfies
the Euler–Lagrange equation for all a < x 6= c < b.
Thus, we need to analyze the remaining terms evaluated at x = c in (9.10). Differenti-
ating the continuity equation (9.9) respect to ε and setting ε = 0 produces

u1′ (c) ζ + ϕ1 (c) = u2′ (c) ζ + ϕ2 (c). (9.11)


Substituting into (9.10) yields
 
∂L ′
 ∂L ′

0= c, u1 (c), u1 (c) − c, u2 (c), u2 (c) ϕ1 (c)
∂p ∂p
  
− H c, u1 (c), u1′ (c) − H c, u2 (c), u2′ (c) ζ,
where
∂L
H(x, u, p) = p (x, u, p) − L(x, u, p) (9.12)
∂p
is the Hamiltonian function associated with the Lagrangian, cf. (3.27).
Since the variations ζ and ϕ1 (c) are independent — although once these are in hand,
ϕ2 (c) is fixed by (9.11) — their coefficients must individually vanish. Using the identifica-
tions (9.8), we deduce the two Weierstrass–Erdmann corner conditions
∂L  ∂L   
c, u(c), u′ (c− ) = c, u(c), u′ (c+ ) , H c, u(c), u′ (c− ) = H c, u(c), u′ (c+ ) ,
∂p ∂p
(9.13)

that are required of any broken extremal. In  other words, although u (x) need not be
continuous at a corner, ∂L/∂p x, u(x), u′ (x) and H x, u(x), u′ (x) are required to be
continuous there. If the broken extremal has several corners, the Weierstrass–Erdmann 

corner conditions (9.13) must hold at each one, or, equivalently, ∂L/∂p x, u(x), u (x) and
H x, u(x), u′ (x) must be continuous on the entire interval [ a, b ].
Example 9.5. In the case of Example 9.4, we have
∂L
L(x, u, p) = 1
2 u2 (p − 1)2 , (x, u, p) = u2 (p − 1), H(x, u, p) = 1
2 u2 (p2 − 1),
∂p
and so the corner conditions (9.13) read
 
1
2 u(c)2 (u′ (c− ) − 1)2 = 1
2 u(c)2 (u′ (c+ ) − 1)2 , 1
2 u(c)2 1 − u′ (c− )2 = 1
2 u(c)2 1 − u′ (c+ )2 ,

which are trivially satisfied for the minimizer (9.7) at c = 0, since u(0) = 0, u′ (0− ) = 0,
u′ (0+ ) = 1.

1/7/22 86 c 2022 Peter J. Olver


Example 9.6. On the other hand, for the arc length functional (2.3),
p ∂L p 1
L(x, u, p) = 1 + p2 , (x, u, p) = p , H(x, u, p) = − p ,
∂p 1 + p2 1 + p2
and hence the corner conditions (9.13) are
u′ (c− ) u′ (c+ ) 1 1
p =p , p =p .
1 + u′ (c− )2 1 + u′ (c+ )2 1 + u′ (c− )2 1 + u′ (c+ )2
These immediately imply that u′ (c− ) = u′ (c+ ), and hence, as we know, a minimizing
geodesic curve cannot contain a corner.
The latter example is a special case of the following result.
Proposition 9.7. If the Lagrangian satisfies
∂ 2L 
x, u, p 6= 0 for all (x, u, p) ∈ R 3 , (9.14)
∂p2
then no critical function can have a corner.
Proof : We apply the Mean Value Theorem to the first corner condition (9.13) to deduce
that
′ + ′ −
 ∂ 2L ∗
0 = u (c ) − u (c ) x, u(x), p
∂p2
for some p∗ that lies between u′ (c+ ) and u′ (c− ). Thus, (9.14) implies that u′ (c+ ) = u′ (c− )
and hence there is no corner. Q.E.D.

10. The Royal Road.


The discovery of unexpected and unnerving counterexamples, such as those in Examples
9.2–9.4, to the classical variational analysis of the Bernouillis, Euler, Lagrange, Jacobi, etc.,
inspired Weierstrass, in many ways the founder of modern analysis, to delve deeply into the
foundations of the subject from the rigorous analytic point of view that he was developing
for ordinary calculus. Later, David Hilbert refined and further developed Weierstrass’
breakthroughs. In the 1920’s, the Greek mathematician Constantin Carathéodory, devised
what later became known as the “Royal Road” to their results, [10]. Rather than pursue
a convoluted historical development, we will proceed directly to Carathéodory’s key idea,
and use it to motivate and establish the earlier contributions of Weierstrass and Hilbert.
Fields of Extremals
Let U ⊂ R 2 be an open subset. In the calculus of variations, a field † of curves on U
means a one-parameter family of non-intersecting regular plane curves Cσ ⊂ U , parame-
trized by σ ∈ J where J ⊂ R is an open interval, such that every point (x, y) ∈ U belongs


Which has nothing to do with the algebraic notion of a field like R or C.

1/7/22 87 c 2022 Peter J. Olver


to one and only one such curve: (x, y) ∈ Cσ . Thus, the curves fill up the domain U and,
moreover, Cσ ∩ Cτ = ∅ when σ 6= τ . In more standard mathematical terminology, a
field of curves is a foliation of U by curves (one-dimensional submanifolds), [53]. We shall
assume, for simplicity, that the curves Cσ are all given by graphs of C2 functions, namely
y = uσ (x) = u(x, σ) where x ∈ Iσ ⊂ R belongs to an open interval, which may depend on
the parameter σ, and such that uσ (x) 6= uτ (x) for all x (where defined) when σ 6= τ .
The slope function ψ: U → R of a field of curves is defined so that its value at a point
in U equals the slope of the curve Cσ passing through it. In other words,
∂u
ψ(x, y) = uσ′ (x) = (x, σ) when y = uσ (x), (x, y) ∈ U.
∂x
Vice versa, a field of curves is uniquely determined by its slope function; indeed, the curves
Cσ can be identified with the graphs of the solutions to the first order ordinary differential
equation
du
= ψ(x, u), (10.1)
dx
where the parameter σ corresponds to the initial condition that prescribes the solution. The
basic Existence and Uniqueness Theorem for ordinary differential equations, [9], implies
that, provided the slope function ψ(x, y) is continuously differentiable, the solution curves
to (10.1) cannot intersect each other, and so necessarily form a field of curves (a foliation).
Consider a first order scalar functional
Z b
J[ u ] = L(x, u, u′ ) dx. (10.2)
a

By a field of extremals ‡ we mean a field of curves each of which is the graph of a critical
function, i.e., a solution to the Euler–Lagrange equation. This imposes the following
condition on the associated slope function ψ(x, u). Differentiating (10.1) and using the
chain rule,
d2 u ∂ψ ∂ψ du ∂ψ ∂ψ
2
= + = +ψ . (10.3)
dx ∂x ∂u dx ∂x ∂u
Substituting (10.1) and (10.3) into the Euler–Lagrange equation (3.14), we deduce that
ψ(x, u) must satisfy the first order partial differential equation
 
∂ψ ∂ψ ∂ 2 L ∂ 2L ∂ 2L ∂L
+ψ 2
(x, u, ψ) + ψ (x, u, ψ) + (x, u, ψ) − (x, u, ψ) = 0.
∂x ∂u ∂p ∂u ∂p ∂x ∂p ∂u
(10.4)
The general theory of first order partial differential equations, [10, 17], implies that, un-
der fairly mild assumptions on the Lagrangian L, there exist solutions to the slope equa-
tion (10.4) on suitable open domains U ⊂ R 2 .


As in the Remark following Theorem 3.1, any solution of the Euler–Lagrange equation is referred
to in the calculus of variations literature as an “extremal” whether or not it provides an extreme
value for the functional — a maximizer or minimizer. To us, a preferable terminology here, more
in tune with the rest of mathematics, would be critical foliation. However, slightly reluctantly,
we shall retain the standard calculus of variations version.

1/7/22 88 c 2022 Peter J. Olver


In the derivation of sufficient conditions for a candidate minimizer u(x), one assumes
that it can be embedded in a field of extremals, meaning that its graph is contained in one
of the field curves.
Theorem 10.1. Let u(x) for a ≤ x ≤ b be a critical function and suppose that
∂ 2L ′

x, u(x), u (x) 6= 0 for a ≤ x ≤ b. If the interval ( a, b ] does not contain a conjugate
∂p2
point, then there exists a field of extremals defined on an open set U ⊃ C0 thereof.
Proof : As in the argument leading to Theorem 8.10, if ( a, b ] contains no conjugate
point to a, a slightly larger interval ( a − ε, b + ε ), for ε > 0 sufficiently small, contains
no conjugate point to a − ε for u(x) extended to [ a − ε, b + ε ]. Also, by continuity, the
non-vanishing condition on ∂ 2 L/∂p2 continues to hold on this larger interval.
As in (8.15), but now with a replaced by a−ε, let ut (x) denote the one-parameter family
of solutions satisfying the initial conditions
ut (a − ε) = u(a − ε), u′t (a) = u′ (a − ε) + t. (10.5)
Given that there are no conjugate points to a − ε, there exists δ > 0 such that the graphs
of the solutions ut and u cannot intersect on ( a − ε, b + ε ) for 0 < | t | < δ. Moreover, by
continuous dependence of solutions of the Euler–Lagrange equation on their initial data,
the union of the graphs of these solutions over ( a −ε, b +ε ) forms an open set U containing
the graph of u = u0 , and thus serves to define the desired field of extremals. Q.E.D.

Equivalent Functionals
It turns out that Carathéodory’s Royal Road is paved by null Lagrangians! In Section 4,
we showed how null Lagrangians enable one to enlarge the range of variationally admissible
natural boundary conditions. Here we will employ them to solve the strong minimization
problem!
Let
d ∂S ∂S
N (x, u, p) = S(x, u) = +p (10.6)
dx ∂x ∂u
be a null Lagrangian prescribed by a function S(x, u). As noted in the remarks after (4.10),
the modified functional
Z b Z b
 
e u] =
J[ e u, u ) dx =
L(x, ′
L(x, u, u′ ) − N (x, u, u′ ) dx
a a
Z b  (10.7)
′ ∂S ′ ∂S
 
= L(x, u, u ) − −u dx = J[ u ] − S b, β + S a, α
a ∂x ∂u
has the same Euler–Lagrange equations and hence the same critical functions as J[ u ]. In
e u ] are equivalent functionals.
other words, J[ u ] and J[
Given a field of extremals containing our candidate minimizer u(x), let ψ(x, y) denote
the associated slope function. Suppose we can find a null Lagrangian (10.6) such that the
modified Lagrangian

e u, p) = L(x, u, p) − N (x, u, p) = L(x, u, p) − ∂S (x, u) − p ∂S (x, u)


L(x, (10.8)
∂x ∂u
1/7/22 89 c 2022 Peter J. Olver
enjoys the following property:
e u, ψ(x, u)) = 0,
L(x, while e u, p) > 0
L(x, whenever p 6= ψ(x, u). (10.9)
In other words, at each point (x, u) ∈ U , the domain of definition of our field of extremals,
the modified Lagrangian L(x,e u, p) has a strict local minimum when p = ψ(x, u) assumes
the value of the slope function at the point (x, u). On first thought, the existence of such
a “magic” null Lagrangian seems highly unlikely, but it turns out that, under appropriate
conditions on the minimizer that are satisfied in a large range of applications, it exists
and, moreover, can be employed to determine sufficient conditions for a strong minimizer.
Before delving into the details, let us explain why its existence enables us to establish the
strong minimization property.
Given (10.9), if v(x) is any function whose graph belongs to U , then

Le x, v(x), v ′(x) ≥ 0,
and

e x, v(x), v ′ (x) ≡ 0
L if and only if v ′ (x) = ψ(x, v(x)) for all a ≤ x ≤ b.
Uniqueness of solutions to the slope equation (10.1) implies that this occurs if and only
if v = uσ coincides with one of critical functions whose graph is contained in the field of
extremals. We have thus proved that
e v] ≥ 0
J[ and e v] = 0
J[ if and only if v = uσ , (10.10)
for some value of the field parameter σ. This result can then be easily converted into
the following sufficient conditions that guarantee that a solution to the Euler–Lagrange
equation is a strong local minimizer.
Theorem 10.2. Let u(x) be a critical function for the variational problem (10.2)
subject to fixed boundary conditions u(a) = α, u(b) = β. Suppose we can embed its graph
{ (x, u(x)) | a ≤ x ≤ b } ⊂ U in a field of extremals defined on an open subset U ⊂ R 2
with slope function ψ(x, u). Suppose further we can construct an equivalent Lagrangian
e u, p) with the property (10.9). Then u(x) is a strict strong local minimizer of both
L(x,
e u ] subject to the prescribed fixed boundary conditions.
functionals J[ u ] and J[
Proof : First note that, since U is open and contains the graph of u, which is compact,
there exists r > 0 such that
Ur = { (x, y) | a ≤ x ≤ b, | y − u(x) | < r } ⊂ U. (10.11)
If v is any function whose graph lies in Ur ⊂ U and has the same boundary values as u,
then, according to (10.7, 10)
     
e v ] + S b, β − S a, α ≥ S b, β − S a, α = J[
J[ v ] = J[ e u ] + S b, β − S a, α = J[ u ],
with equality if and only if v = uσ is one of the extremals in the field. Moreover, since
one and only one extremal passes through each point of U , the only possibility for equality
when v also has the same boundary values as u is when v = u itself. Thus, u satis-
fies the requirements of Definition 9.1, confirming that it is indeed a strict strong local
minimizer. Q.E.D.

1/7/22 90 c 2022 Peter J. Olver


Thus, our remaining task becomes to ascertain when such a “magic” null Lagrangian
exists — that is, when can we construct a function S(x, u) such that N = dS/dx induces
the minimization property (10.10). Referring back to (10.8), the first requirement in (10.9)
means that
 ∂S ∂S
L x, u, ψ(x, u) − (x, u) − ψ(x, u) (x, u) = 0. (10.12)
∂x ∂u
The second requirement is that, for fixed x, u, the scalar function
∂S
e u, p) = L(x, u, p) − ∂S
g(p) = L(x, (x, u) − p (x, u)
∂x ∂u
has a minimum when p = ψ(x, u). According to basic calculus, this requires
dg  ∂L  ∂S
0= ψ(x, u) = x, u, ψ(x, u) − (x, u), (10.13)
dp ∂p ∂u
and, moreover,
d2 g  ∂ 2L  ∂ 2L
0≤ ψ(x, u) = x, u, ψ(x, u) = (x, u, u′ ), (10.14)
dp2 ∂p2 ∂p2
which is the weakened Legendre condition (8.10). Putting (10.12, 13) together, we deduce
that
∂S  ∂L 
(x, u) = L x, u, ψ(x, u) − ψ(x, u) x, u, ψ(x, u) ,
∂x ∂p
(10.15)
∂S ∂L 
(x, u) = x, u, ψ(x, u) .
∂u ∂p
The existence of a function S(x, u) requires that the integrability condition imposed by
cross differentiation of the two equations (10.15) be satisfied:

∂L ∂ 2L ∂ψ ∂ 2 L
(x, u, ψ) − ψ (x, u, ψ) − ψ (x, u, ψ)
∂u ∂u ∂p ∂u ∂p2
∂ 2S ∂ 2L ∂ψ ∂ 2 L
= = (x, u, ψ) + (x, u, ψ),
∂x ∂u ∂x ∂p ∂x ∂p2
which reproduces the condition (10.4), and hence is automatically valid. Thus we are
assured of the existence of a null Lagrangian that satisfies the first condition (10.13). The
resulting integrated null Lagrangian
Z b 
∂S ′ ∂S
I[ u ] = +u dx = S(b, β) − S(a, α)
a ∂x ∂u
Z b  (10.16)
 
′ ∂L

= L x, u, ψ(x, u) − ψ(x, u) − u x, u, ψ(x, u) dx
a ∂p
depends only on the boundary values of u(x) and thus its value is the same for any
function that satisfies the fixed boundary conditions. The left hand side of (10.16) is
known as Hilbert’s invariant integral, where “invariance” refers to its independence of the
path taken between the points (a, α) and (b, β); see the discussion following (4.9).

1/7/22 91 c 2022 Peter J. Olver


L(x, u, p)

Tq (x, u, p)

p
q

Figure 19. The Weierstrass Condition.

The Excess Function


Let us now investigate what the second derivative condition (10.14) requires of our null
Lagrangian. First, combining (10.8, 15), the modified Lagrangian has the explicit formula

e u, p) = E x, u, ψ(x, u), p ,
L(x, (10.17)
where
∂L
E(x, u, q, p) = L(x, u, p) − L(x, u, q) − (p − q) (x, u, q) (10.18)
∂p
is known as the Weierstrass excess function. Thus, the key condition (10.9) can be written
simply as

E x, u, ψ(x, u), p ≥ 0, with equality only when p = ψ(x, u). (10.19)
This will, of course, be ensured by the stronger requirement

E(x, u, q, p) ≥ 0, with equality only when p = q. (10.20)


The conditions (10.19–20) were discovered by Weierstrass during his investigations into
why the classical sufficiency conditions for weak minimizers did not guarantee the strong
minimizer property.
To give a geometrical interpretation of the excess function conditions, we note that
∂L
E(x, u, q, p) = L(x, u, p)−Tq (x, u, p), where Tq (x, u, p) = L(x, u, q)+(p − q) (x, u, q),
∂p
as a function of p for fixed x, u, q, is the equation of the tangent line to the graph of L(x, u, p)
at the point p = q. Thus, Weierstrass’ condition (10.20) requires that the graph of L(x, u, p)
lie strictly above the tangent line when p 6= q, as sketched in Figure 19 — the blue curve
on top represents L(x, u, p) as a function of p for fixed x, u, while the purple tangent line
to its graph at p = q lies everywhere below it. Further, observe that the strengthened

1/7/22 92 c 2022 Peter J. Olver


Legendre condition (8.14) ensures that L(x, u, p), as a function of p, is strictly convex near
q = u′ , and hence lies above its tangent line nearby; however, this does not guarantee that
it holds for values of p that are far away, as required by the Weierstrass condition. Indeed,
this failure of (10.20) is exactly what enables the phenomenon epitomized by Example 9.2,
where a nearby function with a large negative slope makes the Lagrangian highly negative.
p
Example 10.3. For the arc length Lagrangian L = 1 + p2 , the Weierstrass excess
function is
p p
p p q 1 + p2 1 + q 2 − (1 + p q)
2 2
E(x, u, q, p) = 1 + p − 1 + q − (p − q) p = p .
1 + q2 1 + q2
The numerator is ≥ 0 with equality if and only if p = q, since

(1 + p2 )(1 + q 2 ) − (1 + p q)2 = (p − q)2 .

Thus, the Weierstrass condition (10.20) holds. Moreover, we can easily embed any mini-
mizer in a field of extremals — for example, straight lines all possessing the same slope.
We have thus finally rigorously proved our expectation that a straight line is a strong
minimizer of the arc length functional.
1
Example 10.4. On the other hand, for the Lagrangian L = 2
p2 + 13 p3 associated with
the variational problem (9.1), we find

E(x, u, q, p) = 1
2
p2 + 31 p3 − 21 q 2 − 13 q 3 − (p − q) (q + q 2 ) = 1
2
p2 + 31 p3 − p q − p q 2 + 21 q 2 + 23 q 3 ,

which, for any fixed value of q, will be arbitrarily large negative when p ≪ 0 is large
negative. Thus, the Weierstrass condition (10.20) fails, which explains why u(x) ≡ 0 is
not a strong minimizer.

While positivity of the Weierstrass excess function suffices for the extremal to be a strong
local minimizer, its non-negativity provides an additional necessary condition that must
be satisfied by a strong minimizer.

Theorem 10.5. If u(x) is a strong minimizer, then

E(x, u(x), u′(x), p) ≥ 0 for all a ≤ x ≤ b, p ∈ R. (10.21)

Proof : Let p ∈ R, and let ε > 0 be such that a < c − ε < c < b. Consider the following
variation: 
 u(x) + s (x − a),
 a ≤ x ≤ c − ε,
uε (x) = p (x − c) + u(c), c − ε ≤ x ≤ c,


u(x), c ≤ x ≤ b,

where we require s ∈ R satisfy

s (c − ε − a) = p ε + u(c) − u(c − ε) (10.22)

1/7/22 93 c 2022 Peter J. Olver


vε (x)

u(x)

a c−ε c b

Figure 20. A Strong Variation of u(x).

in order that uε be continuous at x = c−ε. See Figure 20. Observe that (10.22) guarantees
that s → 0 as ε → 0+ , and hence uε (x) is uniformly close to u(x). However, when
c − ε ≤ x ≤ c, the graph of uε (x) is a straight line whose slope, p, does not go to zero as
ε → 0+ . Thus, uε (x) constitutes a strong but not a weak variation of u(x) = u0 (x).
For ε > 0, consider
Z c−ε

h(ε) = J[ uε ] = L x, u(x) + s (x − a), u′ (x) + s dx
a
Z c Z b
 
+ L x, u(c) + p (x − c), p dx + L x, u(x), u′ (x) dx.
c−ε c

Since we are assuming u is a strong local minimum, h(ε) must reach a minimum when
ε → 0+ , and hence h′ (0+ ) ≥ 0. We differentiate (10.22) to compute
ds
σ (c − a) = u′ (c) − q, where σ= . (10.23)
dε ε = 0+
Thus, employing an integration by parts, and using the fact that u satisfies the Euler–
Lagrange equation E(L) = 0, we find
 
0 ≤ h′ (0+ ) = − L c, u(c), u′ (c) + L c, u(c), p
Z c 
∂L ′
 ∂L ′

+ x, u(x), u (x) t (x − a) + x, u(x), u (x) t dx
a ∂u ∂p
  ∂L 
= L c, u(c), p − L c, u(c), u′ (c) + c, u(c), u′ (c) σ (c − a)
∂p
Z c 
∂L ′
 d ∂L ′

+ x, u(x), u (x) − x, u(x), u (x) σ (x − a) dx
a ∂u dx ∂p
  ∂L 
= L c, u(c), p − L c, u(c), u′ (c) − (u′ (c) − p) c, u(c), u′ (c)
∂p


= E c, u(c), u (c), p .

1/7/22 94 c 2022 Peter J. Olver


Since this is valid for all a < c < b, by continuity it also holds when c = a or c = b, thus
establishing Weierstrass’ necessary condition (10.21). Q.E.D.

11. Multi-dimensional Variational Problems.


The methods of variational analysis can be applied to an enormous variety of physical
systems, whose equilibrium configurations inevitably minimize a suitable functional, which,
typically, represents the potential energy of the system, and, in higher dimensions, is
represented by a multiple integral. Following similar computational procedures as in the
one-dimensional version, we find that the critical functions are characterized as solutions to
a system of partial differential equations, known as the Euler–Lagrange equations derived
from the variational principle. Each solution to the associated boundary value problem is,
thus, a candidate minimizer. In many applications, the Euler–Lagrange boundary value
problem suffices to single out the physically relevant solutions, and one need not press on
to the considerably more difficult second variation considerations.
Implementation of the variational calculus for functionals in higher dimensions will be
illustrated by looking at a specific example — a first order variational problem involving a
single scalar function of two variables. Once this is fully understood, generalizations and
extensions to higher dimensions and/or higher order Lagrangians depending on several
unknown functions follow from similar manipulations; see, for instance, [42] for the gen-
eral case. In this section, subscripts are used to indicate partial derivatives of functions;
for example ux = ∂u/∂x, uxy = ∂ 2 u/∂x ∂y, and so on. We thus consider an objective
functional ZZ
J[ u ] = L(x, y, u, ux, uy ) dx dy, (11.1)

having the form of a double integral over a prescribed domain Ω ⊂ R 2 , which, for simplicity,
is assumed to be bounded. The Lagrangian L(x, y, u, p, q) is assumed to be a sufficiently
smooth function of its five arguments. Our goal is to find the function(s) u = f (x, y) that
minimize the value of J[ u ] when subject to a set of prescribed boundary conditions on ∂Ω,
which is assumed to be reasonably nice, e.g., piecewise smooth. The two principal types
of boundary conditions are (a) the Dirichlet or fixed boundary value problem that require
that the minimizer satisfy
u(x, y) = g(x, y), (11.2)

for (x, y) ∈ ∂Ω and (b) free boundary conditions, in which no conditions are a priori
imposed on the boundary behavior of u. More generally, one can have mixed boundary
conditions, in which the boundary is split into two subsets, Ω = Ω1 ∪ Ω2 , with fixed
boundary conditions imposed on the first and free boundary conditions on the second. As
in the one-dimensional case, the variational calculus will impose certain natural boundary
conditions on the free part of the boundary.

The First Variation and the Euler–Lagrange Equations

The basic necessary condition for an extremum (minimum or maximum) is obtained


in precisely the same manner as in the one-dimensional framework. Consider the scalar

1/7/22 95 c 2022 Peter J. Olver


function
ZZ
h(ε) ≡ J[ u + ε ϕ ] = L(x, y, u + ε ϕ, ux + ε ϕx , uy + ε ϕy ) dx dy

depending on ε ∈ R. The variation ϕ(x, y) is assumed to satisfy homogeneous Dirichlet


boundary conditions
ϕ(x, y) = 0 for (x, y) ∈ ∂Ω, (11.3)
to ensure that u + ε ϕ satisfies the same boundary conditions (11.2) as u itself. Under
these conditions, if u is a minimizer, then the scalar function h(ε) will have a minimum at
ε = 0, and hence
h′ (0) = 0.
When computing h′ (ε), we assume that the functions involved are sufficiently smooth so
as to allow us to bring the derivative inside the integral, and then apply the chain rule. At
ε = 0, the result is
ZZ  
′ d ∂L ∂L ∂L
h (0) = J[ u + ε ϕ ] = ϕ + ϕx + ϕy dx dy, (11.4)
dε ε=0 Ω ∂u ∂p ∂q
where the derivatives of L are all evaluated at x, y, u, ux, uy . To identify the functional
gradient, we need to rewrite this integral in the form of an inner product:
ZZ

h (0) = h ∇J[ u ] , ϕ i = ψ(x, y) ϕ(x, y) dx dy, where ψ = ∇J[ u ].

To convert (11.4) into this form, we need to remove the offending derivatives from ϕ.
In two dimensions, the requisite integration by parts formula is based on Green’s Theo-
rem, which is written in divergence form, [43]:
ZZ   I ZZ  
∂ϕ ∂ϕ ∂w1 ∂w2
w + w dx dy = ϕ (− w2 dx + w1 dy) − ϕ + dx dy,
Ω ∂x 1 ∂y 2 ∂Ω Ω ∂x ∂y
(11.5)
in which w1 , w2 are arbitrary smooth functions, and the boundary term is a line integral
around ∂Ω, which is oriented in the counterclockwise direction. Equivalently, we can write
(11.5) in vectorial form
ZZ I ZZ
∇ϕ · w dx dy = ϕ w · n ds − ϕ ∇ · w dx dy, (11.6)
Ω ∂Ω Ω

where w = (w1 , w2 ) is a vector field, s denotes arc length, and
 n = dy/ds, − dx/ds is
the unit outward normal on ∂Ω. Setting w = ∂L/∂p, ∂L/∂q , we find
ZZ   I
∂L ∂L
ϕx + ϕy dx dy = ϕ w · n ds
Ω ∂p ∂q ∂Ω
ZZ      (11.7)
∂L ∂L
− ϕ Dx + Dy dx dy.
Ω ∂p ∂q

1/7/22 96 c 2022 Peter J. Olver


The differential operators Dx , Dy are total derivatives with respect to x, y, respectively, in
which the function being differentiated depends on one or more functions and their deriva-
tives; this notation is introduced in order to distinguish from the usual partial derivatives.
For example, given a function L(x, y, u, p, q) in which p, q represent the first order partial
derivatives of u, its total derivatives are, in view of the chain rule,
∂L ∂L ∂L ∂L
Dx L(x, y, u, ux, uy ) = + ux + uxx + uxy ,
∂x ∂u ∂p ∂q
(11.8)
∂L ∂L ∂L ∂L
Dy L(x, y, u, ux, uy ) = + uy + uxy + uyy ,
∂y ∂u ∂p ∂q
where the indicated partial derivatives of L are all evaluated at (x, y, u, ux, uy ).
The boundary integral in (11.7) vanishes if and only if, and each point of ∂Ω, either
ϕ = 0, which follows from the fixed boundary conditions (11.2), or the natural boundary
condition  
∂L ∂L
w·n= , ·n=0 (11.9)
∂p ∂q
holds. At each point on the boundary, one or the other must be imposed. The natural
boundary conditions can be modified by addition of a null Lagrangian, but there are
restrictions on the types of boundary conditions that admit a variational formulation; see
[48] for details.
Further, substituting (11.7) back into (11.4), we conclude that
ZZ     
′ ∂L ∂L ∂L
h (0) = ϕ − Dx − Dy dx dy = h ∇J[ u ] , ϕ i, (11.10)
Ω ∂u ∂p ∂q
where    
∂L ∂L ∂L
∇J[ u ] = E(L) = − Dx − Dy (11.11)
∂u ∂p ∂q
is the Euler–Lagrange expression that gives the desired first variation or functional gra-
dient. Since the gradient must vanish at a critical function, using the two-dimensional
counterpart of the Fundamental Lemma 3.4 — which is proved in an analogous fashion —
we conclude that the minimizer u(x, y) must satisfy the Euler–Lagrange equation
   
∂L ∂L ∂L
(x, y, u, ux, uy ) − Dx (x, y, u, ux, uy ) − Dy (x, y, u, ux, uy ) = 0. (11.12)
∂u ∂p ∂q
Once we explicitly evaluate the total derivatives using (11.8), the net result is a second
order partial differential equation

Lu − Lxp − Lyq − ux Lup − uy Luq − uxx Lpp − 2 uxy Lpq − uyy Lqq = 0, (11.13)
where we use subscripts to indicate derivatives of L, which are evaluated at (x, y, u, p, q) =
(x, y, u, ux, uy ).
In general, the solutions to the Euler–Lagrange boundary value problem are critical
functions for the variational problem, and hence include all (smooth) local and global min-
imizers. Determination of which solutions are genuine minima requires a further analysis

1/7/22 97 c 2022 Peter J. Olver


of the positivity properties of the second variation, which is beyond the scope of our intro-
ductory treatment. Indeed, a complete analysis of the positive definiteness of the second
variation of multi-dimensional variational problems is quite complicated, and still awaits
a completely satisfactory resolution!

Example 11.1. As a first elementary example, consider the Dirichlet minimization


problem
ZZ ZZ
1 2
 1 2 2
 
J[ u ] = 2 k ∇u k − f u dx dy = 2 u x + u y − f (x, y) u dx dy, (11.14)
Ω Ω

where f : Ω → R is a specified function. In this case, the associated Lagrangian is


∂L ∂L ∂L
L = 12 (p2 + q 2 ) − f (x, y) u, with = −f, = p = ux , = q = uy .
∂u ∂p ∂q
Therefore, the Euler–Lagrange equation (11.12) becomes

−Dx (ux ) − Dy (uy ) − f = − uxx − uyy − f = − ∆u − f = 0, (11.15)


which is the two-dimensional Poisson equation, or, in the homogeneous case when f ≡ 0,
the Laplace equation, [43]. The natural boundary conditions involve the vanishing of
the normal component of w = (p, q) = ∇u, which is the normal derivative of u at the
boundary:
∂u
∇u · n = = 0.
∂n
These are known as homogeneous Neumann boundary conditions. Thus, to formulate the
boundary value problem for the Poisson equation in variational form using the Dirichlet
functional, at each point on ∂Ω we must impose either inhomogeneous Dirichlet bound-
ary conditions or homogeneous Neumann boundary conditions. Subject to the selected
boundary conditions, the solutions, i.e., the harmonic functions, are critical functions for
the Dirichlet variational principle.

Example 11.2. The deformations of an elastic body Ω ⊂ R n are described by the


displacement field, u: Ω → R n . Each material point x ∈ Ω in the undeformed body will
move to a new position y = x + u(x) in the deformed body

e = { y = x + u(x) | x ∈ Ω } .

The one-dimensional case governs bars, beams and rods, two-dimensional bodies include
thin plates and shells, while n = 3 for fully three-dimensional solid bodies. See [3, 25] for
details and physical derivations.
For small deformations, we can use a linear theory to approximate the more complicated
equations of nonlinear elasticity. The simplest case is that of a homogeneous and isotropic
planar body Ω ⊂ R 2 . The equilibrium configurations are described by the displacement
 
ux uy
function u(x) = ( u(x, y), v(x, y) ) whose Jacobian matrix is denoted by ∇u = .
vx vy

1/7/22 98 c 2022 Peter J. Olver


In the absence of external forcing, detailed physical analysis of the underlying constitutive
assumptions leads to a minimization principle based on the following functional:
ZZ
1 
J[ u, v ] = 2
µ k ∇u k2 + 12 (λ + µ)(∇ · u)2 dx dy
Z ZΩ (11.16)
 1  2 2 1 2 2

= 2
λ + µ (ux + vy ) + 2 µ (uy + vx ) + (λ + µ) ux vy dx dy.

The parameters λ, µ are known as the Lamé moduli of the material, and govern its intrinsic
elastic properties. For a homogeneous isotropic material, which we assume here, they
are constant. They are measured by performing suitable experiments on a sample of
the material. Physically, (11.16) represents the stored (or potential) energy in the body
under the prescribed displacement, and Nature, as always, seeks the displacement that will
minimize the total energy.
To determine the Euler–Lagrange equations, we consider the functional variation
h(ε) = J[ u + ε ϕ, v + ε ψ ],
in which the individual variations ϕ, ψ are arbitrary functions subject only to the given
homogeneous boundary conditions. If u, v minimize J, then h(ε) has a minimum at ε = 0,
and so we are led to compute
ZZ

h (0) = h ∇J , ϕ i = (ϕ ∇u J + ψ ∇v J) dx dy,

which we write as an inner product (using the standard L2 inner product between vector
fields) between the variation ϕ = (ϕ, ψ) and the functional gradient ∇J = ( ∇u J, ∇v J ).
For the particular functional (11.16), we find h′ (0) equals
ZZ
  
λ + 2 µ (ux ϕx + vy ψy ) + µ (uy ϕy + vx ψx ) + (λ + µ) (ux ψy + vy ϕx ) dx dy.

We use the integration by parts formula (11.5) to remove the derivatives from the variations
ϕ, ψ. Discarding the boundary integrals, which are used to prescribe the variationally
admissible boundary conditions, we find
ZZ   !
(λ + 2 µ) u xx + µ u yy + (λ + µ) vxy ϕ
h′ (0) = −   dx dy.
Ω + (λ + µ) uxy + µ vxx + (λ + 2 µ) vyy ψ
The two terms in braces give the two components of the functional gradient. Setting them
equal to zero produces the second order linear system of Euler–Lagrange equations
(λ + 2 µ) uxx + µ uyy + (λ + µ) vxy = 0,
(11.17)
(λ + µ) uxy + µ vxx + (λ + 2 µ) vyy = 0,
known as Navier’s equations can be compactly written as
µ ∆u + (µ + λ) ∇(∇ · u) = 0 (11.18)
for the displacement vector u = ( u, v ). Its solutions are the critical displacements that,
under appropriate boundary conditions, minimize the potential energy functional.

1/7/22 99 c 2022 Peter J. Olver


Minimal Surfaces

As our next example, let us analyze the minimal surface problem, i.e., finding the surface
S ⊂ R 3 whose boundary is a specified space curve, ∂S = C, that minimizes the total
surface area. Initially we will assume, in order to simplify the exposition, that the surface
is described by the graph of a function z = u(x, y), although the ensuing results will apply
as stated to general parametrized surfaces.
From (2.10), the surface area integral
ZZ p p
J[ u ] = 1 + u2x + u2y dx dy has Lagrangian L = 1 + p2 + q 2 . (11.19)

Note that
∂L ∂L p ∂L q
= 0, =p , =p .
∂u ∂p 1 + p2 + q 2 ∂q 1 + p2 + q 2
Therefore, replacing p → ux and q → uy and then evaluating the derivatives, the Euler–
Lagrange equation (11.12) becomes

ux uy − (1 + u2y )uxx + 2 ux uy uxy − (1 + u2x )uyy


− Dx p − Dy p = = 0.
1 + u2x + u2y 1 + u2x + u2y (1 + u2x + u2y )3/2
(11.20)
Thus, u(x, y) is a critical function, and hence a candidate for minimizing surface area,
provided it satisfies the minimal surface equation

(1 + u2y ) uxx − 2 ux uy uxy + (1 + u2x ) uyy = 0. (11.21)

We are confronted with a complicated, nonlinear, second order partial differential equation,
which has been the focus of some of the most sophisticated and deep analysis over the past
two centuries. We refer the interested reader to [24, 27, 38, 39] for classical and modern
developments.
It turns out that one can employ a combination of basic surface differential geometry,
[24], and complex analysis, [2, 46], to derive a large class of solutions to the minimal
surface equation. Consider a parametrized surface S ⊂ R 3 , given by the image of
T
x(ξ, η) = ( x(ξ, η), y(ξ, η), z(ξ, η) ) , (11.22)

where the parameters range over a connected open subset of the plane: (ξ, η) ∈ U ⊂ R 2 .
The tangent plane at a point x(ξ, η) ∈ S is spanned by the two tangent vectors
 T  T
∂x ∂y ∂z ∂x ∂y ∂z
xξ = , , , xη = , , ,
∂ξ ∂ξ ∂ξ ∂η ∂η ∂η
which are required to be linearly independent in order to avoid singularities; equivalently,
we require xξ × xη 6= 0, where × is used to denote the cross product in R 3 . The surface

1/7/22 100 c 2022 Peter J. Olver


unit normal † is then given by
xξ × xη
n= with k n k = 1. (11.23)
k xξ × xη k

We should also assume that the surface is simple, meaning that it does not self intersect,
b ηb ) whenever (ξ, η) 6= (ξ,
so that x(ξ, η) 6= x(ξ, b ηb ). However, since all our considerations
are local, this global condition can be ignored in the ensuing analysis; moreover, the
Implicit Function Theorem, [4, 35], implies that a surface is locally non-self-intersecting
nearby any nonsingular point. We will sometimes assume that the surface can be locally
identified with the graph of a function z = u(x, y), in which case we can parametrize S
by x(x, y) = (x, y, u(x, y))T ; this is referred to as a Monge patch on the surface. Again
by the Implicit Function Theorem, this is locally true provided the last component of n
is nonzero, xξ yη − xη yξ 6= 0; if one of the other components is non-zero, one can do the
same by relabelling the x, y, z coordinates.
The surface metric tensor , also known as the “First Fundamental Form”, is traditionally
denoted by
ds2 = dx · dx = E dξ 2 + 2 F dξ dη + G dη 2 , (11.24)
where dx = xξ dξ + xη dη, and hence

E = k xξ k2 , F = xξ · xη , G = k xη k2 . (11.25)
The mean and Gauss curvatures of S are given, respectively, by, cf. [24; Theorem 13.25],

eG − 2f F + g E eg − f2
H= , K= , (11.26)
2 (E G − F 2 ) EG − F2
where
e = xξξ · n, f = xξη · n, g = xηη · n, (11.27)
are the coefficients of the “Second Fundamental Form”. In particular, on a Monge patch,

E = 1 + u2x , F = ux uy , G = 1 + u2y , , E G − F 2 = 1 + u2x + u2y ,


uxx uxy uyy (11.28)
e= p , f=p , g=p ,
1 + u2x + u2y 1 + u2x + u2y 1 + u2x + u2y
and hence
(1 + u2y )uxx − 2 ux uy uxy + (1 + u2x )uyy uxx uyy − u2xy
H= , K= . (11.29)
2 (1 + u2x + u2y )3/2 (1 + u2x + u2y )2


In fact, there are two unit normals at each point, namely ± n, and the choice of one of them
induces an orientation on S. For example, if S is a closed surface, one might consistently choose
the unit outward normal. Non-orientable surfaces, like Möbius bands, do not admit an everywhere
smooth choice of normal.

1/7/22 101 c 2022 Peter J. Olver


Observe that the minimal surface Euler–Lagrange equation (11.20) is, up to an irrelevant
factor of − 21 , the equation
H = 0. (11.30)
In other words, a surface is minimal, or, more accurately, a critical point for the surface
area functional, if and only if its mean curvature vanishes. This identification is the first
step in our solution to the minimal surface equation.
Remark : It is not hard to prove that the mean and Gauss curvatures are invariant under
rigid motions — translations and rotations of three-dimensional space. In other words, if
one translates or rotates the surface, the curvatures are the same at the image point. A
much deeper theorem, due to Gauss, is that the Gauss curvature is intrinsic, whereas the
mean curvature depends on how the surface is embedded in three-dimensional space, [24].
The next step in our analysis is to introduce special coordinates on the surface.
Definition 11.3. The parameters (λ, µ) ∈ U ⊂ R 2 are called isothermal coordinates
on the surface S parametrized by x(λ, µ) if the corresponding metric coefficients (11.25)
satisfy E = G, F = 0; equivalently,
k xλ k2 = k xµ k2 = ρ2 , xλ · xµ = 0, (11.31)
for some positive function ρ(λ, µ) > 0.
Thus, isothermal coordinates thus serve to diagonalize the metric tensor (11.24):
ds2 = ρ2 (dλ2 + dµ2 ), (11.32)
which is thus conformally† equivalent to the flat metric. It is not hard to prove that one
can locally construct isothermal coordinates on any surface. One strategy is to (complex)
factorize the metric tensor (11.24):
1 p   p  
ds2 = E dξ + F + i E G − F 2 dη E dξ + F − i E G − F 2 dη .
E
Then, as a consequence of the basic theory of ordinary differential equations, [9], one can
construct a common complex integrating factor‡ σ for the two components, satisfying
 p 
σ(λ, µ) dλ + i dµ = E dξ + F + i E G − F 2 dη,
 p 
σ(λ, µ) dλ − i dµ = E dξ + F − i E G − F 2 dη.
Thus,
σ2    σ2
ds2 = dλ + i dµ dλ − i dµ = ρ2 dλ2 + dµ2 , where ρ2 = .
E E


Two metrics are said to be conformally equivalent if they measure identical angles, which
requires that they differ pointwise only by an overall scalar multiple.

Finding the integrating factor for the first component automatically implies that it is also an
integrating factor for the second component.

1/7/22 102 c 2022 Peter J. Olver


Next, differentiation of (11.31) produces

xλλ · xλ = xλµ · xµ , xλµ · xλ = xµµ · xµ , xλλ · xµ + xλµ · xλ = xλµ · xµ + xµµ · xλ = 0,

and hence
(xλλ + xµµ ) · xλ = (xλλ + xµµ ) · xµ = 0.

Thus, the vector xλλ + xµµ is orthogonal to both tangent vectors, and hence must be a
scalar multiple of the unit normal:

xλλ + xµµ = ω n, (11.33)


where ω(λ, µ) is a scalar-valued function. Moreover, using the formula for the mean cur-
vature in (11.26, 27), combined with (11.32), we find

ω (xλλ + xµµ ) · n
H= 2
= . (11.34)
2ρ 2 ρ2
In particular, if the surface is minimal, and hence H = 0, then (11.33, 34) imply

xλλ + xµµ = 0, (11.35)


T
meaning that the isothermal parametrization ( x(λ, µ), y(λ, µ), z(λ, µ) ) of the surface is
determined by a vector of harmonic functions, i.e., its components are individually solutions
to the Laplace equation (11.15).
With this in hand, we can produce an explicit parametrization of a general minimal
surface using the connection between complex analytic functions and harmonic functions;
see, for example, [2, 46]. We accordingly write

x(λ, µ) = Re z(ζ), (11.36)


T
where z = ( z1 , z2 , z3 ) is a vector of complex analytic functions depending on the com-
plex variable ζ = λ + i µ. By the standard rules of complex differentiation, the complex
derivative of z can be written as
1
z′ (ζ) = 2 (xλ − i xµ ).
Thus, taking the complex dot product of z′ with itself, meaning the sum of the squares
of its entries (not the Hermitian inner product, which involves a conjugate on the second
factor), we find

z′ · z′ = 14 (xλ − i xµ ) · (xλ − i xµ ) = 41 (xλ · xλ − xµ · xµ − 2 i xλ · xµ ) = 0, (11.37)


where the final equality is a direct consequence of the isothermality conditions (11.31).

Definition 11.4. A complex analytic vector-valued function z(ζ) is said to define a


minimal curve if it satisfies z′ · z′ = 0.

Thus, we have proved the following striking result:

1/7/22 103 c 2022 Peter J. Olver


Figure 21. Enneper’s Minimal Surface.

Theorem 11.5. Every minimal surface can be locally identified as the real part of a
minimal curve: x = Re z.
Remark : If z(ζ) defines a minimal curve, then one easily sees that so does e i t z(ζ) for any
t ∈ R, and hence any minimal curve defines a one-parameter family of minimal surfaces:
 
x(t; λ, µ) = Re e i t z(ζ) . (11.38)
Example 11.6. The function
 
z(ζ) = ζ − 13 ζ 3 , i ζ + 1
3 i ζ 3, ζ 2 has z ′ (ζ) = 1 − ζ 2 , i + i ζ 2 , ζ , (11.39)
and hence defines a minimal curve. The corresponding minimal surface is known as
Enneper’s surface and is one of a wide variety of classically studied examples. Its isother-
mal parametrization is given by

x(λ, µ) = Re z(λ + i µ) = − 31 λ3 + λ µ2 + λ, 31 µ3 − λ2 µ − µ, λ2 − µ2 . (11.40)
A part of it, corresponding to −2.4 ≤ λ, µ ≤ 2.4, is plotted in Figure 21. Enneper’s surface
is not simple, meaning that it has self-intersections, as the figure makes clear. Remark :
One can use formula (11.38) to construct a one-parameter family of Enneper surfaces,
but it turns out they can be mapped to each other by a rigid motion, and hence define
essentially the same surface.
Minimal curves were first investigated by the Norwegian mathematician Sophus Lie.
Weierstrass realized that they can be explicitly characterized as follows. Let f (ζ), g(ζ) be
scalar complex analytic functions defined on U ⊂ C. By direct calculation, the analytic
vector
    T
w(ζ) = 12 f (ζ) 1 − g(ζ)2 , 21 i f (ζ) 1 + g(ζ)2 , f (ζ) g(ζ) (11.41)

1/7/22 104 c 2022 Peter J. Olver


Figure 22. Costa’s Minimal Surface.

satisfies w · w = 0. Thus, if we set


Z
z(ζ) = w(ζ) dz, (11.42)

where we integrate component-wise and can use any constants of integration, then z′ (ζ)
satisfies (11.37) and hence defines a minimal curve. In fact, most minimal curves can be
so represented. To see this, set z′ = (z1′ , z2′ , z3′ )T . Assuming z1′ − i z2′ 6= 0, then

z3′
f = z1′ − i z2′ , g= , (11.43)
z1′ − i z2′

recovers the Weierstrass representation (11.41). For example, the Enneper surface obtained
from the minimal curve (11.39) has Weierstrass representation give by f (ζ) = 2, g(ζ) = 21 ζ.
By selecting various complex functions f, g, one can thereby explicitly construct a wide
variety of interesting minimal surfaces through formulas (11.41, 42). See [24; Chapter 22]
for further details, including plots and Mathematica Notebooks for computing a variety
of interesting examples.

Example 11.7. A particularly famous modern example of a minimal surface is the


Costa surface, constructed by the Brazilian mathematician Celso Costa in 1982, [15]. the
costa minimal surface has Weierstrass representation

2e 2π
f (ζ) = ℘(ζ), g(ζ) = , (11.44)
℘′ (ζ)

where ℘(ζ) is the Weierstrass elliptic function with moduli g2 = 4 e2 , g3 = 0, cf. [2, 41].
For the particular constant in the numerator of g, it can be proved that the Costa surface
is both simple and complete, meaning that it has no self-intersections and no boundary;
see Figure 22. (The soap film in Figure 3 is also a Costa surface.) Until its discovery only
a few such surfaces were known.

1/7/22 105 c 2022 Peter J. Olver


Image Processing
Variational principles are of great importance for a wide variety of tasks in modern day
image processing. Here we mention one of these — denoising; for more details, the reader
may consult [11, 12, 51].
A digital image or video is based on a collection of pixels. In the three-dimensional
case these are often referred to as voxels, but we will use “pixel” to describe all situations.
In a grayscale image, each pixel is assigned a number that indicates its gray level. After
possible rescaling, the gray level of a pixel ranges from 0, which corresponds to black, to
1, corresponding to white. Thus the value is a number in the interval [ 0, 1 ]. Or, more
correctly since everything is digitized, the gray level belongs to a discrete subset of the
interval, e.g., { k/n | k = 0, . . . , n }, where n is typically a power of 2. Similarly a color
image assigns a vector, say v = (r, g, b) ∈ [ 0, 1 ]3 ⊂ R 3 indicating the RGB (red, green,
blue) intensities of the pixel or v = (c, m, y, k) ∈ [ 0, 1 ]4 ⊂ R 4 for a CMYK (cyan, magenta,
yellow, black) image.
Just as in fluid and solid mechanics, where the discrete physical system consisting of
individual molecules is effectively modeled by viewing it as a continuum, one similarly
replaces the discrete pixels of a digital image by a continuum. Ultimately, when solving
the continuum equations numerically, one re-discretizes using a different set of tools. The
passage from discrete to continuous back to discrete has proven to be a very fruitful and
insightful means of understanding the many phenomena of physical systems.
In the same vein, it is helpful to identify a grayscale image with a function u: Ω → [ 0, 1 ]
whose value 0 ≤ u(x) ≤ 1 at a point x = (x1 , . . . , xn ) ∈ Ω indicates the gray level of the
point. Here Ω ⊂ R n is a bounded, connected domain, and n = 2 for planar images (photos),
n = 3 for three-dimensional images and videos, in which the third coordinate is time, n = 4
for three-dimensional videos, while n = 1 is often a good testing ground for algorithms. In
the two-dimensional case, Ω is usually a rectangle, or, in higher dimensions, a rectangular
box. For color images, the only difference is that u: R n → R m is vector-valued. Here, to
simplify the exposition, we restrict our attention to grayscale images.
A basic task in image processing is removing extraneous noise from a raw image, a process
known as denoising or image restoration. A variety of methods have been developed for
this purpose; here we summarize an effective approach that relies on a variational problem.
Let f : Ω → R represent the original image, and u: Ω → R the sought-after denoised
version. Here, for specificity, we restrict our attention to planar images, so Ω ⊂ R 2 ,
although the techniques apply in general. A key observation is that noise occurs where the
derivatives of the image function are large, i.e., k ∇f k ≫ 0, where k · k is any convenient
norm on R n , e.g., the standard Euclidean norm. Actually, the gradient vector is also large
at edges of objects in the image, where the grayscale value has a sudden transition. One
of the main challenges in denoising is to remove the noise from the image while retaining
the bona fide boundaries of objects without significant blurring or alteration.
Thus, one could try to reduce the noise by minimizing a functional depending on some
power of the norm of the gradient, e.g.,
ZZ
J0 [ u ] = k ∇u kσ dx dy. (11.45)

1/7/22 106 c 2022 Peter J. Olver


For example, if σ = 2, then (11.45) coincides (up to a factor) with the Dirichlet functional
(11.14), whose Euler–Lagrange equation is the Laplace equation (11.15). However, as it
stands (11.45) contains no information on the image we are trying to denoise. We include
this by adding in a fidelity term, leading to the objective functional of the form†
ZZ
 
Jf [ u ] = k ∇u kσ + λ (u − f )2 dx dy, (11.46)

where λ > 0 is a parameter that measures the relative importance of the two terms and
can be adjusted to optimize the result. The first term in the Lagrangian seeks to minimize
any large gradient-induced noise, while the second term strives to stay close to the original
image. At the minimizer, they adjust to a denoised version u that is close the original image
f . One must also impose suitable boundary conditions on ∂Ω. The most common choice
is the natural boundary conditions which, in the case of (11.46), are the homogeneous
Neumann conditions that the normal derivative of the function vanish on the boundary:
∂u/∂n = 0 on ∂Ω.
For example, in the case σ = 2, (11.46) is a modified version of the Dirichlet functional
and the corresponding Euler–Lagrange equation is (after removing an overall factor of 2)
the following linear second order partial differential equation:
− ∆u + λ u = λ f. (11.47)
This is an inhomogeneous version of the well-studied Helmholtz equation, in which λ plays
the role of an eigenvalue. The homogeneous Helmholtz equation arises when applying the
method of separation of variables to many fundamental second order partial differential
equations, including heat, wave, Laplace, and so on, [43]. However, it has been observed
that, while the solution to (11.47) does a good job removing the unwanted noise, it tends
to blur edges to an unacceptable extent, and so this functional is not used as much in
current practice.
A minimization principle that has been found to perform better is obtained by setting
σ = 1 in (11.46). The resulting objective functional is known as the total variation (TV),
and has the form ZZ
 
Jtv [ u ] = k ∇u k + 21 λ (u − f )2 dx dy, (11.48)

A short calculation shows that the corresponding Euler–Lagrange equation is the following
nonlinear second order partial differential equation:
∇u
− div + λ u = λ f, (11.49)
k ∇u k
which, in the planar case, has the explicit form
u uy − u2y uxx + 2 ux uy uxy − u2x uyy
− Dx p x −Dy p +λ u = +λ u = λ f. (11.50)
u2x + u2y u2x + u2y (u2x + u2y )3/2


One could also try other powers in the fidelity term, but this has no appreciable effect on
performance.

1/7/22 107 c 2022 Peter J. Olver


1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 23. TV Denoising of a One Dimensional Signal.

However, the TV Lagrangian


p 2
L(x, y, u, p, q) = p2 + q 2 + 21 λ u − f (x, y)

is not differentiable when p = q = 0, which is the cause of the singular denominator when
∇u = 0 in the Euler–Lagrange equation (11.50). These singularities causes both analytic
and numerical difficulties that interfere with the efficacy of the method. To avoid them,
one can regularize the TV functional, using
Z Z hp i
Jε [ u ] = k ∇u k2 + ε2 + 21 λ (u − f )2 dx dy, (11.51)

where ε2 > 0 is a small parameter. In practice, one tries to adjust ε to be as small as


feasible without causing undesired numerical instabilities.
The associated regularized Euler–Lagrange equation for TV denoising is

∇u
− div p + λ u = λ f, (11.52)
k ∇u k2 + ε2
or, in full,
− (u2y + ε2 )uxx + 2 ux uy uxy − (u2x + ε2 )uyy
+ λ u = λ f. (11.53)
(u2x + u2y + ε2 )3/2

Observe that if one rescales u 7−→ ε u then the initial term in the regularized functional
(11.51) is ε times the surface area functional (11.19), and so the second order terms in
the Euler–Lagrange equation (11.52) form a rescaled version of the minimal surface equa-
tion (11.20). Figure 23 shows the effect of TV denoising of a one-dimensional noisy signal
— the equation is obtained from (11.53) by eliminating all the y derivative terms; observe
how the sharp discontinuities (edges) are retained while the noise is smoothed out. Fig-
ure 24 shows the effect of TV denoising on an image of Vincent Hall, the home of the

1/7/22 108 c 2022 Peter J. Olver


Figure 24. TV Denoising of Vincent Hall Image.

1/7/22 109 c 2022 Peter J. Olver


School of Mathematics of the University of Minnesota† . The upper noisy image has been
reasonably well restored in the middle image, at a fairly large value of λ, but has become
noticeably blurred in the bottom image, when λ is small and the fidelity term has minimal
impact.
Gradient Descent
In practice, one proceeds, not by trying to solve the Euler–Lagrange equation (11.49), but
by approximating the solution through gradient descent, [35, 47]. Recall that the gradient
∇F of an objective function F : R n → R points in the direction of its steepest increase,
whereas its negative − ∇F is in the direction of steepest decrease. Thus, starting at a given
point x0 , in order to minimize F , one should consistently move in the direction − ∇F to
eventually reach — or, more accurately, come arbitrarily close to — the minimum — or,
more accurately, reach either a local minimum or, in rare cases, another type of critical
point, or become unbounded. If one were hiking in the mountains, this would correspond
to consistently walking in the direction that is steepest downhill, eventually reaching the
bottom of a valley if one does not wander off to infinity or end up at a mountain pass (a
saddle point), although the latter can be avoid by slightly perturbing the route. In the
continuous version, gradient descent takes the form of an initial value problem for system
of ordinary differential equations
dx
= − ∇F (x), x(0) = x0 , (11.54)
dt
known as the gradient descent flow . The solution starts out at position x0 and, provided
it remains bounded, asymptotically approaches a critical point where ∇F = 0. If the
critical point is a local minimum, then it is stable under gradient flow, and all nearby
solutions will also end up there. In practical implementations, one replaces the continuous
gradient descent flow (11.54) by a discretized version, in which one computes a series of
approximations xn for n = 0, 1, 2, . . . . To locate the next iterate, one moves a small
amount in the gradient direction, so

xn+1 = xn − tn ∇F (xn ), (11.55)


where the “step sizes” tn ∈ R are suitably chosen to try to optimize the descent and avoid
numerical instabilities. Further refinements, including conjugate gradients and the like,
[49], are the subject of current research activity.
In the infinite dimensional case under consideration here, the objective function becomes
a functional, J[ u ], specified by a Lagrangian L, and the role of the gradient is played by
the functional gradient or Euler–Lagrange expression E(L). Thus, the gradient descent
flow (11.54) becomes an evolutionary partial differential equation
∂u
= − G(u), where G = E(L), (11.56)
∂t


These images are copied from Jeff Calder’s lecture notes, [ 11 ], with the author’s permission.

1/7/22 110 c 2022 Peter J. Olver


typically of parabolic type, [43]. The initial condition u|t=0 can be taken to be the original
image f . Thus, for example, the gradient descent flow (11.56) for the Dirichlet functional
(11.14) is the heat equation
∂u
= ∆u, (11.57)
∂t
while for the modified denoising functional (11.46), it is a linear modification with the
original image providing a forcing term:
∂u
= ∆u + λ (f − u). (11.58)
∂t
The smoothing out (denoising) of initial data is a well known property of parabolic partial
differential equations like the heat equation, [43]. Finally, for the regularized TV func-
tional (11.51), gradient descent takes the form of a nonlinear parabolic partial differential
equation
∂u ∇u
= div p + λ (f − u). (11.59)
∂t k ∇u k2 + ε2
When λ = 0 and ε = 1, the equation (11.59) represents gradient descent for the surface
area functional (11.19), whose solutions, when subject to suitable boundary conditions,
will converge to minimal surfaces as t → ∞. As in the finite-dimensional case, one one
replaces the continuous flow (11.56) by a discrete iterative equation of the form

u(n+1) = u(n) − tn G(u(n) ), (11.60)


When used for image denoising, one then has the option of stopping the discrete updates as
soon as the image is sufficiently noise-free, and before features such as edges have become
degraded through blurring.

12. Noether’s Theorems.


In a remarkable and extremely influential paper published in 1918, [40], Emmy Noether
proved two general theorems relating symmetries of variational problems and conservation
laws of their Euler–Lagrange equations. In the case of variational problems involving a
single independent variable, a conservation law is the same as a first integral, meaning a
function that is constant on each solution to the Euler–Lagrange equation. An example
is the Hamiltonian function for an x-independent Lagrangian; see Theorem 3.2. In this
section, we will review these results, initially in the simplest context of a scalar first order
variational problem. More generally, symmetry under time translations corresponds to
conservation of energy, symmetry under space translations corresponds to conservation
of (linear) momentum, while symmetry under rotations corresponds to conservation of
angular momentum. Further physical (and non-physical) conservation laws arise from
other types of symmetries.
We begin with an elementary introduction to groups and one-parameter transformation
groups, specializing to symmetry groups of a variational problem and their relationship
to their first integrals/conservation laws. The general version of Noether’s First Theorem,
based on the presentation in [42], follows. Noether’s Second Theorem, which we will not

1/7/22 111 c 2022 Peter J. Olver


cover, relates degeneracies of the Euler–Lagrange equation to infinite-dimensional varia-
tional symmetry groups depending on arbitrary functions of the independent variables. An
example of the latter is the dependency (5.21) among the Euler–Lagrange equations for
a parametric variational problem, where the symmetry group is the infinite-dimensional
group of reparametrization maps. We refer the reader to [42] for a definitive account
including complete proofs of both Theorems. See [30] for their strange history, as well as
an English translation of Noether’s original paper, [40]; further historical remarks can be
found in [45].
Lie Groups and Symmetry Groups
Recall first that a group is a set G that has an associative but not necessarily commu-
tative multiplication operation, which will be (mostly) denoted by g · h ∈ G for group
elements g, h ∈ G. The group must also contain a (necessarily unique) identity element,
denoted e, and each group element g has an inverse g −1 satisfying g · g −1 = g −1 · g = e.
The simplest example of all is the group that contains only the identity element, which is
its own inverse.
Example 12.1. A very basic example of a group is the real numbers R with addition as
the group operation. The identity is 0 and the inverse of x ∈ R is − x. The integers Z ⊂ R
forms a subgroup, meaning a subset that forms a group in its own right by inheriting the
group operations from the larger group. Note that the group operation on R, and hence on
Z, is commutative: x + y = y + x; commutative groups are often called abelian, in honor
of the early nineteenth century Norwegian mathematician Niels Henrik Abel, who applied
group theory to the solution to algebraic equations† . Another abelian group is R n , with
vector addition as the group operation.
Example 12.2. A basic example of a non-abelian (non-commutative) group is the
group of invertible n × n matrices, called the general linear group of order n, and denoted

GL(n) = { A an n × n matrix | det A 6= 0 } .


The group operation is matrix multiplication. The identity is the identity matrix I, so the
inverse is the usual matrix inverse.
Example 12.3. Symmetries of geometric objects, meaning transformations that map
the object back to itself, necessarily form groups because the composition of two symmetries
is a symmetry, as is the inverse of any symmetry. The identity symmetry is the identity
map, that does nothing to the object and so trivially defines a symmetry.
For example, the rotational symmetries of a square form a 4 element group, consisting
of rotations through 0◦ (the identity), 90◦ , 180◦ and 270◦ . If one further allows reflections,
then there are 4 additional symmetries: the reflections through the 2 axes and the 2


In particular, Abel was the first to prove that one could not solve a general scalar polynomial
equation, p(x) = 0, of degree ≥ 5 using algebraic operations: addition, subtraction, multiplication,
division, and taking roots.

1/7/22 112 c 2022 Peter J. Olver


diagonals. The 4 element rotation group is abelian, but the 8 element rotation/reflection
symmetry group is not.
On the other hand, the rotational symmetry group of a circle contains every rotation of
the plane around its center. Rotations are parametrized by their angle θ, which is well-
defined modulo 2 π, and so this group can be identified with the unit circle S 1 . The group
operation of composition of rotations corresponds to addition of their angles, θ + ϕ, again
modulo 2 π, and hence this group is abelian; the identity is the rotation through 0◦ , and
the inverse of a rotation through θ is the rotation
 throughangle − θ. Alternatively one
cos θ − sin θ
can identify a rotation with its matrix A = that acts linearly on vectors
sin θ cos θ
in R 2 . Rotation matrices can be characterized by the conditions AT A = I and det A = 1,
and form the special orthogonal group of degree 2, denoted SO(2) ≃ S 1 . Dropping the
determinantal condition produces the non-abelian orthogonal group O(2), consisting of
rotational and reflectional symmetries of the circle. Both SO(2) and O(2) are realized as
subgroups of GL(2).
In general, the orthogonal group of order n is the subgroup

O(n) = A ∈ GL(n) AT A = I .
of rotational and reflectional symmetries of an n − 1-dimensional sphere S n−1 ⊂ R n . Note
that the orthogonality condition implies either det A = +1, in which case A is a rotation,
or det A = −1, in which case A is a reflection possibly combined with a rotation. The
special orthogonal group, consisting only of the rotations, is thus

SO(n) = { A ∈ O(n) | det A = 1 } ,


and is non-abelian if n ≥ 3. (As the student can check by physically manipulating an
object, e.g., a book, rotations in three-dimensional space do not, in general, commute.)
Groups that depend on continuous parameters are known as Lie groups (pronounced
“Lee”), in honor of the late nineteenth century Norwegian mathematician Sophus Lie,
who, in part inspired by Abel, introduced them to solve differential equations. More
rigorously, [42], a Lie group is a group that is also a smooth manifold such that the
group multiplication and inversion are smooth maps. In particular, a Lie group carries a
topology, i.e., a notion of open subsets. A Lie group is connected if it forms a connected
topological space, meaning that it cannot be written as the disjoint union of two nonempty
open subsets. For example, it can be proved that SO(n) is connected, but O(n) is not,
since it consists of two disjoint open subsets, namely the rotations and the reflections. The
general linear group is also a Lie group — the parameters are the matrix entries — but
is not connected. On the other hand, the subgroup GL+ (n) = { A ∈ GL(n) | det A > 0 },
consisting of all orientation-preserving linear maps, is connected.
Remark : Finite groups, like the symmetry group of a square, or discrete groups, like Z,
are, technically, zero-dimensional Lie groups. However, none of them, with the exception of
the trivial group consisting only of the identity element, G = { e }, are connected. Moreover,
one cannot use the calculus-based tools of importance in Lie theory. The groups that play
a role in Noether’s Theorems are the connected Lie groups of dimension ≥ 1.

1/7/22 113 c 2022 Peter J. Olver


Transformation Groups and Their Infinitesimal Generators

In fact, in an introductory treatment, we need only consider connected one-parameter


groups, meaning that they depend on a single real parameter, or, equivalently, are one-
dimensional manifolds (curves). It is not difficult to show that, up to isomorphism† , there
are, in fact, only two such groups, namely R and SO(2) ≃ S 1 , the only difference being
whether or not we do addition modulo 2 π. Consequently, we will not lose anything by
only considering R from here on.

Definition 12.4. A transformation group is determined by a Lie group G acting on a


space M via a map Φ: G × M → M , denoted by Φ(g, x) = g · x, which satisfies

e · x = x, g · (h · x) = (g · h) · x, for all x ∈ M, g ∈ G. (12.1)

In particular, g −1 · g · x = e · x = x, and hence g −1 determines the inverse transformation


to g. In applications, M is a smooth manifold, e.g., M = R n or an open subset thereof,
and the action is defined by a smooth map Φ with the above properties.

Example 12.5. An evident example is provided by the usual linear action of the
general linear group GL(n), acting by matrix multiplication on column vectors, whereby
Φ(A, x) = A x for A ∈ GL(n) and x ∈ Rn . This action clearly induces a linear action
of any subgroup of GL(n, R) on R n . In fact most, but not all, Lie groups can be real-
ized as subgroups G ⊂ GL(n) of a general linear group of some order n. A linear action
of G is known as a representation, and representation theory is a well developed field of
mathematics with numerous applications throughout mathematics and physics, particu-
larly in quantum mechanics, [13, 54]. On the other hand, nonlinear actions are also of
great importance in applications, including Noether’s Theorems.

As noted above, for our purposes we need only consider the action of the connected one-
parameter group R, which acts on a space M as a one-parameter transformation group via
a map Φ: R × M → M satisfying‡

Φ(s, Φ(t, x)) = Φ(s + t, x), Φ(0, x) = x, s, t ∈ R, x ∈ M. (12.2)

Some basic examples can be found below.


There is a close correspondence between one-parameter transformation groups and ordi-
nary differential equations. To explain this, we assume that M = R n , although extending
these constructions to general manifolds is not difficult, [42]. Namely, every one-parameter

† b are called isomorphic, written G ≃ G, e if there is an invertible map


Two groups G and G
F:G → G e that preserves the group operations: F (g · h) = F (g) · F (h). Isomorphic groups are, for
all practical purposes, identical.

These conditions also govern actions of the circle group SO(2), where now s, t denote angles
and so are defined modulo 2 π. But any action of SO(2) can be viewed as an action of R — just
identify Φ(t + 2 n π, x) for n ∈ Z with Φ(t, x) — and so we need only consider actions of R.

1/7/22 114 c 2022 Peter J. Olver


transformation group arises as the solution to an autonomous system of first order ordinary
differential equations
dx
= ξ(x), x ∈ M, t ∈ R. (12.3)
dt
Let Φ(t, x0 ) = x(t) denote the solution with initial condition x(0) = x0 . In other words
(dropping the 0 subscript on the initial data),
∂Φ 
(t, x) = ξ Φ(t, x) , Φ(0, x) = x. (12.4)
∂t
Assuming ξ(x) is continuously differentiable, the basic existence and uniqueness theorems
for first order systems imply that Φ(t, x) satisfies the transformation group axioms (12.2);
details are left as an exercise for the reader. Vice versa, given a one-parameter group, one
constructs the corresponding system of ordinary differential equations by setting t = 0 in
the first equation in (12.4):
∂Φ
ξ(x) = (0, x). (12.5)
∂t
The right hand side of (12.3) can be viewed as a vector field on M , and the associated one-
parameter transformation group is the flow induced by the vector field. In fluid mechanics,
ξ(x) represents the velocity vector field of a steady state fluid flow, where the fluid particles
follow the paths x(t) defined by the solutions. In transformation group theory, the vector
field is known as the infinitesimal generator of the group action. This is because the group
transformations or, equivalently, the solutions to the ordinary differential equation (12.3)
for t near 0 have the Taylor expansion

Φ(t, x) = x + t ξ(x) + · · · ,
and hence ξ(x) prescribes the “infinitesimal” motion of the point x under the group action
or flow. In particular, if ξ(x0 ) = 0, then x0 is a fixed point for the transformation group,
with Φ(t, x0 ) = x0 for all t ∈ R. In particular, if ξ(x) ≡ 0, then all points are fixed and
the action is trivial — the group consists of only the identity transformation.
Example 12.6. Let M = R 2 . The simplest nontrivial one-parameter transformation
group is the group of translations in the horizontal direction:

Φ t, (x, y) = (x + t, y). (12.6)
Differentiating, as in (12.5), we see that the corresponding vector field has ξ(x, y) = (1, 0).
It is easily verified that these translations, x(t) = x + t, y(t) = y, form the solution to the
dx dy
initial value problem = 1, = 0, with initial conditions x(0) = x, y(0) = y. More
dt dt
generally, translations in the fixed direction (a, b) ∈ R 2 are given by

Φ t, (x, y) = (x + t a, y + t b) with infinitesimal generator ξ(x, y) = (a, b).
The planar rotation group is given by

Φ t, (x, y) = (x cos t − y sin t, x sin t + y cos t), (12.7)

1/7/22 115 c 2022 Peter J. Olver


where t ∈ R denotes the rotation angle. Differentiating with respect to t and setting
t = 0, we find that the corresponding vector field or infinitesimal generator is given by
ξ(x, y) = (− y, x). As before, the rotations form the solution to the associated system of
dx dy
ordinary differential equations = − y, = x. Note that the origin is a fixed point
dt dt
under rotations since ξ(0, 0) = (0, 0).
The one complication in this correspondence between transformation groups and or-
dinary differential equations is that the solutions to the system of ordinary differential
equations (12.3) may not be defined for all t ∈ R. For example, given the planar vector
field ξ(x, y) = (x2 , y), the solution to the associated system
 
dx 2 dy   x t
=x , = y, is Φ t, (x, y) = x(t), y(t) = , ey ,
dt dt 1 − tx
which is not defined when t = 1/x for x 6= 0. Nevertheless, the solutions satisfy the
transformation group axioms (12.2) where defined, i.e., for t, s sufficiently close to 0, or,
equivalently, for transformations that are sufficiently close to the identity.
Such considerations lead one to introduce the concept of a local one-parameter trans-
formation group, where the group axioms hold where defined and not necessarily for all
t, s ∈ R, although the identity axiom, when t = 0, is required everywhere; this is the origi-
nal version that Lie developed, and full details can be found in [42]. There is a one-to-one
correspondence between local one-parameter transformation groups and autonomous first
order systems of ordinary differential equations, or, equivalently vector fields, which serve
as the infinitesimal generators.
Given a (local) transformation group, a real-valued function I: M → R is called an
invariant if its values are unaffected by the group action:

I(g · x) = I(x) for all g ∈ G, x ∈ M. (12.8)


More generally, I may only be defined on an open subset of M , and/or the group action
may be local, so the invariance condition (12.8) is only required to hold when well-defined.
In the sequel, we will only be interested in smooth invariants, meaning that I(x) has
continuous derivatives up to some order.
For example, I(x, y) = y is an invariant for the translation group (12.6), as is any
p
e y) = F (y). Similarly the radius I(x, y) = r =
function thereof: I(x, x2 + y 2 is an
invariant for the rotation group (12.7). Again any function of the radius is also invariant.
e
In general, if I1 (x), . . . , Ik (x) are invariants, so is I(x) = F (I1 (x), . . . , Ik (x)) when F is
any function.
For a general one-parameter transformation group, invariance of I(x) requires†

I Φ(t, x) = 0 for all t ∈ R.


If the group is a local transformation group, the requirement only holds for those values of t
where defined.

1/7/22 116 c 2022 Peter J. Olver


u U

g
7−→

x X

Figure 25. Group Action on the Graph of a Function.

Differentiating with respect to t, setting t = 0, and using the chain rule, we deduce that
n
X ∂
v(I) = 0 where v= ξi (x) . (12.9)
i=1
∂xi
The expression on the right hand side is a convenient differential geometric notation for
the vector field defined by the function ξ(x) = (ξ1 (x), . . . , ξn (x)). Indeed, from here on we
will identify vector fields and first order linear partial differential operators as in (12.9).
Given a transformation group G acting on M , there is an induced action on subsets
S ⊂ M . Namely, we set g · S = { g · x | x ∈ S }. If g · S = S, we call g ∈ G a symmetry of
the subset S. In particular, the identity element e is a symmetry of every subset. Moreover,
the set of symmetries of a subset S is easily seen to be a subgroup of G, known as the
symmetry subgroup of S. In general, if every element g ∈ G is a symmetry of a subset
S, we call S a G-invariant subset. If I: M → R is an invariant function, then every level
set, Sc = { I(x) = c } for any c ∈ R, is a G-invariant subset. For the rotation group SO(2)
acting on the plane, the circles (and the origin) are the level sets of the radius invariant
and clearly rotationally invariant.
Action on Functions
We are particularly interested in the action of groups on the graphs of functions. In
the simplest case, consider a one-parameter transformation group action on the plane, so
M = R 2 . We use coordinates (x, u), regarding x as the independent variable  and u as the
dependent variable. We identify a function u = f (x) with its graph Γf = (x, f (x)) ⊂
R 2 . The group transformations act on the graph as above, and we identify the transformed
graph g · Γf = Γg·f with the graph of the transformed function F = g · f ; see Figure 25.
In formulas, the group will act on R 2 plane via transformations of the form
X = A(t, x, u), U = B(t, x, u), (12.10)

where t ∈ R is the group parameter, while A, B are the components of Φ t, (x, u) . Each
point (x, f (x)) ∈ Γf in the graph of u = f (x) will thus be mapped to the point
X = A(t, x, f (x)), U = B(t, x, f (x)), (12.11)

1/7/22 117 c 2022 Peter J. Olver


which, for fixed t, belongs to the graph ΓFt of the transformed function U = Ft (X). Thus,
(12.11) serves to parametrize the graph of F , with x serving as the parameter. And hence,
to find the formula for U = Ft (X), we must eliminate x from the two equations.
Example 12.7. Consider the one-parameter group SO(2) of rotations

Φ t, (x, u) = (x cos t − u sin t, x sin t + u cos t). (12.12)
Such a rotation transforms a function u = f (x) by rotating its graph. Therefore, the
transformed graph will be the graph of a well-defined function provided the rotation angle
t is not too large. Thus, in general, the group action on functions may only be a local
action, defined for group elements sufficiently close to the identity.
Fixing the rotation angle t, the equation for the transformed function is given in implicit
form
X = x cos t − f (x) sin t, U = x sin t + f (x) cos t,
so that U = Ft (X) is found by eliminating x from these two equations. For example, if
f (x) = a x + b is affine, i.e., its graph is a straight line, then the transformed function is
also affine, and given explicitly by
sin t + a cos t b
Ft (X) = X + ,
cos t − a sin t cos t − a sin t
which is well defined provided cot t 6= a, i.e., provided the graph of f has not been rotated
to be vertical.
Before proceeding further, let us formally state the definition of the total derivative of a
function F (x, u, p, q, . . .) that depends on the independent variable x, the dependent vari-
able u, and, possibly, the derivative variables: p, representing u′ , along with q representing
u′′ , and so on. Specifically the total derivative of F is equal to its derivative with respect
to x when u represents a function of x, etc., and is denoted
∂F ∂F ∂F
Dx F (x, u, p, q, . . . ) = +p +q + ··· , (12.13)
∂x ∂u ∂p
where the terms on the right hand side represent the usual partial derivatives of F with
respect to its arguments. Thus, if u = f (x), then
d
F (x, f (x), f ′(x), f ′′ (x), . . . ) = Dx F (x, f (x), f ′(x), f ′′ (x), . . . ).
dx
Now, the one-parameter transformation group will also act on the derivatives of the
function u = f (x), by mapping the tangent line to its graph at a point (x, f (x)) to the
tangent line to the graph of (X, Ft (X)) at the transformed point (12.11). The formulae
are found by implicit differentiation, so
d ∂B ∂B
B(x, f (x), t) (x, f (x), t) + (x, f (x), t) f ′(x)
dU dx ∂x ∂u
= = .
dX d ∂A ∂A
A(x, f (x), t) (x, f (x), t) + (x, f (x), t) f ′(x)
dx ∂x ∂u
1/7/22 118 c 2022 Peter J. Olver
Note that the numerator and denominator are the total derivatives of B, A, respectively,
evaluated at (x, f (x), f ′(x), t). Thus, the so-called prolonged transformation rule for the
derivative coordinate p that represents f ′ (x) is a linear fractional transformation
Dx B(t, x, u, p) B (t, x, u) + Bu (t, x, u) p
P = C(t, x, u, p) = = x , (12.14)
Dx A(t, x, u, p) Ax (t, x, u) + Au (t, x, u) p
where Dx denotes the total derivative (12.13), and we use subscripts to denote partial
derivatives of A, B. Note that the transformed derivative is not defined when Dx A = 0;
these are precisely the points at which the transformed graph has a vertical tangent and
we can no longer solve for U as a single-valued smooth function of X.
The formulas for the transformed higher order derivatives can be found similarly, but
rapidly become quite complicated. On the other hand, the infinitesimal generator of the
prolonged action has a relatively easy formula that can be readily extended to higher
order and higher dimensions. First, differentiating (12.10) with respect to t and setting
t = 0 produces the entries of the infinitesimal generator (vector field) of the original one-
parameter transformation group:
∂ ∂
v = ξ(x, u) + ϕ(x, u) , (12.15)
∂x ∂u
where
∂A ∂B
ξ(x, u) = (0, x, u), ϕ(x, u) = (0, x, u), (12.16)
∂t ∂t
in accordance with (12.5), and where we adopt the vector field notation in (12.9). Thus,
to find the infinitesimal change in the derivative coordinate p we differentiate (12.14) with
respect to t and set t = 0, which yields
 
∂C ∂ϕ ∂ϕ ∂ξ ∂ξ
π(x, u, p) = (0, x, u, p) = +p − − p2 . (12.17)
∂t ∂x ∂u ∂x ∂u
This is appended to the original infinitesimal generator (12.15) to form its first prolongation
∂ ∂ ∂
v∗ = ξ(x, u) + ϕ(x, u) + π(x, u, p) , (12.18)
∂x ∂u ∂p
which is a vector field on R 3 , with underlying coordinates x, u, p. The corresponding
prolonged group transformations can be recovered by solving the associated system of
ordinary differential equations
dx du dp
= ξ(x, u), = ϕ(x, u), = π(x, u, p). (12.19)
dt dt dt
The formulas for the prolongations to higher order derivatives will be given, in complete
generality, below.
For later purposes, it will help to rewrite the prolonged vector field (12.18) in a more
convenient form. First, define the characteristic of v to be the function

Q(x, u, p) = ϕ(x, u) − p ξ(x, u). (12.20)

1/7/22 119 c 2022 Peter J. Olver


Then, using the formula (12.13) for the total derivative, the prolongation formula (12.17)
can be written in the form
   
∂Q ∂Q ∂Q ∂ϕ ∂ξ ∂ϕ ∂ξ
π(x, u, p) = Dx Q + q ξ = +p +q +qξ = − +p −p ,
∂x ∂u ∂p ∂x ∂x ∂u ∂u
(12.21)
where the terms involving q cancel out since ∂Q/∂p = − ξ. Moreover, we can write (12.18)
in the form
 
∗ ∂ ∂ ∂ ∂ ∂ ∗ +ξD ,
v =Q + (Dx Q) +ξ +p +q = vQ x (12.22)
∂u ∂p ∂x ∂u ∂p
where
∂ (1) ∂ ∂
vQ = Q , vQ = Q + (Dx Q) . (12.23)
∂u ∂u ∂p
In fact, as we will see, formulae (12.18, 23) can be straightforwardly adapted to determine
the higher order prolongations of the infinitesimal generator v. The differential operator
vQ is known as the evolutionary form of v, which indicates that it has no x component,
(1)
while vQ is its first prolongation, in accordance with the prolongation formula (12.21)
when ξ = 0, ϕ = Q. The evolutionary form vQ is not an ordinary vector field on R 2
because the characteristic Q also depends on the derivative coordinate p. Furthermore,
(1)
vQ is not an ordinary vector field on R 3 because Dx Q also depends on q. Such objects
are known in the literature as generalized vector fields, [42].

Variational Symmetries and First Integrals

With this in hand, let us investigate the action of one-parameter transformation groups
on variational problems. To begin, let us concentrate on the simplest case of a scalar first
order functional Z b
J[ u ] = L(x, u, u′ ) dx (12.24)
a

with Lagrangian L(x, u, p). Given a one-parameter transformation group as in (12.10),


we first form its prolongation (12.14) to find the action on the derivative coordinate p.
Furthermore, the transformed integrand has the form L(X, b U, P ) dX, where, according to
the usual change of variables for integrals,
 
d
dX = dA(t, x, f (x)) = A(t, x, f (x)) dx
dx
 
∂A ′ ∂A 
= (t, x, f (x)) + f (x) (t, x, f (x)) dx = Dx A t, x, f (x), f ′(x) dx.
∂x ∂u
b (X, U, P ) is related to the original Lagran-
We deduce that the transformed Lagrangian L t
gian L(x, u, p) according to the formula

b A(t, x, u), B(t, x, u), C(t, x, u, p) D A(t, x, u, p).
L(x, u, p) = L (12.25)
t x

1/7/22 120 c 2022 Peter J. Olver


Equivalently, since − t determines the inverse transformation,
x = A(− t, X, U ), U = B(− t, X, U ), p = C(− t, X, U, P ),
and hence

b (X, U, P ) = L A(− t, X, U ), B(− t, X, U ), C(− t, X, U, P ) D A(− t, X, U, P ). (12.26)
L t x

The one-parameter transformation group is a symmetry group of the variational problem


if all its transformations leave the Lagrangian invariant. This requires

L A(t, x, u), B(t, x, u), C(t, x, u, p) Dx A(t, x, u, p) = L(x, u, p). (12.27)
Remark : Unlike invariants, a function of an invariant Lagrangian is not an invariant
Lagrangian owing to the factor Dx A in (12.27). (In the language of differential invariants,
an invariant Lagrangian is a “relative invariant”.) On the other hand, if one multiplies an
invariant Lagrangian by any invariant function, one obtains another invariant Lagrangian.
Functions invariant under the prolonged group action are known as differential invariants,
and have been deeply investigated since the time of Sophus Lie, [42].
Differentiating the invariance condition (12.27) with respect to t, setting t = 0, and
using (12.16, 17) produces the infinitesimal invariance criterion
v∗ (L) + L Dx ξ = 0. (12.28)
Vice versa, any Lagrangian that satisfies this condition is invariant under the connected
one-parameter transformation group generated by v. Equivalently, in view of (12.18), this
implies
(1) (1)
0 = vQ (L) + ξ Dx L + L Dx ξ = vQ (L) + Dx (ξ L). (12.29)
On the other hand, we can write
     
(1) ∂L ∂L ∂L ∂L ∂L ∂L
vQ (L) =Q +(Dx Q) =Q − Dx +Dx Q = Q E(L)+Dx Q .
∂u ∂p ∂u ∂p ∂p ∂p
(12.30)
Substituting back into (12.28–29), and rearranging terms, we deduce Noether’s identity
   
∗ ∂L ∂L
v (L) + L Dx ξ = Q E(L) + Dx Q + ξ L = (ϕ − p ξ) E(L) + Dx ϕ −ξH ,
∂p ∂p
(12.31)
where
∂L
H(x, u, p) = p −L
∂p
is the Hamiltonian associated with L, as in (9.12). Thus, if we have an infinitesimal
symmetry of the variational problem, so that the left hand side of (12.31) vanishes, then
∂L
Dx I = Q E(L) where (x, u, p) (12.32)
I(x, u, p) = ξ(x, u) H(x, u, p) − ϕ(x, u)
∂p
is a first integral of the Euler–Lagrange equation. Indeed, Dx I = 0, and hence I is is
conserved, meaning that it remains constant, independent of x, whenever u(x) is a solution
to the Euler–Lagrange equation E(L) = 0.

1/7/22 121 c 2022 Peter J. Olver


Example 12.8. If L(x, p) is independent of u, then the variational problem is invariant
under the one-parameter group of translations (x, u) 7−→ (x, u + t), with infinitesimal
generator v = ∂/∂u, so that ξ = 0, ϕ = 1. thus the first integral is ∂L/∂p = c. Indeed,
the Euler–Lagrange equation in this case is simply Dx (∂L/∂p) = 0.
Similarly, if L(u, p) is independent of x, then the symmetry group is translations in x,
so (x, u) 7−→ (x + t, u), with infinitesimal generator v = ∂/∂x, so that ξ = 1, ϕ = 0. Thus,
the first integral is the Hamiltonian H, and we recover Theorem 3.2.
Finally, consider the rotation group SO(2) with infinitesimal generator
∂ ∂ ∂
v∗ = − u +x + (1 + p2 ) .
∂x ∂u ∂p
Any Lagrangian of the form
p
L(x, u, p) = F (x2 + u2 ) 1 + p2
satisfies the infinitesimal invariance criterion (12.28) which, in this case, is

v∗ (L) − p L = 0.
Equation (12.32) produces the first integral

∂L (x − u p) F (x2 + u2 )
I(x, u, p) = x (x, u, p) + u H(x, u, p) = p ,
∂p 1 + p2
which is constant when u = f (x) and p = f ′ (x), where f is any solution to the Euler–
Lagrange equation. This effectively reduces the Euler–Lagrange equation to the first order
ordinary differential equation I x, f (x), f ′(x) = c where c is the constant of integration.
If one goes to polar coordinates x = r cos θ, u = r sin θ, and parametrizes the solution by
θ = h(r), then the equation reduces to

2 r 2 h′ (r)
p = c. (12.33)
1 + r 2 h′ (r)2
The latter equation can be explicitly integrated by solving for h′ (r). Indeed, in polar
coordinates, the original variational problem has the simpler Lagrangian
p
L(r, θ, q) = F (r) 1 + r 2 q 2 ,
where q represents θ ′ . This is independent of θ and hence the first integration method
produces the first integral (12.33). The fact that one can solve the resulting first order
ordinary differential equation by quadrature is a manifestation of Lie’s theory of integration
of ordinary differential equations possessing a one-parameter symmetry group, [42].

The General Case


Let us now present the general form of Noether’s First Theorem, relating symmetries of
a variational problem and conservation laws of the Euler–Lagrange equations.

1/7/22 122 c 2022 Peter J. Olver


A general system of (partial) differential equations involves p independent variables x =
(x1 , . . . , xp ), and q dependent variables u = (u1 , . . . , uq ), which, together, form coordinates
on the total space M = R p+q . Functions u = f (x) are thus provided by q functions of p
variables,
u1 = f 1 (x1 , . . . , xp ), . . . , uq = f q (x1 , . . . , xp ). (12.34)
A smooth,
 scalar-valued
 function f (x1 , . . . , xp ) depending on p independent variables has
p+k−1 ∂kf
pk = different k th order partial derivatives ∂J f (x) = , in-
k ∂xj1 ∂xj2 · · · ∂xjk
dexed by all unordered (symmetric) multi-indices J = (j1 , . . . , jk ), 1 ≤ jκ ≤ p, of order
k = #J. Therefore, if we have q dependent variables (u1 , . . . , uq ), we require qk = qpk
different coordinates uα J , 1 ≤ α ≤ q, #J = k, to represent all the k
th order derivatives

uα α
J = ∂J f (x) of a function u = f (x). We write (x, u
(n)
) for the complete collection of
i α
coordinates x and uJ for i = 1, . . . , p, α = 1, . . . , q, 0 ≤ #J ≤ n.
A function F (x, u(n) ) depending on the independent variables, the dependent variables,
and a finite number of their derivatives is known as a differential function. The order of F is
the highest order partial derivative it explicitly depend on. A system of partial differential
equations is thus defined by the vanishing of a collection of differential functions:

F1 (x, u(n) ) = · · · = Fk (x, u(n) ) = 0. (12.35)

The total derivative with respect to the ith independent variable xi is the first order
differential operator
Xq X
∂ ∂
Di = Dxi = + uα
J,i , (12.36)
∂x i
α=1
∂uαJ
J

where uα α α
J,i = Di (uJ ) = uj1 ...jk i . The sum in (12.36) is over all symmetric multi-indices
J of arbitrary order. Even though Di involves an infinite summation, when applying the
total derivative to any particular differential function, only finitely many terms are needed.
Applying the total derivative Di to a differential function has the effect of differentiating
it with respect to xi treating the uα and its derivatives as functions of x1 , . . . , xn . Higher
order total derivatives are defined so that DJ = Dj1 · · · Djk for any symmetric multi-index
J = (j1 , . . . , jk ), 1 ≤ jν ≤ p. If J0 = ∅ is the empty multi-index with #J = 0, then,
by convention DJ0 is the identity map. If u = f (x) is a (smooth) solution to our system
of partial differential equations (12.35) then it also satisfies the differentiated equations
DJ Fj = 0 for all multi-indices J and all j = 1, . . . , k.
Let Ω ⊂ X denote a connected open set with piecewise smooth boundary ∂Ω. By an nth
order variational problem, we mean the problem of finding the extremals (minima and/or
maxima) of a functional
Z Z
(n)
J[ u ] = L(x, u ) dx = L(x, u(n) ) dx1 · · · dxp (12.37)
Ω Ω

over some space of functions u = f (x) for x ∈ Ω. The integrand L(x, u(n) ), which is a
smooth differential function on the jet space Jn , is referred to as the Lagrangian of the

1/7/22 123 c 2022 Peter J. Olver


variational problem (12.18). In this presentation, we will ignore the precise nature of the
boundary conditions — fixed, natural, etc.
The Euler–Lagrange equations are found by the usual variational procedure combined
with integration by parts, the vanishing of the resulting boundary integral being a conse-
quence of the imposed boundary conditions. The result can be formalized as follows
Definition 12.9. Let 1 ≤ α ≤ q. The Euler operator is the q-tuple of differential
operators E = (E1 , . . . , Eq ), with components
X ∂
Eα = (−1)#J DJ , α = 1, . . . , q. (12.38)
∂uαJ
J

Theorem 12.10. The smooth extremals u = f (x) of a variational problem with La-
grangian L(x, u(n) ) must satisfy the system of Euler–Lagrange equations obtained by ap-
plying the Euler operators to the Lagrangian:
X ∂L
Eα (L) = (−1)#J DJ = 0, α = 1, . . . , q. (12.39)
∂uαJ
J

Note that, as with the total derivatives, even though the Euler operator (12.38) is defined
using an infinite sum, for any given Lagrangian only finitely many summands are needed
to compute the corresponding Euler–Lagrange expressions E(L).
A general vector field on the space of independent and dependent variables takes the
form
p q
X ∂ X ∂
i
v= ξ (x, u) i + ϕα (x, u) α . (12.40)
i=1
∂x α=1
∂u
 
The induced flow or one-parameter transformation group x(t), u(t) = Φ t, (x, u) is
obtained by integrating the associated system of ordinary differential equations
dxi duα
= ξ i (x, u), = ϕα (x, u),
dt dt
subject to the initial conditions x(0) = x, u(0) = u.
The characteristic of the vector field (12.40) is the q-tuple Q(x, u(1) ) of first order dif-
ferential functions defined by
p
X
α (1) α ∂uα
Q (x, u ) = ϕ (x, u) − ξ i (x, u) , α = 1, . . . , q. (12.41)
i=1
∂xi
The evolutionary form of v is the generalized vector field
q
X ∂
vQ = Qα . (12.42)
α=1
∂uα
The prolongation of vQ to the derivative coordinates uα
J is then given by
q
X X
∗ = ∂
vQ DJ Qα , (12.43)
α=1
∂uαJ
J

1/7/22 124 c 2022 Peter J. Olver


while the prolongation of v itself can be written as
p p q p
X X ∂ X X ∂ X
v∗ = vQ
∗+ i
ξ Di = ξi
+ ϕα
J , where ϕα
J = DJ Q + α
ξ i uα
J,i ,
i=1 i=1
∂x i
α=1
∂uαJ i=1
J
(12.44)
in direct analogy with (12.22). Complete proofs of these formulas are in [42].
Conservation Laws and Noether’s Identity
The total divergence of a p-tuple of differential functions is given by
p
X
Div P = Di Pi = 0, (12.45)
i=1

Starting with the elementary product formula


Di (P Q) = P Di Q + Q Di P,
we deduce integration by parts formulas for the total derivatives:

P Q, j = i,
P Di Q + Q Di P = Div A where Aj = (12.46)
0, j=6 i.
By induction, we can show that
P DJ Q = (−1)#J Q DJ P + Div B, (12.47)
for some B = (B1 , . . . , Bp ) is a linear combination of terms involving a total derivative of P
times a total derivative of Q, whose coefficients are integers determined by the multi-index
J, but whose somewhat complicated explicit formula is not required for what follows.
In higher dimensions, conservation laws of a system of partial differential equations are
defined as divergence expressions that vanish on all solutions:
p
X
Div P = Di Pi = 0, (12.48)
i=1

where P1 , . . . , Pp are differential functions. According to the Divergence Theorem, this


implies that the integral of a conservation law over a domain equals a boundary integral.
For example, if the law is conservation of mass, the boundary integral is the mass flux. If
one of the independent variables is time t, and the spatial variables are x = (x1 , . . . , xp−1 ),
then one writes the conservation law in the form
p−1
X
Dt T + Divx X = Dt T + Dxi Xi = 0, (12.49)
i=1

where T is referred to as the density while X = (X1 , . . . , Xp−1 ) is the associated flux . In
this case, (12.49) combined with the Divergence Theorem implies
Z Z
d (n)
T (t, x, u ) dx = X(t, x, u(n)) · n dS. (12.50)
dt Ω ∂Ω

1/7/22 125 c 2022 Peter J. Olver


Thus, if, as a consequence of the boundary conditions, the normal component of the flux
X vanishes on the boundary ∂Ω, then the integral of the density T over the domain is
conserved, meaning it is constant in time. For example, if T = u is the mass density, then
(12.50) reduces to the law of conservation of mass, provided there is no mass flux through
the boundary.
For a multidimensional Lagrangian, the general infinitesimal invariance criterion under
the one-parameter group with infinitesimal generator v takes the form

v∗ (L) + L Div ξ = Div A, (12.51)


for some p-tuple of differential functions A = (A1 , . . . , Ap ), where ξ = (ξ1 , . . . , ξp ) are
the coefficients of the infinitesimal generator (12.40). If Div A ≡ 0, then v is a strict
variational symmetry; in the general case, v is known as a divergence symmetry. The
term Div ξ arises as a result of how the group transformations change the integration
element dx = dx1 · · · dxp , multiplying it by the Jacobian determinant of the transforma-
tion. Differentiation of the determinant with respect to the group parameter produces the
divergence term; see [42] for details. Equivalently, in view of (12.44), this implies
p
X p
X
∗ (L) +
Div A = vQ ξ i Di L + L ∗ (L) + Div (L ξ).
Di ξ i = vQ (12.52)
i=1 i=1

Thus, an evolutionary vector field vQ defines a variational symmetry if and only if


∗ (L) = Div B
vQ (12.53)
for some p-tuple B, which is related to the p-tuple in (12.51) via B = A − L ξ.
On the other hand, in view of (12.43), we can write
q
X X ∂L
v∗ (L) =
Q DJ Qα . (12.54)
α=1
∂uα
J
J

We use (12.47) to integrate each summand by parts, producing the formula


q
!
X X ∂L
vQ∗ (L) = Qα (−1)#J DJ + Div C = Q · E(L) + Div C,
α=1
∂uα
J
J

where the Euler–Lagrange expressions associated with L(x, u(n) ) are given in (12.39) and
the divergence terms resulting from the integrations by parts are collected together in
the p-tuple C. Substituting back into (12.51, 52), and rearranging terms, we deduce the
general Noether identity

v∗ (L) + L Div ξ = Q · E(L) + Div C. (12.55)


Thus, if we have an infinitesimal symmetry of the variational problem, satisfying (12.51),
then
Div P = Q · E(L),

1/7/22 126 c 2022 Peter J. Olver


where P = A − C. Since the right hand side vanishes on solutions to the Euler–Lagrange
equation, we conclude that P defines a conservation law.
Conversely, suppose we have a conservation law (12.48). Under mild regularity con-
ditions, [42], this means that the divergence expression is a linear combination of total
derivatives of the Euler–Lagrange equations:
q
X X
Div P = Qα
J DJ Eα (L),
α=1 J

where the coefficients Qα


Jare differential functions. If we apply the integration by parts
formula (12.47) to each summand, we arrive at an equivalent conservation law
X
Div Pe = Q · E(L) where Qα = (−1)#J DJ QαJ (12.56)
J

are the components of what is called the characteristic of the conservation law. Substi-
tuting back into the Noether identity (12.55), we deduce that if vQ is the evolutionary
vector field corresponding to the q-tuple in (12.56), then it defines a variational symmetry
of the variational problem, meaning that is satisfies (12.53) for some B. We have thus
established Noether’s First Theorem.
Theorem 12.11. Every variational symmetry gives a conservation laws of the Euler–
Lagrange equation and, conversely, every conservation law comes from such a symmetry.
Remark : Higher order conservation laws produce higher order generalized symmetries.
Their existence is a hallmark of the integrability of the Euler–Lagrange equations, [42].
Example 12.12. Consider the one-dimensional unforced wave equation
utt − c2 uxx = 0, (12.57)
for the displacement u(t, x) as a function of the time and the single spatial coordinate.
Here, the wave speed c > 0 is assumed to be constant, whereby (12.57) models the unforced
propagation of (small amplitude) waves in a one-dimensional homogeneous elastic medium.
The wave equation is the Euler–Lagrange equation for the variational problem
ZZ
1 2 1 2 2

J[ u ] = u
2 t − 2 c u x dx dt,

representing the difference between the kinetic and potential energies. In this case, we
ignore details on the domain of integration, which could be all of R 2 , and boundary con-
ditions.
A variety of conservation laws can be constructed using the Noether machinery. First,
invariance of the Lagrangian with respect to time translation, which has infinitesimal
generator ∂t and corresponding characteristic Q = − ut produces the conservation law†
 
Dt 12 u2t + 12 c2 u2x − Dx c2 ux ut = ut (utt − c2 uxx ) = 0


In most cases, we use − Q in the conservation law to avoid additional minus signs.

1/7/22 127 c 2022 Peter J. Olver


in which the integrated conserved density is the total energy — the sum of the kinetic
and potential energies. Similarly, invariance under space translations, with infinitesimal
generator ∂x and characteristic Q = − ux produces the conservation law

Dt (ux ut ) − Dx 12 c2 u2x − 12 u2t = ux (utt − c2 uxx ) = 0
in which the integrated conserved density represents the momentum. Translations in u,
with infinitesimal generator ∂u and characteristic Q = 1 produces

Dt (ut ) − Dx (c2 ux ) = utt − c2 uxx = 0,


and so, assuming no net flux, the integral of the velocity ut is conserved. Finally, the
infinitesimal symmetry generator v = x ∂t + t∂x corresponds to the one-parameter group
of “hyperbolic rotations”

Φ(t, (x, u)) = (x cosh ty + u sinh t, x sinh t + u cosh t).


The characteristic is Q = −x ut − t ux , and the corresponding conservation law is

Dt T + Dx X = (x ut + t ux ) (utt − c2 uxx ) = 0,
where
 
T = 1
2 t 1
2 u2t + 21 c2 u2x + x ux ut , X = − 21 x 1
2 u2t + 12 c2 u2x − x2 t ux ut .
The integral of T equals t times the energy integral plus the center of mass integral and, in
the absence of boundary fluxes, implies that the center of mass of the disturbance moves
as an affine function of time: C(t) = E t + C0 .
The above are but four of an infinite family of conservation laws of the wave equation;
see [42] for additional examples and their classification.

13. Lagrangian and Hamiltonian Mechanics.


In this section, we will change our notational conventions to bring them into line with
the standard usage in the Lagrangian and Hamiltonian approaches to classical mechanics.
The independent variable will be time, denoted by t. The dependent variables represent
positions of classical particles, and are denoted by q = (q1 , . . . , qn ), so that the solutions to
our systems of differential equations will be vector-valued functions q(t) = (q1 (t), . . . , qn (t))

of the time variable. We will use dots to denote time derivatives, so that q = dq/dt is the
velocity vector of the particle motion, with components q i (t), while q = d2 q/dt2 is the
 

acceleration vector. For three-dimensional motion, each point mass has three independent
coordinate positions, and so n = 3 k, where k is the number of individual masses, and the
positions and masses are labelled accordingly.
The Lagrangian approach to classical Newtonian mechanics requires one to determine
the critical functions of a first order variational problem
Z b
(13.1)

I[ q ] = L(t, q, q) dt,
a

1/7/22 128 c 2022 Peter J. Olver


where the Lagrangian L is a smooth (usually C2 suffices) function of time, position, and
velocity. The associated Euler-Lagrange equations are a second order system of ordinary
differential equations

d ∂L ∂L
Ei (L) = −  + = 0, i = 1, . . . , n. (13.2)
dt ∂ q i ∂qi

In this dynamical context, the solutions are typically specified by initial conditions rather
than boundary conditions.
In particle dynamics, the Lagrangian is the difference of kinetic and potential energy:
n
1 X
mi q 2i − V (t, q),

L= (13.3)
2 i=1

where mi is the mass associated with the particle coordinate qi . The Euler-Lagrange
equations (13.2) are just Newton’s laws F = m a:

d2 qi ∂V
mi 2
+ = 0, i = 1, . . . , n, (13.4)
dt ∂qi

relating the accelerations to the forces induced by the potential energy function. Observe
that, because the Lagrangian is the difference in energies, we do not expect the solutions
to the Euler–Lagrange equations, i.e., the particle trajectories, to minimize or maximize
the variational problem, hence their status as critical functions and not extremizers.
For example, the two body problem in three-dimensional space governs the gravitational
interactions between two point masses — e.g., the (idealized) earth and its moon — that
are moving in space. Here q(t) = (q1 (t), . . . , q6 (t)) has six components, where (q1 , q2 , q3 )
represent the x, y, z coordinates of the first body, while (q4 , q5 , q6 ) represent the x, y, z
coordinates of the second body. Consequently, m1 = m2 = m3 = M1 equal the mass of the
first, while m4 = m5 = m6 = M2 equal the mass of the second. According to Newtonian
gravitation, the potential function is proportional to the product of the masses divided by
the distance between them:
G M1 M2
V (t, q) = − p
(q1 − q4 )2 + (q2 − q5 )2 + (q3 − q5 )2

where G ≈ 6.674 × 1011 m3 kg −1 s−2 is the universal gravitational constant. The partial
derivatives of the potential V appearing in the Newtonian system (13.4) produce the inverse
square law of gravitation attraction in the non-relativistic, flat universe.

Hamiltonian Systems

The alternative approach to classical mechanics is due to Hamilton, who first studied
optics before extending his methods to all of mechanics. The starting point is to define
the (generalized) momenta associated with the Lagrangian in (13.1). These are given by

1/7/22 129 c 2022 Peter J. Olver


the partial derivatives of the Lagrangian with respect to the velocity variables† :
∂L 
pi =  (t, q, q). (13.5)
∂ qi
For example, in the case of a Newtonian system (13.3), the momentum

pi = mi q i (13.6)
is the usual expression for momentum involving the product of mass and velocity. If we
impose the nondegeneracy assumption that the Hessian matrix of the Lagrangian with
respect to the velocity coordinates is nonsingular:
!
∂ 2L
det   6= 0, (13.7)
∂ qi ∂ qj

then the Implicit Function Theorem, [4, 35], allows us to locally uniquely solve (13.5) for

q as a function of t, q, p:

q i = ϕi (t, q, p). (13.8)
The map between the Lagrangian variables (t, q, q) and the Hamiltonian variables (t, q, p)


is known as the Legendre transformation.


Define the Hamiltonian function
n
X n
X
   ∂L 
H(t, q, p) = pi q i − L(t, q, q) = qi  − L(t, q, q), (13.9)
i=1 i=1
∂ qi

where we use (13.8) to replace q on the right hand side. (Compare with formula (9.12),
keeping in mind the change in notation.) Typically H represents the physical energy in
the system. For example, in the case of Newtonian mechanics with Lagrangian (13.3), the
associated Hamiltonian function, in view of (13.6), is
n
1 X p2i
H(t, q, p) = + V (t, q). (13.10)
2 mi
i=1

Since p2i /mi = mi q 2i , the initial summation is again the kinetic energy, and hence H is the


sum of kinetic and potential energy, i.e., the total energy, whereas the Lagrangian L is the
difference of the two.
We now need to compute the partial derivatives of the Hamiltonian function; the easiest
way to see this is to use differentials. Using formula (13.9), we compute
∂H ∂H ∂H
dH = dt + dqi + dp . (13.11)
∂t ∂qi ∂pi i


We are changing our earlier notation; from here on p will represent a momentum, and not the
derivative of q.

1/7/22 130 c 2022 Peter J. Olver


 
dH = pi dq i + q i dpi − dL
  ∂L ∂L ∂L   ∂L ∂L (13.12)
= pi dq i + q i dpi − dt − dqi −  dq i = q i dpi − dt − dq .
∂t ∂qi ∂ qi ∂t ∂qi i
Equating (13.11, 12), and then using the Euler–Lagrange equations (13.2), we deduce that
∂H ∂L ∂H ∂H ∂L d ∂L 
 = − pi .

=− , = qi , =− =−
∂t ∂t ∂pi ∂qi ∂qi dt ∂ qi
Rearranging the latter two equations, we deduce Hamilton’s equations
dqi ∂H dpi ∂H
= , =− , i = 1, . . . , n. (13.13)
dt ∂pi dt ∂qi
They are entirely equivalent to the Euler-Lagrange equations (13.2) via the Legendre trans-
formation (13.5, 8). Hamilton’s equations provide a convenient “canonical form” for writ-
ing the equations of classical mechanics, that, presciently, turned out to be the essential
ingredient in Schröidinger’s development of quantum mechanics in the twentieth century.
The law of conservation of energy in Newtonian mechanics is a particular case of the
following general result.
Proposition 13.1. If the Hamiltonian function is independent of t, then it is constant
on any solution q(t), p(t) to Hamilton’s equations.
Proof : To prove that H(q, p) is constant, we show that its time derivative is zero:
Xn   X n  
d ∂H dqi ∂H dpi ∂H ∂H ∂H ∂H
H(q(t), p(t)) = + = − = 0,
dt i=1
∂qi dt ∂pi dt i=1
∂qi ∂pi ∂pi ∂qi

where we used (13.13) to evaluate the derivatives of qi and pi . Q.E.D.


Thus, in the time-independent case, where Hamilton’s equations form an autonomous
system of first order ordinary differential equations, the Hamiltonian function forms a first
integral or conservation law. Time independence of H is implied by time independence of

the Lagrangian L(q, q), and the consequential conservation law prescribed by H follows,
through Noether’s Theorem, from the fact that time translation is a symmetry of the vari-
ational problem. Similarly, invariance of the variational problem under space translations
or rotations leads to the conservation of linear and angular momentum, respectively. See
[42] for details.
Example 13.2. Geometric Optics: Consider the optics functional (5.6) in three-
dimensional space, with Lagrangian
p
L(x, y, z, x, y, z) = n(x, y, z) x2 + y 2 + z 2 , (13.14)
    

where n denotes the index of refraction. We cannot apply the usual Hamiltonian theory
since the Lagrangian is degenerate: det(∂ 2 L/∂ qi ∂ q j ) ≡ 0. This is in essence because the
 

solutions are the curves followed by the light rays, and hence are unaffected by reparametri-
zation. In other words, the variational principle is parameter-independent, and, as we saw
in Example 5.3, there is a dependency among the associated Euler–Lagrange equations.

1/7/22 131 c 2022 Peter J. Olver


To proceed with a Hamiltonian formulation, we must remove this degeneracy. The
easiest way is to assume that the curve is given by the graph of a function, and use one
of the coordinates as the parameter. For instance, if we specialize to a planar medium,
so q = (x, y), and suppose that the path is given as the graph of a curve y = f (x), the
variational problem (5.6) takes the form
Z b Z b p
J[ y ] = n(x, y) ds = n(x, y) 1 + y 2 dx, y(a) = α, y(b) = β. (13.15)


a a

Now the horizontal coordinate x plays the role of time. The Euler-Lagrange equation of
this variational problem is
!
d n(x, y) y ∂n p

− p + 1 + y 2 = 0, (13.16)

dx 1+y
2
∂y

where now means d/dx. To compute the Hamiltonian form of these equations, the
Lagrangian is p
L(x, y, y) = n(x, y) 1 + y 2 , (13.17)
 

hence
∂L ny p

p=  = p  , which can be explicitly inverted: y=p

∂y 1 + y2 n2 − p2
Therefore, the Hamiltonian is
p
H(x, p, y) = p y − L = − n(x, y)2 − p2 (13.18)


with canonical equations, first derived by Hamilton,

 ∂H n ∂n/∂y ∂H p
p=− =p , y= =p ,

(13.19)
∂y n2 − p2 ∂p n − p2
2

The field of Hamiltonian dynamics is vast, and we refer the reader to [1, 5, 17, 23, 42, 56]
for further developments.
Hamilton–Jacobi Theory
There is an intimate connection between first order partial differential equations and
systems of first order ordinary differential equations, [10, 17, 43]. The two subjects are,
in essence, equivalent, and a complete solution to one gives the complete solution to the
other. This relationship, going back to Hamilton, Jacobi, and others in the first half of
the nineteenth century, lies at the heart of wave/particle duality, and the interconnections
between classical and quantum mechanics.
We begin with a first order variational problem
Z t1

L(t, q, q) dt
t0

1/7/22 132 c 2022 Peter J. Olver



with q = (q1 , . . . , qn ). We use the compact notation ∂L/∂q, ∂L/∂ q for the “gradients” of

L with respect to the indicated vector variables, with respective entries ∂L/∂qi , ∂L/∂ qi ,
for i = 1, . . . , n.
Let (t0 , q0 ) ∈ R 2 . We use the multi-dimensional generalization of the construction in
Theorem 10.1 to construct a field of extremals on an open set U ⊂ { t > t0 } consisting of
solutions to the Euler–Lagrange equations starting at the initial point (t0 , q0 ). We then
define the action function to be
Z

S(t, q) = L(t, q, q) dt, (13.20)
γ

where γ is the extremal curve contained in the field that connects (t0 , q0 ) to the point
(t, q) ∈ U . It is not hard to see that S is a continuously differentiable function of its
arguments. Moreover it satisfies a very important first order partial differential equation.
Theorem 13.3. The action function is a solution to the Hamilton–Jacobi equation:
 
∂S ∂S
+ H t, q, = 0, (13.21)
∂t ∂q
where H(t, q, p) is the Hamiltonian (13.9) associated with the variational problem.
Proof : This requires us to compute the partial derivatives of S, which we do by varying
its arguments. To avoid confusing the endpoints with the integration variables, let us set
Z b+ε η

L(t, q + ε ϕ, q + ε ϕ) dt,

h(ε) = S(b + ε η, β + ε ϑ) =
a

where the value of the variation in q at the varied endpoint gives the variation in β:

q(b + ε η) + ε ϕ(b + ε η) = β + ε ϑ. (13.22)


Then, by the chain rule, we can compute the partial derivatives of S by evaluating the
derivative of h:
∂S ∂S
h′ (0) = (b, β) η + (b, β) ϑ.
∂t ∂q
Recall the formula (4.28) which, in the present notation, implies
 ∂L 
h′ (0) = L b, q(b), q(b) η +  b, q(b), q(b) ϕ(b)
 
(13.23)
∂q
whenever q(t) solves the Euler–Lagrange equations (13.2). On the other hand, differenti-
ating (13.22) with respect to ε and setting ε = 0, we find

q(b) η + ϕ(b) = ϑ.
Substituting into (13.23), we deduce that
 
′    ∂L   ∂L  
h (0) = L b, q(b), q(b) − q(b)  b, q(b), q(b) η +  b, q(b), q(b) ϑ
∂q ∂q

1/7/22 133 c 2022 Peter J. Olver


and hence
∂S    ∂L   ∂S ∂L  
(b, β) = L b, q(b), q(b) − q(b)  b, q(b), q(b) , (b, β) =  b, q(b), q(b) .
∂t ∂q ∂q ∂q
Recalling the definition of the momenta (13.5) and the Hamiltonian (13.9), and reverting
to (t, q) instead of (b, β), we see that the preceding pair of equations can be written in the
more suggestive form
∂S ∂S
= − H(t, q, p), = p. (13.24)
∂t ∂q
Replacing p in the first formula by its expression in the second produces the Hamilton–
Jacobi equation (13.21). Q.E.D.
Example 13.4. Consider the special case of Newton’s equations for the motion of a
mass in a static central gravitational force field. The Lagrangian is
m k q k2  k p k2 


L(q, q) = −V kqk = −V kqk ,
2 2m

where p = m q. The Hamiltonian is
 k p k2
 
H(q, p) = p q − L(q, q) = +V kqk .
2m
Thus, the Hamilton–Jacobi equation (13.21) takes the form
2
∂S 1 ∂S 
+ + V k q k = 0. (13.25)
∂t 2m ∂q
In spherical coordinates (r, ϕ, θ), this becomes
  2    2 
∂S 1 ∂S 1 ∂S 2 1 ∂S
+ + 2 + 2 2 + V (r) = 0. (13.26)
∂t 2m ∂r r ∂ϕ r sin ϕ ∂θ

Remark : We will use the mathematical convention for spherical coordinates, cf. [43],
where − π < θ ≤ π is the azimuthal angle or longitude, while 0 ≤ ϕ ≤ π is the zenith angle
or co-latitude, whereby
x = r sin ϕ cos θ, y = r sin ϕ sin θ, z = r cos ϕ. (13.27)
In many books, particularly those in physics, the roles of ϕ and θ are reversed , leading to
much confusion when one is perusing the literature.
Example 13.5. In two-dimensional geometric optics, as we have seen, the Hamiltonian
function (13.18) is p
H(x, y, p) = − n(x, y)2 − p2
with x playing the role of t. The Hamilton–Jacobi equation becomes
s  2
∂S ∂S
− n(x, y)2 − = 0.
∂x ∂y

1/7/22 134 c 2022 Peter J. Olver


which is equivalent to the eikonal equation
 2  2
∂S ∂S
+ = n(x, y)2.
∂x ∂y
More generally, in n-dimensional optics, the Hamilton–Jacobi equation is the multidimen-
sional eikonal equation
k ∇S k2 = n2 .

The solutions to the Hamilton–Jacobi equation (13.21) have an intimate relationship


with the solutions to Hamilton’s equations (13.13). In fact, they are in a certain sense
equivalent differential equations, and knowing the complete solution to one allows one
to determine the complete solution to the other. There are several approaches to this;
here we invoke the method of characteristics, [10, 17, 43], and show that the solutions to
Hamilton’s equations are just the characteristic curves for the Hamilton–Jacobi equation.

Characteristics

The physical motivation for the theory of characteristics comes from geometrical optics,
where the level sets { S = c } of the action function are the wavefronts where the light
waves have constant phase, [8]. The corresponding characteristics are just the paths
followed by the light rays or the photons. This principle is at the heart of Hamilton’s
“optical-mechanical” analogy, and, ultimately, the basis of the wave/particle duality in
quantum mechanics.
In general, consider a first order partial differential equation

F (x, u, ∇u) = 0, x = (x1 , . . . , xn ) ∈ R n , (13.28)


where F (x, u, p) is a smooth function on the (2 n + 1)-dimensional space (sometimes known
as the first order jet space, [42]) whose coordinates are the independent variables x =
(x1 , . . . , xn ), the dependent variable u, and the gradient coordinates p = (p1 , . . . , pn ), with
pi representing the derivative ∂u/∂xi (and not a momentum variable in this section). In
future, we will abbreviate the gradient of u as ux = ∇u, adopting the subscript notation
to denote partial derivatives.
The most basic problem associated with such an equation is the Cauchy problem, mean-
ing the initial value problem in which one specifies the values of the function u on an
(n − 1)-dimensional submanifold (hypersurface) S of the base space R n . To solve the
Cauchy problem one is required to find a solution u, defined in a neighborhood of the
initial hypersurface S, with the given initial values: u |S = f . For instance, suppose we
adopt the particular coordinates (t, y) = (t, y1 , . . . , yn−1 ) in which the initial hypersurface
is the flat hyperplane S = { (t, y) | t = 0 }. In this case the equation takes the form

F (t, y, u, ut, uy ) = 0. (13.29)


The Cauchy data is specified on the hyperplane S by

u(0, y) = f (y). (13.30)

1/7/22 135 c 2022 Peter J. Olver


Suppose we can solve this equation for the normal derivative ut , placing the equation in
Cauchy–Kovalevskaya form
ut = G(y, t, u, uy ). (13.31)
According to the Implicit Function Theorem, [4, 35], this is possible (locally) provided
the partial derivative ∂F/∂ut 6= 0 does not vanish at a point (t, y, u, ut, uy ). In this case,
the initial hyperplane S is called non-characteristic. The Cauchy-Kovalevskaya Existence
Theorem, [17], shows that, for analytic G, the above Cauchy problem (13.30–31) has a
unique solution.
More generally, a hypersurface S ⊂ R n is called non-characteristic if the corresponding
Cauchy problem
u |S = f
is similarly well-posed. Let us represent

S = { x | h(x) = 0 }
as the zero locus of a smooth scalar-valued function h with ∇h 6= 0 on S. Assuming
without loss of generality that ∂h/∂xn 6= 0 we can locally introduce new coordinates

t = h(x), y 1 = x1 , ... yn−1 = xn−1 ,


to flatten out S to be the hyperplane { t = 0 }. Then
∂u ∂u ∂u ∂u ∂u ∂h
= ξj + , 1 ≤ j ≤ n − 1, = ξn , where ξj = .
∂xj ∂t ∂yj ∂xn ∂t ∂xj
The Implicit Function Theorem requires that, to be able to solve smoothly for the “normal”
derivative ut , we must have
X n
∂F ∂F
0 6= = ξj .
∂ut j =1
∂pj

The preceding discussion serves as motivation for the following crucial definition.
Definition 13.6. An n-tuple of numbers ξ = (ξ1 , . . . , ξn ) determines a characteristic
direction for the partial differential equation F (x, u, p) = 0 at the point (x, u, p) if

ξ · Fp (x, u, p) = 0.

Then, very loosely speaking, a hypersurface S = { h(x) = 0 } is characteristic if its


normal ξ = ∇h determines a characteristic direction at each point.
Away from singular points where the derivative Fp vanishes, the characteristic directions
span an (n − 1)-dimensional subspace of (the cotangent space to) R n . Throughout this
section, we will always assume that Fp 6= 0 so as to avoid singularities of the equation. The
orthogonal complement to this characteristic subspace will be a one-dimensional subspace
of (the tangent space to) R n . Thus we will call a tangent vector v = (v1 , . . . , vn ) at a point
x ∈ R n a characteristic vector for the point (x, u, p) sitting over x if it is orthogonal to all
the characteristic directions ξ at this point, i.e., v · ξ = 0 for all ξ such that ξ · Fp = 0.

1/7/22 136 c 2022 Peter J. Olver


Clearly this is true if and only if v is parallel to Fp , i.e., v = λ Fp for some scalar λ.
In particular, except in the special case of a linear equation, we need to know the values
of u and p in order to specify the characteristic vectors at a point x. A hypersurface S
is characteristic if and only if the corresponding characteristic vector v is contained its
tangent space at each point.
An alternative, but equivalent definition of a characteristic direction is a direction in
which the derivative of some solution admits a possible discontinuity. Note that, if u is a
continuous solution to the equation that is C1 except on the hypersurface S = { t = 0 },
where the normal derivative ut has a discontinuity, then S must be characteristic, since if
we can solve for ut as above, then it is necessarily continuous, as all the functions on the
right hand side are continuous. Both definitions of characteristics extend to higher order
equations, although they can typically no longer be used to derive the general solution.
Let u = f (x) be any C2 solution to the partial differential equation (13.28) defined on
a domain D ⊂ R n . At each point x ∈ D, the solution will determine (up to multiple) a
characteristic vector v = λ Fp (x, f (x), fx(x)). A parametrized curve x(s) whose non-zero
tangent vector is everywhere a characteristic vector for the given solution u will be called
a characteristic curve for the solution. This requires that dx/ds be proportional to Fp at
the point:
dx ∂F
=λ . (13.32)
ds ∂p
Now, except in the case of linear equations, the characteristic vectors depend not only on
the base point x, but also on the value of u and its derivatives p at x. This suggests that,
in order to determine a characteristic curve, we not only look at the base curve x(s), but
also its “prolongation” to the jet space R 2 n+1 . There will be a unique curve contained in
the prolonged graph Γf of the solution sitting over the base curve x(s), namely the curve
(x(s), u(s), p(s)), where

∂f
u(s) = f (x(s)), p(s) = (x(s)).
∂x
We can use the chain rule to compute how the u and p components of the prolonged curve
depend on s. Thus, in view of (13.32),
n n
du X ∂f dxi X ∂F ∂F
= =λ pi = λp .
ds i = 1 ∂xi ds i=1
∂pi ∂p

and also,
dpi X ∂p dx X ∂ 2 f ∂F
i k
= =λ .
ds ∂xk ds ∂xi ∂xk ∂pk
k k

At this stage it appears that we also need to know how the second derivatives of the
solution behave. However, u = f (x) is assumed to be a solution, so it also satisfies

∂  ∂F ∂F ∂f X ∂F ∂ 2 f
0= F x, f (x), fx(x) = + + .
∂xi ∂xi ∂u ∂xi ∂pk ∂xi ∂xk
k

1/7/22 137 c 2022 Peter J. Olver


Comparing these two equations, we see that
 
dpi ∂F ∂F
= −λ + pi .
ds ∂xi ∂u
Finally, note that we can absorb the proportionality factor λ into the parameter s by
reparametrizing the curve, so λ = 1 without loss of generality. We are thus left with a
system of first order ordinary differential equations for the components x(s), u(s), p(s)
which do not refer any longer to the particular solution u = f (x).

Definition 13.7. A characteristic curve, parametrized by† x(s), u(s), p(s) , is a so-
lution to the characteristic system of ordinary differential equations:
 
dx ∂F du ∂F dp ∂F ∂F
= , =p , =− +p . (13.33)
ds ∂p ds ∂p ds ∂x ∂u

In general, by standard existence and uniqueness results for ordinary differential equa-
tions, given a point (x0 , u0 , p0 ) ∈ R 2 n+1 , there is a unique characteristic curve passing
through it. Moreover, by the preceding calculations, if u0 = f (x0 ), p0 = fx (x0 ) so that
(x0 , u0 , p0 ) belongs to the prolonged graph of a solution u = f (x) to the partial differential
equation (13.28), then the characteristic curve passing through (x0 , u0 , p0 ) is contained in
the prolonged graph. Since the characteristic curves are uniquely determined by their ini-
tial conditions, we deduce that every solution to our partial differential equation is swept
out by an (n − 1)–parameter family of characteristic curves, parametrized by the initial
values of u and p on the (n − 1)-dimensional non-characteristic Cauchy surface.
If F doesn’t depend on u, then the characteristic equations (13.33) are essentially the
same as Hamilton’s equations for the time-independent Hamiltonian F = H(q, p) (with
x = q) since u can be determined from x, p by a single quadrature. For the time-dependent
Hamilton–Jacobi equation
 
∂S ∂S
F (t, q, S, St, Sq ) ≡ + H t, q, = 0, (13.34)
∂t ∂q
we replace S by u to find the corresponding equations for characteristic curves
dt dq ∂H dp ∂H dπ ∂H du ∂H
= 1, = , =− , =− , =π+p , (13.35)
ds ds ∂p ds ∂q ds ∂t ds ∂p
where π represents St . Thus t = s + c, and, after we solve the Hamiltonian system for q, p,
we can recover the complete expression for the characteristic curves by quadrature. The as-
sociated characteristic curves (solutions to Hamilton’s equations) are found by integrating
the first order system  
dq ∂H ∂S
= t, q, (q, t) (13.36)
dt ∂p ∂q


s is not necessarily arc length.

1/7/22 138 c 2022 Peter J. Olver


for the positions, and then substituting

∂S
p(t) = (t, q(t)).
∂q

to get the corresponding momenta. The solution q(t) to (13.36) will describe the particle
trajectories in physical space.
As another example, suppose that we know a solution S(q) to the eikonal equation

k ∇S k2 = n(q) (13.37)

of geometrical optics. Then the associated light rays, which follow the characteristics, are
found by integrating
dq
= 2 ∇S(q),
dt

since F = k p k2 − n(q), so Fp = 2 p. Since the gradient ∇S of the solution is orthogonal


to its level sets, we deduce the important fact that the light rays are orthogonal to the
wave front sets { S = c } for c ∈ R. In fact, (13.36) can be interpreted as the corresponding
“orthogonality” relation for classical mechanics.
As an application, we can solve the central force field problem from Example 13.4 in
this manner. The Hamilton–Jacobi equation (13.26) can be solved by additive separation
of variables:
S(t, r, ϕ, θ) = R(r) + Φ(ϕ) + α θ + β t.

The functions R(r), Φ(ϕ) satisfy the pair of ordinary differential equations

 2  2
1 dR −2 dΦ α2
+ V (r) + β = γ r , + + γ = 0,
2m dr dϕ sin2 ϕ

where γ is the separation constant. These can be straightforwardly integrated by quadra-


ture, resulting in the required three-parameter solution. One can use this result to establish
all of Kepler’s laws of planetary motion, [56].

14. Geometric Optics and Wave Mechanics.

The derivation of the equations for geometric optics from those of wave optics pro-
vides the key to Schrödinger’s establishment of the basic equation of quantum mechanics,
and also the classical limit of quantum mechanical phenomena. The basic framework is
encapsulated in the following diagram.

1/7/22 139 c 2022 Peter J. Olver


Geometric Wave

Optics Mechanics

❄ ❄

Classical ✛ Quantum
Mechanics Mechanics

As we will see, just as the equation of geometric optics, which govern the motion of
photons, can be derived as the high frequency limit of the equations of wave mechanics, so
the equation of classical mechanics appear as the high frequency (or, equivalently, small
Planck’s constant) limit of the equations of quantum mechanics. This limiting procedure
is well defined mathematically; On the other hand, going in the reverse direction is not,
since many equations can have the same high frequency limit. For example, the equations
of geometric optics do not uniquely prescribe the equations of wave mechanics. Nor do the
equations of classical mechanics uniquely prescribe the fundamental Schrödinger equation
of quantum mechanics, and the process of “quantization”, that is going from classical to
quantum mechanics, is more of an art than a well defined mathematical process.
It is fascinating that Hamilton, in the first part of the nineteenth century, had three
of the four boxes in the preceding diagram well in hand. He knew how to frame the
equations of classical mechanics in analogy with those of geometric optics through the
Hamiltonian formulation. But he failed to fill in the remaining box — quantum mechanics
— not because he did not have the required mathematical tools, but because there was
not reason at that time to introduce a “wave theory of matter”. But if he had ignored the
missing physical underpinnings, he might well have discovered the Schrödinger equation
70 or so years earlier!
The Wave Equation
Let us begin by describing the connections between wave mechanics and geometric optics.
For simplicity, we first consider the scalar wave equation

c2
ϕtt − ∆ϕ = 0 (14.1)
n2
in an inhomogeneous medium, where t is time, x = (x1 , . . . , xk ) are the spatial coordinates
(and so k = 3 for the physical universe), and n(x) is the index of refraction. In the
homogeneous case where n is constant, the solutions to this problem are superpositions of
plane waves
ϕ = A e i (k·x−ω c t) ,

1/7/22 140 c 2022 Peter J. Olver


where the wave number (spatial frequency) k and the temporal frequency ω are connected
by the dispersion relation
k k k = n ω.
In the geometrical optics approximation, we consider the case when the wave number is
large in comparison to the variation in the refractive index. We begin by restricting our
attention to a simple harmonic wave

ϕ(t, x) = u(x) e i ω c t .
This allows us to factor out the t dependence in the wave equation, implying that u satisfies
the Helmholtz equation
∆u + n2 ω 2 u = 0. (14.2)
We seek complex solutions of the form

u(x) = A(x, ω) e i ωS(x,ω) , (14.3)


where the amplitude A(x, ω) and the phase S(x, ω) are real. Substituting (14.3) into the
Helmholtz equation (14.2) yields
  
∆u + n2 ω 2 u = ω 2 − k ∇S k2 + n2 A + ∆A + i ω 2 ∇S · ∇A + (∆S) A e i ωS .
Since both A and S are real, we deduce the following system of partial differential equations:

∆A + ω 2 (−k ∇S k2 + n2 ) A = 0, 2 ∇S · ∇A + (∆S) A = 0. (14.4)


which is, so far, completely equivalent to the Helmholtz equation.
Now in the high frequency approximation, we introduce asymptotic expansions of the
amplitude and phase,

A(x, ω) ∼ A(x) + ω −1 A1 (x) + ω −2 A2 (x) + · · · ,


(14.5)
S(x, ω) ∼ S(x) + ω −1 S1 (x) + ω −2 S2 (x) + · · · ,
in decreasing powers of the frequency ω. Substituting into the system (14.4), we collect
terms involving the same powers of ω. The leading term, in ω 2 , occurs in the first equation.
Since A 6= 0, this implies that the leading term S(x) in the phase expansion must satisfy

k ∇S k2 = n2 , (14.6)
which is the eikonal equation we already encountered in (13.37). It says that the hyper-
surfaces of constant phase { S = c } are the same as the characteristic hypersurfaces for
the wave equation. If we interpret S as the action function, then the eikonal equation is
the same as the Hamilton–Jacobi equation (13.21) for the geometric optics Hamiltonian.
Thus, the phase surfaces propagate along the characteristics, which are just the solutions
to Hamilton’s equations.
The next term, of order ω, says that the leading amplitude A(x) will satisfy the transport
equation
2 ∇S · ∇A + A ∆S = 0. (14.7)

1/7/22 141 c 2022 Peter J. Olver



To solve this equation, suppose the curve p(s), q(s) in phase space determines a char-
acteristic for the eikonal equation defined by
F (q, u, p) = p2 − n(q)2 .
Then, if Φ(q) is any function of position, we have
dΦ ∂F ∂Φ ∂Φ
= = 2p = 2 ∇S · ∇Φ.
ds ∂p ∂q ∂q
Therefore, along the characteristics, once S(x) is specified, the transport equation (14.7)
reduces to an ordinary differential equation
dA
= − 12 A ∆S
ds
which can solved explicitly
 Z s 
 1 ′
 ′
A q(s) = exp − ∆S q(s ) A(q0 ) ds .
2 s0

Note that if A(q) = 0, so there is zero amplitude to leading order, then A = 0 along the
entire characteristic emanating from q. Therefore, in the first approximation, the solutions
to the wave equation are concentrated on the characteristics; this reflects the fact that
waves and signals propagate along characteristics.
High Frequency Limit and Quantization
In general, suppose we have a linear partial differential equation
F [ ψ ] = 0, (14.8)
depending on a large parameter ω. The differential operator F can be written as
F = Fb(x, i ∂; ω)
where Fb(x, p; ω) is a smooth function, which is a polynomial in the derivative coordinate
p = i ∂. We use ∂ = (∂1 , . . . , ∂n ) to denote the derivative operators ∂j = ∂/∂xj , j =
1, . . . , n. There is a problematic ambiguity in this representation, since we have to specify
the order of the derivatives and the function. For instance, if Fb (x, p) = x p, then there is
a question as to whether this should represent the differential operator i x ∂ or i ∂ · x =
i x ∂ + i . For convenience, we adopt the “normal ordering” convention that the derivatives
always appear last, so x p corresponds to the differential operator i x ∂. (However, this will
come back to haunt us later.)
In the high frequency limit, we make the ansatz
ψ(x, ω) = A(x, ω) e i ωS(x,ω) ,
where A and S are real, A having the usual asymptotic expansion (14.5) in decreasing
powers of ω. In order to determine the analogue of the eikonal equation, it is helpful to
rewrite the operator in the form
 
1
F = F x, ∂, ω .

1/7/22 142 c 2022 Peter J. Olver


Now we abbreviate p = ( i ω)−1 ∂. We assume that for large ω we can expand F in an
asymptotic series

F (x, p, ω) ∼ Fn (x, p) ω n + Fn−1 (x, p) ω n−1 + · · · ,


where we call Fn the leading component of the operator. Now, note that if ψ is given as
above, then
1 ∂ψ 1 ∂  ∂S 1 ∂ log A
= A e i ωS = ψ+ ψ.
i ω ∂xj i ω ∂xj ∂xj i ω ∂xj
But the second term has order less that the first. It is not difficult to see that, in general,
 
1
F [ψ] = F x, ∂, ω ψ = ω n Fn (x, ∇S) ψ + O(ω n−1 ).

Therefore, in the high frequency limit, the term ω n Fn (x, ∇S)ψ will dominate the asymp-
totic expansion. In order that the equation F [ψ] = 0 hold, then, we find the analogue of
the eikonal equation to be
Fn (x, ∇S) = 0. (14.9)
For instance, the Helmholtz equation is already homogeneous of degree 0, corresponding
to F (x, p, ω) = −k p k2 + n2 . Thus (14.9) is coincides with the eikonal equation (14.6).
Now, what about the Hamilton–Jacobi equation? It will be the high frequency limit of
some linear wave equation. In fact, if we set

F (t, x, π, p) = π + H(t, x, p),


where H is the Hamiltonian function, then

F (t, x, St, ∇S) = St + H(t, x, ∇S) = 0


is the Hamilton–Jacobi equation. This indicates that the corresponding wave equation is
 
1 1
ψ + H[ ψ ] = 0, where H = H t, x, ∂ .
iω t iω
We denote ω −1 = ~, which is known as Planck’s constant and has the units of inverse
frequency, according to Einstein’s formula

E = ~ω (14.10)
that relates frequency and energy, which we take to be fixed. The resulting equation

i ~ ψt = H[ ψ ], where H = H(t, x, − i ~ ∂).


is known as the Schrödinger equation, and is the fundamental equation of quantum mechan-
ics. The differential operator H = H(t, x, − i ~ ∂) is known as the Hamiltonian operator
for the quantum mechanical Schrödinger equation. We have found it by requiring that its
high frequency (low ~) limit reduce to the Hamilton–Jacobi equation of classical mechanics.
This means that we are endowing classical particles with a wave-like interpretation.

1/7/22 143 c 2022 Peter J. Olver


Example 14.1. Consider the case of a particle in a central force field. Here the
Hamiltonian is given by
k p k2
H(q, p) = + V (r), (14.11)
2m
where r = k q k. (Recall that this also describes the motion of two interacting masses, pro-
vided we go to center of mass coordinates.) The corresponding Hamilton–Jacobi equation
is
2
∂S 1 ∂S
+ + V (r) = 0. (14.12)
∂t 2 m ∂q
Replacing pj by − i ∂j produces the associated Hamiltonian operator

~2
H=− ∆ + V (r), (14.13)
2m
where ∆ is the ordinary Laplacian. The Schrödinger equation is
~2
i ~ ψt = − ∆ψ + V (r) ψ. (14.14)
2m
In the case V (r) = − e2 /r, where e is the charge on an electron, we are in the situation
of the quantum mechanical hydrogen atom, meaning a single electron circling a single
(heavy) proton. The Hamilton–Jacobi equation governs the leading order asymptotics,
i.e., the solution to the eikonal equation.
A Word of Warning: The Schrödinger equation looks a lot like the heat equation, but
the complex factor i makes it of an entirely different character. It is, in fact, a dispersive
hyperbolic partial differential equation, not a dissipative parabolic equation. One way to
see the difference right away is to look at the norm of the solution† k ψ k2 = ψ ψ. For the
one-dimensional Schrödinger equation
~2
i ~ ψt = − ψ + V (x) ψ, (14.15)
2 m xx
we have
   
∂ ~ ~
k ψ k 2 = ψ ψ t + ψt ψ = ψ ψ − V (x) ψ + − ψ + V (x) ψ ψ
∂t 2 i m xx 2 i m xx
~ ∂ 
= ψ ψ x − ψ ψx .
2 i m ∂x
Therefore, assuming that that ψ(x) and its derivative ψ ′ (x) tend to 0 as x → ∞, its L2
norm is constant:
Z
d 2 d ∞ 2 ~  ∞
kψk = k ψ k dx = ψ ψ x − ψx ψ = 0.
dt dt −∞ 2im −∞


We use an overbar to denote complex conjugate throughout.

1/7/22 144 c 2022 Peter J. Olver


This is in contrast with the heat equation, where the L2 norm of solutions decreases as
t−1/2 , and, lowing to the rapid decay of the high order Fourier mode, the solutions are
immediately smoothed out, [43].
At the moment, things seem rather simple. However, there are genuine problems with
the passage from a classical mechanical system to its corresponding quantum mechanical
counterpart. As an example, consider the central force field problem in R 2 , but written in
polar coordinates. The classical Hamiltonian becomes
 
1 2 1 2
H= pr + 2 pθ + V (r),
2m r
where pr , pθ are the momenta conjugate to the r, θ coordinates. The Hamilton–Jacobi
equation is   2   
∂S 1 ∂S 1 ∂S 2
+ + 2 + V (r) = 0.
∂t 2m ∂r r ∂θ
If we write the corresponding Schrödinger equation, we have
 
~2 1
i ~ ψt = − ψrr + 2 ψθθ + V (r) ψ. (14.16)
2m r
On the other hand, the rectangular coordinate Schrödinger equation involves the Laplacian,
which, in polar coordinates, assumes a slightly different form:
1 1
∆ψ = ψxx + ψyy = ψrr + ψr + 2 ψθθ , (14.17)
r r
containing a first order term not present in (14.16). Thus, although the two classical
systems are completely equivalent under the change of variables, the same is not true for
the two corresponding Schrödinger equations.
What is going on? In our derivation of the geometric optics approximation to wave
optics, we only looked at the leading order terms in the partial differential equation, and
ignored lower order terms in ω = ~−1 . Therefore, many different equations will reduce
down to the same classical system in the high frequency limit, and we have no way of
knowing from classical mechanics alone what the correct lower order terms are. (These
are, as I understand it, even difficult to determine experimentally in quantum mechani-
cal systems.) Thus, there is an inherent ambiguity in our derivation of the Schrödinger
equation. The point is that changes of coordinates will preserve the leading order terms,
but will not preserve the lower order components (even if there are none in one particular
coordinate system). Thus, the rectangular Schrödinger equation, re-expressed in polar
coordinates has the form
     
∂ 1 ∂ 2 1 ∂ 2
0 = −i~ + −i~ + −i~ + V (r) ψ
∂t 2 m ∂r 2 m r2 ∂θ
 
~ ∂
+ −i~ ψ.
2mr ∂r
The leading order terms agree with the polar coordinate Schrödinger equation (14.16), but
the final order ~ term, which does not appear in the classical limit, is absent.

1/7/22 145 c 2022 Peter J. Olver


For the early quantum mechanists, this was not viewed as a problem. Basically, they
required one to quantize only in rectangular coordinates only. However, this is far from
a relativistic viewpoint, which asserts that physical laws must be independent of any par-
ticular coordinates system on the space-time manifold. And, to this day, this inherent
ambiguity in quantization is still causing problems with the mathematical foundations of
quantum mechanics. There are some mathematically reasonable ways to do this, most no-
tably the method of geometric quantization, [55]. However, some elementary systems, e.g.,
the helium atom, are still not covered by this approach, and it has not been significantly
developed in recent years. Moreover, if we impose a few additional reasonable assumptions,
then we run into a roadblock: there are no fully consistent methods of quantization.
Now, return to the Schrödinger equation

i ~ ψt = H[ ψ ]. (14.18)
Assume that the Hamiltonian operator H is independent of t. Then we can separate
variables by setting
b e i ωt ,
ψ(t, x) = ψ(x)
leading to the time independent form of the Schrödinger equation

H[ ψb ] = ~ ω ψb = E ψ.
b (14.19)
where, in view of Einstein’s relation (14.10), E denotes the energy of the system. Thus,
(14.19) implies that the energy E is an eigenvalue of the corresponding Hamiltonian oper-
ator H. There is thus an intimate connection between the possible energies of a physical
system, and the spectrum of the corresponding Hamiltonian operator. We assume (for
the time being) that the solutions of the time-independent Schrödinger equation must be
smooth and bounded over all space. In the particular case of the hydrogen atom, or a
more general particle in a central force field V (r), with V → 0 as r → ∞, the spectrum of

H = −∆ + V (r)
consists of two parts:
(i ) The discrete spectrum, E < 0, which consists of a finite number of negative eigenvalues
corresponding to bound states. The associated eigenfunctions ψ are in L2 , and,
in particular, ψ → 0 as r → ∞.
(ii ) The continuous spectrum, E > 0, where the associated eigenfunction ψ no longer goes
to zero as r → ∞, but rather its asymptotic behavior is like that of a plane wave
e i k·x . These correspond to scattering states.
The key difference between classical mechanics and quantum mechanics is that in clas-
sical mechanics, the energy can take on any positive value, but in quantum mechanics, the
bound state energies are quantized, i.e. they can only take on discrete values. The investi-
gation of the spectrum of Hamiltonian operators is the fundamental problem of quantum
mechanics.
Finally, we list a few basic references on quantum mechanics that the reader may prof-
itably study: [19, 33, 37, 44, 52].

1/7/22 146 c 2022 Peter J. Olver


Acknowledgments: I thank Olivier de La Grandville, Aritra Lahiri, and Maria Clara
Nucci for comments, corrections, and inspiration on earlier versions of these notes. Thanks
to Jeff Calder for permission to use his image processing figures from [11].

References

[1] Abraham, R., and Marsden, J.E., Foundations of Mechanics, 2nd ed., The
Benjamin–Cummings Publ. Co., Reading, Mass., 1978.
[2] Ahlfors, L., Complex Analysis, McGraw–Hill, New York, 1966.
[3] Antman, S.S., Nonlinear Problems of Elasticity, Appl. Math. Sci., vol. 107,
Springer–Verlag, New York, 1995.
[4] Apostol, T.M., Mathematical Analysis, 2nd ed., Addison–Wesley Publ. Co., Reading,
Mass., 1974.
[5] Arnol’d, V.I., Mathematical Methods of Classical Mechanics, Graduate Texts in
Mathematics, vol. 60, Springer–Verlag, New York, 1978.
[6] Ball, J.M., and James, R.D., Fine phase mixtures as minimizers of energy, Arch. Rat.
Mech. Anal. 100 (1987), 13–52.
[7] Bloch, A.M., Nonholonomic Mechanics and Control, 2nd ed., Interdisciplinary Applied
Mathematics, vol. 24, Springer–Verlag, New York, 2003.
[8] Born, M., and Wolf, E., Principles of Optics, Fourth Edition, Pergamon Press, New
York, 1970.
[9] Boyce, W.E., and DiPrima, R.C., Elementary Differential Equations and Boundary
Value Problems, 7th ed., John Wiley, New York, 2001.
[10] Carathéodory, C., Calculus of Variations and Partial Differential Equations of the
First Order, Parts I, II, Holden-Day, New York, 1965, 1967.
[11] Calder, J., The Calculus of Variations, Lecture Notes, University of Minnesota, 2020.
https://www.math.umn.edu/∼jwcalder/CalculusOfVariations.pdf
[12] Chan, T.F., and Shen, J., Image Processing and Analysis. Variational, PDE, Wavelet,
and Stochastic methods, SIAM, Philadelphia, PA, 2005.
[13] Chen, J.Q., Group Representation Theory for Physicists, World Scientific, Singapore,
1989.
[14] Cioranescu, D., and Donato, P., An Introduction to Homogenization, Oxford
University Press, Oxford, 1999.
[15] Costa, C.J., Example of a complete minimal immersion in R 3 of genus one and three
embedded ends, Bol. Soc. Brasil. Mat. 15 (1984), 47–54.
[16] Courant, R., Differential and Integral Calculus, vol. 2, Interscience Publ., New York,
1936.
[17] Courant, R., and Hilbert, D., Methods of Mathematical Physics, Interscience Publ.,
New York, 1953.
[18] Dacorogna, B., Introduction to the Calculus of Variations, Imperial College Press,
London, 2004.

1/7/22 147 c 2022 Peter J. Olver


[19] Dirac, P.A.M., The Principles of Quantum Mechanics, 3rd ed., Clarendon Press,
Oxford, 1947.
[20] Forsyth, A.R., Calculus of Variations, Dover Publ., New York, 1960.
[21] Gel’fand, I.M., and Fomin, S.V., Calculus of Variations, Dover Publ., New York, 2000.
[22] Giaquinta, M., and Hildebrandt, S., Calculus of Variations I. The Lagrangian
Formalism, Springer–Verlag, New York, 1996.
[23] Goldstein, H., Classical Mechanics, 2nd ed., Addison–Wesley, Reading, Mass., 1980.
[24] Gray, A., Abbena, E., and Salamon, S., Modern Differential Geometry of Curves and
Surfaces with Mathematica, 3rd ed., Chapman & Hall/CRC, Boca Raton, Fl., 2006.
[25] Gurtin, M.E., An Introduction to Continuum Mechanics, Academic Press, New York,
1981.
[26] Hennessey, M.P., Olson, D.A., and Shakiban, C., Steering options for maneuvering
a particle on a surface, in: Proceedings of the ASME International Mechanical
Engineering Congress and Exposition, Salt Lake City, Utah, 2019.
[27] Hildebrandt, S., and Tromba, A., Mathematics and Optimal Form, Scientific American
Books, New York, 1985.
[28] James, R.D., Materials from mathematics, Bull. Amer. Math. Soc. 56 (2019), 1–28.
[29] Kirk, D.E., Optimal Control Theory, Dover Publ., Mineola, New York, 2004.
[30] Kosmann-Schwarzbach, Y., The Noether Theorems. Invariance and Conservation Laws
in the Twentieth Century, Springer, New York, 2011.
[31] Kot, M., A First Course in the Calculus of Variations, American Mathematical
Society, Providence, R.I., 2014.
[32] de La Grandville, O., On a classic problem in the calculus of variations: setting
straight key properties of the catenary, Amer. Math. Monthly, to appear.
[33] Landau, L.D., and Lifshitz, E.M., Quantum Mechanics (Non-relativistic Theory),
Course of Theoretical Physics, vol. 3, Pergamon Press, New York, 1977.
[34] Lee, E.B., and Markus, L., Foundations of Optimal Control Theory, John Wiley &
Sons, New York, 1967.
[35] Marsden, J.E., and Tromba, A.J., Vector Calculus, 6th ed., W.H. Freeman, New York,
2012.
[36] Matsutani, S., Eulers elastica and beyond, J. Geom. Symmetry Physics 17 (2010),
45–86.
[37] Messiah, A., Quantum Mechanics, John Wiley & Sons, New York, 1976.
[38] Morgan, F., Geometric Measure Theory: a Beginner’s Guide, Academic Press, New
York, 2000.
[39] Nitsche, J.C.C., Lectures on Minimal Surfaces, Cambridge University Press,
Cambridge, 1988.
[40] Noether, E., Invariante Variationsprobleme, Nachr. Konig. Gesell. Wissen. Gottingen,
Math.–Phys. Kl. (1918), 235–257. (See Transport Theory and Stat. Phys. 1 (1971),
186–207 for an English translation.)
[41] Olver, F.W.J., Lozier, D.W., Boisvert, R.F., and Clark, C.W., eds., NIST Handbook of
Mathematical Functions, Cambridge University Press, Cambridge, 2010.

1/7/22 148 c 2022 Peter J. Olver


[42] Olver, P.J., Applications of Lie Groups to Differential Equations, 2nd ed., Graduate
Texts in Mathematics, vol. 107, Springer–Verlag, New York, 1993.
[43] Olver, P.J., Introduction to Partial Differential Equations, Undergraduate Texts in
Mathematics, Springer, New York, 2014.
[44] Olver, P.J., Quantum Mathematics, Lecture Notes, University of Minnesota, 2016.
[45] Olver, P.J., Emmy Noether’s enduring legacy in symmetry, Symmetry: Culture and
Science 29 (2018), 475–485.
[46] Olver, P.J., Complex Analysis and Conformal Mapping, Lecture Notes, University of
Minnesota, 2020. http://www.math.umn.edu/∼olver/ln /cml.pdf
[47] Olver, P.J., Nonlinear Systems, Lecture Notes, University of Minnesota, 2021.
http://www.math.umn.edu/∼olver/ln /nls.pdf
[48] Olver, P.J., Boundary conditions in the calculus of variations, preprint, University of
Minnesota, 2021.
[49] Olver, P.J., and Shakiban, C., Applied Linear Algebra, Second Edition, Undergraduate
Texts in Mathematics, Springer, New York, 2018.
[50] Rindler, F., Calculus of Variations, Springer–Verlag, New York, 2018.
[51] Sapiro, G., Geometric Partial Differential Equations and Image Analysis, Cambridge
University Press, Cambridge, 2001.
[52] Thirring, W., Quantum Mechanics of Atoms and Molecules, A Course in Mathematical
Physics, vol. 3, Springer–Verlag, New York, 1981.
[53] Tondeur, P., Geometry of Foliations, Birkhäuser Verlag, Boston, 1997.
[54] Varadarajan, V.S., Lie Groups, Lie Algebras, and Their Representations,
Springer–Verlag, New York, 1984.
[55] Woodhouse, N.M.J., Geometric Quantization, Second edition; Oxford University Press,
New York, 1992.
[56] Whittaker, E.T., A Treatise on the Analytical Dynamics of Particles and Rigid Bodies,
Cambridge University Press, Cambridge, 1937.

1/7/22 149 c 2022 Peter J. Olver

You might also like