0% found this document useful (0 votes)
47 views

Lectures Analytic

This document outlines the contents of a course on Lagrangian and Hamiltonian mechanics. It introduces the action principle, which states that the trajectory of a physical system is the path that makes a particular quantity, called the action, stationary. It discusses how the calculus of variations can be used to determine the equations of motion by making the action stationary. Later sections will cover Noether's theorem, Hamiltonian mechanics, fields and continuous systems, and applications like the wave equation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Lectures Analytic

This document outlines the contents of a course on Lagrangian and Hamiltonian mechanics. It introduces the action principle, which states that the trajectory of a physical system is the path that makes a particular quantity, called the action, stationary. It discusses how the calculus of variations can be used to determine the equations of motion by making the action stationary. Later sections will cover Noether's theorem, Hamiltonian mechanics, fields and continuous systems, and applications like the wave equation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

MATHEMATICAL PHYSICS II

MATH2071 · MICHAELMAS · 2023

Lagrangian and Hamiltonian Mechanics

Draft version, September 12, 2023

Iñaki García Etxebarria · Durham

(Based on notes by P. Bowcock and P. Mansfield)


Contents

1 Introduction 2

2 The action principle 3


2.1 Calculus of variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Configuration space and generalized coordinates . . . . . . . . . . . . . . . 10
2.3 Lagrangians for classical mechanics . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Ignorable coordinates and conservation of generalised momenta . . . . . . . 18

3 Symmetries, Noether’s theorem and conservation laws 20


3.1 Ordinary symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Energy conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Normal modes 32
4.1 Canonical kinetic terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Non-canonical kinetic terms . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Fields and the wave equation 39


5.1 Variational Principle for Continuous Systems . . . . . . . . . . . . . . . . . 39
5.2 Example: the wave equation from the Lagrangian for a string . . . . . . . 42
5.2.1 Derivation of the massless scalar Lagrangian from a physical system 42
5.3 D’Alembert’s Solution to the Wave Equation . . . . . . . . . . . . . . . . . 44
5.4 Noether’s theorem for fields . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.5 The Energy-Momentum Tensor . . . . . . . . . . . . . . . . . . . . . . . . 49
5.6 Monochromatic Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.7 Strings with Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.7.1 Dirichlet boundary condition . . . . . . . . . . . . . . . . . . . . . . 53
5.7.2 Neumann boundary condition . . . . . . . . . . . . . . . . . . . . . 54
5.8 Junctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6 The Hamiltonian formalism 59


6.1 Phase space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2 The Poisson bracket and Hamiltonian flows . . . . . . . . . . . . . . . . . . 61
6.2.1 Flows for conserved charges . . . . . . . . . . . . . . . . . . . . . . 65
6.3 The Hamiltonian and Hamilton’s equations . . . . . . . . . . . . . . . . . . 65
6.4 There and back again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

A A review of some results in calculus 74


A.1 Two useful lemmas for coordinate changes . . . . . . . . . . . . . . . . . . 76

1
2

Preliminaries

This course relies on some of the results you learned last year in calculus and linear algebra,
but should be otherwise self-contained. You might nevertheless find it useful to consult
additional textbooks for alternative approaches to the same ideas. Some good choices are:

• Landau, L.D. (1976) Mechanics.

• Marion, J.B. and Thornton, S.T. (1995) Classical dynamics of particles and systems.

• Goldstein, H., Poole, C.P. and Safko, J. (2002) Classical mechanics.

• Kibble, T. and Berkshire, F.H. (2004) Classical mechanics.

• Helliwell, T.M. and Sahakian, V.V. (2021) Modern classical mechanics.

These are all available from the library.

§1 Introduction

During this term we will be studying two closely connected reformulations of classical
mechanics, known as “Lagrangian” and “Hamiltonian” mechanics. Anything that can be
done in these frameworks can also be done using the language of Newtonian mechanics, but
this does not mean that they are uninteresting. In fact, one can argue that, despite being
ultimately equivalent to Newtonian mechanics, they are a “more correct” way of looking
at classical mechanics. This statement can be justified by noting that Lagrangian and
Hamiltonian mechanics are naturally obtained — in favourable cases at least, for systems
that admits an approximate classical description — as a limit of quantum mechanics.
As a consequence, many of the concepts used in the formulation of quantum mechanics
already make an appearance in the Lagrangian and Hamiltonian frameworks. Additionally,
understanding the classical solutions for a given quantum mechanical system can often serve
as a first step towards its full quantum mechanical solution. This makes the study of this
formalism for classical mechanics a natural stepping stone on our way towards quantum
mechanics.
Accordingly, during this term we will reformulate classical mechanics in this new frame-
work, focusing particularly on those aspects that still play an important role in the quantum
theory.
3

§2 The action principle

We will start with the Lagrangian formulation. The underlying physical principle behind
this formulation can be traced back to the idea that for some physical processes, the natural
answer to the question “what is the trajectory that a particle follows” is something like
“the most efficient one”. Our goal in this section is to understand, in a precise sense, how
to characterize this notion of efficiency.
A fundamental example is the free particle in flat space: its motion is along a straight
path. What makes the straight path special? The answer is well known: the straight path
is the one that minimizes the distance travelled between the origin and the destination of
a path. This is equivalent to saying that the motion of the particle is along a trajectory
that, assuming constant speed for the particle, minimizes the total time travelled.
This second formulation connects with Fermat’s principle, which states that the path
that a ray of light takes, when moving on a medium, is the one that minimizes the time
spent by the light beam. Or more precisely, one should impose that the time is stationary
(we will define this precisely below) under small variations of the path.
These two examples suggest a natural question: is there always some quantity, in
problems of classical mechanics, that is minimized along physical motion? The answer is
that there is indeed such a quantity, known as the action. We will now explain how to
determine the equations of motion from the action, and then determine the form of the
action that reproduces classical Newtonian physics.
Our basic tool will be the “Calculus of variations”, which we now describe.

§2.1 Calculus of variations

Let us start by reviewing how to find the maxima and minima of a function f (s) : R → R.
As you will recall, this can be done by solving the equation
df
=0
ds
as a function of s. As an example, if our function is f (s) = 21 s2 − s we have

df
=s−1
ds
so the function has an extremum (a minimum, in this case) at s = 1. An alternative way
of formulating the same condition makes use of the definition of derivative as encoding the
change in the function under small changes in s. For a small δs ∈ R, we have

df (s)
f (s + δs) = f (s) + δs + R(s, δs)
ds
2.1 CALCULUS OF VARIATIONS 4

where R(s, δs) is an error term. It is convenient to introduce the notation

δf := f (s + δs) − f (s)

so the statement above becomes


df (s)
δf = δs + R(s, δs) .
ds
We note that the usual definition of the derivative implies

R(s, δs)
lim = 0.
δs→0 δs
In these cases we say that “δf vanishes to first order in δs”. The functions that we will
study will almost always admit a well-behaved Taylor expansion, so this result implies that
R(s, δs) is at least of quadratic order in δs. Henceforth we will encode this vanishing to
first order by writing O((δs)2 ) instead of R(s, δs).
So, finally, we can say that the extrema of f (s) are located at the points where

δf = O((δs)2 ) .

The same reasoning can be applied in the case of functions of multiple variables. Con-
sider a function f (s1 , . . . , sN ) : RN → R, and introduce a small displacement si → si + δsi .
In this case the partial derivatives ∂f /∂si are defined by
N
X ∂f
δf = δsi + O(δs2 )
i=1
∂s i

where the error term includes terms vanishing faster than δsi (so terms of the form δs21 ,
δs1 δs2 , . . .). Stationary points1 of f are located wherever δf vanishes to first order in δsi .
In fact, we need to go one step further, and work with functionals: these are maps from
functions to R. One (heuristic, but sometimes useful) way of thinking of them is as the
limit of the previous multi-variate case when the number of variables N goes to infinity.
From instance, we could have a functional S[y(t)] defined by
Z b
S[y(t)] = y(t)2 dt
a

for some fixed choice of (a, b). I emphasize that one should think of S as the analogue of
f above, and the different functions y(t) as the “points” in the domain of this functional.
We want to define a meaning for a function y(t) to give an extremal value for the
functional S. In analogy with what happened in the finite dimensional case above, we
can study the variation of S as we displace y(t) slightly. We need to be a bit careful
1
You might want to remind yourself of section 1.9 of the Calculus I Epiphany notes.
2.1 CALCULUS OF VARIATIONS 5

when specifying which class of functions y(t) we are going to include in our extremization
problem. In the case of interest to us, we will extremize over the set of smooth2 functions
y(t) with fixed values at the endpoints a and b. That is, we fix y(a) = ya and y(b) = yb ,
for some fixed valued of ya and yb .

Definition 2.1.1. We say that a function y(t) is stationary (for the functional S) if

dS[y(t) + ϵz(t)]
=0
dϵ ϵ=0

for all smooth z(t) such that z(a) = z(b) = 0.

Note 2.1.2
Consider the Taylor expansion in ϵ (which is a constant) of S[y(t) + ϵz(t)]:

dS[y(t) + ϵz(t)] 1 d2 S[y(t) + ϵz(t)]


S[y(t) + ϵz(t)] = S[y(t)] + ϵ + ϵ2 + ...
dϵ ϵ=0 2 dϵ2 ϵ=0

The condition for y(t) to be stationary is that the term proportional to ϵ vanishes:

δS := S[y(t) + ϵz(t)] − S[y(t)] = O(ϵ2 ) .

It is useful to think of the combination ϵz(t) as a small variation of y(t), which we


denote δy(t) := ϵz(t). We define O((δy(t))n ) to mean simply O(ϵn ). In particular, we
can rewrite the stationary condition as

δS = O((δy(t))2 ) .

If you are ever confused about the expansions in δy(t) below, you can replace δy(t)
with ϵz(t), and expand in the constant ϵ. For instance, consider the integral
Z
g(t)(δy(t))n dt

For any positive integer n and any function g(t). I claim that this is O((δy(t))n ). The
proof is as follows: replacing δy(t) with ϵz(t) we have
Z Z
n n n
g(t)ϵ z(t) dt = ϵ g(t)z(t)n dt .

2
My conventions are that smooth functions are those which have continuous derivatives to all orders.
2.1 CALCULUS OF VARIATIONS 6

(Here we have used that ϵ is a constant.) In our O((δy(t)n )) notation, this means that
Z
O((δy(t))n )dt = O((δy(t))n ) .

That is, integration does not change the order in δy(t) (which, once more, when talking
about action functionals I define to be really just the order in ϵ).

Note 2.1.3
Because our interest in dynamical problems, we will often refer to the functions y(t)
as paths, so that the conditions above define what an “stationary path” is.

We are now in a position to introduce the action principle. Assume that we have
an action functional (or simply action) S : {functions} → R, which takes functions, and
generates a real number. In the Lagrangian formalism all of the physical content of the
theory can be summarized in the choice of action functional.
For now we also assume that we have a particle moving in one dimension, and we
want to determine its motion. Its trajectory is given by a function x(t), with t the time
coordinate. For many physical problems equations of motion are second order in x(t), so
we need data to fix two integration constants. In the Lagrangian formalism these are given
by fixing the initial and final positions. That is, we will assume that we know the initial
position x(t0 ) of the particle at time t0 , and its final position x(t1 ) at time t1 .
The action principle 3 then states that for arbitrary smooth small deformations δx(t)
around the “true” path x(t) (that is, the path that the particle will actually follow) we have

δS := S[x + δx] − S[x] = O((δx)2 ) . (2.1.1)

Or in other words:

Action principle:
The paths described by particles are stationary paths of S.

In a moment we will need an important result known as the fundamental lemma of the
calculus of variations. It goes as follows:

Lemma 2.1.4 (Fundamental lemma of the calculus of variations). Consider a function


3
This goes under various names in the literature. Common ones are action principle, least action
principle, extremal action principle and (less precisely) variational principle. I will mostly use “action
principle”, which has the advantage of being concise.
2.1 CALCULUS OF VARIATIONS 7

f (x) continuous in the interval [a, b] (we assume a < b) such that
Z b
f (x)g(x) dx = 0
a

for all smooth functions g(x) in [a, b] such that g(a) = g(b) = 0. Then f (x) = 0 for all
x ∈ [a, b].
Proof. I will prove the result by contradiction. Assume that there is a point p ∈ (a, b) such
that f (p) > 0 (the case f (p) < 0 can be proven analogously). By continuity of f (x), there
will be some non-vanishing interval [p0 , p1 ] where f (x) > 0. Construct
(
ν(x − p0 )ν(p1 − x) if x ∈ [p0 , p1 ]
g(x) =
0 otherwise.

with ν(x) = exp(−1/x). It is an interesting exercise in first year calculus to prove that
this function is smooth everywhere, including at p0 and p1 (it is an example of “bump
functions”, useful in many domains). Clearly f (x)g(x) > 0 for x ∈ (p0 , p1 ), and vanishes
otherwise. This implies that
Z b Z p1
f (x)g(x) dx = f (x)g(x) dx > 0
a p0

which is a contradiction.
Now, for the systems that we will study during this term, it will be the case that S can
be expressed in a particularly nice way as the time integral of a Lagrangian. That is, we
will have Z t1
S[x] = dt L(x(t), ẋ(t)) (2.1.2)
t0

for some function L(a, b) of two real variables, where ẋ(t) := dx


dt
.
Whenever a Lagrangian exists, the variational principle together with the fundamental
lemma of the calculus of variations leads to a set of differential equations that determine
x(t). The argument is as follows. If we Taylor expand the perturbed Lagrangian to first
order in δx(t) we get4
∂L ∂L
L(x(t) + δx(t), ẋ(t) + δ ẋ(t)) = L(x(t), ẋ(t)) + δx(t) + δ ẋ(t) + . . .
∂x ∂ ẋ
4
To bring the main point to light here: note that from the point of view of the Lagrangian x(t), δx(t),
ẋ(t) and δ ẋ(t) are simply numbers, not functions. Let me call them a, ϵα, b and ϵβ, respectively, to
emphasize this point, where a, α, b, β ∈ R, and ϵ ∈ R is as in definition 2.1.1. Then all we are doing here
is taking the first order in the Taylor expansion of the Lagrangian in ϵ:
∂L(r, s) ∂L(r, s)
L(a + ϵα, b + ϵβ) = L(a, b) + ϵα + ϵβ + ...
∂r (r,s)=(a,b) ∂s (r,s)=(a,b)
2.1 CALCULUS OF VARIATIONS 8

Putting this expansion of the Lagrangian into the variation of the action we have
Z t1
δS = dt L(x(t) + δx(t), ẋ(t) + δ ẋ(t)) − L(x(t), ẋ(t))
t0
Z t1   (2.1.3)
∂L ∂L
= dt δx(t) + δ ẋ(t)
t0 ∂x ∂ ẋ
where we have omitted terms of second order or higher in δx.5 For notational simplicity I
will often write ∂L/∂x instead of the more precise but much more cumbersome
∂L(r, s)
∂r (r,s)=(x(t),ẋ(t))

where (r, s) are names for the two arguments of the Lagrangian L (which are conventionally,
but somewhat confusingly, also named x and ẋ, a convention that I will follow most of the
time. . . but here I want to be as clear as possible about what I mean). Similarly
∂L ∂L(r, s)
:= .
∂ ẋ ∂s (r,s)=(x(t),ẋ(t))

We proceed by noting that δ ẋ(t) = dtd (δx(t)), so we can write the above as
Z t1  
∂L ∂L d
δS = dt δx(t) + δx(t) .
t0 ∂x ∂ ẋ dt
Integration by parts of the second term6 now allows us to rewrite this as
Z t1     
∂L d ∂L d ∂L
δS = dt − δx(t) + δx(t) .
t0 ∂x dt ∂ ẋ dt ∂ ẋ
The last term is a total derivative, so we can integrate it trivially to give:
 t Z t1   
∂L 1 ∂L d ∂L
δS = δx(t) + dt − δx(t) .
∂ ẋ t0 t0 ∂x dt ∂ ẋ
5
Recall from definition 2.1.1 that any time we talk about expanding on δx we are really expanding on
a small parameter ϵ inside δx(t) = ϵz(t) (where z(t) is as in definition definition 2.1.1). The variation
δ ẋ(t) = ϵż(t) clearly has the same dependence on ϵ, since ϵ is just a constant that does not depend on
time. We therefore have that “δ ẋ(t) is first order in δx(t)”, at least for the purposes of counting degrees
when expanding.
6
Recall that
d(f (t)g(t)) df (t) dg(t)
= g(t) + f (t)
dt dt dt
or equivalently
dg(t) d(f (t)g(t)) df (t)
f (t) = − g(t) .
dt dt dt
In the text we have
∂L
f (t) = ; g(t) = δx(t) .
∂ ẋ
2.1 CALCULUS OF VARIATIONS 9

Now we have that δx(t0 ) = δx(t1 ) = 0, as the paths that we consider all start and end on
the same positions. This implies that the first term vanishes, so
Z t1   
∂L d ∂L
δS = dt − δx(t) .
t0 ∂x dt ∂ ẋ
Recall that the action principle demands that this variation cancels (to first order in
δx(t), i.e. ignoring possible terms that we have not written) for arbitrary δx(t). By the
fundamental lemma of the calculus of variations, the only way that this could possibly be
true is if the function multiplying δx(t) in the integral vanishes:
 
∂L d ∂L
− = 0. (2.1.4)
∂x dt ∂ ẋ

This is known as the Euler-Lagrange equation, in the case of one-dimensional problems.

Note 2.1.5
There is a somewhat subtle point in the Lagrangian formulation that I want to make
explicit. Note that the Lagrangian L is an ordinary function of two parameters, and
it knows nothing about paths. (In general it is a function of 2N parameters, with
N the number of “generalised coordinates” that we need to describe the system, see
below.) Let me emphasize this by writing L(r, s). When using the Lagrangian function
to construct the action we evaluated the Lagrangian function at (r, s) = (x(t), ẋ(t)) at
each instant in time, but it is important to keep in mind that the Lagrangian itself
treats r and s as independent variables: they are simply the two arguments to the
function.
In general, if we want to study how this function changes under small displacements
of r and s we would use the chain rule:
∂L ∂L
L(r + δr, s + δs) = L(r, s) + δr + δs + . . .
∂r ∂s
where the dots denote terms of higher order in δr and δs. This is what we did above
in (2.1.3), again with (r, s) = (x(t), ẋ(t)).
What this all means is that the partial derivatives appearing in the Euler-Lagrange
equations treat the first and second arguments of the Lagrangian function indepen-
dently, leading to the somewhat funny-looking rules:
∂x ∂ ẋ
= = 0. (2.1.5)
∂ ẋ ∂x
This would probably be a little clearer if we used a different notation for ẋ (such as v)
when writing Lagrangians, to emphasize that in the Lagrangian formalism ẋ should be
treated as a variable which is entirely independent of x itself. But I will stick to the
2.2 CONFIGURATION SPACE AND GENERALIZED COORDINATES 10

standard (if somewhat puzzling at first) notation, with the understanding that in the
Lagrangian formalism one should impose (2.1.5).
This also makes clear that (2.1.5) is not something you should generically expect to
hold outside the Lagrangian formalism. And indeed, when we study the Hamiltonian
framework below this rule will be replaced by a different one.

§2.2 Configuration space and generalized coordinates

We now want to extend the Lagrangian formalism to deal with more general situations,
beyond the rather special case of a particle moving in one dimension. We start with
Definition 2.2.1. The set of all possible (in principle) instantaneous configurations for a
given physical system is known as configuration space. We will denote it by C.
It is important to note that this includes positions, but it does not include velocities.
One informal way of thinking about configuration space is as the space of all distinct
photographs one can take of the system, at least in principle.
Additionally, in constructing configuration space we make no statement about dynam-
ics: we need to construct configuration space before we construct a Lagrangian, which
tells us about the dynamics in this configuration space.
Example 2.2.2. A particle moving in Rd (that is, d-dimensional euclidean space) has
configuration space Rd . We discussed the d = 1 example above, where we had a particle
moving in the one dimensional line R, which we parametrized by the coordinate x.
Example 2.2.3. N particles moving freely in the Rd have configuration space (Rd )N =
RdN . (Assuming that we can always distinguish the particles. I leave it as an amusing
exercise to think what is the configuration space if you cannot distinguish the particles.)
Example 2.2.4. N electrically charged particles moving in Rd : since particles are elec-
trically charged they repel or attract. But this is a dynamical question, and configuration
space is insensitive to such matters, it is still RdN . One way to see this is that you can
always place a set of N particles at any desired positions in Rd (barring some singular
points where particles overlap, so the system has infinite energy and is unphysical, but we
can ignore such subtleties here). After being released, the particles will subsequently move
in a manner described by the Euler-Lagrange equations, but any initial choice is permitted.
Example 2.2.5. Two particles joined by a rigid rod of length ℓ in d-dimensions: without
the rod the configuration space is R2d , but the rod introduces the constraint that the particles
are at fixed distance ℓ from each other. This can be written as
||⃗x1 − ⃗x2 ||2 = ℓ2
where ⃗x1 and ⃗x2 are the positions of the two particles in Rd . The configuration space is
2d − 1 dimensional, given by the surface defined by this equation inside R2d .
2.2 CONFIGURATION SPACE AND GENERALIZED COORDINATES 11

Example 2.2.6. Finally, consider a rigid body in R3 , a desk for instance. We can view it as
formed by 1027 atoms, joined by atomic forces. But for the purposes of classical mechanics
we certainly do not care about the motion of the individual atoms (we are doing classical
mechanics, not quantum mechanics, so we would get the answer wrong anyway, even if
we could compute it!). Rather, for classical dynamics we can think about the classical
configurations that the desk can take. And this is a six-dimensional space, given (for
instance) by the position of the centre of mass of the desk, and three rotational angles.
Definition 2.2.7. Given a configuration space C for a physical system S, we say that S
has dim(C) degrees of freedom.
Although it is illuminating to think of configuration space abstractly, in practice we will
often want to put coordinates on it, so that we can write and analyse concrete equations. I
emphasize that this is a matter of convenience: any choice of coordinate system is equally
valid, and the Lagrangian formalism holds regardless of the choice.
Definition 2.2.8. Given a configuration space C, any set of coordinates in this space is
known as a set of generalized coordinates. Conventionally, when we want to indicate that
some equation holds for arbitrary choices of generalized coordinates, we will use “qi ” for the
coordinate names, with i ∈ {1, . . . , dim(C)}, and “q” (without indices) for the coordinate
vector with components qi .
Example 2.2.9. Consider the case of a particle moving on R2 . The configuration space
is R2 . There are two natural sets of coordinates in this space (although I emphasize again
that any choice is valid): we could choose the ordinary Cartesian coordinates (x, y), or it
might be more convenient to choose polar coordinates r, θ satisfying
x = r cos(θ) ,
y = r sin(θ) .
Example 2.2.10. Consider instead the case of a bead attached to a circular wire of unit
radius on R2 , defined by the equation x2 + y 2 = 1. The configuration space is the circle, S 1 .
A possible coordinate in this space is the angular variable θ appearing in the description of
the circle in polar coordinates.

Note 2.2.11
We will only be dealing with unconstrained generalized coordinates when describing
configuration space. That is, we want a set of exactly dim(C) coordinates (and no
more) that describes, at least locally, the geometry of the configuration space C. So
in example 2.2.10 we can take θ as our generalized coordinate, but we do not want to
consider (x, y) as generalized coordinates, as they are subject to the constraint x2 +
y 2 = 1. While there is nothing wrong geometrically with such systems of coordinates,
the existence of the constraint implies that we cannot vary x and y independently
in our variational problem (as we we will implicitly do below), and this complicates
2.2 CONFIGURATION SPACE AND GENERALIZED COORDINATES 12

the analysis of the problem somewhat. So, for simplicity, we will just declare that
henceforth we are dealing with unconstrained systems of generalized coordinates.

We can now repeat the derivation of the Euler-Lagrange equations for a general con-
figuration space C. Consider a general path in configuration space given by q(t) ∈ C,7 and
assume the existence of a Lagrangian function, L(q, q̇), such that the action for the path
is given by Z t1
S= dt L(q(t), q̇(t)) .
t0

The variational principle states that, if we fix the initial and final positions in configuration
space, that is q(t0 ) = q(0) and q(t1 ) = q(1) , the path taken by the physical system satisfies

δS = 0

to first order in δq(t). The derivation runs parallel to the one above (here N := dim(C)):
Z t1 N N
X ∂L X ∂L
δS = dt δqi + δ q̇i
t0 i=1
∂qi i=1
∂ q̇i
Z t1 N N
X ∂L X ∂L d
= dt δqi + (δqi )
t0 i=1
∂q i i=1
∂ q̇ i dt
Z t1 X N     
∂L d ∂L d ∂L
= dt − δqi + δqi
t0 i=1
∂q i dt ∂ q̇ i dt ∂ q̇i
#t
X ∂L 1 Z t1 X
" N N   
∂L d ∂L
= δqi + dt − δqi .
i=1
∂ q̇i t0 i=1
∂qi dt ∂ q̇i
t0

As mentioned above, we are dealing with unconstrained coordinates, meaning that we


can vary the qi independently in configuration space. Since there are dim(C) independent
coordinates, applying the fundamental lemma of the calculus of variations leads to the
system of dim(C) equations

 
∂L d ∂L
− =0 ∀i ∈ {1, . . . , dim(C)} (2.2.1)
∂qi dt ∂ q̇i

7
We know that q lives in C, by definition. Where does q̇ live? Imagine that at each point in C we
attach a tangent space T (q), the space of all tangent vectors at that point. The vector q̇ is a velocity,
so it is a vector in T (q). The total space of all such tangent spaces over all points in C is known as T C
(the “tangent bundle”). So, if I wanted to be fully precise, I would say that L : T C → R. While this is
the true geometric nature of the Lagrangian function, and the resulting geometric ideas are beautiful to
explore, during the course we will take the more pedestrian approach of looking at things locally in T C,
where T C ≈ Rdim(C) × Rdim(C) . The Lagrangian is then L : Rdim(C) × Rdim(C) → R, that is, a function of
two vectors, which we call q and q̇.
2.2 CONFIGURATION SPACE AND GENERALIZED COORDINATES 13

known as the Euler-Lagrange equations. I want to emphasize the fact that we have not
made any assumptions about the specific choice of coordinate system used in deriving these
equations, so the Euler-Lagrange equations are valid in any coordinate system.8

Note 2.2.12
We emphasized in note 2.1.5 above that in the case of systems with one degree of
freedom the Lagrangian is a function of the coordinate x (a coordinate in the one-
dimensional configurations space) and ẋ, and these should be treated as independent
variables when writing down the Euler-Lagrange equations for the system.
Similarly, for N -dimensional configuration spaces, with generalized coordinates qi
with i ∈ {1, . . . , N }, we have in the Lagrangian formalism

∂qi ∂ q̇i
= =0 (2.2.2)
∂ q̇j ∂qj

and (
∂qi ∂ q̇i 1 if i = j ,
= = δij = (2.2.3)
∂qj ∂ q̇j 0 otherwise.

Note 2.2.13
We will later on include the possibility of Lagrangians that depend on time explicitly.
We indicate this as L(q, q̇, t), an example could be L = 12 mẋ2 − t2 x2 .
This is a mild modification of the discussion above, and it does not affect the form
of the Euler-Lagrange equations, but there are a couple of things to keep in mind:

1. When taking partial derivatives, t should be taken to be independent from q and


q̇. The reasoning for this is as in note 2.1.5: the Lagrangian is now a function of
2 dim(C) + 1 arguments (the generalized coordinates, their velocities, and time),
which are unrelated to each other. It is only when we use the Lagrangian to
build the action that the parameters become related, but the partial derivatives
that appear in the functional variation do not care about this, since they arise in
computing the variation of the action under small changes in the path.
For instance, for L = 21 mẋ2 − 12 t2 x2 we have

∂L ∂L ∂L
= mẋ ; = xt2 ; = −tx2 .
∂ ẋ ∂x ∂t

8
Alternatively, you can derive the Euler-Lagrange equations in any fixed coordinate system, and check
that they stay invariant when you change to a different coordinate system, as done in the appendix.
2.3 LAGRANGIANS FOR CLASSICAL MECHANICS 14

2. Since in extremizing the action we change the path, but leave the time coordinate
untouched, there is no Euler-Lagrange equation associated to t. In the example
above there would be a single Euler-Lagrange equation, of the form
 
d ∂L ∂L
− = mz̈ + t2 z = 0 .
dt ∂ ż ∂z

§2.3 Lagrangians for classical mechanics

So far we have kept L(q, q̇) unspecified. How should we choose the Lagrangian in order
to reproduce the classical equations of motion? Ultimately, this needs to be decided by
experiment, but in problems in classical mechanics there is a very simple prescription, that
I will now state. Consider a system with kinetic energy T (q, q̇) and potential energy V (q).
Then the Lagrangian that leads to the right equations of motion is

L=T −V

Let us see that this gives the right equations of motion in the simple case of a particle
moving in three dimensions. The configuration space is R3 , and if we choose Cartesian
coordinates xi (that is, we choose qi = xi ) we have
1
T = m(ẋ21 + ẋ22 + ẋ23 )
2
and V = V (x1 , x2 , x3 ). Note, in particular, that T depends only on ẋi , and V depends
on xi only. We have three degrees of freedom, so we have three Euler-Lagrange equations,
given by
 
∂L d ∂L
0= −
∂xi dt ∂ ẋi
∂V d
=− − m (ẋi )
∂xi dt
∂V
=− − mẍi
∂xi

where we have used that ∂∂Vẋi = 0 and ∂x


∂T
i
= 0, since xi and ẋi are independent variables in
the Lagrangian formalism, as we explained above. We can rewrite the equations above in
vector notation as
d2 ⃗
m 2 (⃗x) = −∇V
dt
which is precisely Newton’s second law for a conservative force F⃗ = −∇V ⃗ .
2.3 LAGRANGIANS FOR CLASSICAL MECHANICS 15

Example 2.3.1. The simplest example of the discussion so far is the free particle of mass
m moving in d dimensions. Its configuration space is Rd , which we can parametrize using
Cartesian coordinates xi . In these coordinates the kinetic energy is given by
d
1 X 2
T = m ẋ
2 i=1 i

and the potential energy V vanishes. This gives a Lagrangian


d
1 X 2
L=T −V = m ẋ
2 i=1 i

which leads to the d Euler-Lagrange equations of motion

mẍi = 0 ∀i ∈ {1, . . . , d} .

These equations are solved by the particle moving at constant speed, xi = vi t + bi , with vi , bi
constants.

Example 2.3.2. Our second example will be a pendulum moving under the influence of
gravity. Our conventions will be as in figure 1: we have a mass m attached by a rigid
massless rod of length ℓ to a fixed point at the origin. The pendulum can swing on the (x, y)
plane. The configuration space of the system is S 1 . We choose as a coordinate the angle θ
of the rod with the downward vertical axis from the origin, measured counterclockwise. The
whole system is affected by gravity, which acts downwards.

Figure 1: The pendulum discussed in example 2.3.2.

We now need to compute the kinetic and potential energy in terms of θ. The expression
of the kinetic energy in the (x, y) coordinates is 12 m(ẋ2 + ẏ 2 ). In terms of θ we have

x = ℓ sin(θ) and y = −ℓ cos(θ) .


2.3 LAGRANGIANS FOR CLASSICAL MECHANICS 16

This implies ẋ = ℓ cos(θ)θ̇ and ẏ = ℓ sin(θ)θ̇, so


1
T = mℓ2 θ̇2 .
2
The potential energy, in turn, is (up to an irrelevant additive constant) given by

V = mgy = −mgℓ cos(θ)

leading to the Lagrangian


1
L = T − V = mℓ2 θ̇2 + mgℓ cos(θ) .
2
The corresponding Euler-Lagrange equations are

mℓ2 θ̈ + mgℓ sin(θ) = 0

or equivalently
g
θ̈ + sin(θ) = 0 .

The exact solution of this system requires using something known as elliptic integrals, but
as a simple check of our solution, note that for small angles sin(θ) ≈ θ, and the Euler-
Lagrange equation reduces to
g
θ̈ + θ = 0

p
with solution θ(t) = a sin(ωt) + b cos(ωt), where ω = g/ℓ, and a, b are arbitrary constants
that encode initial conditions. These are the simple oscillatory solutions that one expects
close to θ = 0.
Example 2.3.3. Consider instead a spring with a mass attached to it. The spring is
attached on one end to the origin, but it is otherwise free to rotate on the (x, y) plane,
without friction. In this case we ignore the effect of gravity, and we assume that the spring
has vanishing natural length, and constant κ. The configuration is shown in figure 2.

Figure 2: The rotating spring studied in example 2.3.3.

In this case the configuration space is R2 . It is easiest to solve the Euler-Lagrange


equations in Cartesian coordinates. We have the kinetic energy
1
T = m(ẋ2 + ẏ 2 ) .
2
2.3 LAGRANGIANS FOR CLASSICAL MECHANICS 17

The potential energy is given by the square of the extension of the spring, times the spring
constant. We are assuming that p the natural length of the spring is 0, so we have that the
extension of the spring in ℓ = x2 + y 2 . So the potential energy is
1 1
V = κℓ2 = κ(x2 + y 2 ) .
2 2
Putting everything together, we find that
1 1
L = T − V = m(ẋ2 + ẏ 2 ) − κ(x2 + y 2 ) .
2 2
The Euler-Lagrange equations split into independent equations for x and y, given by
κ
ẍ + x = 0,
m
κ
ÿ + y = 0 .
m
The general solution is then simply
x(t) = ax sin(ωt) + bx cos(ωt) ,
y(t) = ay sin(ωt) + by cos(ωt) ,
p
with ax , ay , bx , by constants encoding the initial conditions, and ω = κ/m.
Example 2.3.4. Let us try to solve this last example in polar coordinates r, θ. These are
related to Cartesian coordinates by
x = r cos(θ) ,
y = r sin(θ) .
Taking time derivatives, and using the chain rule for time derivatives, we find

ẋ = ṙ cos(θ) − r sin(θ)θ̇ ,
ẏ = ṙ sin(θ) + r cos(θ)θ̇ .
A little bit of algebra then shows that
1 1
T = m(ẋ2 + ẏ 2 ) = m(ṙ2 + r2 θ̇2 ) .
2 2
On the other hand, the potential energy is simpler. We have
1 1
V = κ(x2 + y 2 ) = κr2 .
2 2
We thus find that the Lagrangian in polar coordinates is
1 1
L = T − V = m(ṙ2 + r2 θ̇2 ) − κr2 .
2 2
2.4 IGNORABLE COORDINATES AND CONSERVATION OF GENERALISED MOMENTA 18

Let us write the Euler-Lagrange equations. For the coordinate r we have


 
d ∂L ∂L
− = mr̈ − mrθ̇2 + κr = 0
dt ∂ ṙ ∂r
while for the θ coordinate we have
 
d ∂L ∂L d  2 
− = mr θ̇ = 0 .
dt ∂ θ̇ ∂θ dt
This equation is quite remarkable: it tells us that there is a conserved quantity in this
system, given by mr2 θ̇. This was not obvious at all in the Cartesian formulation of the
problem,9 but it follows immediately in polar coordinates, since the Lagrangian does not
depend on θ, only on θ̇, and accordingly ∂L/∂θ = 0. We can use this knowledge to simplify
the problem. Define
J := mr2 θ̇ .
This is a constant of motion, so on any given classical trajectory it is simply a real num-
ber fixed by initial conditions. We can use this knowledge to simplify the Euler-Lagrange
equation for r, which after replacing θ̇ = J/mr2 becomes an equation purely in terms of r:
2
J2

J
mr̈ − mr + κr = mr̈ − + κr = 0 .
mr2 mr3

§2.4 Ignorable coordinates and conservation of generalised momenta

It is useful to formalize what we just saw happen in example 2.3.3.


Definition 2.4.1. Given a set {q1 , . . . , qN } of generalized coordinates, we say that a spe-
cific coordinate qi in the set is ignorable if the Lagrangian function, expressed in these
generalised coordinates, does not depend on qi . That is, a coordinate is ignorable iff
∂L(q1 , . . . , qN , q̇1 , . . . , q̇N )
= 0.
∂qi
Definition 2.4.2. The generalized momentum pi associated to a generalized coordinate is
∂L
pi := .
∂ q̇i
With these two definitions in place we have
Proposition 2.4.3. The generalized momentum associated to an ignorable coordinate is
conserved.
9
Once we know that the conserved charged is there, it is not difficult to find its expression in Cartesian
coordinates: we have mr2 θ̇ = m(xẏ − y ẋ).
2.4 IGNORABLE COORDINATES AND CONSERVATION OF GENERALISED MOMENTA 19

Proof. This follows immediately from the Euler-Lagrange equation for the ignorable coor-
dinate. Denoting the ignorable coordinate qi and its associated generalized momentum pi ,
we have  
d ∂L ∂L dpi dpi
− = −0= = 0.
dt ∂ q̇i ∂qi dt dt

Example 2.4.4. We already found a ignorable coordinate in example 2.3.3. We have that
θ was ignorable, and it associated generalized momentum is
∂L
pθ = = mr2 θ̇ .
∂ θ̇
Example 2.4.5. An even simpler example is the free particle moving in d dimensions. In
Cartesian coordinates we have
d
1 X 2
L=T −V = m ẋ ,
2 i=1 i

so every coordinate is ignorable. The associated generalized momenta are


∂L
pi = = mẋi .
∂ ẋi
In this case conservation of generalized momenta is simply conservation of linear momen-
tum.

Example 2.4.6. Let us look again to the free particle, but this time in two dimensions
(d = 2), and in polar coordinates. We have
1
L = T − V = m(ṙ2 + r2 θ̇2 ) .
2
We have that θ is ignorable. The associated conserved generalized momentum is
∂L
pθ = = mr2 θ̇ .
∂ θ̇
You might recognize this as the angular momentum of the particle (that is, linear momentum
× position vector), which should indeed be conserved for the free particle.
20

§3 Symmetries, Noether’s theorem and conservation laws

§3.1 Ordinary symmetries

Our discussion of ignorable coordinates hints at a connection between symmetries and


conservation laws: the fact that the Lagrangian does not depend on qi can be rephrased
as the statement that the Lagrangian is invariant under the transformation qi → qi + ϵai ,
with ϵai an arbitrary constant shift. (We will define all these concepts more carefully
momentarily.) And we saw that whenever this happens, there is a conserved quantity, the
generalized momentum pi .
This result is somewhat unsatisfactory, in that we can only understand the appearance
of the conserved charges in carefully chosen coordinate systems. And, as we saw in the
example of the free particle above, we might need to patch together results in different
coordinate systems in order to access all the conserved charges in the system.
Noether’s theorem fixes these deficiencies, providing a coordinate-independent connec-
tion between symmetries and conservation laws. Before we get to the theorem itself, we
will need some preliminary results and definitions.
Definition 3.1.1. Consider a uniparametric family of smooth maps φ(ϵ) : C → C from
configuration space to itself, with the property that φ(0) is the identity map. We call this
family of maps a transformation depending on ϵ. In any given coordinate system we can
write the transformation as
qi → ϕi (q1 , . . . , qN , ϵ)
with ϕi a set of N := dim(C) functions representing the finite transformation in the given
coordinate system. We take the change in velocities to be
d
q̇i → ϕi .
dt

Note 3.1.2
At the level of the Lagrangian we treat qi and q̇i as independent variables, so it is not
automatic that the transformation of the velocities q̇i is as given. One should take the
prescription q̇i → dtd ϕi as part of the definition above.

Remark 3.1.3. A word on notation: when it is clear from the context which transformation
we are talking about, we often write qi′ instead of ϕi (q, ϵ). That is, we often write

qi → qi′ = . . .

where the omitted terms are some function of qi and ϵ.


3.1 ORDINARY SYMMETRIES 21

Definition 3.1.4. The generator of φ is

dφ(ϵ) φ(ϵ) − φ(0)


:= lim .
dϵ ϵ=0 ϵ→0 ϵ
In any given coordinate system we have

qi → ϕi (q, ϵ) = qi + ϵai (q) + O(ϵ2 )

where
∂ϕi (q, ϵ)
ai =
∂ϵ ϵ=0
is a function of the generalized coordinates. So, in coordinates, the generator of the trans-
formation is ai . Similarly, for the velocities we have

q̇i → q̇i + ϵȧi (q1 , . . . , qN , q̇1 , . . . , q̇N ) + O(ϵ2 )

generated by ȧi .

Example 3.1.5. A particle moving in Rd can be described in Cartesian coordinates xi . The


transformation associated to translations of the origin of coordinates in the first direction
is x1 → x1 + ϵ, with the other coordinates constant. So we have that shifts of the coordinate
system in the x1 direction are generated by
(
1 for i = 1 ,
ai =
0 otherwise

and ȧi = 0.

Example 3.1.6. Say that we have a particle moving in two dimensions, and we want
to consider the finite transformations given by rotations around the origin. In Cartesian
coordinates we have
x → x cos(ϵ) − y sin(ϵ)
y → x sin(ϵ) + y cos(ϵ) .

In order to find the generators, we can derive the associated infinitesimal transformations
by using the expansions sin(ϵ) = ϵ + O(ϵ3 ) and cos(ϵ) = 1 + O(ϵ2 ). We find

x → x − yϵ + O(ϵ2 )
y → y + xϵ + O(ϵ2 ) .

This implies that the transformation is generated in Cartesian coordinates by

ax = −y ; ay = x ; ȧx = −ẏ ; ȧy = ẋ .


3.1 ORDINARY SYMMETRIES 22

Lemma 3.1.7. The equations of motion do not change if we modify the Lagrangian by
addition of a total derivative of a function of coordinates and time. That is,

dF (q1 , . . . , qN , t)
L→L+
dt
does not affect the equations of motion.

Proof. Since the term that we add is a total time derivative, the effect on the action is
Z t1
S= dt L → S ′ = S + F (q1 (t1 ), . . . , qN (t1 ), t1 ) − F (q1 (t0 ), . . . , qN (t0 ), t0 ) . (3.1.1)
t0

Now, recall that the variational principle tells us that the equations of motion are obtained
by imposing that δS vanishes to first order in δqi (t), keeping the qi fixed at the endpoints
of the path. This implies that in the variational problem both F (q1 (t0 ), . . . , qN (t0 ), t0 ) and
F (q1 (t1 ), . . . , qN (t1 ), t1 ) are kept fixed. So

δS ′ = S ′ [q + δq] − S ′ [q]
= S[q + δq] + F (q1 (t1 ), . . . , qN (t1 ), t1 ) − F (q1 (t0 ), . . . , qN (t0 ), t0 )
− (S[q] + F (q1 (t1 ), . . . , qN (t1 ), t1 ) − F (q1 (t0 ), . . . , qN (t0 ), t0 ))
= S[δq] − S[q] = δS .

We learn that the addition of dFdt


to the Lagrangian does not affect the variation of the
action in the variational problem, so it cannot affect the equations of motion.
This result motivates the following definition:

Definition 3.1.8. A transformation φ(ϵ) is a symmetry if, to first order in ϵ, there exists
some function F (q, t) such that the change in the Lagrangian is a total time derivative of
F (q, t):

dF (q1 , . . . , qN , t)
L → L′ = L(ϕ(q1 , ϵ), . . . , ϕ(qN , ϵ)) = L + ϵ + O(ϵ2 ) .
dt
Remark 3.1.9. I emphasize that F (q, t) is only defined up to a constant: if some F (q, t)
exists such that
dF (q, t)
L′ = L + ϵ + O(ϵ2 )
dt
any other F ′ (q, t) = F (q, t) + c with c is a constant will also satisfy the same equation.
The specific choice of c is arbitrary, and any choice will lead to correct results. In what
follows I will simply pick a convenient representative F (q, t) — for instance F (q, t) = 0
whenever this is possible.
3.1 ORDINARY SYMMETRIES 23

Example 3.1.10. Whenever we have an ignorable coordinate qi , the symmetry associated


to shifting it by constants, qi → qi + ci , is clearly a symmetry, since by definition the
coordinate does not appear in the Lagrangian, and q̇i stays invariant. So in this case F
can be chosen to be 0.
As an example, consider the example of the rotating spring discussed in example 2.3.3.
In polar coordinates (r, θ), we have
1 1
L = m(ṙ2 + r2 θ̇2 ) − κr2 .
2 2
In this case the θ coordinate is ignorable, so the associated shift θ → θ + ϵ is a symmetry.
The generators of the symmetry are
ar = 0 ; aθ = 1 ; ȧr = 0 ; ȧθ = 0 .
Example 3.1.11. Let us study the same system as in the previous example, but now in
Cartesian coordinates. We have
1 1
L = m(ẋ2 + ẏ 2 ) − κ(x2 + y 2 ) .
2 2
The transformation θ → θ + ϵ is a rotation around the origin. Whenever ϵ ≪ 1, we have
x → x′ = x − ϵy + O(ϵ2 )
y → y ′ = y + ϵx + O(ϵ2 ) .
as we argued in example 3.1.6. And accordingly, for the time derivatives we have
ẋ → ẋ′ = ẋ − ẏϵ + O(ϵ2 )
ẏ → ẏ ′ = ẏ + ẋϵ + O(ϵ2 ) .
Note that this transformations imply that
2
x2 + y 2 → x′2 + y ′ = (x + ϵy)2 + (y − ϵx)2 = x2 + y 2 + O(ϵ2 )
and similarly that
ẋ2 + ẏ 2 → ẋ′2 + ẏ ′2 = ẋ2 + ẏ 2 + O(ϵ2 )
The action of the symmetry on the Lagrangian is then, to first order in ϵ:
L → L′ = L(x′ , y ′ , ẋ′ , ẏ ′ ) = L + O(ϵ2 )
so we also see in this coordinate system that the rotation is a symmetry.
Note that this argument generalizes straightforwardly for any Lagrangian of the form
1
L = m(ẋ2 + ẏ 2 ) − V (x2 + y 2 )
2
with V (r) an analytic function of r, since in this case
V (x2 + y 2 + O(ϵ2 )) = V (x2 + y 2 ) + O(ϵ2 ) .
3.1 ORDINARY SYMMETRIES 24

Example 3.1.12. Consider a system with Lagrangian


m 2 1
ẋ + ẏ 2 − y ẋ − x2 ,

L=
2 2
and a transformation generated by

x → x′ = x ,
y → y′ = y + ϵ .

Then ẋ′ = ẋ and ẏ ′ = ẏ and

δL = L(x′ , y ′ , ẋ′ , ẏ ′ ) − L(x, y, ẋ, ẏ) = −y ′ ẋ′ + y ẋ = −ϵẋ .

So this is also a symmetry, this time with F = −x.

Note 3.1.13
It is important to notice that the definition of symmetry above does not involve the
equations of motion: the Lagrangian must stay invariant (up to a total derivative)
without using the equations of motion. That is, the Lagrangian must be invariant
also for those paths in configuration space that do not extremize S.

We are finally in a position to state and prove Noether’s theorem.


Theorem 3.1.14 (Noether). Consider a transformation generated by ai (q1 , . . . , qN ) (in a
given set of generalized coordinates), such that
dF (q1 , . . . , qN , t)
L→L+ϵ + O(ϵ2 ) ,
dt
so that it is a symmetry. Then

N
!
X ∂L
Q := ai −F
i=1
∂ q̇i

is conserved (that is, dQ


dt
= 0). The conserved quantity Q is known as the Noether charge.
Proof. I will start by giving the intuitive idea behind the proof. Recall that physical
trajectories qi (t) are those that satisfy δS = 0 to first order in δqi (t), keeping the endpoints
qi (t0 ) and qi (t1 ) fixed. A general transformation acts as qi (t) → q(t)+ϵai (q, t), but crucially
it does not necessarily keep the endpoints qi (t0 ) and qi (t1 ) fixed. So the action of a physical
path can change to first order in ϵ under a generic transformation. But it does so in a
fairly localised way: only the behaviour near the endpoints of the path, at t0 and t1 , can
3.1 ORDINARY SYMMETRIES 25

contribute to δS. If the transformation is furthermore a symmetry, we can compute δS


(to first order in ϵ) in a second way, as a function of quantities at t0 and t1 only, using our
result (3.1.1) above. Equating the result of both approaches leads to Noether’s theorem.
In detail, this goes a follows. We want to understand the variation of the action under
the transformation
qi → qi + δqi = qi + ϵai
in two different ways. On one hand, as for any other variation of the path, we can Taylor
expand to obtain
Z t1 X N  
∂L ∂L
δS = dt ϵai + ϵȧi + O(ϵ2 )
t0 i=1
∂qi ∂ q̇i
which becomes, using the Euler-Lagrange equations
Z t1 X N    
d ∂L ∂L
δS = dt ϵai + ϵȧi + O(ϵ2 )
t0 i=1
dt ∂ q̇ i ∂ q̇ i
Z t1 N
!
d X ∂L
= dt ϵ ai + O(ϵ2 )
t0 dt i=1
∂ q̇ i
" N #t1
X ∂L
=ϵ ai + O(ϵ2 ) .
i=1
∂ q̇ i
t0

Note that we have used the Euler-Lagrange equations of motion in going from the first to
the second line, so the result will only be valid along the path that satisfies the equations
of motion.
On the other hand, using the fact that the variation is a symmetry, we have
δS = S[q + δq] − S[q]
Z t1   
dF 2
= dt L+ϵ + O(ϵ ) − L
t0 dt
= ϵ [F ]tt10 + O(ϵ2 ) .
Equation both results, we immediately obtain that Q(t1 ) = Q(t0 ). Since the choice of t0
and t1 is arbitrary, the result now follows easily: choose t1 = t0 + ϵ. We have
dQ
Q(t1 ) − Q(t0 ) = Q(t0 + ϵ) − Q(t0 ) = ϵ + O(ϵ2 ) = 0
dt
so dQ
dt
= 0.
Example 3.1.15. Whenever the coordinate qi is ignorable, we have a symmetry (with
f = 0) generated by qi → qi + ϵ, leaving the other coordinates constant. That is,
(
1 if i = k .
ak = δik :=
0 otherwise.
3.1 ORDINARY SYMMETRIES 26

The corresponding Noether charge is then


N N
X∂L X ∂L ∂L
Q= ak = δki =
k=1
∂ q̇i k=1
∂ q̇i ∂ q̇i

as expected.
Example 3.1.16. Let us come back to the conservation of angular momentum in rota-
tionally symmetric systems, expressed in Cartesian coordinates. Assume that we have a
system with Lagrangian
1
L = m(ẋ2 + ẏ 2 ) − V (x2 + y 2 ) .
2
We saw in example 3.1.11 that rotations around the origin, which are generated by

ax = −y ; ay = x ,

are a symmetry of the system with F = 0.


Noether’s theorem then tells us that the associated charge is
∂L ∂L
Q = ax + ay = m(−y ẋ + xẏ) .
∂ ẋ ∂ ẏ

It is a simple exercise to show that this is indeed equal to mr2 θ̇.


Example 3.1.17. Finally, let us revisit example 3.1.12. We have a Lagrangian
m 2 1
ẋ + ẏ 2 − y ẋ − x2 ,

L=
2 2
and a transformation generated by

x → x′ = x ,
y → y′ = y + ϵ .

That is, ax = 0 and ay = 1. We found in example 3.1.12 that this transformation is a


symmetry with F = −x. The associated Noether charge is
∂L ∂L
Q = ax + ay − F = mẏ + x .
∂ ẋ ∂ ẏ
We can check that this is conserved from the equations of motion, which are
 
d ∂L ∂L
− = mẍ − ẏ + x = 0 ,
dt ∂ ẋ ∂x
 
d ∂L ∂L
− = mÿ + ẋ = 0 .
dt ∂ ẏ ∂y
dQ
Note in particular that the second equation is precisely dt
= 0.
3.2 ENERGY CONSERVATION 27

§3.2 Energy conservation

Conservation of energy can be understood in a way quite similar to what we have seen:
energy can be defined as the Noether charge associated with time translations. The deriva-
tion is quite similar to the one above, but with some small (but crucial) differences needed
in order to take into account the fact that the time coordinate “t” is treated specially in
the Lagrangian formalism.
Let us consider the possibility that the Lagrangian depends explicitly on time. That
is, we promote the Lagrangian L to a function of the generalized coordinates qi (t), their
associated velocities q̇i (t), and time itself. We write this as L(q, q̇, t). The expression of
the action is now
Z t1
S= L(q1 (t), . . . , qN (t), q̇1 (t), . . . , q̇N (t), t) dt .
t0

It is not difficult to see that the Euler-Lagrange equations do not change if we do this.10
Definition 3.2.1. Given a Lagrangian L(q, q̇, t), we defined the energy to be

N
!
X ∂L
E := q̇i −L
i=1
∂ q̇i

Theorem 3.2.2. Along a path q(t) satisfying the equations of motion, we have
dE ∂L
=− .
dt ∂t
In particular, the energy is conserved if and only if the Lagrangian does not depend explicitly
on time.
Remark 3.2.3. In this theorem ∂L
∂t
denotes taking the derivative of the Lagrangian with
respect to time, keeping q and q̇ fixed. See note 2.2.13 for a further discussion of this
point.
Elementary proof. It is easy to verify directly, by taking the time derivative of the definition
of energy, that the theorem holds. The calculation goes as follows. If we take the time
derivative of the energy, we have (from definition 3.2.1)
N
! !
dE d X ∂L
= q̇i −L
dt dt i=1
∂ q̇i
N  !
X ∂L d ∂L dL
= q̈i + q̇i −
i=1
∂ q̇i dt ∂ q̇i dt
10
I leave this as an exercise. All you need to do is to convince yourself that our derivation of the
Euler-Lagrange equations, above equation (2.2.1), is not modified if the Lagrangian includes an explicit
dependence on time.
3.2 ENERGY CONSERVATION 28

Using the Euler-Lagrange equations, this becomes


N
!
dE X ∂L ∂L dL
= q̈i + q̇i − .
dt i=1
∂ q̇i ∂qi dt

On the other hand, from the chain rule, we have


N
!
dL X ∂L ∂L ∂L
= q̈i + q̇i + .
dt i=1
∂ q̇i ∂qi ∂t

The result now follows from substitution.


Alternative proof. Here I will present a less straightforward but (in my opinion) more
illuminating proof, closer in spirit to the one we used in proving Noether’s theorem.
i This alternative proof is not examinable. i

Imagine that we take a path q(t) satisfying the equations of motion, and we displace it
to a new path q′ (t) = q(t − ϵ). That is, we move the whole path slightly forward in time,
keeping its shape. We have
Z t1

S = dt L(q1′ (t), . . . , qN

(t), q̇1′ (t), . . . , q̇N

(t), t)
t0
Z t1
= dt L(q1 (t − ϵ), . . . , qN (t − ϵ), q̇1 (t − ϵ), . . . , q̇N (t − ϵ), t) .
t0

We can compute this expression in two different ways. First, by the chain rule, we have
that

L(q1 (t − ϵ), . . . , qN (t − ϵ), q̇1 (t − ϵ), . . . , q̇N (t − ϵ), t) =


N
!
X ∂L ∂L
L(q1 (t), . . . , qN (t), q̇1 (t), . . . , q̇N (t), t) − ϵ q̇i + q̈i + O(ϵ2 ) .
i=1
∂qi ∂ q̇i

Using the Euler-Lagrange equations of motion, we can write this as


N N  
X ∂L ∂L X d ∂L ∂L
q̇i + q̈i = q˙i + q̈i
i=1
∂q i ∂ q̇ i i=1
dt ∂ q̇ i ∂ q̇ i
N
!
d X ∂L
= q̇i .
dt i=1 ∂ q̇i

Substituting these expressions into the action, we have just proven that
#t
X ∂L 1
" N
S′ = S − ϵ q̇i + O(ϵ2 ) .
i=1
∂ q̇i
t0
3.2 ENERGY CONSERVATION 29

On the other hand, introducing a new variable t′ = t − ϵ, we can write


Z t1

S = dt L(q1 (t − ϵ), . . . , qN (t − ϵ), q̇1 (t − ϵ), . . . , q̇N (t − ϵ), t)
t0
Z t1 −ϵ
= dt′ L(q1 (t′ ), . . . , qN (t′ ), q̇1 (t′ ), . . . , q̇N (t′ ), t′ + ϵ) .
t0 −ϵ

We can expand this as a series in ϵ using Leibniz’s rule (see equation (A.0.1) in the appendix
for a reminder), to get:

′ dS ′
S =S+ϵ + O(ϵ2 )
dϵ ϵ=0
= S − ϵL(q1 (t1 ), . . . , qN (t1 ), q̇1 (t1 ), . . . , q̇N (t1 ), t1 )
+ ϵL(q1 (t0 ), . . . , qN (t0 ), q̇1 (t0 ), . . . , q̇N (t0 ), t0 )
Z t1 −ϵ ′ ′ ′ ′ ′

′ ∂L(q1 (t ), . . . , qN (t ), q̇1 (t ), . . . , q̇N (t ), t + ϵ)
+ϵ dt
t0 −ϵ ∂ϵ ϵ=0
+ O(ϵ2 )

Now we note that that, by the chain rule, we have

∂L(q1 (t′ ), . . . , qN (t′ ), q̇1 (t′ ), . . . , q̇N (t′ ), t′ + ϵ) ∂L(q1 (t′ ), . . . , qN (t′ ), q̇1 (t′ ), . . . , q̇N (t′ ), t′ + ϵ)
=
∂ϵ ∂t′
so
t1 −ϵ
∂L(q1 (t′ ), . . . , qN (t′ ), q̇1 (t′ ), . . . , q̇N (t′ ), t′ + ϵ)
Z 

dt
t0 −ϵ ∂ϵ ϵ=0
Z t1
∂L(q1 (t), . . . , qN (t), q̇1 (t), . . . , q̇N (t), t)
= dt .
t0 ∂t

The theorem now follows from equating the two expressions for S ′ that we found.
3.2 ENERGY CONSERVATION 30

Note 3.2.4
It is not obvious that the quantity E that is conserved if ∂L
∂t
= 0 is what is usually
known as “energy” in classical mechanics. But this is easy to verify. Assume that we
have a particle with Lagrangian

L = T (ẋ1 , . . . , ẋN ) − V (x1 , . . . , xN )

with T (ẋ1 , . . . , ẋN ) = 12 m(ẋ21 + . . . + ẋ2d ), as we often do in classical mechanics. Then


applying the definition 3.2.1 above one easily finds the expected relation

E =T +V .

The result holds more generally. Consider a Lagrangian of the form


N
!
X
L= Kij (q1 , . . . , qN )q̇i q̇j −V (q)
i,j=1
| {z }
T (q,q̇)

with the Kij (q) and V (q) arbitrary functions on configuration space C. Then it is easy
to verify that
E =T +V .

Example 3.2.5. Say that we have a spring that becomes weaker with time, with a spring
constant κ(t) = e−t . A mass attached to the spring can then be described by a Lagrangian
1 1
L = mẋ2 − κ(t)x2 .
2 2
The resulting equation of motion is

mẍ + κ(t)x = 0 .

The energy of the system is


1 1
E = mẋ2 + κ(t)x2 .
2 2
Since the Lagrangian depends explicitly on time, we expect that energy is not conserved.
And indeed:
dE 1 dκ(t)
= mẋẍ + κ(t)xẋ + x2
dt 2 dt
1 2 dκ(t)
= ẋ(mẍ + κ(t)x) + x
2 dt
1 dκ(t)
= x2
2 dt
3.2 ENERGY CONSERVATION 31

where in the last step we have used the equation of motion. On the other hand

∂L 1 dκ(t)
= − x2
∂t 2 dt
since time appears explicitly only in κ(t). So we have verified that

dE ∂L
=− .
dt ∂t
Example 3.2.6. Note that our definition 3.2.1 for the energy does not require the La-
grangian to have the specific form L = T − V . Consider for instance the Lagrangian
p 
L = −m 1 − ẋ2 − ẏ 2 − ż 2 .

(This specific Lagrangian is in fact fairly important, as it describes the motion of a particle
in R3 in special relativity.) Definition 3.2.1 gives

∂L ∂L ∂L
E = ẋ + ẏ + ż − L.
∂ ẋ ∂ ẏ ∂ ż
We have
∂L mẋ2
ẋ =p ,
∂ ẋ 1 − ẋ2 − ẏ 2 − ż 2
and similarly for ẏ and ż. Putting everything together we find

m(ẋ2 + ẏ 2 + ż 2 ) p
E= p + m 1 − ẋ2 − ẏ 2 − ż 2
1 − ẋ2 − ẏ 2 − ż 2
m
=p .
1 − ẋ2 − ẏ 2 − ż 2
32

§4 Normal modes

So far we have studied the Euler-Lagrange equations abstractly, but we have not spent
much effort actually trying to solve them, except on some fairly elementary examples. The
reason for this is simple: in most cases we cannot solve the equations in closed form. Even
if we can, it is rarely the case that the answer can be written in terms of elementary
functions. Recall, for instance, example 2.3.2 above, where we discussed the pendulum.
We found that the Euler-Lagrange equations of motion were of the form
g
θ̈ + sin(θ) = 0 .

This equation can be solved in closed form, in terms of a class of special functions known as
“elliptic functions”, but the solution is relatively involved, and not particularly illuminating
for our current purposes. Rather than insisting in solving the problem exactly from the
outset, it is often illuminating to instead try to understand what the system does for small
displacements away from equilibrium. That is, for small values of θ. In this regime we
have that sin(θ) ≈ θ, and the equation of motion becomes
g
θ̈ + θ = 0

which can be solved straightforwardly to give

θ = a cos(ωt) + b sin(ωt)

with ω = gℓ and a, b constants that depend on the initial conditions.


p

The technology of normal modes, which we introduce in this section, is a way of for-
malizing this observation, and applying it systematically to more complicated systems.

§4.1 Canonical kinetic terms

Let us restrict ourselves to the neighbourhood of minima of the potential. Assume, to start
with, that we have a Lagrangian
n
1X 2
L= q̇ − V (q) . (4.1.1)
2 i=1 i

This particularly simple form for the kinetic term T = 21 ni=1 q̇i2 is known as a canonical
P
kinetic term.
Assume that there is a stationary point of V (q) at q = 0, that is
∂V
= 0 ∀i .
∂qi q=0
4.1 CANONICAL KINETIC TERMS 33

If the stationary point we are interested in is at some other position q = (a1 , . . . , aN ), we


can simply introduce new variables qi′ = qi − ai such that the stationary point is now at
q′ = 0. Clearly in doing this the form of equation (4.1.1) is preserved, so for simplicity we
will assume henceforth that the stationary point we are studying is indeed at q = 0.
We can write an approximate Lagrangian, describing the dynamics around this ex-
tremum, by expanding V (q) to second order in q
n
1X 2 1X
Lapprox = q̇ − Aij qi qj (4.1.2)
2 i=1 i 2 i,j
with
∂ 2V
Aij = . (4.1.3)
∂qi ∂qj q=0
The equations of motion arising from the approximate Lagrangian are given in matrix
notation by
q̈ + Aq = 0 . (4.1.4)
The approximate equations of motion are linear, since they can be written as
 2 
d
DA q := + A q = 0. (4.1.5)
dt2
2
where we have defined DA := dtd 2 + A. This is a linear operator, meaning that given any
two vectors a and b we have DA (a + b) = DA a + DA b, and also for any c ∈ R and vector
a we have DA (ca) = cDA a. We have n equations, and the equations are of second order
and linear, so we expect to be able to express any solution of the approximate equations
of motion as a linear superposition of some 2n basic solutions.
To find these solutions, let us start by noticing that the n × n matrix A is real and
symmetric (for any potential whose second partial derivatives are continuous, which will
be the case during this course), so it has real eigenvalues and eigenvectors. We denote the
set of eigenvalues of A by λ(i) , and the n corresponding eigenvectors by v(i) , so that
Av(i) = λ(i) v(i) . (4.1.6)
Let us now take an ansatz11
q(i) (t) = f (i) (t)v(i) (4.1.7)
for some function f (i) (t) that we will determine. Since v(i) is an eigenvector with eigenvalue
λ(i) , we have that
 2   2 
d (i) d
+ A q (t) = + A f (i) (t)v(i)
dt2 dt2
 2 
(i) d (i)
=v +λ f (i) (t)
dt2
= 0.
11
An ansatz is an assumed form for the solution of the problem. We test the assumption by inserting
the ansatz into the equation, and verifying that it does provide a solution for an appropriate choice of f (t).
4.1 CANONICAL KINETIC TERMS 34

Since v(i) ̸= 0, this implies that


d2
 
(i)
+λ f (i) (t) = 0 .
dt2
Solving this equation is elementary, but the form of the solution depends on the sign of
λ(i) . We have
 √ √

α (i)
cos( λ (i) t) + β (i) sin( λ(i) t) if λ(i) > 0
f (i) (t) = C (i) t + D(i) if λ(i) = 0
√ √
α cosh( −λ(i) t) + β (i) sinh( −λ(i) t) if λ(i) < 0

 (i)

where the α(i) , β (i) , C (i) and D(i) are constants to be fixed by initial conditions. Note that
whatever the value of λ(i) , each eigenvector leads to a two-dimensional space of solutions.
Since the eigenvectors span n-dimensional space, our ansatz gives us the full 2n-dimensional
space of solutions to the linear equation. So we can write the general solution of the system
in terms of the ansatz (4.1.7) as
N
X
q(t) = v(i) f (i) (t)
i=1

with the f (i) as above.


The qualitative behaviour of the solution depends on the sign of the eigenvalues λ(i) . For
λ(i) all being positive we are at a local minimum, and we have oscillatory behaviour around
the minimum. If we have a negative eigenvalue we instead have exponential behaviour away
from the stationary point. This agrees with expectations: if we are at a maximum along
some direction, small perturbations away the point will quickly grow, and we are trying to
expand around an unstable solution. Finally, zero eigenvalues are associated with motion
with constant velocity, displaying no oscillatory behaviour.
Definition 4.1.1. Each basic solution
 √ √ 
(i) (i) (i) (i) (i)
q(t) = v α cos( λ t) + β sin( λ t)

associated with an eigenvalue λ(i) > 0 is a normal mode.


Definition 4.1.2. Each basic solution
q(t) = v(i) C (i) t + D(i)


associated with a zero eigenvalue λ(i) = 0 is a zero mode.


Definition 4.1.3. Each basic solution
 p p 
q(t) = v(i) α(i) cosh( −λ(i) t) + β (i) sinh( −λ(i) t)

associated with an eigenvalue λ(i) < 0 is an instability.


4.1 CANONICAL KINETIC TERMS 35

The general solution in the absence of instabilities is the superposition of the ordinary
normal modes for the non-zero eigenvalues and the zero modes

n
X n
X
(i) (i) (i) (i) (i)
v(i) C (i) t + D(i) .
 
q(t) = v α cos(ω t + β sin(ω t)) +
i=1 i=1
λ(i) ̸=0 λ(i) =0

Note 4.1.4
Let me emphasize that the existence of zero modes is fairly brittle: if we slightly deform
our starting potential V (q) in a generic way, then the eigenvalues of A will generically
change slightly, and the zero eigenvalues will generically becomes either positive or
negative. So whenever we find a zero mode in a real physical system this tells us very
valuable information: we expect to be able to find some principle that restricts the
possible deformations of V (q)!
As an example, imagine that we have two particles with the same mass moving in
one dimension, located at x1 and x2 . Assume that the physics is independent of the
choice of origin of coordinates, or equivalently that there is a symmetry

x1 → x1 + ϵa
x2 → x2 + ϵa

for any constant a. Then the potential can only depend on the difference x1 − x2 , and
we have
1
L = m(ẋ21 + ẋ22 ) − V (x1 − x2 ) .
2
This symmetry will then always lead to the existence of a zero mode, associated with
translation of the centre of mass of the system. We can see this explicitly if we introduce
new coordinates x+ := √12 (x1 + x2 ), x− := √12 (x1 − x2 ). Then our Lagrangian can be
written as
1 √
L = m(ẋ2+ + ẋ2− ) − V ( 2x− )
2
which clearly leads to a zero mode for x+ , no matter the specific form of V . So in
this case we find that the existence of the zero mode is ultimately protected by the
translation symmetry!
4.1 CANONICAL KINETIC TERMS 36

Example 4.1.5. Consider two pendula, each of length one with mass one, suspended a
distance d apart. Connecting the masses is a spring of constant κ and also of natural
length d.

The velocity of the left hand mass is simply ((− cos(θ1 )θ̇)21 + (sin(θ1 )θ̇1 )2 ) = θ̇12 . We get a
similar result for the right hand mass so the total kinetic energy T is
1 2 2

T = θ̇ + θ̇2 .
2 1
The potential comes from gravity, which gives a contribution g(− cos(θ1 ) − cos(θ2 )), and
from the spring. For a spring of constant κ, its potential energy is given by κ(l − d)2 /2,
where l − d is the extension of the spring. The length l of the spring is given by Pythagoras
Theorem as p
l = (sin(θ1 ) − sin(θ2 ) + d)2 + (cos(θ1 ) − cos(θ2 ))2 .
Thus the Lagrangian for the system is given by
1 2 
L = θ̇1 + θ̇22 + g(cos(θ1 ) + cos(θ2 ))
2
κ p 2
− (sin(θ1 ) − sin(θ2 ) + d)2 + (cos(θ1 ) − cos(θ2 ))2 − d .
2
Finding the exact solution to the equations of motion resulting from this Lagrangian seems
hopeless. However, it is clear that the system would be happy to sit at θ1 = θ2 = 0, as
this configuration minimises both the gravitational potential energy, and the spring energy
since the spring would be at its natural unextended length d. Let us now try to find an
approximate Lagrangian which describes the system when θi ≪ 1.
Approximating the gravitational potential is easy. cos(θ) ≈ 1 − θ2 /2 + O(θ4 ) so we can
take
θ12 θ22
 
−g(cos(θ1 ) + cos(θ2 )) = −g 2 − − .
2 2
The constant term −2g can be discarded using the usual reason that additions of constants
to potentials/Lagrangians has no effect. The spring potential looks more tricky to deal with,
but note that to calculate κ(l − d)2 /2 to quadratic order in the small θi we only need to
calculate l − d to order θi , since it is linear in the θi :
p
l−d = (sin(θ1 ) − sin(θ2 ) + d)2 + (cos(θ1 ) − cos(θ2 ))2 − d
p
= (sin(θ1 ) − sin(θ2 ) + d)2 − d + O(θ2 )
= θ1 − θ2 .
4.1 CANONICAL KINETIC TERMS 37

Finally we can write the approximate Lagrangian as


1 2  g  κ
Lapprox = θ̇1 + θ̇22 − θ12 + θ22 − (θ1 − θ2 )2 .
2 2 2
The equations which follow from this are

θ̈1 + (g + κ)θ1 − κθ2 = 0


θ̈2 − κθ1 + (g + κ)θ2 = 0.

If one arranges the equations of motion in this way, so that all the terms proportional to
θ1 and those proportional to θ2 appear in columns then it is straightforward to read the
elements of matrix A from the equations as
 
g + κ −κ
A= .
−κ g + κ

Solving for the eigenvalues of A we find that λ = g or g + 2κ, with eigenvectors (1, 1) or
(1, −1) respectively. So we can write the normal modes as
       
θ1 1 ±i√gt θ1 1 √
= e or = e±i g+2κt
θ2 1 θ2 −1

The first of these has θ1 = θ2 whilst the second has θ1 = −θ2 . These two normal modes
can be pictured as follows: For the normal mode which has θ1 = θ2 , the spring always

remains exactly length d and therefore remains unextended and exerts no force. The result

of this is that the angular frequency or this normal mode is g which does not involve κ
the spring constant. On the other hand, for the second normal mode the pendula move in
opposite directions, and in this case the spring stretches
√ and contracts, enhancing the effect
of gravity which results in an angular frequency g + 2κ which is greater than that of the
first normal mode in which only gravity plays a role.
The general solution of the system is thus given by
   
θ1 1  (1) √ √ 
= α cos(t g) + β (1) sin(t g)
θ2 1
 h
1 (2)
p (2)
p i
+ α cos(t g + 2κ) + β sin(t g + 2κ)
−1
4.2 NON-CANONICAL KINETIC TERMS 38

with α(i) and β (i) arbitrary constants. To see how the general solution we found helps in
practice when studying the motion of the system, let us use this solution to study what
happens if we release the two masses from rest at t = 0 from θ1 = −θ2 = δ. Setting
θ1 = −θ2 = δ at t = 0 we find
   (1) 
δ α + α(2)
= (4.1.8)
−δ α(1) − α(2)

so α(1) = 0 and α(2) = δ. Similarly, the condition that the masses are released from rest is
encoded in    
θ̇1 (t = 0) 0
= (4.1.9)
θ̇2 (t = 0) 0
which taking derivatives in our general solution is easily shown to lead to
   (1) 
0 β + β (2)
= (4.1.10)
0 β (1) − β (2)

which implies β (1) = β (2) = 0. So we find that the motion is given by


   
θ1 δ p
= cos(t g + 2κ)
θ2 −δ

which is an oscillatory motion in which the masses move oppositely, without changing the
centre of mass, as one might have guessed.

§4.2 Non-canonical kinetic terms

Finally, we consider configurations with non-canonical kinetic terms of the form


1X
L= Bij (q)q̇i q̇j − V (q) . (4.2.1)
2 i,j

We still obtain a linear differential operator if we restrict B(q) → B(0). Physically, this
corresponds to considering oscillations with not too much kinetic energy, which makes sense
if we want to stay at the minimum. The resulting equations of motion are

Bq̈ + Aq = 0 (4.2.2)

where we have defined B ≡ B(0), a constant matrix. B generally does not have zero
eigenvalues (since this would correspond to generalised coordinates without a kinetic term),
so we assume no zero eigenvalues. This implies that det(B) ̸= 0, and so B−1 exists. We
then have an equivalent set of equations

q̈ + B−1 Aq = 0 (4.2.3)
39

which reduces to the case we have already studied if you define C := B−1 A.
There is one small subtlety that needs to be mentioned here: the fact that A was
symmetric was quite important in our discussion above, since it ensured that its eigenvalues
were real, but in general B−1 A will not be symmetric, even if both B−1 and A separately
are. Let us assume that A is positive semi-definite: that is, all its eigenvalues are either
1 1
positive or zero. Then it exists a symmetric matrix A 2 such that (A 2 )2 = A.12 We can use
this matrix to rewrite  1  1
−1 − 12 −1 12
C := B A = A A B A A2
2

so we find that C is similar (in the sense of similarity transformations of matrices) to


1 1
A 2 B−1 A 2 . This matrix is manifestly symmetric, so its eigenvalues are real. Since similar
matrices have the same eigenvalues, the eigenvalues of C will be real too. It is straightfor-
1 1 1
ward to check that if v is an eigenvector of A 2 B−1 A 2 with eigenvalue λ, then A− 2 v will be
an eigenvector of C with the same eigenvalue. So, in practice, we can simply compute the
eigenvalues and eigenvectors of C, and proceed as we did above.

§5 Fields and the wave equation

§5.1 Variational Principle for Continuous Systems

In the next section we will derive the equations of motion for the string. Before going into
the details of that particular system, we will derive in general how to deduce the Euler-
Lagrange equations for fields, which is a simple generalisation of what we did in the case
of systems with a finite number of degrees of freedom.
Assume that we can express the action S in terms of some Lagrangian density L (we
will determine L for the string in the next section)
Z Z
S = dt dx L(u, ut , ux , x, t)

where we have introduced for convenience the notation


∂u ∂u
ux := ; ut := .
∂x ∂t

12
The simplest way to prove this is to note that A is real symmetric, and thus diagonalisable by an
1 1
orthogonal transformation O as A = ODOt . We can then define A 2 = OD 2 Ot .
5.1 VARIATIONAL PRINCIPLE FOR CONTINUOUS SYSTEMS 40

Note 5.1.1
I emphasize that that the “x” coordinate plays a significantly different role in field
theory than it did for the point particle: in field theory “x” is a coordinate that fields
depend on, and it is on the same footing as “t”. They are, in particular, independent
variables, and they are not generalized coordinates.
On the other hand, for the point particle we often denoted by “x(t)” the position of
the particle, which was a generalized coordinate that for any given path was a function
of time. In field theory the closest thing to this “x” is the field value “u(x, t)”.

Note 5.1.2
There is an important notational point that I want to clarify: say that we have a
Lagrangian density L(u, ux , ut , x, t) depending on the field, its first derivatives, and x
and t themselves. Then we have two notions of “derivative of L with respect to t” (the
following discussion generalizes straightforwardly to x, so I will not consider this case
separately). We might mean either:

1. The derivative with respect to any explicit appearances of t, keeping u, ux , ut


and x fixed.

2. The derivative of L with respect to t, taking into account that u, ux and ut are
functions of t, so we need to use the chain rule.

In the context of the point particle, we denoted the first derivative “∂/∂t” and the
second “d/dt”.
In the context of field theory it is more common and useful to switch conventions,
and denote the second option by ∂L/∂t. That is, we define:

∂L(u, ux , ut , x, t) 1
:= lim (L(u(x, t + h), ux (x, t + h), ut (x, t + h), x, t + h)
∂t h→0 h
− L(u(x, t), ux (x, t), ut (x, t), x, t))

We will simply never need to consider the first notion of partial derivative in the context
of fields during this course, so this leads to no ambiguity.
The main reason to switch conventions is that this reproduces the natural definition:

∂u(x, t)
ut :=
∂t
that we gave above, since the meaning of the derivative here is the usual one: we are
varying t keeping x fixed.
5.1 VARIATIONAL PRINCIPLE FOR CONTINUOUS SYSTEMS 41

In this case we expect to be able to derive the equations of motion for the system by
making use of the variational principle we discussed in previous sections. To see how this
goes, consider a solution of the equations of motion us (x, t), and consider a small variation
δu(x, t) around it:
u(x, t) = us (x, t) + δu(x, t) .
If us is indeed an stationary function for the action, we expect the first order change in S
to vanish:
Z Z  
∂L ∂L ∂L
δS = S[us + δu] − S[us ] = dt dx δu + δux + δut + O((δu)2 )
∂u ∂ux ∂ut
We will work to first order, and drop the O((δu)2 ) terms henceforth. Now, for our
variations we have
   
∂u ∂ ∂u ∂
δux = δ = (δu) ; δut = δ = (δu)
∂x ∂x ∂t ∂t
which allows us to integrate δS by parts, in order to obtain
     Z  x  t
∂L f ∂L f
Z Z Z
∂L ∂ ∂L ∂ ∂L
δS = dt dx δu − − + dt δu + dx δu
∂u ∂x ∂ux ∂t ∂ut ∂ux xi ∂ut ti
If we assume that we hold u fixed at the endpoints both in x and t the last two terms
on the right cancel. Imposing δS = 0 for arbitrary δu then implies, by the fundamental
lemma of the calculus of variations,13 that the generalised Euler-Lagrange equations for
fields
   
∂L ∂ ∂L ∂ ∂L
− − = 0.
∂u ∂x ∂ux ∂t ∂ut

Here are some easy generalisations. Clearly, if we have n fields u(i) we end up with n
generalised equations of motion:
  !
∂L ∂ ∂L ∂ ∂L
− − =0 for all i .
∂u(i) ∂x ∂u(i) x ∂t ∂u(i)t

Another possible easy generalisation is considering fields that depend on more coordinates
than two. If we replace (t, x) by a set of d coordinates xi we have
d
!
∂L X ∂ ∂L
− =0 for all i
∂u(i) k=1 ∂xk ∂u(i) k

(i) ∂u(i)
where we have defined uk := ∂xk
.
13
The proof that we gave above for the fundamental lemma of the calculus of variations was for functions
of a single variable. I leave it as a small exercise to generalise the proof to the case of functions of multiple
variables.
5.2 EXAMPLE: THE WAVE EQUATION FROM THE LAGRANGIAN FOR A STRING 42

§5.2 Example: the wave equation from the Lagrangian for a string

Our main example will be a Lagrangian density that can be thought of the Lagrangian
density for the one-dimensional string oscillating in one dimension. The standard name
for this Lagrangian is the “massless scalar field” Lagrangian.
Definition 5.2.1. The massless scalar field Lagrangian is
1 1
L := ρu2t − τ u2x .
2 2
We refer to the constants ρ and τ as the density and tension, respectively. The field “u”
in this expression is the massless scalar.
Remark 5.2.2. It is in fact possible, and we do this in section 5.2.1 below, to derive this La-
grangian density from the physics of an idealized string in the limit in which the oscillations
are small. This explains the origin of the labels “density” and “tension” above. I empha-
size that the uses of this Lagrangian in Mathematical Physics go well beyond explaining
vibrating strings.
Definition 5.2.3. The Euler-Lagrange equations for fields immediately imply the equation
of motion
ρutt − τ uxx = 0
for the massless scalar u, where
∂ut ∂ 2u
utt = = 2
∂t ∂t
and similarly for uxx . Introducing for convenience c2 = τ /ρ (both the tension and the
density are assumed to be positive, so c is real), the equation of motion for the massless
scalar becomes:
utt = c2 uxx .
We will refer to this equation as the wave equation. More precisely, what we are describing
here is known as the wave equation in one spatial dimension.

§5.2.1 Derivation of the massless scalar Lagrangian from a physical system

i This section is not examinable. i

We will now derive the massless scalar Lagrangian from the dynamics of a string vibrat-
ing in one dimension, in the approximation where the displacements are small. Similarly
to the case of point particles, the Lagrangian density can be constructed in terms of the
kinetic and potential energy densities. That is, if we have
Z
T (u, ux , ut , x, t) = dx T (u, ux , ut , x, t)
5.2 EXAMPLE: THE WAVE EQUATION FROM THE LAGRANGIAN FOR A STRING 43

and Z
V (u, ux , ut , x, t) = dx V(u, ux , ut , x, t)

for the total kinetic energy T and total potential energy V of the string, then we call T
and V the corresponding densities of kinetic and potential energy, and we have

L=T −V

So we need to find expressions for the kinetic and potential energy densities. We will work
to leading (that is, quadratic) order in ux and ut . This is the regime in which the oscillations
are neither too large nor too fast. We do this because it leads to much simpler equations,
while still being quite useful for modelling many systems in Nature. Similarly, we will
assume that the string is only displaced vertically, without any horizontal displacement.
The kinetic energy can be obtained relatively straightforwardly by subdividing the
string into small pieces. Consider the small piece lying between x and x + δx. If the
segment is small enough its behaviour will be approximately point-like; therefore its kinetic
energy will be of the form m2 v 2 . The mass of the small segment of string is given by
p
m = ρ ds ≈ ρ 1 + (ux )2 δx ≈ ρ δx.

Here ρ is the density of the string (which we take to be constant), and ds the arc-length
of the string segment. The final approximation follows from taking ux ≪ 1. Since u(x, t)
denotes the vertical displacement of the string it is clear that the vertical velocity is ut . The
contribution to the kinetic energy from the small piece of string that we are considering
is then 21 (ut )2 ρ δx. We then immediately obtain the kinetic energy of the whole string by
integrating over all the segments to find that the kinetic energy is given by
ρ ∞
Z
T = dx (ut )2
2 −∞

so the kinetic energy density is


ρ
T = (ut )2 .
2
Obtaining the potential energy is a little bit more subtle. We know that the tension in
the string is a constant, which we call τ . It follows that the work done in extending the
string’s length by a distance δl will be τ δl. If we imagine extruding the entire length of
the string from a point we reach the conclusion that the potential energy of the string is
τ times its length. Of course, our string is infinitely long, so that this may initially be a
concern, until we recall that adding a constant to the potential energy makes no difference.
We are not really interested in the absolute value of the potential energy, but rather
the differences in potential energy between string in various configurations. Therefore we
will take the potential energy of a string in some configuration u(x, t) to be defined as τ
times the difference in length between the string with shape u(x, t) and the length of the
5.3 D’ALEMBERT’S SOLUTION TO THE WAVE EQUATION 44

undisturbed string lying along the x-axis for which u(x, t) = 0. To be more precise we have
Z ∞ Z ∞ 
V = τ ds − dx
−∞ −∞
Z ∞ 
p
= τ 2
( 1 + (ux ) − 1)dx
−∞
Z ∞
(ux )2

≈ τ (1 + − 1)dx
−∞ 2
τ ∞
Z
= (ux )2 dx
2 −∞

again to leading order in oscillations. From here we obtain the potential energy density
τ
V= (ux )2
2
and thus the Lagrangian density
ρ τ
L = T − V = (ut )2 − (ux )2 .
2 2

§5.3 D’Alembert’s Solution to the Wave Equation

The general solution to the wave equation in one spatial dimension was given by D’Alembert,
and it is simply
u(x, t) = f (x − ct) + g(x + ct)
where f and g are arbitrary functions. The part of the solution f (x − ct) corresponds to
a wave moving to the right with speed c, whilst the remaining part g(x + ct) corresponds
to a wave moving to the left with speed c.

Theorem 5.3.1. D’Alembert’s solution u(x, t) = f (x−ct)+g(x+ct) is the general solution


to the wave equation.

Proof. We introduce new variables x+ = x + ct and x− = x − ct, or equivalently x =


1
2
(x+ + x− ) and t = 2c
1
(x+ − x− ). By the chain rule:

∂u ∂u ∂x+ ∂u ∂x− ∂u ∂u
= + = + ,
∂x ∂x+ ∂x ∂x− ∂x ∂x+ ∂x−
 
∂u ∂u ∂x+ ∂u ∂x− ∂u ∂u
= + =c − .
∂t ∂x+ ∂t ∂x− ∂t ∂x+ ∂x−
5.3 D’ALEMBERT’S SOLUTION TO THE WAVE EQUATION 45

Taking derivatives again, once more using the chain rule:

∂ 2u
     
∂ ∂u ∂u ∂ ∂u ∂u ∂ ∂u ∂u
= + = + + +
∂x2 ∂x ∂x+ ∂x− ∂x+ ∂x+ ∂x− ∂x− ∂x+ ∂x−
∂ 2u ∂ 2u ∂ 2u
= + + 2
∂x2+ ∂x2− ∂x+ ∂x−
2
     
∂ u ∂ ∂u ∂u 2 ∂ ∂u ∂u ∂ ∂u ∂u
=c − =c − − −
∂t2 ∂t ∂x+ ∂x− ∂x+ ∂x+ ∂x− ∂x− ∂x+ ∂x−
 2
∂ 2u ∂ 2u

2 ∂ u
=c + −2
∂x2+ ∂x2− ∂x+ ∂x−

As usual, we have used the assumption that partial derivatives commute. We see that in
these variables the wave equation utt = c2 uxx becomes

∂ 2 u(x+ , x− )
utt − c2 uxx = −4c2 = 0.
∂x+ ∂x−
The general solution of this equation is indeed

u(x+ , x− ) = f (x− ) + g(x+ ) .

In practice, we are often interested in understanding what happens if we release a string


from a given configuration. How does the string evolve? This is an initial value problem,
which D’Alembert also solved in general. Assume that we are told that at t = 0 the string
has profile ψ(x), that is
u(x, 0) = φ(x)
and in addition we know with which speed the string is moving at that instant:

ut (x, 0) = ψ(x) .

In terms of f and g, which parametrise the general form of the solution, these equation are

f (x) + g(x) = φ(x)

and
−cf ′ (x) + cg ′ (x) = ψ(x) .
This last equation can be integrated (formally) to give

1 x
Z
g(x) − f (x) = d + ds ψ(s)
c −∞
5.4 NOETHER’S THEOREM FOR FIELDS 46

with d some unknown constant. We now have two equations for two unknowns, so solving
for f and g we find
1 x
 Z 
1
f (x) = φ(x) − d − ds ψ(s)
2 c −∞
1 x
 Z 
1
g(x) = φ(x) + d + ds ψ(s)
2 c −∞
so we finally find
u(x, t) = f (x − ct) + g(x + ct)
x+ct
φ(x − ct) + φ(x + ct)
Z
1
= + ds ψ(s) .
2 2c x−ct

§5.4 Noether’s theorem for fields

The Lagrangian density that we found for the string does not involve u explicitly, which only
enters through its derivatives. This is a situation analogous to that of having an ignorable
coordinate in the case of point particles. So we should expect that there is a symmetry
associated to this fact, generated by the infinitesimal transformation u → u′ = u + ϵ,
and associated to this symmetry some conserved quantity, by some analogue for fields of
Noether’s theorem. This analogue does exist, as we now describe.
It is illuminating to do this more generally, for d spatial dimensions, and arbitrary
symmetries. So let us introduce coordinates x0 , . . . , xd . The case d = 1 would have x0 = t,
x1 = x. Our field u(x0 , . . . , xd ) is a map from Rd+1 → R. For convenience, we introduce
the notation
∂u
ui := .
∂xi
Definition 5.4.1. A symmetry (in the context of field theory) is a transformation
u → u′ = u + ϵa(u)
such that
δL = O(ϵ2 )
without having to use the equations of motion.
Remark 5.4.2. We could include a total derivative, or more precisely, a divergence, on the
right hand side of the variation δL, as we did in the case of the point particle, but we
ignore this possibility for simplicity.
Definition 5.4.3. We define the generalised momentum vector
 
∂L ∂L
Π := ,..., .
∂u0 ∂ud
5.4 NOETHER’S THEOREM FOR FIELDS 47

Definition 5.4.4. Given a transformation generated by a, we define the Noether current


associated to the transformation by
J := aΠ ,
or in components
∂L
Ji := a .
∂ui
Theorem 5.4.5 (Noether’s theorem for fields). If J is the Noether current associated to
a symmetry, then
d
X ∂Ji
⃗ · J :=
∇ = 0. (5.4.1)
i=0
∂xi

Proof. We can proceed analogously as to what we did when proving Noether’s theorem for
discrete systems. Under a generic transformation we have
d
∂L X ∂a ∂L
δL = ϵa +ϵ + O(ϵ2 ) .
∂u i=0
∂xi ∂ui

Using the Euler-Lagrange equations, this becomes


d  
X ∂ ∂L
δL = ϵ a + O(ϵ2 )
i=0
∂x i ∂u i

which equating with the explicit action of the symmetry on L leads to


d  
X ∂ ∂L
a = 0.
i=0
∂xi ∂ui

Definition 5.4.6. Given a Noether current J associated to a transformation, we define


the (Noether) charge density
Q := J0 .
Furthermore, in the d = 1 case (one spatial dimension)14 we define the charge contained in
an interval (a, b) to be
Z b Z b
Q(a,b) := Q dx = J0 dx .
a a

Proposition 5.4.7. Assume d = 1. Then


dQ(a,b)
= J1 (a) − J1 (b) .
dt
14
The d > 1 case can be treated similarly, with the total charge in some region being the integral of the
function Q over that region.
5.4 NOETHER’S THEOREM FOR FIELDS 48

Proof. Taking the derivative inside the integral, we have


dQ(a,b) d b
Z
= J0 dx
dt dt a
Z b
∂J0
= dx .
a ∂t

Now, in our d = 1 case the conservation equation (5.4.1) becomes


∂J0 ∂J1
+ =0
∂t ∂x
so replacing ∂J0
∂t
by − ∂J
∂x
1
inside the integral above we have
Z b
dQ(a,b) ∂J1
=− dx = J1 (a) − J1 (b) .
dt a ∂x

Remark 5.4.8. The way to interpret proposition 5.4.7 is that it is telling us that the charge
within some region changes only due to charge leaving or entering through the boundaries
of the region. The current J1 measures how much charge is leaving or entering by unit
time on a given boundary component.
Definition 5.4.9. Given a Noether current J associated to a transformation, we define the
Noether charge to be the total charge over all space. In the case of one spatial dimension
(d = 1) this is Z ∞
Q := Q(−∞,∞) = J0 dx .
−∞

Corollary 5.4.10. Assume that d = 1, and limx→±∞ J1 = 0. Then


dQ
=0
dt
for the Noether charge associated to a symmetry.
Proof. This follows immediately from proposition 5.4.7, since we assume J1 (±∞) = 0.
Example 5.4.11. Let us apply all this abstract discussion to our guiding example, the one-
dimensional string, and the symmetry arising from u being ignorable, namely u → u + ϵ.
In this case we have a = 1, so the Noether current is simply given by
 
∂L ∂L
J=Π= , = (ρut , −τ ux ) .
∂ut ∂ux
From here, we conclude that the Noether charge
Z Z
Q = dx J0 = ρ dx ut
5.5 THE ENERGY-MOMENTUM TENSOR 49

is conserved in time, assuming that J1 = −τ ux vanishes at infinity (in this case, since τ is
a non-zero constant, this is equivalent to ux vanishing at infinity). Indeed
Z Z
dQ  +∞
= ρ dx utt = τ dx uxx = τ ux −∞ = 0 ,
dt
where in the middle step we have used the wave equation ρutt = τ uxx for the string.

§5.5 The Energy-Momentum Tensor

In addition to the conservation laws for transformations of the field itself, we also expect
conservation laws associated to transformations of x and t. This is analogous to the fact
that for systems with discrete degrees of freedom, we could construct an energy that
satisfied
dE ∂L
=−
dt ∂t
Since t does not appear explicitly in the Lagrangian density for the string, we would expect
energy to be conserved for oscillations of the string too. And indeed, it will prove quite
easy to show that the total energy of the string is conserved. But the situation for the
string is more interesting than that for the point particle. The string’s energy is distributed
along its length; some places may have no energy, whilst other parts of the string may be
very energetic. As a wave packet travels, regions that had no energy may energise for some
time, and then come back to having no energy. So we should not expect to have that the
energy density at any given point is conserved. Additionally, in the case of fields the t and
x directions are treated on equal footing, so there should be some generalised notion that
treat the x variable the same as the t variable.
Definition 5.5.1. The energy-momentum tensor is

∂L ∂u
Tij := − δij L . (5.5.1)
∂uj ∂xi

Definition 5.5.2. The energy density E is defined to be equal to T00 .

Note 5.5.3
As for the case of the point particle, you can convince yourself that this definition of
the energy density agrees with the ordinary one whenever the Lagrangian density is of
the form L = 12 ρu2t − 12 τ u2x − V(u); that is, a kinetic energy density minus a potential
energy contribution (which in this case contains a possible contribution from the string
tension, plus an additional term V(u) containing arbitrary extra contributions to the
potential energy). See for instance example 5.5.6 below. In cases where the Lagrangian
density is not of this form we can still define the energy-momentum tensor, and we
5.5 THE ENERGY-MOMENTUM TENSOR 50

simply define the energy density to be the T00 component.

Theorem 5.5.4. The conservation laws for the energy-momentum tensor are:

d
X ∂ Tij
= 0. (5.5.2)
j=0
∂xj

Proof. Consider the variation of the Lagrangian density L(u, u0 , . . . , ud ) as we move in the
xi direction.15 By the chain rule, this is given by
d
∂L ∂L ∂u X ∂L ∂ 2 u
= +
∂xi ∂u ∂xi j=0 ∂uj ∂xi ∂xj

Using the Euler-Lagrange equations for the field, we can rewrite this as
d ! d
∂u X ∂L ∂ 2 u

∂L X ∂ ∂L
= +
∂xi j=0
∂xj ∂uj ∂xi j=0 ∂uj ∂xi ∂xj
d  
X ∂ ∂L ∂u
=
j=0
∂xj ∂uj ∂xi

or equivalently
d  
X ∂ ∂L ∂u
− δij L = 0 .
j=0
∂x j ∂u j ∂x i

Remark 5.5.5. Note that we have d + 1 conservation equations for the energy-momentum
tensor, one for each choice of “i”.
Example 5.5.6. This may look a little complicated, but it is not hard to evaluate in prac-
tice. For instance, for our string we have
∂L ρ τ
Ttt = ut − L = (ut )2 + (ux )2
∂ut 2 2
which is indeed the energy density for the string. The rest of the components can be com-
puted similarly, with the result
ρ 2 τ 2

(u t ) + (ux ) −τ ut ux
T = 2 2
ρut ux − ρ2 (ut )2 − τ2 (ux )2
15
We could consider more general cases, in which the Lagrangian density also depends explicitly on the
space and time coordinates t, x0 , . . . , xd . I leave the generalization of the discussion to this case as an
(optional) exercise.
5.5 THE ENERGY-MOMENTUM TENSOR 51

The conservation laws in the case of the string are then:


∂Ttt ∂Ttx
+ =0
∂t ∂x
and similarly
∂Txt ∂Txx
+ =0
∂t ∂x
In order to see what these laws mean physically, let us denote the energy in the piece of
string lying between x = a and x = b by E(a,b) (t). Since we had that the energy density is
given by Ttt , we have that Z b
E(a,b) = Ttt dx.
a
The energy in this piece of string will not be conserved. It might be at rest at one time,
and then a few seconds later acquire energy as a wave passes between x = a and x = b,
and then later, lose all its energy as the wave passes on. How the energy in this portion of
the string varies is given by

d b
Z
d
(E(a,b) (t)) = Ttt dx
dt dt a
Z b
∂Ttt
= dx
a ∂t
Z b
∂Ttx
=− dx
a ∂x
= − [Ttx ]ba
= (Ttx )x=a − (Ttx )x=b

where in going from the second to the third line we have used the conservation law. In this
way, the rate of change in the energy in the interval (a, b) can be expressed in terms of the
difference of a function evaluated at x = a and x = b. If we interpret Ttx = −τ ut ux as the
flux of energy moving from left to right, then our formula can be interpreted as the rate of
change of energy of the string in the interval (a, b) is equal to the flux of energy coming
into the segment of string from the left at x = a minus the flux of energy leaving the string
segment to the right at x = b.
Note that the the rate of change of E, the total energy on the whole string, is given by
dE d
E(−∞,∞) = τ [ut ux ]∞

= −∞ .
dt dt
This rate of change vanishes, so that the total energy is conserved, provided that ut ux → 0
as |x| → ∞. In other words, the energy is conserved provided none of it leaks away at
infinity. If we disturb the string at t = 0 near x = 0, it will take an infinite amount of
time before the disturbance propagates out to infinity, so indeed energy will be conserved.
5.6 MONOCHROMATIC WAVES 52

§5.6 Monochromatic Waves

We have already seen that we can write down a general solution to the wave equation,
which is solvable as a result of its linearity. Below we will analyse what happens to waves
in the presence of boundaries and junctions. This analysis is often simplified if, rather than
considering what happens to an arbitrary wave on the string, we ‘decompose’ the wave into
its various constituent wavelengths and consider what happens to each wavelength sepa-
rately. Using the linearity of the wave equation, the full answer can then be reconstructed
by superposing the solution for the constituent wavelengths. A physical analogy would
be to imagine the wave to be a light wave. One finds out how red, orange, yellow, green,
blue, indigo and violet light behave, and then deduce how a general light wave behaves
by mixing the colours together. More mathematically one is simply Fourier analysing the
signal. For example for a right moving wave we can write as a sum, or more precisely an
integral over waves with different frequencies as follows:
Z ∞
u(x, t) = f (x − ct) = dkA(k)eik(x−ct) .
−∞

The solutions with a definite frequency, or monochromatic waves, are A(k)eik(x−ct) . We


have chosen to work with complex exponentials rather than coses and sines, as this makes
life easier, but if we need to recover a real solution we can take instead
uk = ℜ A(k)eik(x−ct = ℜ |A|eiθ eik(x−ct = |A| cos(k(x − ct) + θ).
 

The graph of uk shows that |A| is the amplitude of the wave and that the wavelength is
2π/k.

A monochromatic wave moving to the left is given by u(x, t) = A(k)e−ik(x+ct) , or we


can again take the real part of this to obtain a real solution.
Let us calculate the energy flux of a monochromatic wave. The expression Ttx = −τ ut ux
we derived measures the flux of energy carried by a solution past a point moving from left
to right, so we should expect the answer to be positive for a right moving wave. Taking
our solution to be u(x, t) = uk defined above we see that the flux is given by
Ttx = −τ (uk )t (uk )x = −τ (kc|A| sin(k(x − ct) + θ)) (−k|A| sin(k(x − ct) + θ))
= τ ck 2 |A|2 sin2 (k(x − ct) + θ)
5.7 STRINGS WITH BOUNDARIES 53

which is clearly positive although it fluctuates with time. If we average over a whole period
we see that the average energy passing a point per unit time is given by

τ ck 2 |A|2
Z
kc kc
τ ck 2 |A|2 sin2 (k(x − ct) + θ)dt = .
2π 0 2
Note that the energy flux proved to be positive as we had predicted. If we had performed
the same calculation on a left moving wave u = ℜ A(k)e −ik(x+ct
we would find the average


flux to be −τ ck 2 |A|2 /2.

§5.7 Strings with Boundaries

Now that we know how to deal with infinitely long strings which run from x = −∞ to
x = ∞, let us complicate the situation a bit by introducing a boundary, or end, to our
string at x = 0. The string is still infinitely long but now runs from x = −∞ to x = 0. In
such a situation it is necessary to specify a boundary condition at x = 0, specifying how
the string interacts with the boundary. The most natural thing that we can impose is that
no energy flows into the boundary. This is what one should expect if the string is attached
to a rigid boundary of infinite mass: in this (idealised) case the vibrations of the string do
not affect the boundary at all, and in particular there is no energy flow into the boundary.
We have seen above that the right-moving energy flux for the string is Ttx = −τ ux ut .
So the condition that no energy flows into the boundary is

lim Ttx (x, t) = − lim− τ ux (x, t)ut (x, t) = 0 .


x→0− x→0

There are two natural solutions to this equation: limx→0− ut (x, t) = 0 and limx→0− ux (x, t) =
0. For convenience, at the cost of some slight imprecision, we will refer to these conditions
as ut (0, t) = 0 and ux (0, t) = 0. We study them in turn.

§5.7.1 Dirichlet boundary condition

The first case, ut (0, t) = 0 is perhaps the most natural: it enforces that the endpoint of
the string at x = 0 does not change with time, or in other words u(0, t) is a constant. This
is what you get if you simply tie a string to a wall. Given that there is a shift symmetry
for u, let us simply assume that the condition is that u(0, t) = 0. This is called a Dirichlet
boundary condition. It is quite straightforward to find the general solution in this case.
We know that u(x, t) satisfies the wave equation for x < 0, so the solution must be of
D’Alembert’s form

u(x, t) = f (x − ct) + g(x + ct) = f (x − ct) + h(−x − ct)


5.7 STRINGS WITH BOUNDARIES 54

where for convenience we have introduced a function h(ξ) = g(−ξ). The boundary condi-
tion tells us that
u(0, t) = 0 = f (−ct) + h(−ct),
from which it follows that h(ξ) = −f (ξ). It follows that u(x, t) = f (x − ct) − f (−x − ct).
To understand this solution a little better, note that, considered as a function on the
whole of the x-axis, u(x, t) is an odd function in x; that is u(x, t) = −u(−x, t). The figure

shows the solution u(x, t) for all x. In the physical region there is a wave moving towards
the boundary. The dotted line represents a mirror image of the physical string. This mirror
image moves to the left, and after some time will pass the line x = 0, emerging into the
physical region x < 0 as the reflected wave. At later times the solution will look like below.

So we see from this that waves reflect off the boundary and are turned upside down by
this boundary condition.

§5.7.2 Neumann boundary condition

The other classic boundary condition for a string is the Neumann (sometimes called free)
boundary condition ux (0, t) = 0. Again the flux of energy into the boundary vanishes, so
5.8 JUNCTIONS 55

that energy is conserved on the string. Once more we can deduce the general solution from
D’Alembert’s solution u(x, t) = f (x − ct) + h(−x − ct). Demanding that ux (0, t) = 0 gives
us that
ux (0, t) = f ′ (0 − ct) − h′ (0 − ct) = 0
from which we deduce that it is possible to take f (ξ) = h(ξ) (up to a constant shift of u),
so that
u(x, t) = f (x − ct) + f (−x − ct).
In this case, the function u(x, t) considered over the whole line is an even function. As

before, given enough time the mirror image of the incoming wave emerges from behind the
boundary x = 0 as the reflected wave, but in this case since u(x, t) is even rather than odd
it will emerge the same way up as the incoming wave.

§5.8 Junctions

Junctions or defects afford another possible way of introducing boundary conditions. We


shall explain the idea of junctions through an example.
5.8 JUNCTIONS 56

Figure 3: String with a spring of constant κ attached at x = 0.

Consider a setup in which we attach at x = 0 a spring, with constant κ and zero natural
length, to the string, as in figure 3. We can view this system as two strings, one on the right
and another on the left, joined at a junction at x = 0. Away from the junction at x = 0 we
have a vanilla string, so we expect the monochromatic wave to be a good solution there.
We want to understand what happens to such a monochromatic wave coming from the left
as it hits the junction. Physically, we expect that part of the wave will be transmitted
across the junction, and part will be reflected.
In order to solve the problem, it is essential to introduce junction conditions, describing
which conditions should u satisfy as we cross the junction. The first condition is straight-
forward, namely that u is continuous at x = 0:

lim u(ϵ, t) = lim− u(ϵ, t) (5.8.1)


ϵ→0+ ϵ→0

The second condition is energy conservation across the junction. In order to formulate
5.8 JUNCTIONS 57

this, note that on an infinitesimal neighbourhood [−ϵ, ϵ] of x = 0 we have the energy


Z +ϵ  
1 2 1 2 1 2
κ u(0, t) + dx ρ(ut ) + τ (ux ) .
2 −ϵ 2 2

That is, there is a contribution coming from the vibrating string between ϵ and ϵ, and a
contribution from the extended spring at x = 0. We will assume that
Z +ϵ  
1 2 1 2
lim dx ρ(ut ) + τ (ux ) = 0 .
ϵ→0 −ϵ 2 2

so the only contribution to the total energy of the small interval in the limit ϵ → 0 is the
one coming from the extension of the spring. Conservation of energy tells us

d  
lim E(−ϵ, ϵ) = lim(Ttx )x=−ϵ − lim(Ttx )x=ϵ . (5.8.2)
dt ϵ→0 ϵ→0 ϵ→0

As an example, suppose that we send in a monochromatic wave of unit amplitude. We


expect that upon encountering the spring, this will be partially reflected into a left moving
wave on the left side of the string, and partially transmitted to a right moving wave on the
right side of the string. Putting this together our ansatz is
(
ℜ ((eipx + Re−ipx ) e−ipct ) for x ≤ 0
u(x, t) =
for x > 0

ℜ T eip(x−ct)

where T gives the amplitude/phase of the transmitted wave. Away from x = 0 we have
monochromatic waves, which satisfy the wave equation. All that remains is to ensure that
the ansatz also satisfies the junction conditions, by adjusting R and T . Continuity of
u(x, t) at x = 0 — that is, equation (5.8.1) — implies

ℜ((1 + R)e−ipct ) = ℜ(T e−ipct ) .

This will hold for all t if and only if16

1+R=T .

This is our first junction condition in this case.


In order to study energy conservation, as given by equation (5.8.2), it is convenient to
note that for our monochromatic wave solution continuity of u(x, t) at x = 0 (or equiva-
lently 1 + R = T , as we just showed) implies

lim ut (x, t) = lim+ ut (x, t) .


x→0− x→0

16
To see this, choose for instance t = 0 and t = π/(2pc).
5.8 JUNCTIONS 58

In other words, ut (x, t) is continuous at x = 0, and ut (0, t) is well defined.


We have computed above that Ttx = −τ ux ut , and we have that limϵ→0 E(−ϵ, ϵ) =
1
2
κ u(0, t)2 , so in the current case energy conservation across the junction becomes
 ϵ
κ u(0, t)ut (0, t) = τ ut ux (5.8.3)
−ϵ

This is our second junction condition. This equation can be simplified since there is a
factor of ut (0, t) on both sides that we can divide, to obtain:
 
κu(0, t) = τ lim+ ux (x, t) − lim− ux (x, t) .
x→0 x→0

Plugging in our candidate monochromatic solution, this is

κℜ((1 + R)e−ipct ) = τ ℜ ip(T − (1 − R))e−ipct




which holds for all t if and only if

κ(1 + R) = iτ p(R + T − 1) .

Solving this equation together with the continuity equation 1 + R = T we find that
κ
R=
2ipτ − κ
2ipτ
T =
2ipτ − κ
To get some intuition for these formulas, first assume that we make the spring very
stiff by sending κ → ∞. Then R → −1 and T → 0. This is as we expect; if the spring
becomes very stiff, then the left hand piece of string has its end effectively pinned so it
has a Dirichlet boundary condition, and nothing gets through to the right hand side. On
the other hand if we send κ → 0, then we are effectively removing the spring, and the two
pieces of string will become as one. In this case we can explicitly see that as κ → 0, R → 0
and T → 1.
Alternatively, we can think of fixing κ and consider the effect on waves with different
value of p. The energy flux associated with the incoming wave, whose amplitude is fixed
at one, is given by
τ cp2 |A|2 τ cp2
= .
2 2
If p is very small, that is to say the wavelength is very long, then the energy flux is very
small, and again it is hard for the wave to excite the spring since it does not have enough
energy, so effectively we have Dirichlet boundary conditions. Again in this limit p → 0,
R → −1 and T → 0, the value for Dirichlet boundary conditions. On the other hand, if p
is very large, the energy of the incoming wave is so large that the spring has little effect,
and indeed R → 0, T → 1 in this limit.
59

§6 The Hamiltonian formalism

§6.1 Phase space

So far we have discussed the Lagrangian formalism, in which the evolution of the system
is determined by the Euler-Lagrange equations. Given a set of initial conditions, these
equations determine the time evolution of the system in configuration space. Recall from
§2.2 that this is the space described by the generalized coordinates q, without including
the information about the velocities q̇.
The Hamiltonian formalism is closely related to the Lagrangian formalism that we
have been studying so far, but it starts from a slightly different perspective: instead of
considering configuration space, we now want to consider the space of all states of our
physical systems. This space is known as phase space. I now define these notions.
Definition 6.1.1. The state of a classical system at a given instant in time is a complete
set of data that fully fixes the future evolution of the system.
Example 6.1.2. The Euler-Lagrange equations are second order linear differential equa-
tions on q(t) (assuming that the Lagrangian depends on q(t) and q̇(t) only, and not higher
derivatives of q(t)). We can fix the integration constants that appear in solving these equa-
tions by giving the positions q(t0 ) and velocities q̇(t0 ) at any chosen time t0 , for some
convenient choice of generalised coordinates and velocities. Once we have fixed these con-
stants we know the behaviour of the system for all future times, so in this case we can
parametrize the state at a given time t by giving q(t) and q̇(t).
Remark 6.1.3. The parametrization of the physical state in terms of q(t) and q̇(t) is not
the only possible one: any parametrisation that allows us to fully fix future evolution is
valid. We will see an example of a different parametrisation momentarily.
Definition 6.1.4. The phase (or state) space P of a classical system is the space of all
possible states that the system can be in at a given instant in time.
Remark 6.1.5. This definition for phase space sounds rather similar to the definition of
configuration space (this was definition 2.2.1). But note that that phase space has twice
the dimension of configuration space: while configuration space encodes the (generalised)
position of the system at a time t, phase space encodes the generalised positions and the
velocities.
Example 6.1.6. Consider a particle moving in one dimension. Phase space in this case
is the two dimensional plane R2 : one coordinate for x and one coordinate for ẋ. Every
possible point in this plane is a possible state for the particle. For instance, the point
(x, ẋ) = (0, 10) parametrizes a particle at the origin, moving toward positive values of x.
Similarly the particle moving in d dimensions has a phase space R2d . Note that the precise
6.1 PHASE SPACE 60

form of the Lagrangian does not enter in our definition of phase space: given a point in
phase space the Lagrangian will determine future evolution, but any point in phase space
is acceptable as an initial condition (by definition).
Definition 6.1.7. The Hamiltonian formalism studies dynamics on phase space, parametrized
by generalised coordinates q(t) and their associated generalised momenta p(t).
The fundamental step in going from the Lagrangian to the Hamiltonian formalism is
to invert the definition equations for the generalised momenta:
∂L(q, q̇, t)
pi := .
∂ q̇i
The right hand side of these equations are a set of functions of q, q̇ and t. We often17 can
invert these equations to express q̇ in terms of q, p and t. Once we do this, we can express
any function in phase space (the Lagrangian, for instance) in terms of q, p and t only.
Example 6.1.8. Consider a particle of mass m moving in one dimension, expressed in
Cartesian coordinates. Its Lagrangian is
1
L(x, ẋ) = mẋ2 ,
2
so its associated momentum is
∂L
p= = mẋ .
∂ ẋ
We can trivially solve this equation to find ẋ = p/m. We find that the Lagrangian for this
system is thus
p2
L(x, p) =
2m
in the Hamiltonian formalism.
Example 6.1.9. Let us now take a particle moving in two dimensions, expressed in polar
coordinates. Its Lagrangian is
1
L(q, q̇) = m(ṙ2 + r2 θ̇2 )
2
so its generalised momenta are
pr = mṙ ; pθ = mr2 θ̇ .
We can easily invert these equations, to find ṙ = pr /m and θ̇ = pθ /(mr2 ). In this way we
can express any function of phase space in terms of the q and p. For instance, for the
Lagrangian itself we have  
1 2 1 2
L(q, p) = p + p .
2m r r2 θ
17
This will be true in the examples that we discuss during this course, at any rate. There are interesting
situations in which this inversion cannot be done, but we will not study them during this course. I
encourage those of you who are curious to search for material on “Dirac brackets” if you want to see how
our story below generalises to these more complicated cases. A good reference is the book “Quantization
of Gauge Systems”, by Henneaux and Teitelboim.
6.2 THE POISSON BRACKET AND HAMILTONIAN FLOWS 61

§6.2 The Poisson bracket and Hamiltonian flows

We still need to understand how a given state evolves in time in this new formalism.
That is, if we know which point in phase space describes a system at a given time, which
trajectory in phase space will describe subsequent motion of the system?
In fact, there would be little point in doing this if all we gained was a description of
the dynamics in a different set of variables. After all, the Lagrangian formalism will do
the job of giving the equations of motion for the system perfectly well.18 The advantage of
switching to the Hamiltonian formalism is that we will be able to exhibit a rather deep and
beautiful geometric structure to classical dynamics, in which we will obtain (in a sense) a
reciprocal of Noether’s theorem! Recall that Noether’s theorem states that every symmetry
has an associated conserved charge. We will see below that in the Hamiltonian formalism
the conserved charge generates the symmetry: if we know the form of the conserved charge
for a symmetry we will be able to reconstruct systematically the infinitesimal form of the
symmetry transformation.
The fundamental object that allows us to think of charges as generating transformations
is the Poisson bracket:
Definition 6.2.1. The Poisson bracket between two functions f (q, p, t) and g(q, p, t) on
phase space is the function in phase space defined by

n  
X ∂f ∂g ∂f ∂g
{f, g} := − (6.2.1)
i=1
∂qi ∂pi ∂pi ∂qi

where n is the dimension of configuration space (so half the dimension of phase space).
Remark 6.2.2. Note that in the definition of the Poisson bracket the position and momenta
are independent coordinates in phase space, and are treated as independent variables when
taking partial derivatives:
∂qi ∂pi ∂qi ∂pi
= =0 ; = = δij .
∂pj ∂qj ∂qj ∂pj
Example 6.2.3. The simplest functions in phase space that we can construct are those
that give the coordinates of a point in a given basis. From the definition of the Poisson
bracket, we have the fundamental brackets

{qi , qj } = {pi , pj } = 0 ; {qi , pj } = δij .


18
Or Newton’s formalism, for that matter! We went through all this trouble during the past weeks not
because we wanted to find more efficient methods of solving the dynamics of classical systems (although
that is sometimes a useful byproduct of switching perspectives), but rather because we wanted to un-
derstand better the structure of classical mechanics — important ideas like the action principle or the
relation between symmetries and conserved charges become much more transparent in the Lagrangian and
Hamiltonian formalisms.
6.2 THE POISSON BRACKET AND HAMILTONIAN FLOWS 62

The Poisson bracket has a number of interesting properties, which I now list. The proof
of these properties is straightforward, and can be found in the problem sheet for week 10:

Proposition 6.2.4. The Poisson bracket is antisymmetric:

{f, g} = −{g, f } .

Proposition 6.2.5. The Poisson bracket is linear:

{αf + βg, h} = α{f, h} + β{g, h}

for α, β ∈ R. Note that together with antisymmetry this implies

{h, αf + βg} = α{h, f } + β{h, g}

so the Poisson bracket is in fact bilinear (that is, linear on both terms).

Proposition 6.2.6. The Poisson bracket obeys the Leibniz identity:

{f g, h} = f {g, h} + g{f, h} .

Proposition 6.2.7. The Poisson bracket obeys the Jacobi identity for the sum of the
cyclic permutations:

{{f, g}, h} + {{h, f }, g} + {{g, h}, f } = 0 .

Denote by F the space of all functions from phase space P to R. Given any function
f ∈ F , we can define an operator Φf that generates infinitesimal transformations on F
using the Poisson bracket.

Definition 6.2.8. The Hamiltonian flow defined by f : P → R is the infinitesimal trans-


formation on F defined by
(ϵ)
Φf : F → F
(ϵ)
Φf (g) = g + ϵ{g, f } + O(ϵ2 ) .

Remark 6.2.9. I am taking a small liberty with the language here to avoid having to intro-
duce some additional formalism: what I have just introduced is the infinitesimal version
of what is commonly known as “Hamiltonian flow” in the literature, which is typically
defined for finite (that is, non-infinitesimal) transformations. The finite version of the
transformation is obtained by exponentiation:

(a) a2 a3
Φf (g) = ea{·,f } g := g + a{g, f } + {{g, f }, f } + {{{g, f }, f }, f } + . . .
2! 3!
6.2 THE POISSON BRACKET AND HAMILTONIAN FLOWS 63

(ϵ)
Remark 6.2.10. By studying the action of Φf on the coordinates q, p of phase space, we
(ϵ)
can also understand Φf as the generator of a map from phase space to itself. We have
(ϵ) ∂f
Φf (qi ) = qi + ϵ{qi , f } + O(ϵ2 ) = qi + ϵ + O(ϵ2 )
∂pi
(ϵ) ∂f
Φf (pi ) = pi + ϵ{pi , f } + O(ϵ2 ) = pi − ϵ + O(ϵ2 ) .
∂qi
The two definitions are compatible:
(ϵ)
Φf (g) = g(q1 + ϵ{q1 , f }, . . . , qn + ϵ{qn , f }, p1 + ϵ{p1 , f }, . . . , pn + ϵ{pn , f })
n  
X ∂g ∂g
= g(q1 , . . . , qn , p1 , . . . , pn ) + ϵ {qi , f } + {pi , f }
i=1
∂q i ∂p i
n  
X ∂g ∂f ∂g ∂f
= g(q1 , . . . , qn , p1 , . . . , pn ) + ϵ −
i=1
∂q i ∂p i ∂pi ∂qi
= g + ϵ{g, f }
where in the second line we have done a Taylor expansion, and we have omitted higher
order terms in ϵ throughout for notational simplicity.
Example 6.2.11. As a simple example, consider a particle moving in one dimension. The
Hamiltonian flow Φp associated to the canonical momentum p acts on phase space functions
as:
∂g
Φ(ϵ)
p (g(q, p)) = g(q, p) + ϵ + O(ϵ2 ) .
∂q
(ϵ) (ϵ)
Alternatively, Φp acts on the coordinate q as q → q + ϵ, so the effect of Φp on phase
space is a uniform shift in the q direction:
p

We can reproduce the effect on arbitrary functions of q from this viewpoint by doing a
Taylor expansion:
∂g
g(q + ϵ, p) = g(q, p) + ϵ + O(ϵ2 ) .
∂q
(You might also find it interesting to reproduce the full form of the Taylor expansion of
f (x + a) around x using the exponentiated version in remark 6.2.9.)
6.2 THE POISSON BRACKET AND HAMILTONIAN FLOWS 64

Example 6.2.12. As a second example, consider a particle of unit mass moving in two
dimensions, expressed in Cartesian coordinates, which we call q1 and q2 . We choose the
Lagrangian to be of the form
1
L = (q̇12 + q̇22 ) − V (q1 , q2 ) .
2
For the function generating the flow we will choose J = q1 q̇2 − q2 q̇1 . (Recall from exam-
ple 3.1.16 that this function is angular momentum, which Noether’s theorem associated
with rotations around the origin.) From the Lagrangian we have p1 = q̇1 and p2 = q̇2 , so
in terms of standard (q, p) coordinates of phase space we have J(q, p) = q1 p2 − q2 p1 . The
(ϵ)
Hamiltonian flow ΦJ then acts on phase space as

(ϵ) ∂J
ΦJ (q1 ) = q1 + ϵ{q1 , J} = q1 + ϵ = q1 − ϵq2 ,
∂p1
(ϵ) ∂J
ΦJ (q2 ) = q2 + ϵ{q2 , J} = q2 + ϵ = q2 + ϵq1 ,
∂p2
(ϵ) ∂J
ΦJ (p1 ) = p1 + ϵ{p1 , J} = p1 − ϵ = p1 − ϵp2 ,
∂q1
(ϵ) ∂J
ΦJ (p2 ) = p2 + ϵ{p2 , J} = p2 − ϵ = p2 + ϵp2 .
∂q2
omitting higher orders in ϵ. So the effect of J on the coordinates can be written as an
infinitesimal rotation on the q and the p (independently)
    
(ϵ) q1 1 −ϵ q1
ΦJ = ,
q2 ϵ 1 q2
    
(ϵ) p1 1 −ϵ p1
ΦJ = .
p2 ϵ 1 p2
(ϵ)
For instance, the action of ΦJ on the (q1 , q2 ) slice of phase space (which in this case has
four dimensions) is as in the following picture:
6.3 THE HAMILTONIAN AND HAMILTON’S EQUATIONS 65

§6.2.1 Flows for conserved charges

We have just seen that linear momentum p generates spatial translations, and angular mo-
mentum generates rotations. This is in fact general: assume that we have a transformation
acting as qi → qi +ϵai (q)+O(ϵ2 ) on the generalised coordinates. Noether’s theorem assigns
a charge to this transformation given, in the Lagrangian framework, by
n
!
X ∂L(q, q̇, t)
Q(q, q̇, t) = ai (q) − F (q, t) .
i=1
∂ q̇ i

This charge can be written in the Hamiltonian framework in terms of generalised coordi-
nates and generalised momenta as
n
!
X
Q(q, p, t) = ai (q)pi − F (q, t) .
i=1

If we now compute the Hamiltonian flow associated to this charge on the generalised
coordinates we find
(ϵ)
ΦQ (qi ) = qi + ϵ{qi , Q} + O(ϵ2 ) = qi + ϵai + O(ϵ2 ) .

Note 6.2.13
This is a very important result: Noether’s theorem told us that symmetries imply the
existence of conserved quantities. We have just seen that we can go in the other direc-
tion too: conserved quantities generate the corresponding symmetry transformations,
via the associated Hamiltonian flow.

§6.3 The Hamiltonian and Hamilton’s equations

We have just proven that conserved quantities generate the corresponding symmetries. It
is natural to guess at this point that energy will generate time evolution, via Hamiltonian
flow. This is indeed the case.

Definition 6.3.1. The Hamiltonian “H” of a physical system is the energy expressed in
terms of generalised coordinates and generalised momenta. That is:
n
!
X
H := pi q̇i (q, p, t) − L(q, q̇(q, p, t), t) .
i=1
6.3 THE HAMILTONIAN AND HAMILTON’S EQUATIONS 66

Example 6.3.2. Consider the harmonic oscillator in one dimension, with Lagrangian
1 1
L = mẋ2 − κx2 .
2 2
The generalised momentum is p = mẋ, so the Hamiltonian for this system is
1 2 1 2
H= p + κx .
2m 2
Theorem 6.3.3. The time evolution of the generalised coordinates and momenta is given
by the Hamiltonian flow ΦH :

ΦH (qi ) = qi (t + ϵ) + O(ϵ2 ) ; ΦH (pi ) = pi (t + ϵ) + O(ϵ2 ) .

Equivalently (expanding qi (t + ϵ) = qi (t) + ϵq̇i (t) + . . ., and similarly for pi ):

∂H ∂H
q̇i = {qi , H} = ; ṗi = {pi , H} = − . (6.3.1)
∂pi ∂qi

These equations are known as Hamilton’s equations of motion.

Proof. The first thing to do is to note that when we write the partial derivative ∂q ∂A
j
in
the Hamiltonian picture we mean differentiate A with respect to qj keeping the other q’s,
any explicit time dependance in A, and the p’s fixed. This should be contrasted with the
Lagrangian picture, where differentiating with respect to qj involved keeping the other q’s
time, and and the q̇’s fixed. To highlight this point, in this proof I will write ∂q
∂A
j
|p or ∂q
∂A
j
|q̇
to clarify which set of variables are being held fixed when taking partial derivatives. 19

Given this let us calculate the derivates of H with respect to qj and pj . we have
!
∂H ∂ X
= pi q̇i (q, p, t) − L(q, q̇(q, p, t), t)
∂qj p ∂qj i p
X ∂ q̇i X ∂L ∂qi X ∂L ∂ q̇i
= pi − −
i
∂qj p i
∂qi q̇ ∂qj p i
∂ q̇i q ∂qj p
X ∂ q̇i ∂L X ∂L ∂ q̇i
= pi − −
i
∂qj p ∂qj q̇ i
∂ q̇i q ∂qj p
!
X ∂L ∂ q̇i ∂L
= pi − − .
i
∂ q̇i q ∂qj p ∂qj q̇

19
Not to overload notation too much, I will leave implicit the fact that we are also keeping fixed any
explicit time parameters in A, unless explicitly stated otherwise.
6.3 THE HAMILTONIAN AND HAMILTON’S EQUATIONS 67

The first bracket in this expression is zero by the definition of pi . Using the Euler-Lagrange
equations we conclude that along a physical path
!
∂H ∂L d ∂L
=− =− = −ṗj .
∂qj p ∂qj q̇ dt ∂ q̇j q

Similarly, calculating ∂H
∂pj
we find
!
∂H ∂ X
= pi q̇i (q, p, t) − L(q, q̇(q, p, t), t)
∂pj q ∂pj i q
X ∂pi X ∂ q̇i X ∂L ∂ q̇i
= q̇i + pi −
i
∂pj i
∂pj q ∂ q̇i q ∂pj q
!i
X X ∂L ∂ q̇i
= δij q̇i + pi −
i i
∂ q̇i q ∂pj q
= q̇j

again using the definition of pi to show the last term vanishes. Note that we did not need to
use the Euler-Lagrange equations to derive this last equation. Accordingly, in practice this
equation generally just reproduces the result of inverting the definition of the generalised
momentum in the Lagrangian formalism to express q̇ in terms of q, p and t.
Corollary 6.3.4. The time evolution of any function f (q, p) on phase space is generated
by ΦH :
df
= {f, H} .
dt
In the case that f depends explicitly on time, we have
df ∂f
= + {f, H} .
dt ∂t
Proof. The function f will depend on time through its explicit dependence on t, if any,
and via its implicit dependence via q and p, who themselves are functions of time. Using
the chain rule we find
n  
df ∂f X ∂f ∂f
= + q̇i + ṗi
dt ∂t i=1
∂qi ∂pi
n  
∂f X ∂f ∂H ∂f ∂H
= + −
∂t i=1
∂qi ∂pi ∂pi ∂qi
∂f
= + {f, H} ,
∂t
where we have used Hamilton’s equations in going to the second line.
6.3 THE HAMILTONIAN AND HAMILTON’S EQUATIONS 68

Remark 6.3.5. We can apply this corollary to give a very neat proof of conservation of
energy: the energy, in the Hamiltonian formalism, is equal to the Hamiltonian itself. So
we have that
dH ∂H ∂H
= + {H, H} = (6.3.2)
dt ∂t ∂t
using the fact that the Poisson bracket is antisymmetric. So, if time does not appear
explicitly in the expression for the Hamiltonian, then the Hamiltonian is conserved.
Remark 6.3.6. A small variation of this last equation is sometimes included as part of
Hamilton’s equations. From the definition of the Hamiltonian we have that
n
! !
∂H(q, p, t) ∂ X
= q̇i (q, p, t)pi − L(q, q̇(q, p, t), t)
∂t ∂t i=1
n
!
X ∂ q̇i (q, p, t) ∂L(q, q̇(q, p, t), t)
= pi − .
i=1
∂t ∂t

Now note that L can have an explicit dependence on t through q̇, if q̇(q, p, t) depends
explicitly on time. Using the chain rule:
n
!
∂L(q, q̇(q, p, t), t) ∂L(q, q̇, t) X ∂L(q, q̇, t) ∂ q̇i (q, p, t)
= +
∂t ∂t q,q̇ i=1
∂ q̇i q ∂t
so ! !
n
∂H(q, p, t) ∂L(q, q̇, t) X ∂L(q, q̇, t) ∂ q̇i (q, p, t)
=− + pi − .
∂t ∂t q,q̇ i=1
∂ q̇i q ∂t
The second term vanishes due to the definition of the generalised momentum, so we con-
clude that
∂H(q, p, t) ∂L(q, q̇, t)
=− (6.3.3)
∂t q,p ∂t q,q̇

In particular, this makes (6.3.2) compatible with theorem 3.2.2.


Remark 6.3.7. More generally, assume that we have a function Q(q, p, t) on phase space.
We have that Q is conserved if
dQ ∂Q
= {Q, H} + = 0.
dt ∂t
In particular, if Q does not depend explicitly on time, we have that Q is conserved if and
only if {Q, H} = 0. By antisymmetry of the Poisson bracket we can also read this condition
as
{H, Q} = 0
which can be interpreted as saying that the Hamiltonian is left invariant, to first order in
ϵ, by the transformation generated by Q:
ΦQ (H) = H + ϵ{Q, H} + O(ϵ2 ) = H + O(ϵ2 ) .
6.3 THE HAMILTONIAN AND HAMILTON’S EQUATIONS 69

Example 6.3.8. A system whose Lagrangian is given by


1 2  r2
L= ṙ + r2 θ̇2 − .
2 2
We define the momenta to be
∂L
pr = = ṙ
∂ ṙ
∂L
pθ = = r2 θ̇
∂ θ̇
so that

ṙ = pr

θ̇ = 2
r
The Hamiltonian is given by
 
1 2  r2 
2 2
H = pr ṙ + pθ θ̇ − ṙ + r θ̇ −
2 2
p  1   p 2 r 2 
θ θ
= p2r + pθ 2 − pr 2 + r 2 2 −
r 2 r 2
2 2
 
1 2 pθ r
= pr + 2 + .
2 r 2
Hamilton’s Equations of Motion tell us that
∂H
ṙ = = pr
∂pr
∂H pθ
θ̇ = = 2
∂pθ r
∂H p2
ṗr = − = 3θ − r
∂r r
∂H
ṗθ = − = 0.
∂θ
Note that the first two equations here simply reproduce the results of expressing the q in
terms of the p’s. This is always the case when we derive the Hamiltonian system from a
Lagrangian system like above. The last equation shows that pθ is conserved as a result of
the Hamiltonian being independent of θ. The concept of an ignorable coordinate goes over
completely from the Lagrangian picture to the Hamiltonian picture. The real ‘meat’ of the
dynamics is in the remaining equation for ṗr . Given that pθ is a constant and that pr = ṙ
it can be read as
p2θ
r̈ = 3 − r.
r
6.3 THE HAMILTONIAN AND HAMILTON’S EQUATIONS 70

Example 6.3.9. Suppose we start instead with a Hamiltonian:

p2
H= + xp.
2
Hamilton’s equations are
∂H
ẋ = =p+x
∂p
∂H
ṗ = − = −p.
∂x
Solving the second equation, we have that p = Ae−t . Substituting this into the first equation
we find
ẋ − x = Ae−t
which is a linear first order differential equation. Multiplying through by the integrating
factor we find
d
xe−t = Ae−2t

dt
which can be integrated to give x = Cet − Ae−t /2.

Example 6.3.10. The following is a Hamiltonian for the damped harmonic oscillator:

e−bt p2 ebt w2 x2
H= + .
2 2
Notice that H explicitly depends on time; this implies that it is not conserved, as we would
expect for the damped harmonic oscillator, whose motion dies away to nothing. Hamilton’s
equation of motion are
∂H
ẋ = = e−bt p
∂p
∂H
ṗ = − = −w2 ebt x.
∂x
Differentiating the first equation with respect to t, we see that

ẍ = −be−bt p + e−bt ṗ
= −bẋ − w2 x
6.4 THERE AND BACK AGAIN 71

§6.4 There and back again

i This section is not examinable. i

Let me finish by closing the circle of ideas that we have been developing. We have
seen in §3.1 that Noether’s theorem implies that every symmetry implies the existence
of a conserved Noether charge Q, and we have also shown in §6.2.1 that the Noether
charge associated to a symmetry generates the right transformations on the generalized
coordinates. It is natural to ask at this point: does any conserved charge generate a
symmetry transformation? This is indeed that case, as we now show for the class of
conserved charges that we have been discussing.
Theorem 6.4.1. Assume that we have a function Q(q, p, t) of the form
n
!
X
Q(q, p, t) = ai (q)pi − F (q, t)
i=1

such that
dQ ∂Q
= {Q, H} + = 0,
dt ∂t
(ϵ)
Then ΦQ (L) = L + ϵ dF
dt
+ O(ϵ2 ), so Q generates a symmetry, whose Noether charge is Q.
Proof. Let me start by proving some simple auxiliary results. Note first that
∂Q
{qi , Q} = = ai (q) (6.4.1)
∂pi
which implies, in particular, that
∂{qi , Q} ∂ai (q)
= = 0.
∂t ∂t
Note also that since Q is conserved, and the only explicit time dependence of Q on time is
via F , we have
dQ ∂Q ∂Q ∂F
{Q, H} = − =− = ,
dt ∂t ∂t ∂t
so  
∂{Q, H} ∂ ∂F (q, t)
−{{Q, H}, qi } = {qi , {Q, H}} = = = 0.
∂pi ∂pi ∂t
Using these two results we find that
d{qi , Q} ∂{qi , Q}
= + {{qi , Q}, H}
dt ∂t
= {{qi , Q}, H}
= −{{H, qi }, Q} − {{Q, H}, qi }
= {{qi , H}, Q} .
6.4 THERE AND BACK AGAIN 72

where on the third line we have used the Jacobi identity in proposition 6.2.7. Hamilton’s
equations then imply that
d{qi , Q}
= {q̇i , Q} . (6.4.2)
dt
Using these results it is straightforward to compute the change in the Lagrangian due
to Q. The Lagrangian in Hamiltonian coordinates is
n
!
X
L(q, p, t) = q̇i (q, p)pi − H(q, p, t) .
i=1

Since Q is conserved, we have


" n
#
X
{L, Q} = ({q̇i , Q}pi + q̇i {pi , Q}) + {Q, H}
" i=1
n
#
X ∂Q
= ({q̇i , Q}pi + q̇i {pi , Q}) − ,
i=1
∂t
which becomes, using the results above:
" n  #
X d ∂Q
{L, Q} = pi {qi , Q} + q̇i {pi , Q} −
dt ∂t
" i=1
n    #
X d ∂Q
= {qi , Q}pi − ṗi {qi , Q} + q̇i {pi , Q} −
dt ∂t
" i=1
n    #
X d ∂Q ∂Q ∂Q
= {qi , Q}pi − ṗi − q̇i −
i=1
dt ∂pi ∂qi ∂t
n
!
d X dQ
= {qi , Q}pi − .
dt i=1 dt

where on the first line we have used (6.4.2), and in going from the third to the fourth the
chain rule. Finally, using (6.4.1) we find that
n
X n
X
{qi , Q}pi = ai p i = Q + F
i=1 i=1

which implies
d(Q + F − Q) dF
{L, Q} = = .
dt dt
Recalling the definition of the Hamiltonian flow operator, this gives
(ϵ) dF
ΦQ (L) = L + ϵ{L, Q} + O(ϵ2 ) = L + ϵ + cO(ϵ2 ) .
dt
Additional material

73
74

§A A review of some results in calculus

I include here a brief review of some basic results in many variable calculus that will appear
often during the course.

Derivatives and partial derivatives

Let me introduce the notation


δf (x) = f (x + δx) − f (x)
for the variation of a function f as we change its argument. We typically want to make δx
small, and understand how δf depends on δx. The answer is that
df
δf = δx + O((δx)2 )
dx
where the last term is a “correction term”, satisfying
O((δx)2 )
lim = 0.
δx→0 δx
Notice that this is just a restatement of the usual definition of the derivative
df f (x + δx) − f (x)
= lim
dx δx→0 (x + δx) − x
in a form which is more convenient for our applications.
This definition extends straightforwardly to functions of several variables. We define
the partial derivatives of f (x1 , . . . , xn ) by
∂f f (x1 , . . . , xi−1 , xi + δxi , xi+1 , . . . , xn ) − f (x1 , . . . , xn )
= lim .
∂xi δxi →0 δx
Note that in the case of functions of a single variable the definitions of the partial and
ordinary derivatives coincide.
We can now express the change in f (⃗x) under small changes δ⃗x of ⃗x as
n
X ∂f ⃗ + O(δ⃗x2 )
δf (⃗x) = f (⃗x + δ⃗x) − f (⃗x) ≈ δxi = δ⃗x · ∇f
i=1
∂xi
where we have defined the vector of derivatives
⃗ )i := ∂f (⃗x) .
(∇f
∂xi
When the variation is infinitesimal we write δxi → dxi , and we have
n
X ∂f ⃗ .
df (⃗x) = dxi = d⃗x · ∇f
i=1
∂x i
75

The chain rule and commuting derivatives

Assume now that the vector ⃗x is a function of time, which we denote as ⃗x(t), and that we
have a function f (⃗x(t), t) as above (where we have included a possible explicit dependence
on the time coordinate). Note that f is now implicitly a function of t via its dependence
on ⃗x(t), in addition to any possible explicit dependence on t it might have. The variation
of this function as t varies is given by the chain rule
n
!
df X ∂f dxi ∂f ⃗ · d⃗x + ∂f .
= + = ∇f
dt i=1
∂xi dt ∂t dt ∂t
There is a version of this rule for the case of multiple variables. Say that you have a set of
variables (x1 , . . . , xn ) that depend on other variables (y1 , . . . , ym ). Then:
n
∂f X ∂f ∂xi
= .
∂yj i=1
∂x i ∂y j

Another theorem that we will use later is that if we partially differentiate a function first
with respect to xi and then with respect to xj we obtain the same as if we differentiated
in the opposite order provided that the result is continuous:
   
∂ ∂f ∂ ∂f
= .
∂xi ∂xj ∂xj ∂xi
This result is known as Schwarz’s theorem. During this course all second derivatives will
be continuous, so we will apply this result freely.
As it is a relatively common mistake, let me note that in general partial derivatives
associates to variables belonging to different coordinates do not commute. Denoting by xi
the first set of coordinates, and ui a second set, we have
   
∂ ∂f ∂ ∂f
̸= .
∂xi ∂uj ∂uj ∂xi
As a simple example, consider the function f : R2 → R that tells us how far a point is from
the vertical axis. We choose as xi the Cartesian coordinates (x, y), and as uj the polar
coordinates (r, θ). In Cartesian coordinates we have simply f (x, y) = x, while in polar
coordinates we have f (x, θ) = r cos θ. We have
 
∂ ∂f ∂
= (1) = 0
∂r ∂x ∂r
while
  !
∂ ∂f ∂ ∂ x
= (cos θ) = p
∂x ∂r ∂x ∂x x2 + y 2
1 x2
=p − 2
x2 + y 2 (x + y 2 )3/2
̸= 0 .
A.1 TWO USEFUL LEMMAS FOR COORDINATE CHANGES 76

Leibniz’s rule

Assume that you have a function of x expressed in integral form:


Z b(x)
f (x) = dt g(x, t) .
a(x)

Then Z b(x)
df db da ∂g(x, t)
= g(x, b(x)) − g(x, a(x)) + dt . (A.0.1)
dx dx dx a(x) ∂x

Notation for time derivatives

Finally, an additional piece of notation: the time coordinate t will play a special role during
this course, so for convenience we introduce special notation for derivatives with respect
to t. Given a function x(t) we will write

dx
ẋ ≡
dt
and similarly for higher order time derivatives. For instance,

d2 x
ẍ ≡ .
dt2

§A.1 Two useful lemmas for coordinate changes

Consider two sets of generalised coordinates {ui } and {qi } related by ui = ui (q1 , q2 , ...qn , t).
Note that we allow for the change of coordinates to depend on time.20 Such transformation
is known as a point transformation. We also note that the {ui } coordinates depend on {qi }
(and possibly t), but not on {q̇i }. This is no longer true if we take a time derivative:
generically u̇i will depend on {qi , q̇i , t}. We start by proving the following two simple
lemmas:

Lemma (A). If ui = ui (q1 , q2 , ...qn , t) then

∂ui ∂ u̇i
= .
∂qj ∂ q̇j
For instance, we could have ui = et qi , giving a sort of “expanding” set of coordinates. Such things
20

appear fairly naturally when one is studying cosmology, for example.


A.1 TWO USEFUL LEMMAS FOR COORDINATE CHANGES 77

Proof. By the chain rule


n
X ∂ui ∂ui
u̇i = q̇k +
k=1
∂qk ∂t

Further differentiating with respect to q̇j just picks out the coefficient of q̇j (since ui does
not depend on q̇i its derivative ∂ui /∂qk does not either) giving the advertised result
n
!
∂ ∂ X ∂ui ∂ui ∂ui
(u̇i ) = q̇k + = .
∂ q̇j ∂ q̇j k=1 ∂qk ∂t ∂qj

Lemma (B). If ui = ui (q1 , q2 , ...qn , t) then


 
∂ u̇i d ∂ui
= .
∂qj dt ∂qj
Proof. We again use the chain rule, and the fact that partial derivatives on the same set
of coordinates commute (if the result is continuous):
n n
!
∂ 2 ui ∂ 2 ui
  X
d ∂ui ∂ X ∂ui ∂ui ∂
= q̇k + = q̇k + = (u̇i ) .
dt ∂qj k=1
∂q k ∂qj ∂t∂q j ∂q j
k=1
∂q k ∂t ∂q j

Example: invariance of the Euler-Lagrange equations under coordinate


changes

As an example of how the theorems above are useful, let us prove explicitly that the choice
of generalized coordinates does not affect the form of the Euler-Lagrange equations.
Theorem. Assume that we have two sets of generalized coordinates {u1 , . . . , un } and
{q1 , . . . , qn } related by an invertible change of coordinates ui = ui (q1 , . . . , qn , t). Then
the Euler-Lagrange equations
 
∂L d ∂L
− = 0 ∀i ∈ {1, . . . , n}
∂qi dt ∂ q̇i
are equivalent to  
∂L d ∂L
− = 0 ∀k ∈ {1, . . . , n}
∂uk dt ∂ u̇k
Proof. We will prove the result by repeated application of the chain rule. For the first term
in the Euler-Lagrange equations we get
 
  n n
d ∂L d  X ∂L ∂uk X ∂L ∂ u̇k ∂L ∂t 
=  + + 
dt ∂ q̇i dt  k=1 ∂uk ∂ q̇i k=1 ∂ u̇k ∂ q̇i ∂t ∂qi 
|{z} |{z}
=0 =0
A.1 TWO USEFUL LEMMAS FOR COORDINATE CHANGES 78

which using Lemma (A) becomes


n
!
d X ∂L ∂uk
=
dt k=1
∂ u̇k ∂qi
n    n    
X d ∂L ∂uk X ∂L d ∂uk
= +
k=1
dt ∂ u̇k ∂qi k=1
∂ u̇k dt ∂qi

which is now, using Lemma (B)


n    n
X d ∂L ∂uk X ∂L ∂ u̇k
= + .
k=1
dt ∂ u̇ k ∂q i
k=1
∂ u̇ k ∂qi

The second term in the Euler-Lagrange equations is easier. Again using the chain rule:
n n
∂L X ∂L ∂uk X ∂L ∂ u̇k ∂L ∂t
= + + .
∂qi k=1
∂uk ∂qi k=1
∂ u̇k ∂qi ∂t ∂qi
|{z}
=0

Taking the difference of both equations we get


  n    
d ∂L ∂L X d ∂L ∂L ∂uk
− = − .
dt ∂ q̇i ∂qi k=1
dt ∂ u̇k ∂uk ∂qi

We are almost there. In order to exhibit the rest of the argument most clearly, we will
switch to matrix notation. Denote the matrix associated to the change of variables by
∂uk
Jik := .
∂qi
This matrix (known as the “Jacobian matrix”) is invertible, since by assumption the change
of coordinates is invertible. Denote the vector of Euler-Lagrange equations on the q coor-
dinates by  
(q) ∂L d ∂L
Ei = −
∂qi dt ∂ q̇i
and similarly for the u coordinates
 
(u) ∂L d ∂L
Ek = − .
∂uk dt ∂ u̇k
Using these definitions we can rewrite the Euler-Lagrange equations as the vector equations
⃗E(q) = 0 and ⃗E(u) = 0, and we have just shown that

⃗E(q) = J ⃗E(u)

with J invertible, so ⃗E(q) = 0 iff ⃗E(u) = 0, which is what we wanted to show.

You might also like