Basdevant JL Variational Principles in Physics
Basdevant JL Variational Principles in Physics
Variational
Principles
in Physics
Second Edition
Variational Principles in Physics
Jean-Louis Basdevant
Variational Principles
in Physics
Second Edition
Jean-Louis Basdevant
Professeur Honoraire
Département de Physique
École Polytechnique
Paris, France
Original French edition published by de Boeck Superieur S.A., Louvain-la-Neuve, Belgium, 2022
1st edition: © Springer Science+Business Media, LLC 2007
2nd edition: © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer
Nature Switzerland AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
At the Ecole Polytechnique, my major teaching activity was the general course on
Quantum Mechanics, but I had many opportunities to get interested in other fields
such as Statistical physics, Particle physics, Energy and the environment. The last
course I constructed concerned Variational Principles in Physics. Actually, this was
unexpected: I had to replace a colleague.
I had not thought about that subject and it was really a discovery that there was
much more than what I could teach. But I was lucky enough to have an interesting
group of students and colleagues who were happy when I told them we would learn
physics together. The spirit was excellent, and together we discovered many aspects
of the evolution of physics in the minds of creative people.
In the meantime, two basic results occurred in fundamental physics. First, in
July 2012, the discovery, at the CERN Large Hadron Collider, of the long antici-
pated Higgs Boson which completed, in some sense, the Standard Model of Particle
Physics. On 8 October 2013, the Nobel Prize in Physics was awarded jointly to
Francois Englert and Peter Higgs “for the theoretical discovery of a mechanism that
contributes to our understanding of the origin of mass of subatomic particles...”. A
year after, great news occurred with the discovery of the first gravitational waves
observed on September 14, 2015, by the LIGO and Virgo laboratories, one hundred
years after their prediction by Einstein. These will remain the two most important
physical discoveries of the first part of the twenty-first century.
The physics of the Higgs field is too far from the purpose of this book. But it was
obviously a must to add a new chapter on gravitational waves to the 2007 edition of
“Variational Principles in Physics”. My colleagues and students said that although
variational principles are intellectually attractive, if one does not show what they aim
at, they may seem formal.
Another example is the fact that, around 1840, Hamilton was fascinated by an
unknown fact: geometrical optics should be considered as a limiting case of wave
optics, and classical mechanics seems to be a similar limit of some yet unknown
mechanical theory, but which theory? In 1890, the mathematician Felix Klein
deplored that no one had pursued that idea. But 35 years later, some of the “fathers”
of quantum mechanics, such as Dirac and Louis de Broglie, gained much inspiration
v
vi Preface
ix
x Contents
Ancient thinkers and builders were attracted by the optimality of phenomena and
their applications. Archimedes studied the optimal form to be given to the hulls of
ships, Aristotle, in a more metaphysical mind, claimed that the orbits of planets are
circles because their even curvature, which shows neither an origin nor an end, must
emanate from a creator who wanted them to be eternal.
The original idea that a physical phenomenon could be optimal is due to Hero of
Alexandria,1 in the first century AD, who explained his experimental observations
by a correct geometrical reasoning: in its way between several mirrors, light follows
the shortest path.
The variational principles, born with Fermat (1601–1665), Maupertuis (1698–
1759), Euler (1707–1783) and Lagrange (1736–1813), marked the continuation of
the work of Galileo (1564–1642) and Newton (1642–1727) (and others) towards
contemporary physics.
This view of things, quite obvious nowadays, had disappeared from scientific
considerations during fifteen centuries, and reappeared in the Variational Principles,
which now form the framework of fundamental physical theories, and the Variational
Calculus, a particularly fruitful chapter of mathematics. The step forward appears in
an astonishing remark of Pierre de Fermat in 1661:
It is most likely that nature always acts by the easiest means, that is, either by the shortest
lines, that do not take more time, or in any case, by the quickest time, in order to reduce its
task and to end its operation sooner.
He thus developed a Method for the search of the maximum and the minimum,2 which
founded the laws of geometrical optics. This remark coincided with the major rise of
1 Outstanding mathematician and physicist of the first century AD. In particular, he built the first
steam engine, the Aeolipile.
2 Fermat’s work Méthode pour la recherche du maximum et du minimum, was published by Paul
Tannery and Charles Henry, Gauthier-Villars, 1896 (Third volume, excerpt, pp. 121–123).
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1
J.-L. Basdevant, Variational Principles in Physics,
https://doi.org/10.1007/978-3-031-21692-3_1
2 1 Structure of Physical Theories
differential calculus in mathematics, and this resulted, a few decades later, in Euler
and Lagrange’s work on mechanics in 1744. Euler was inspired by Maupertuis’s
Principle of Least Action. These principles are in the same vein as the observation
of Hero of Alexandria.
The list of famous physicists and mathematicians who have since left their names
to this work is considerable. This field now forms the conceptual framework of the-
oretical physics. It inspired the founders of statistical physics and quantum physics,
it became an essential tool of quantum field theory and is the basis of general rel-
ativity. It is also at the origin of many amazing applications in pure and applied
mathematics.3
Variational Principles
The mathematical formalisation of such ideas, came first from Pierre de Fermat
(1601–1665), as we will see in Chap. 2. His geometric optics principle is a principle
of least time which gives the law of refraction of Snell-Descartes, and also leads to
the understanding of mirages and curved rays.
In fact, everything started around 1637 in a lively critique addressed to Descartes
by Fermat about the notion of proof. Fermat’s irritation followed the publication of the
Dioptric in Descartes’ Discourse of the Method. Fermat, a Toulouse magistrate, was a
mathematician, not a physicist, but he was interested in the structure of physical laws.4
The lack of rigour of the “pseudo-proof” of Descartes, irritated Fermat, who was
convinced that things could be done properly: “It seems to me that a little geometry
will get us out of this mess”. When he succeeded in geometrically demonstrating the
law of refraction n 1 sinθ1 = n 2 sinθ2 , Fermat was literally fascinated: [...] I found
that my principle gave precisely the same proportion of refractions established by
Monsieur Descartes.
In 1744, Pierre-Louis Moreau de Maupertuis (1698–1759) stated for the first
time what he called the “principle of least amount of action” for mechanics. He had
introduced Newton’s ideas in France in 1730. The statement and rationale originally
proposed by Maupertuis are very confused, but this is a historic date in the evolution
of ideas in physics and, at the time, in philosophy.
He pursued the works of Fermat, and he understood that, in well determined
conditions, Newton’s equations are equivalent to the fact that a quantity, which he
called Action, is minimal. In his own words:
The Action is proportional to the product of the mass by the velocity and by the space. Now,
here is this principle, so wise, so worthy of the supreme being: When there is some change
in the Nature, the amount of Action used for this change is always as small as possible.
For a particle of mass m, velocity v, the action of Maupertuis is therefore the product
of three factors, the mass, the speed, and distance traveled. The formulation and
demonstration of Maupertuis’s Principle of Least Action were, fortunately, given
shortly thereafter by Leonard Euler (1707–1783) , his close friend (who realized
that Maupertuis’s action was the integral of the momentum over the Newtonian
trajectory).
These principles had a great impact in the 18th century. That the laws of nature
could be deduced from optimization principles, leading to a balance between con-
flicting causes, struck people’s minds. The “principle of natural economy” was a
fascinating view of things. It asserted an agreement between different laws of nature
in opposition or even in conflict. It was readily linked to the “principle of the best”
advocated by Leibniz.
In the last section of Chap. 2, we will turn to a completely analogous case, which
is fascinating by the number and the power of its consequences, since it is a complex
4He had in particular a correspondence with Etienne Pascal, father of Blaise Pascal on mechanical
equilibrium.
4 1 Structure of Physical Theories
system, compared to the simplicity of the initial assumption. This is the basis of
statistical thermodynamics. By the introduction of the principle of equiprobability
of configurations, Boltzmann obtains an extremely simple definition of the notion
of temperature, accompanied by its primary property which is the equalization of
temperatures of systems in thermal contact. Then we will arrive at the statistical
and absolute definition of entropy, due to Boltzmann. This leads to the surprisingly
simple principle:
Thermodynamic equilibrium corresponds to a situation that maximizes entropy, that is,
disorder, given the constraints.
This result is the basis of many results in other sciences than physics since the
beginning of the 20th century, with the successive developments of the means of
communication with electronics and its advances in quantum mechanics, developed
vertiginous applications such as the theory of Information, or Communication, by
Claude Shannon (1916–2001), an American engineer and mathematician who is
its “father”, with his founding article A Mathematical Theory of Communication
published in 1948,5 which represents a gigantic activity, both scientific and technical,
which our societies cannot ignore, and which finds part of its basis in the notion of
Boltzmann’s Entropy, that we will see in Chap. 2. The “Digital” word becomes an
ubiquitous term in today’s societies. Quantum Information is a current sector of
activity for both researchers and industrial companies.
The same line of thought has been, understandably, one of the cornerstones in the
construction of economic models . The leading personality in this domain was Paul
A. Samuelson (1915–2009) [5], who published his first major work, “Foundations of
Economic Analysis”6 in 1947. His thesis, Foundations of Economic Analysis is based
on the chemical thermodynamics of Willard Gibbs. He was awarded the (second)
Nobel Prize in Economics in 1970 “for the scientific work through which he has
developed static and dynamic economic theory and actively contributed to raising
the level of analysis in economic science”. He was the collaborator of President John.
F. Kennedy. The book reveals a common mathematical structure underlying multiple
branches of economics from two basic principles: maximizing behavior of agents
(such as utility by consumers and profits by firms) and stability of equilibrium such
as markets or economies. One of its key insights about comparative statics, called
the correspondence principle, states that stability of equilibrium implies testable
predictions about how the equilibrium changes when parameters are changed.
The same line of thought has been developed in medicine and biology, in the
principles of maximization in evolutionary biology (evolution can be seen as a max-
imization of entropy7 ), or in the development of population genetics in a Darwinian
evolution.
Coming back to physics, the variational principles have continued to produce richer
and richer physical results.
We will see, in Chap. 3, the contributions of Leonhard Euler (1707–1783) and
Joseph-Louis Lagrange (1736–1813), whose work was pursued by William R. Hamil-
ton (1805–1865). These are the fathers, together with Boltzmann, of the cornerstones
of contemporary theoretical physics.
The consequences of this vision of physics are at the source of Einstein’s gen-
eral relativity as well as quantum mechanics and modern theories of fundamental
interactions.
The central mathematical tool is the variational calculus. We owe it to Euler
who understood the mechanism, and Lagrange who made a decisive contribution in
1766.8 Variational calculus is an astonishing part of mathematics, both by its unifying
character and by the number of questions it has permitted to answer.
Euler published in 1744 his treatise Methodus inveniendi lineas curvas maximi
minimive Gaudens proprietate, which founded the variational calculus, that had a
considerable influence on Lagrange. It is in this work that Euler justified a posteriori
the least action principle of his friend Maupertuis.
Lagrange was particularly gifted and precocious. Euler’s favourable reaction to his
work encouraged him and in 1756 he applied his techniques to the principle of least
action in a form that makes it the foundation of modern mechanics and theoretical
physics.
One of Lagrange’s major contributions is his Analytic mechanics where he syn-
thesizes all the methods he had previously developed in statics and dynamics. The
work, completed in 1782, appeared only in 1788 in Paris. Lagrange’s mechanics is
as important in the History of Physic as the celestial mechanics of Newton.
This work will be the starting point for all later research, including that of Hamilton,
who, given the admiration he had for Lagrange, called it a “scientific poem written
by the Shakespeare of mathematics”. It was indeed Hamilton who named this theory,
and invented the word Lagrangian, after Lagrange.
Chapter 4 will take us to the next step and to the so-called canonical formulation
of Analytical mechanics, due to Hamilton. It is actually a very long, abundant and
deep contribution to physics as well as mathematics (Hamilton invented quaternions,
which are the mathematical structure of spin 1/2, which was a terrible problem for
physicists who, one century later fought helplessly with this fundamental feature
8 Euler, who had been visually impaired since the age of 28, became blind in the same year 1766.
In 1754 he received the visit of the young Lagrange who showed him his works. Amazed by the
this young man’s talent, he concealed his own results, to leave the sole merit to Lagrange alone.
This is an example of an almost unique, and now disappeared, act of human courtesy and passion
for science.
6 1 Structure of Physical Theories
of quantum physics during 25 years). This canonical formalism, dating from 1834,
is based not on the empirical variables and their time derivatives (x, ẋ) but on the
“conjugated momenta” (x, p). We will see that it brought many people to work
on ideas that were completely unpredictable before him and opened new paths to
physics. The canonical point of view is more convenient for a number of problems,
including mechanics of points or sets of points. But above all, it is impressive in its
mathematical and physical developments and its ability to bring out the symmetries
of problems.
We will refer to a large number of impacts of Hamilton’s work, including some
aspects of dynamical systems. This type of physical problem has indeed been an
extraordinary source of discoveries, both in physics and in mathematics. The founder
of this field of study is Henri Poincaré, in 1885, after he studied the three-body prob-
lem. This leads to fascinating problems: limiting problems at t = ∞, attractors and
strange attractors, bifurcations, chaos etc. Perhaps the most famous strange attractor
is the Lorenz’s attractor, named after its inventor Edward N. Lorenz who discovered
it in 1963 from a mathematical model of the atmosphere, and relaunched in a spec-
tacular way, with the “butterfly” effect in meteorology, interest in chaos, invented by
Poincaré 80 years before.
We will then naturally come in Chap. 5, to the amazing discovery of the similarity
between analytical mechanics and geometrical optics, which Hamilton had under-
stood.
Between 1825 and 1828 he presented the theory of a single function, the charac-
teristic function, which unified mechanics, optics and mathematics and helped him
establish the wave theory of light. More precisely, Hamilton showed that mechanics
(of that time) had a striking similarity with geometrical optics, which, itself, was the
limit of wave optics. He wondered about what kind of theory would have classical
mechanics as a limit. Of course, at that time there was no experimental manifestation
of Planck’s constant. But both his characteristic function, which had the dimension
of an action, and his remarks were strong sources of inspiration for some fathers
of quantum theory, such as Dirac and Louis de Broglie, one century later. We shall
come back to this point in the last chapter of this book when we describe Richard
Feynman’s Path Integral approach to the sources of quantum mechanics.
Hamilton was fascinated by the action and the Fermat principle. He understood
that it is a stationary and not minimal principle of action. The variational principle,
also known as the Hamilton principle, is the essential element of his articles. We will
describe Hamilton’s results on geometrical optics, where he chooses to work directly
with the action in the form of his characteristic function S, function of canonical
variables (x, p). Hamilton formalized the fact that geometric optics is a limit of short
wavelength optics, and we will see its amazing structural similarity with mechanics.
1 Structure of Physical Theories 7
Gravitational Waves
In Chap. 8, we will really address general relativity by describing one of the greatest
results in recent years: the quantitative detection of gravitational waves, emitted by
accelerated masses. This verification took place one century after Einstein’s predic-
tion itself. The first event detected was on September 14, 2015 by the international
collaboration LIGO-Virgo. For these results, the 2017 Nobel Prize in Physics was
awarded to Rainer Weiss, Barry C. Barish and Kip Thorne. In this case, the waves
were emitted by a rotating binary systems of black holes, before they merged into a
single congener. Double discovery: gravitational waves and existence of black holes!
Early evidence of gravitational wave emission was recognized by the 1993 Nobel
Prize awarded to Joseph H. Taylor and Russell A. Hulse “For the discovery of a new
type of pulsar, which opened up new possibilities for the study of gravitation”. It
was a result of exceptional precision on a totally unexpected phenomenon. Taylor
and Hulse had discovered the first example of a double pulsar, on the other hand the
rotation of this system emits a gravitational energy so important (although the signal
is very weak) that its rotation period decreases over time with an accuracy identical
to that of the best theoretical calculations of General Relativity.
8 1 Structure of Physical Theories
Quadrupolar gravitational waves propagate at the speed of light, but they are
waves of space-time, that is, of the medium which carries them, and, at the point of
detection, their amplitude, measured by the ratio of the relative distance variation
(of proper time to be precise) of two points of the detector, is of the order of 10−21
(relative order of magnitude of an atom compared to the Earth-Sun distance), and
the realization of the detection devices is in itself a prodigy.
Our last Chap. 9 deals with the variational formulation of quantum mechanics due
to Feynman.
In 1941, he discovers a 1932 text by Dirac which contains a remarkable idea that
will allow him to build a completely new variational formulation of non-relativistic
quantum mechanics. In this work, he introduces the mathematical concept of path
integrals, which has been developed ever since. This will be the subject of his thesis,
defended in May 1942 and published only after the end of the war.
Dirac, like Schrödinger and Louis de Broglie , had reread Hamilton’s articles,
and in particular meditated on the characteristic function and connection between
geometric optics and classical mechanics. He was interested in the phase and ratio of
the action and the universal Planck constant , in the expression ex p(i S/), similar
to Hamilton’s characteristic function in optics. But he did not know how to make an
essential calculation.
Feynman solved the problem and formulated quantum mechanics on the basis of
the hypothesis of the existence of probability amplitudes, the principle of superpo-
sition and this quantum version of the characteristic function. He could thus deduce
from these assumptions the form and algebra of observables and the evolution equa-
tion, by writing the amplitude of probability of a process as the superposition of
amplitudes from the totality of possible quantum paths, generalization of interfer-
ence by Young’s holes to the complete set of possible trajectories. The sum of all the
amplitudes realizing the process under consideration is a subtle mathematical object
called a path integral, on which all the formalism is based.
Feynman showed that one thus obtains the relations of Einstein and Broglie, as
well as the Schrödinger equation, the algebra of observables and all the traditional
quantum mechanics. Countless results have been achieved, this tool plays a central
role in contemporary quantum field theory.
If we consider systems and processes where the classic S action is macroscopic,
that is to say much greater than the Planck constant , the contribution of paths that
may seem very close to each other in the classical sense, but such that the difference
in the action calculated on these paths is also, much larger than , will be, with a
high probability, in destructive interference. The contribution of all such paths is then
zero. In other words, under these conditions, only an infinitesimal neighbourhood of
the classical trajectory, remains impossible to scrutinize experimentally in its detail.
The “probability” of the conventional trajectory is therefore equal to one. Thus,
1 Structure of Physical Theories 9
recover and find local laws such as Newton’s law on the proportionality of force
and acceleration of a body in mechanics, or Coulomb’s law on the force between
two electric charges. The depth of this approach, more rich and more powerful,
give their full origin to local laws, and exhibit the fundamental principles which
are behind them. In fundamental physics, quantum physics, elementary particles or
general relativity, it is mainly symmetry properties and invariance laws that govern
the construction of theories.
In this chapter, we review some important examples and introduce the necessary
mathematical tools that show how they work. In Sect. (2.1), we turn back to the
Fermat principle, in particular Fermat’s proof of the laws of refraction. Fermat did
not know the velocity of light and the existence of an index of refraction. He assumed
that the time it takes light to travel a certain distance in a medium is increased by the
“resistance” of that medium to the propagation of light. Fermat stated his “least time
principle” at the end of 1661. He called it the “principle of natural economy.” This
principle explains curved light rays and mirages, which the Snell–Descartes laws
do not account for. This will directly lead us to the central underlying mathematical
foundation of the problem under consideration: the variational calculus of Euler and
Lagrange in Sect. (2.2). It is an amazing chapter of mathematics, both by its unifying
aspect and in the number of problems that it allows to solve. The mathematical details
are remarkably exposed by Jean-Pierre Bourguignon [8]. It is deliberately that we
shall not go into any mathematical details. Such details can be found in the literature,
and we shall focus on physical applications and results2
The direct consequences of this first mathematical approach are the phenomena
of optical mirages that we discuss and illustrate in section and, In Sect. (2.3), we
will give first example of the “least action principle,” as first stated by Maupertuis in
mechanics in 1744. This was a landmark in the evolution of ideas in physics as well
as philosophy at the time.
In this chapter, we will examine three important examples and the mathematical
tools needed to formalise them. In Sect. 2.1, we return to the Fermat principle, and
in particular the latters demonstration of the laws of reflection and refraction. Fermat
didnt know about the speed light and refractive indices. Assuming that the time
it takes light to travel in a medium is increased, Fermat enunciated at the end of
1661 his least time principle that he called the “principle of natural economy”. This
principle directly explains, as we shall see, the curved rays, responsible for mirage
phenomena, which are absent in the framework of the Snell-Descartes laws.
This will lead us, in Sect. (2.2), to a first approach to the mathematical core of our
subject: the variational calculation of Euler and Lagrange. This is an amazing part
of mathematics, both by its unifying aspect, by the number of questions it allows to
be answered and by the diversity of its consequences and applications. Well come
back to those points later on. The mathematical details are remarkably exposed by
2 In addition to this mathematical presentation, a classic treatise is C. Lanczos book, The Variational
Principles of Mechanics Dover Publications, 1970, and Lawrence Schulmans highly detailed work
[27]; there is a basic presentation in: P.K. Townsend Variational Principles, Part 1A Mathematics
Tripos 2018, https://www.damtp.cam.ac.uk/user/examples/B6La.pdf.
2.1 Fermat’s Least Time Principle 13
3In addition to this mathematical presentation, a classic treatise is the book of C. Lanczos The
Variational Principles of Mechanics, Dover Publications, 1970, and Lawrence Schulmans highly
detailed work [27]; there is a basic presentation in: P.K. Townsend Variational Principles, Part 1A
Mathematics Tripos 2018, https://www.damtp.cam.ac.uk/user/examples/B6La.pdf.
14 2 Variational Principles
Fig. 2.1 Possible light rays between an emitter A and an observer B when there is a reflection on
a plane. Since B is symmetric to B with respect to the plane of the mirror the length of AO B is
equal to that of AO B. The shortest path between A and B is a straight line. A path AF B is longer
whenever F = O
Refraction
Concerning the laws of refraction, Descartes had assumed that the velocity of light
in matter (a dense medium) is larger than in vacuum (or in a diluted medium).4 That
fact, together with the lack of rigor of Descartes’s “proof,” had made Fermat angry.
He was convinced that things could be done properly.
Fermat solved the problem of refraction only much later, in 1661, annoyed by the
critiques of Descartes’s supporters. The key point of his proof lies in the assumption
that the velocity of light is just smaller in a dense medium than in a dilute one.
Let (X, Y ) be the plane separating the two media of indices n 1 and n 2 . The source
is at point A and the observer is at point B, as represented in Fig. 2.2.
Let H and H be the projections of A and B on the (x, y) plane. We denote by h
the distance of A to the surface and h that of B. The distance H H is l. We consider
a path AO B and we denote by x the distance H O. We want to minimize the optical
path n 1 AO + n 2 O B.
By the Pythagorean theorem, we have
AO 2 = h 2 + x 2 , O B 2 = h 2 + (l − x)2 .
T = (n 1 AO + n 2 O B)/c. (2.1)
4 This idea probably comes from the fact that many interfaces under consideration were horizontal
liquid surfaces, perpendicular to the direction of gravity. Since the refracted light ray appears to be
closer to the vertical when it passes from air to water, for instance, it seemed intuitive that it “falls”
more rapidly.
2.1 Fermat’s Least Time Principle 15
Fig. 2.2 Possible light ray between an emitter A and an observer B when there is refraction across
a plane surface between two media of indices n 1 and n 2 . H and H are the projections of A and B
on the surface. h is the distance between A and the surface, and h that between B and the surface.
The distance H H is l
Here, we give an analytic proof, at present simpler to understand than the beau-
tiful purely geometric argument of Fermat. (Fermat did not fully know differential
calculus, which was developed later by Newton and Leibniz, although he had correct
ideas on the subject.)
We seek x such that (2.1) is minimal. By taking the derivative of this expression
with respect to x, and writing that the derivative dT /d x vanishes, we obtain
n1 x n 2 (l − x)
√ = . (2.2)
h2 + x2 h 2 + (l − x)2
We note that
x (l − x)
√ = cos θ1 = sin i 1 , and = cos θ2 = sin i 2 , (2.3)
h2 + x2 h 2 + (l − x)2
where the angles θ1 and θ2 are indicated In the figure, and i 1 and i 2 are the angles of
incidence and refraction.
Therefore, we obtain the Snell–Descartes law
“principle of natural economy,” and he added the remark that “Nature always acts by
the shortest paths.” As we said, this principle had a great impact in the 18th century.
It was used by Maupertuis in mechanics.
Rescuing a Swimmer
This result can be transposed into many other situations. One example is the optimal
path that a rescuer must follow on a beach and in the water in order to rescue a bather
in difficulty as quickly as possible. The velocities of the rescuer on the beach, v1 ,
and in the water, v2 , are not the same. The optimal trajectory, which can be sketched
as in (2.2), obeys the law
sin i 1 /v1 = sin i 2 /v2 .
Curved Rays
Fig. 2.3 Light ray between an emitter A and an observer B in a medium whose index of refraction
varies with the altitude z. The variable x is the horizontal distance. We assume that the problem is
translation invariant in the perpendicular y direction. The apparent direction of point A as seen by
B is the tangent to the light ray reaching B
2.2 Variational Calculus of Euler and Lagrange 17
where the endpoints A and B are fixed, where ż(x) ≡ dz(x)/d x, and where L is a
known function, called the Lagrange function. Needless to emphasize, it is exactly
the problem of Eq. 2.5.
Let us assume there exists a solution, that we denote as z = Z (x). We want that, for
any infinitesimal variation δz(x) of Z (x), there corresponds a second-order (or more)
variation of the integral I . In the transformation Z → Z + δz(x), Ż → Ż + δ ż(x),
where it is assumed that the endpoints of the integration do not change, δz(A) =
δz(B) = 0, the variation δ I of the integral is
B
∂L ∂L
δI = δz(x) + δ ż(x) d x.
A ∂z ∂ ż
The second term can be integrated by parts since by definition δ ż = (d/d x)δz. Since
δz(A) = δz(B) = 0, the variation δ I is
18 2 Variational Principles
B
∂L d ∂L
δI = − δz(x) d x. (2.7)
A ∂z dx ∂ ż
We want this integral to vanish for any infinitesimal variation δz. The integrand
must vanish identically. Therefore, the solution z = Z (x) must satisfy the second-
order differential equation
∂L d ∂L
= , (2.8)
∂z d x ∂ ż
∂L
= C1 . (2.10)
∂ ż
∂L
= 0. (2.11)
∂z
dL ∂L ∂L ∂L
= + ż + z̈ .
dx ∂x ∂z ∂ ż
2.2 Variational Calculus of Euler and Lagrange 19
or equivalently:
d ∂L ∂L d ∂L
L − ż = ż − . (2.13)
dx ∂ ż ∂z d x ∂ ż
Let us come back to the case of curved rays considered above. Consider the integral
(2.5) and let us assume for definiteness that the index of refraction varies with the
altitude as n(z) = 1 + νz, with ν constant. We also assume that the endpoints are at
the same height z(x = 0) = h and z(x = l) = h. The Lagrange function is
1
L= (1 + νz) 1 + ż(x)2 ,
c
from which we deduce the Lagrange-Euler equation
The solution of this equation is simple. We shift to the function u = z + 1/ν and
insert this into (2.15). We obtain
1 + u̇ 2 = u ü, (2.16)
Fig. 2.4 Diagrams of an inferior mirage (left) and a superior mirage (right)
In this simplistic model, the trajectory of the curved ray is a cosh function whose
minimum (or maximum) altitude is attained at x = l/2 (the symmetry of the prob-
lem).
This situation is encountered in mirages. Perhaps the most common is highway
shimmer. Parts of a hot highway can appear as “lakes.” This sort of mirage is sketched
in Fig. 2.4. The index of refraction is smaller near the highway, where the temperature
is high and the air less dense, whereas it increases with the altitude, where the
temperature is lower. The “lake” is a reflection of the sky. Such a case is called an
inferior mirage. The apparent image is below the actual direction of the object.
As one can understand from this simple example, a more complex variation of
the index of refraction n(z) will lead to a variety of phenomena. The reverse happens
if the index is smaller at high altitudes than at lower altitudes. This type of situation
happens when light rays propagate near a hot hill. These are called superior mirages.
One can then see an object that should be hidden geometrically by the hill, such as
the mirages in the desert.
At sunset, one can see the sun for quite a long time after it has gone below the
geometrical horizon. As shown in Fig. 2.5, when the sun is close to the horizon,
light rays cross an atmosphere whose index of refraction varies considerably with
the altitude. At sunset, the angle between the apparent direction of the sun and its
actual direction is roughly half a degree. The sun is far below the horizon (see [4]
for other examples).
2.2 Variational Calculus of Euler and Lagrange 21
Mirages happen frequently in the Arctic and Antarctic. For a long time, the line
of sight crosses a large thickness of the atmosphere. Over that distance, the density
and chemical composition of the atmosphere can vary considerably. This results in
spectacular effects.
Figure 2.6 is a picture taken during a German expedition led by the ship Germania
in the Arctic in 1888. It is particularly rich, since for both ships there are two superior
mirages, inverted with respect to one another. Between the ships, one can see an
iceberg. This picture is reminiscent of the legend of the Flying Dutchman5 (in the
Southern Hemisphere).
Figure 2.7 shows two mirages. Above, an inferior, inferior mirage of the sun in
the forefront is caused by the strong density and temperature variations inside a layer
of clouds visible on the picture Below superior mirage of icebergs in the Arctic
(Courtesy of Pekka Parviainen.). See also http://www.atoptics.co.uk/.
The variations of the index of refraction of the atmosphere generate a series of
effects, in particular lensing effects. It is possible to observe islands, ships, and coasts
that are several hundred kilometers away. The variation of the index allows one to
see and take pictures of the famous “Green ray” at sunset (see Pekka Parviainen at
http://virtual.finland.fi/finfo/english/mirage2.html).
5 The “Flying Dutchman” was a famous sailor. He claimed he could sail around the Cape of Good
Hope whatever the weather conditions. Years after he disappeared in a huge storm, many sailors
claimed they had seen his ship, in particular in the sky, which was proof that storms were unable to
beat him.
22 2 Variational Principles
Fig. 2.7 Above: remarkable double sunset mirage observed at Paranal Observatory in the Atacama
Desert, Chile, by Luc Arnold in 2002 at the site of the European Southern Observatory Very Large
Telescope. (Courtesy of Luc Arnold). Below superior mirage of icebergs in the Arctic. (Courtesy
of Pekka Parviainen.)
In 1744, Maupertuis stated for the first time his principle of the least quantity of
action in mechanics. Even though the initial version and justification of Maupertuis
are confused, it is a historical landmark in the evolution of ideas, both in physics and,
at the time, in philosophy.
Consider a particle of mass m and velocity v. The action of Maupertuis is the
product of three terms: the mass, the velocity, and the distance covered.
Actually, it
is the integral of the linear momentum along the trajectory: A = mv dl.
The correct formulation and the proof of Maupertuis’s principle were given a
little later by Euler. In present terminology, consider a point-like particle of mass m
in a potential V (r). We denote by v the velocity and v its norm. Assuming (this is
essential) that the energy E is a constant of the motion, we have
1 2
E= mv + V (r).
2
2.3 Maupertuis, Principle of Least Action 23
where dl is the length element along the trajectory. The principle of Maupertuis is
that the physical trajectory that the particle follows to go from a to b with a fixed
energy E is the path that makes (2.19) minimum.
There are many proofs. We parameterize the state variables {r, ṙ}, where r =
(x, y, z), by the time t on the physical trajectory (i.e., we work with {r(t), ṙ(t)}).
of departure ta and arrival tb are therefore well defined. We have dl =
The times
v dt = (ẋ 2 + ẏ 2 + ż 2 ) dt, and the action (2.19) is
tb
Aa,b = 2m(E − V (r)) ẋ 2 + ẏ 2 + ż 2 dt. (2.20)
ta
2m(E − V (r)) = m 2 v 2 .
dv
− ∇V = m QED. (2.22)
dt
Consider now a slightly more complicated problem. We want to determine the elec-
trostatic potential φ(r) created by a given distribution of charges ρ(r). We know that
the answer is Poisson’s law,
ρ
Δφ = − . (2.23)
ε0
The problem under consideration is to find the potential φ(r) that minimizes this
expression.
We remark on the following points:
1. As usual, we assume there are no charges at infinity, so that φ can be chosen to
vanish at infinity. The integrals run over all three-dimensional space.
2. Since the first term is positive, if there exists a minimum of this expression for
a function φ(r), this minimum corresponds to an equilibrium situation. In this
respect, it is similar to the case of the massive string in Sect. 2.5. There is an
equilibrium between two contributions to the total energy that compete with one
another. Any excess of one form of energy corresponds to an unstable situation.
3. In comparison with the mirage (2.5) or the massive string (A.1), it is the potential
φ and its gradient ∇φ that play the role of the previous single variable z and its
derivative ż. The variable x of the previous simple examples is now a point r of
three-dimensional space (i.e., r ∈ R3 ).
Let φ be the solution and η(r) an infinitesimal variation of this potential. In the
variation φ → φ + η, we have, to first order,
Integrating the first term by parts, and taking into account the fact that φ vanishes at
infinity, we obtain
(∇φ · ∇η) d 3 r = − Δφ η d 3 r.
Therefore,
δU = [−ε0 Δφ − ρ] η d 3 r. (2.26)
The fact that δU = 0 for any infinitesimal η(r) yields the Poisson Eq. (2.23)
2.4 Thermodynamic Equilibrium: Maximal Disorder 25
ρ
Δφ = − .
ε0
A particular case is when the charge density vanishes. By that, we mean that there
are a certain number of charged conductors each of which is at a given potential
V1 , V2 , . . . , Vn . There is a surface charge density, but the volume density ρ vanishes
everywhere. Let 1 , 2 , . . . , n be the surfaces of the conductors. Then Eq. (2.26)
boils down to
Δφ = 0,
Let us turn to a case that is similar in its motivation but that has fascinating conse-
quences compared with the simplicity of the starting point.
As Schrödinger wrote [9], there is, basically, only one problem in statistical ther-
modynamics: the distribution of a given amount of energy E over N identical systems.
We only consider here classical statistical thermodynamics. Quantum statistics
is outside the scope of this book. The only “quantum” feature lies in the fact that we
assume there are discrete energy levels.
We consider an isolated assembly of N identical systems {s1 , s2 , . . . , s N }, each
of which can occupy one of the energy levels εk (for instance, the energy levels in a
box where we place the atoms of a monatomic gas).
We assume that the pairwise interactions of these systems are weak in the sense
that they do not affect their energy levels. The energy of the assembly is therefore
the sum of the energies of the N systems.
Let us call the state or configuration of the assembly the fact that
The ei belong to the set {εk } and, of course, the sum is equal to the (given) total
energy E.
We call distribution of the N systems the fact that
etc.
n i = N and n i εi = E.
i i
ln W = N ln N − N − n i ln n i + ni = n i ln(N /n i ) = −N pi ln( pi ),
i i i i
(2.28)
where we have introduced the probabilities pi = n i /N .
2.4 Thermodynamic Equilibrium: Maximal Disorder 27
We want to find the distribution {n i } that maximizes this expression under the
constraints
ni = N , n i εi = E. (2.29)
i i
In order to solve this problem, we use the technique of Lagrange multipliers, which
has many applications.
The problem under consideration is to find the maximum of a function f (x, y)
with the constraint that (x, y) lie on a path y = y0 (x) (for instance, find the highest
point not of a mountain but of a road on that mountain).
Of course, one can think of injecting the equation of the path in f and calculating
x such that
d ∂f ∂f d
f (x, y0 (x)) = + (y0 (x)) = 0. (2.30)
dx ∂x ∂y dx
∂f ∂g ∂f ∂g
+λ = 0 (1), +λ = 0 (2), g = 0 (3). (2.32)
∂x ∂x ∂y ∂y
∂f ∂ y0 ∂f
+λ = 0 (1), − λ = 0 (2). (2.33)
∂x ∂x ∂y
Eliminating λ between (1) and (2) obviously amounts to solving the initial equation
(2.30).
This method applies in the case of a function f ({xi }) of any number of variables
xi , i = 1, . . . , n related by any number p of constraints gk ({xi } = 0, k = 1, . . . , p
(with p < n).
It is simpler to work not with the occupation numbers n i but with the probabilities
pi = n i /N , which can be considered continuous quantities since N is very large.
28 2 Variational Principles
pi = 1, pi εi = E/N .
i i
ln W = n i ln(N /n i ) = −N pi ln( pi ).
i i
We must therefore introduce two Lagrange multipliers, α and β, and the proba-
bility law { pi }, which maximizes this expression under the above constraints, is the
function for which the variation of the quantity
δ ln W − αδ N − βδ E (2.34)
vanishes. In other words, whatever the δ pi (with δ pi = 0), the following variation
must vanish
− δ pi ln( pi ) − α δ pi − β εi δ pi . (2.35)
i i i
E
e−α−βεi = 1, εi e−α−βεi = .
i i
N
1
e−α = −βεi
. (2.36)
i e
Therefore, the equilibrium distribution is characterized by the fact that there exists a
number β related to the total energy E by
εi e−βεi
E=N i
−βεi
. (2.37)
i e
β = 1/kT, (2.38)
e−εi /kT
pi = (2.39)
Z
where the function
Z= e−εi /kT (2.40)
i
is called the partition function of the system (from the German Zustandssumme,
sum over states). This function plays an important role in statistical physics; −k ln Z
is the free energy divided by T . The form (2.39) is called the Boltzmann–Gibbs
distribution.
Consider two assemblies E and E , which may be of different natures, formed respec-
tively of N and N systems S and S . The energy levels of S are εi and those of S
are εj . These two assemblies are in “thermal contact,” which means that they can
exchange energy but that their interaction is sufficiently weak that it does not change
the individual energy levels εi and εj of the two systems considered separately.
Furthermore, these two systems are isolated. We denote by E the total energy.
1. The number W of states in a distribution ({n i }, {n j }) of the systems S and S is
N !N !
W = .
i (n i !) j (n j !)
ni = N , n j = N , n i εi + n j εj = E.
δ (W − αN pi − α N p j ) − β(N pi εi + N p j εj )
or equivalently
30 2 Variational Principles
−N δ pi ln( pi ) − N δ p j ln( p j ) − αN δ pi − α N δ p j
i j i j
⎛ ⎞
−β ⎝ N εi δ pi + N εj δ p j ⎠ = 0. (2.41)
i j
3. This expression must vanish for all infinitesimal δ pi and δ p j . Therefore, one
obtains
− ln pi − α − βεi = 0, − ln pi − α − βεj = 0,
or
pi = e−α−βUi , p j = e−α −βU j . (2.42)
We notice that it is the same Lagrange multiplier β that appears in both expres-
sions. This is due to the fact that it is the total energy that is a given quantity. The
constants α, α , and β are fixed by the constraints as above. Therefore, the two
temperatures are equal if β = 1/kT .
π 2 2 2
εn,l,m = (n + l 2 + m 2 ).
2ma 2
We assume that the spacing of these levels is very small compared with the mean
thermal energy kT . We go to the continuum limit using the density of states in phase
space6
d 3r d 3 p V d3 p
d6 N = ; i.e., d 3
N = ,
2π3 (2π)3
where V is the volume of the cubic container. Inserting this in (2.39), we obtain the
probability d P of finding an atom of the gas in an element d 3 v around the velocity
v = p/m,
mβ 3/2
dP = ex p(−βmv 2 /2)d 3 v. (2.43)
2π
m 3/2 mv 2
dP = e− 2kT d 3 v,
2πkT
we obtain the fundamental relation
1
β= , (2.44)
kT
where T is the absolute temperature.
We now come back to the equalization of the factors β of two assemblies in
thermal contact. We see that, since one of these assemblies can be an ideal gas, the
energy–absolute-temperature relation holds for any assembly of N identical systems
of individual energy levels εi ,
εi e−εi /kT
E=N i
−εi /kT
(2.45)
i e
(all appropriate care is assumed in the counting of degenerate states and in taking
the continuum limit).
Finally, this method allows us to define a thermostat by considering the limit
where one of the assemblies is much larger than the other. Establishing thermal
contact with the second, small assembly does not change the temperature of the first
one. We therefore recover the usual treatment of thermodynamics of assemblies in
thermal contact with a thermostat at a given temperature.
S = k ln W (2.46)
is nothing but the entropy of the assembly, which is therefore defined in an absolute
manner (k is Boltzmann’s constant). This provides a measure of the state of disorder
of the assembly. The greater its disorder, the more stable it is.
This is a fundamental result of great simplicity:
The thermodynamic equilibrium corresponds to a situation that maximizes the
entropy for a given set of constraints. In other words, it maximizes the disorder
given the constraints.
This principle applies to a large variety of daily life situations. What is the state of a
child’s room that optimizes the satisfaction of all the family? Its range of application
32 2 Variational Principles
goes far beyond physics. This notion is one of the founding blocks of economic
models.
The notion of heat, which is very intuitive and has been known since very ancient
times, was viewed for a long time as emanating from some fluid that could flow
from one body to another. The first principle of thermodynamics tells us that it is a
particular form of energy. Statistical thermodynamics allows us to understand this in
a very natural manner.
Indeed, consider an assembly at equilibrium whose total energy is E = N pi εi
and whose temperature is T . In any infinitesimal evolution of this assembly through
a contact with the outside, two things can happen. One is the variation of the energy
levels εi if the total volume changes, or if an electric field is applied, etc. Another
is the reorganization of the populations of the various energy levels n i = N pi . The
corresponding variation d E of the total energy of the assembly is
The first term is obvious. It corresponds simply to the work of the external forces
dτ = n i dεi (2.48)
i
(we avoid the traditional dW in order to avoid confusion). Under the external action,
the energy levels vary, resulting in a variation (2.48) of the total energy of the system.
The second term is less obvious. It comes from the fact that, even if the energy
levels εi do not change (in the absence of external work), the total energy can be
modified by a rearrangement of the populations n i of the levels. This variation of the
(internal) energy without any intervention of external forces is what we call “heat.”
We obtain the statistical definition of heat as
dQ = εi dn i . (2.49)
i
In order to relate Boltzmann’s entropy Eq. (2.46) and the usual formula of ther-
modynamics, consider a variation d S = kd(ln W ). We obtain
d S = −k N dn i ln n i (2.50)
i
2.5 Exercises 33
(of course, i dn i = 0). Suppose the evolution is sufficiently slow that at any time
thermodynamical equilibrium is achieved. (The temperature may evolve during the
process). This is called a reversible transformation in macroscopic thermodynamics.
If this is the case, the n i are proportional to exp (−εi /kT ), which yields
1
d Sr ev. = εi dn i . (2.51)
T i
2.5 Exercises
We want to determine the relative intensities I1 and I2 of the electric current in the
two legs of the simple electric circuit shown in Fig. 2.8, whose resistances are R1
and R2 . The incoming current has an intensity I . The well-known result is easily
obtained with the Ohm–Kirchhoff laws.
Show that it is simpler to assume that the loss of energy by Joule heating is
minimal.
34 2 Variational Principles
Consider a massive string of constant linear mass density μ and length L whose
endpoints are fixed at A (x = 0, z = z 0 ) and B (x = a, z = z 1 ). The string lies in
the vertical plane (x, z), and it is in the gravitational field, oriented along the vertical
z axis. Determine the shape of the string at equilibrium. (Of course, we assume that
(z 1 − z 0 )2 + a 2 ≤ L 2 .)
Reconsider the massive string exercise above using Lagrange multipliers in order to
express the constraints; i.e., the length of the string and the positions of the endpoints.
2.6 Problem. Win a Downhill 35
A skier slides down a snowy plane slope. The plane makes an angle α with respect
to the horizontal direction. The skier is in the vertical field of gravity, of acceleration
g. The skier starts with a zero velocity from some point O and wants to reach a given
point A, downhill, in the shortest time. What is the optimal trajectory?
We choose in the plane a reference frame with origin at O, with horizontal axis
O y, and whose x axis is along the line of greatest slope, as shown in the Fig. 2.10.
We choose the origin of the potential energy at point O so that the initial energy E
of the skier is zero.
We neglect friction of air and the track, as well as the efforts of the skier to maintain
his trajectory. Therefore, the total energy of the skier is a constant of the motion.
1. Check that with this definition of the variable x, the potential energy of the skier
at point (x, y) is V = −mgx sin α.
2. Write the expression of the skier’s total energy at a given time. We denote ẋ ≡
d x/dt, ẏ ≡ dy/dt. What is the relation between the potential energy and the
kinetic energy owing to energy conservation?
3. Use the previous expression to express the square of the time interval dt between
two positions, (x, y) and (x + d x, y + dy), of the skier, in terms of d x 2 , dy 2 , x,
y, g, and α.
4. Calculate the time it takes to go from O to A if the skier follows a trajectory
defined by a function y(x) (note y ≡ dy/d x).
5. What is the equation of the optimal trajectory?
6. Show that along the optimal trajectory the quantity C = y / x(1 + (y )2 ) is a
constant. Deduce from this that along the trajectory the quantity f (t) = ẏ/x is a
constant K , and express its value in terms of C, g, and α.
7. Check that the parametric form x(θ) = (1 − cos 2θ)/(2C 2 ) and y(θ) = (2θ −
sin 2θ)/(2C 2 ) is a solution. Use the result of the previous question to calculate
the function θ(t).
8. What kind of curve is it? Draw the trajectory qualitatively in the case y (A) 1.
9. Explain the result physically. (It is not necessary to do all previous calculations
in order to answer this question.)
Chapter 3
The Analytical Mechanics of Lagrange
1 Discursive reasoning and mathematical proofs concerning two new sciences of mechanics and
local movements.
1782)), and was followed by Euler, Lagrange, and later on by Hamilton (1805–1865).
The true structure of mechanics was discovered to be a geometric structure. A large
category of mechanical problems could be reduced to purely geometrical problems.
D’Alembert (1717–1783), who was the first to understand the concept of mass
through the notion of linear momentum and its conservation, attacked the abstract
concept of force introduced by Newton. For d’Alembert, the only observable phe-
nomenon is motion, whereas the “cause of motion” is an abstraction; hence the idea
of studying not a particular trajectory of the theory but the global set of motions that
it predicts (this is a very modern conception of forces or interactions).
The crowning achievement of these ideas came with Lagrange in 1788, one century
after the Principia. Lagrange published, in his Analytical Mechanics (Méchanique
Analitique), a new formulation of mechanics where the global and geometric structure
of the theory was emphasized.2 Lagrange wrote (nevertheless) at that time,
One will not find Figures in this work. The methods I present do not require any construction
or geometric arguments, but only algebraic operations, subject to a uniform and continuous
methodology. Those who appreciate Analysis will discover with pleasure how Mechanics
becomes a new branch part of it, and they will be grateful to me for extending its realm.
2There are many books on analytic mechanics, or dynamics. One can refer to the classics of Landau
and Lifshitz, [1] and [2], and the book Classical Mechanics [3] by Herbert Goldstein, which is clear
and complete.
3.1 Lagrangian Formalism and Least Action 39
Finally, in Sect. 3.4, we shall extend such considerations to the case of a relativistic
particle. We shall restrict ourselves to the case of a massive particle that is free or
placed in an electromagnetic field. The basic assumption will be Lorentz invariance.
In order for the least action principle to have any physical meaning, it must determine
the motion of the particle independently of the state of motion of the observer. This
will allow us to construct the Lagrangian of a relativistic particle. We will see how
the energy and momentum of a free particle are related to its mass and velocity. Two
points must be emphasized. First, the Lagrangian formalism allows us to prove that
the set {E/c, p} is a four-vector of space-time, whereas we have no idea a priori of
the values of energy and momentum in terms of the velocity. The second point is
that these properties come from the assumption of relativistic invariance of physical
laws.
In order to make things simple, let us consider first the case of only one space
dimension. Among the infinite class of possible trajectories (see Fig. 3.1), what is
the law that determines the physical one? Lagrange knows that the answer to this
question lies in the “principle of natural economy” of Fermat, further developed by
Maupertuis, as we said in Chap. 2.
The variational principle we present here is not the original one used by Lagrange;
it was formulated by Hamilton in 1834 and is simpler in this discussion. In order not
to complicate things, we reverse chronology.
One assumes the following:
1. Any mechanical system is characterized by a Lagrange function, or Lagrangian
L(x, ẋ, t), which depends on the position x, on its time derivative ẋ = d x/dt,
and possibly on time. The quantities x and ẋ are called the state variables of the
particle. For a particle in a potential V (x, t), we have for instance
1 2
L= m ẋ − V (x, t). (3.1)
2
40 3 The Analytical Mechanics of Lagrange
Fig. 3.1 Examples of trajectories starting from x1 at time t1 and arriving at x2 at time t2 . Among
all such trajectories, the physical trajectory actually followed by the particle renders the action S
minimal (extremal)
2. For any trajectory x(t), one can define the action S by the integral
t2
S= L(x, ẋ, t) dt. (3.2)
t1
The Least Action Principle states that the physical trajectory X (t) followed by
the particle is such that S is minimum, or, more generally, has an extremum.
We call X (t) the physical trajectory, and we proceed as in Sect. 2.2, except that the
variable is now the time t. Consider a trajectory x(t) infinitely close to X (t), which
also starts from x1 at t1 and reaches x2 at t2 ,
d
x(t) = X (t) + δx(t), ẋ(t) = Ẋ (t) + δ ẋ(t), δ ẋ(t) = δx(t), (3.3)
dt
where by assumption
δx(t1 ) = δx(t2 ) = 0. (3.4)
We integrate the second term by parts and take into account (3.4), so that the integrated
term vanishes. This leads to
t2
∂L d ∂L
δS = − δx(t) dt. (3.6)
t1 ∂x dt ∂ ẋ
The least action principle states that δS must vanish whatever the infinitesimal vari-
ation δx(t). Therefore, the equation of motion (i.e., the equation that determines the
physical trajectory), is the Lagrange–Euler equation
∂L d ∂L
= . (3.7)
∂x dt ∂ ẋ
∂V
m ẍ = − ≡ f,
∂x
where f is the force.
Generalization
Remarks
d
L = L + f ({xi }, t),
dt
the equations of motion are unchanged.
42 3 The Analytical Mechanics of Lagrange
∂L
L = L + 2v · ε .
∂v 2
The second term on the right-hand side is a total derivative with respect to time
if and only if ∂L/∂v 2 = constant. Therefore, the Lagrangian of a free particle
is of the form L = K v 2 , where K is a constant. We choose this constant to
be m/2 since this entails momentum conservation for an isolated system, as
we will see. Therefore, for a free particle,
m 2
L= v . (3.10)
2
(e) In a reference frame with constant velocity V with respect to the previous
one, the Lagrangian becomes
m d t
L = (v + V)2 = L + mr · V + mV 2
2 dt 2
and the equations of motion are the same in both reference frames.
(f) If the particle is in a field of force, the potential energy term in (3.1) is merely
a definition of the force. We wish to recover Newton’s law, and this choice
guarantees it for forces that derive from potentials.
3. Generalization
The Lagrangian of a set of N particles in a potential V (r1 , . . . , r N ; t) (which
includes the mutual interactions of the particles) is
1
N
L= m i (ṙi )2 − V (r1 , . . . , r N ; t). (3.11)
2 i=1
3.2 Invariances and Conservation Laws 43
It is remarkable that the laws of mechanics can be derived from a variational principle.
The physical trajectory is that for which the action is optimal.
This optimization appears as a “compromise” between various causes in “con-
flict.” Indeed, in the absence of forces (V = constant in (3.1)), S is minimum for ẋ =
constant, the motion is linear and uniform. In the absence of inertia, on the contrary,
the particle would go to the maximum of the potential at the initial point and come
back at the final point. The presence of the potential can be considered as a property
of space that curves the trajectory. Inertia and force can be viewed as conflicting
effects. The particle follows a path of minimal “length,” this length being measured
by the action S.
We see here how the mechanical problem can be transformed into a geometric
problem. As we shall see later on, the motion of a particle in a flat Euclidean space
can be transformed into the free motion in a curved space, where it moves along
geodesics. We will come back extensively to this point in Chap. 7. Einstein had this
idea in mind in 1908 when he was constructing general relativity. It took him seven
years to elaborate the mathematical details of the final theory.
Invariance laws of physical phenomena are fundamental. They form the set of what
is known a priori about a physical problem. They imply corresponding conservation
laws, which play a crucial role. In more elaborate problems than those we have seen
here, they constitute the guiding line in order to construct the Lagrangian of a system
(we have seen a simple example above in discussing the form of the free-particle
Lagrangian).
A system with s degrees of freedom possesses a priori 2s conserved quantities.
Indeed, the evolution of the system is completely determined by the knowledge of
the 2s initial conditions {xi (0), ẋi (0)}. Therefore, there are in principle 2s relations
between the variables {xi (t), ẋi (t)}, which allow one to calculate {xi (0), ẋi (0)} at
any time. In practice, only a subset of such relations are usable.
44 3 The Analytical Mechanics of Lagrange
∂L
ṗi = , (3.13)
∂xi
(x1 , x2 , . . . , x N ) → (q1 , q2 , . . . , q N ),
or
L({xi }, {ẋi }; t) → L ({qi }, {q̇i }; t).
In this change of variables, the Lagrange–Euler equations keep the same form. We
can define the conjugate momentum pi of the generalized variable qi by the relation
∂L
pi = . (3.14)
∂ q̇i
This quantity satisfies the same equation as (3.13); i.e., ṗi = ∂L /∂qi .
A cyclic variable is a variable qi that does not appear explicitly in the Lagrangian
L . This means that
∂L
= 0.
∂qi
Assume the system is isolated (i.e., ∂L/∂t = 0). Another way to describe this
assumption is to say that the system is invariant under translations in time or that
time is homogeneous.
We evaluate the evolution of L(x, ẋ) along the physical trajectory x(t),
dL ∂L ∂L d ∂L
(x, ẋ) = ẋ(t) + ẍ(t) = ẋ(t) , (3.15)
dt ∂x ∂ ẋ dt ∂ ẋ
where we have transformed the first term by taking into account the Lagrange equa-
tion (3.7). We deduce that
d ∂L
ẋ(t) − L = 0. (3.16)
dt ∂ ẋ
If we use the Lagrange conjugate momenta, the expression of the energy becomes
E= pi ẋi − L. (3.19)
i
Examples
Consider the
massive string of Chap. 2 and Eq. (A.1). The Lagrangian is (up to factors)
L ∝ z(x) 1 + ż(x)2 (here the variable is x). This Lagrangian does not depend on
the variable x. Therefore, the quantity p ż − L, where p is the conjugate momentum
of z, is constant along the curve (it is a “constant of the motion”
in the language of the
present
chapter). One obtains with no difficulty p = z ż/ 1 + ż(x)2 and p ż − L =
−z/ 1 + ż(x) = −c, where c is a constant.
2
46 3 The Analytical Mechanics of Lagrange
If, by the definition of φ(x), we set ż(x) = sinh(φ(x)), we obtain, by inserting this
into the equation,
We conclude that cφ̇(x) = 1, and the solution given in (2.17) follows: z(x) =
c cosh((x − x0 )/c). Using this conserved quantity simplifies the resolution of the
problem.
In general, if we consider a Lagrangian of the form
L = f (z(x)) 1 + ż(x)2 , (3.20)
f (z)ż
p= . (3.21)
1 + ż(x)2
Since the Lagrangian does not depend explicitly on variable x, the quantity
f (z)
A = p ż − L = − , (3.22)
1 + ż(x)2
2
f (z)
A2 (1 + ż 2 ) = f (z)2 ; i.e., ż = − 1. (3.23)
A
This is the generalization of the usual method of integration of the equation of motion
when there is energy conservation.
3.2 Invariances and Conservation Laws 47
and, for s = 0,
dL ∂L ∂ Q i ∂L ∂ Q̇ i
0= = + . (3.28)
ds s=0 ∂qi ∂s s=0 ∂ q̇i ∂s s=0
∂L ∂ Q i
(3.30)
i
∂ q̇i ∂s s=0
Suppose the problem is invariant under translations in space. This is the case for
a free particle, and it is also the case for a system of particles whose interactions
depend only on the relative coordinates: V ({ri − r j }).
In this case, for any infinitesimal transformation ri → ri + ε, the Lagrangian is
invariant:
∂L ∂L
δL = · ε = 0 ∀"; i.e., = 0. (3.31)
i
∂ri i
∂ri
∂Φ
≡ ∇i Φ, (3.32)
∂ri
However, there is another interpretation of the result (3.31). Using the definitions
(3.12) and (3.13) of the momenta and their time derivatives, this relation can be
written as
d
N
d
pi ≡ P = 0, (3.34)
dt i=1 dt
N
where P is the total momentum P = i=1 pi .
Translation invariance in space implies conservation of the total momentum of a
system of particles.
∂L ∂L
δL = · (δφ û × ri ) + · (φ û × ṙi ) . (3.35)
i
∂ri ∂ ṙi
If there is rotation invariance, δL = 0 for all δφ û. Coming back to the definition
of conjugate momenta and their derivatives, we obtain
(ri × ṗi + ṙi × pi ) = 0.
i
In other words,
d d d
(ri × pi ) ≡ Li = L = 0, (3.36)
dt i
dt i
dt
where the angular momentum Li of each particle and the total angular momentum
L are defined by
Li = ri × pi , L= Li . (3.37)
i
To rotation invariance there corresponds the conservation of the total angular momen-
tum.
A problem may have symmetries of dynamical origin, which can be more or less
hidden. We will examine in Chap. 4 some of the many symmetries of the harmonic
oscillator.
The Kepler problem V (r ) = −g 2 /r and L = mv 2 /r + g 2 /r has a well-known
symmetry that comes from the conservation of the Lenz vector. This vector is
p×L r
A= − g2 , (3.38)
m r
where p is the momentum and L = r × p the angular momentum of the particle.
In Kepler’s problem, we must determine six quantities as a function of time
r(t), ṙ(t). Conservation of angular momentum and energy fixes four of them. The
conservation of the Lenz vector, which is perpendicular to the angular momentum
and therefore lies in the plane of the trajectory, fixes the two others. Therefore, the
solution of the problem does not necessitate any quadrature. One consequence is that
in the case of bound states, the trajectory is closed, which is exceptional: Only the
harmonic potential (∝ r 2 ) and the Newtonian potential (∝ 1/r ) lead to this property.
50 3 The Analytical Mechanics of Lagrange
One can convince oneself that the formalisms of Lagrange and Newton coincide in the
case of conservative forces, which derive from potentials. However, the Lagrangian
formalism does not easily accommodate dissipative forces that depend on the veloc-
ity, such as friction. Dissipative forces belong to the mechanics of continuous media,
and we are not much concerned with that here.6
We can nevertheless give, as a concrete example, a Lagrangian method that can
deal with simple dissipative systems by a trick. Consider a system that loses energy
by friction, Joule heating, or any other process. The trick consists in coupling the
system appropriately with a fictitious mirror system that formally absorbs the energy
in such a way that the total energy of the two systems remains constant. Naturally,
one only attributes a physical meaning to quantities or results that possess one.
Consider, for definiteness, a damped harmonic oscillator in one dimension, of
coordinate x, whose equation of motion is
m ẍ + R ẋ + kx = 0. (3.39)
[12].
3.3 Velocity-Dependent Forces 51
They have nothing to do with the linear momentum of the damped oscillator (3.39).
Applying the Lagrange–Euler equations to the two variables {x, x ∗ }, one obtains
the two equations
m ẍ + R ẋ + kx = 0 and m ẍ ∗ − R ẋ ∗ + kx ∗ = 0. (3.42)
The second equation represents an oscillator that “absorbs” the energy lost by the
damping of the first one.
The energy of the set of two systems
E = p ẋ + p ∗ ẋ ∗ − L = ẋ ẋ ∗ + kx x ∗ (3.43)
1 2
L= m ṙ + ṙ · A(r, t), (3.44)
2
where A(r, t) is a given vector field.
The Lagrange equations give, for the x component for instance,
∂A(r, t) d
ṙ · = m ẍ + A x (r, t). (3.45)
∂x dt
Taking into account
we obtain
∂ A y (r, t) ∂ A x (r, t) ∂ A x (r, t) ∂ A z (r, t) ∂ A x (r, t)
m ẍ = ẏ − − ż − − .
∂x ∂y ∂z ∂x ∂t
(3.47)
Therefore, if we introduce the vector field
∂A(r, t)
m r̈ = ṙ × B(r, t) − , (3.49)
∂t
whose form is of obvious interest.
This force is velocity dependent and does not derive from a potential. The magnetic
part qv × B does not work. Let Φ be the electric potential. The potential energy boils
down to its electric part V = qΦ, and the total energy is E = mv2 /2 + qΦ.
The Lagrangian cannot be L = mv2 /2 − qΦ since one would lose track of the
magnetic field.
The result (3.49) shows how a linear dependence of the Lagrangian on the velocity
can solve this problem, owing to the properties of the electromagnetic field.
Maxwell’s homogeneous equations
∂B
∇ · B = 0, ∇ ×E=− , (3.50)
∂t
allow us to express the fields E and B in terms of the scalar and the vector potentials
Φ and A,
∂A
B=∇ ×A, E = −∇Φ − . (3.51)
∂t
Consider a particle of mass m and charge q placed in this electromagnetic field.
We note as usual r and ṙ = v, the position and velocity of the particle.
A possible Lagrangian for this particle is expressed in terms of the potentials A
and Φ,
1
L = m ṙ2 + q ṙ · A(r, t) − q Φ(r, t). (3.52)
2
3.3 Velocity-Dependent Forces 53
d ∂A ∂A ∂A ∂A
A(r, t) = + ẋ + ẏ + ż
dt ∂t ∂x ∂y ∂z
One thing may, however, seem surprising. We have expressed the Lagrangian in
terms of the potentials Φ and A. However, these are not unique. The fields E and B
are invariant under gauge transformations,
∂χ
A → A = A + ∇χ(r, t), Φ → Φ = Φ − , (3.53)
∂t
3.3.4 Momentum
Consider now the conjugate momentum p. From the definition (3.12), we obtain
7See, for instance A. Tonomura et al, Evidence for Aharonov-Bohm effect with magnetic field
completely shielded from electron wave Phys. Rev. Lett 56, 792 (1986).
54 3 The Analytical Mechanics of Lagrange
In other words, in a magnetic field, the momentum p does not coincide with the linear
momentum m ṙ!
Similarly, the angular momentum L = r × p does not coincide with r × mv.
Finally, note that the relation (3.56) and our result (3.52) simply combine in the
expression of the energy (3.19) of a charged particle placed in the potentials Φ and
A. The corresponding Hamitonian H , which we will talk about in the following
chapters, is:
1
H= (p − qA)2 + qΦ . (3.57)
2m
The Lagrangian formalism has the fundamental physical property (which could not
be guessed, neither by Lagrange nor by Maxwell) that it can be extended directly to
the case of a relativistic particle because it is formulated directly in space and time.
Therefore it can be written in the four-dimensional space-time of Minkowski, and
can become the relativistic theory of a charged particle motion. Well see this here on
the case of a massive particle free, or placed in an electromagnetic field.
ηi j a i b j = a 0 b0 − a.b . (3.59)
Note, in the above Lorentz transformation, that we have between the duration and
the distance of the two observed events the relation:
c2 t 2 − r2 = c2 t − r
2 2
. (3.60)
The argument is based on Lorentz invariance. The least action principle must
yields the same equation of motion whatever the relative state of free motion of the
observer. We proceed as in Sect. 3.1. We want to determine the path followed to go
from A(r1 , t1 ) to B(r2 , t2 ) by minimizing the action
t2
S= L(r, ṙ) dt. (3.62)
t1
Consider first a free particle of mass m. We know the result: The motion is linear
and uniform.
Among all possible paths, the free motion corresponds to the largest proper time.
Let dt be the time interval measured by an observer with a relative velocity v
with respect to the particle. The proper time of the particle is dτ = dt 1 − v 2 /c2 .
Therefore, free motion maximizes the quantity
t2
v2
τ= 1 − 2 dt (3.63)
t1 c
This action is Lorentz invariant, whereas the Lagrangian (3.64) is not. This comes
from the fact that in the present approach, time, over which we integrate, plays a
special role. One can get rid of this problem, but we shall not do it here.
We remark that in the limit of small velocities, we recover the non-relativistic
Lagrangian up to a constant: L = −mc2 + mv 2 /2.
We deduce the expression of the energy and momentum by following the same
method as in Sect. 3.2. These quantities are of interest since they are conserved if
space-time is homogeneous. This holds in any reference system.
The conjugate momentum is
∂L mv
p= = . (3.66)
∂v 1 − v 2 /c2
The energy is
mc2
E =p·v−L= or also E = p 2 c2 + m 2 c4 . (3.67)
v2
1− c2
(E/c)2 − p 2 = m 2 c2 . (3.68)
Two points must be emphasized. First, the Lagrangian formalism allows us to prove
that the set {E/c, p} is a four-vector of space-time, whereas neither energy nor
momentum are defined a priori, and we have only worked with positions and veloc-
ities. Second, this property follows, of course, from our starting assumption (3.65)
based on the relativistic invariance of physical laws.
3.4 Lagrangian of a Relativistic Particle 57
The observed velocity of the particle is related to its momentum and to its energy
by
pc2
v= . (3.69)
E
Let us note a simple but amazing result. The duration of an event depends only on
the velocity and not on the distance travelled, even if this speed varies (3.63 is then a
sum). The lepton μ or muon, a heavy electron congener, with a mass 200 times greater
than the electron m μ c2 ∼ 105 MeV, has a mean lifetime of 2.2µs. This particle is
studied at Fermilab in a high energy storage ring with a diameter of 35 meters around
which are positioned superconducting magnets that produce a constant, uniform and
very precise magnetic field. The lifetime measured at this energy (Lorentz factor
γ ∼ 30) with this magnetic field is perfectly consistent with its value at rest deduced
from its circular velocity while the muons undergo a radial acceleration greater than
1018 g. In other words, the magnetic field does not alter its variation.
u μ Aμ = γ(φ − v · A) (3.70)
L I = −q u μ Aμ /γ = q (v · A − φ) (3.71)
has the properties wanted. This expression is identical to the interaction term
of (3.52). It is called the “minimal interaction” of a charged particle with an
electromagnetic field.
We therefore obtain the expression of the relativistic Lagrangian of a particle of
mass m and charge q in an electromagnetic field that derives from the potentials φ
and A,
58 3 The Analytical Mechanics of Lagrange
v2
L = −mc 2
1− + q(v · A − φ). (3.72)
c2
The equation of motion follows from the standard procedure.
1. Conjugate Momentum
Let p be the momentum in the absence of the field, as defined by (3.66):
mv
p= . (3.73)
1 − v 2 /c2
∂L
P= = p + qA. (3.74)
∂v
2. Lagrange–Euler Equations
The equation of motion follows from the Lagrange–Euler equations.
d ∂L ∂L
= . (3.75)
dt ∂v ∂r
We have
∂L
= q(∇(v · A) − ∇φ), (3.76)
∂r
which yields
dP d(p + qA)
= = q(∇(v · A) − ∇φ). (3.77)
dt dt
3. Equations of Motion
We use the relations
dA ∂A ∂A ∂A ∂A ∂A
= + ẋ + ẏ + ż = + (v · ∇)A, (3.78)
dt ∂t ∂x ∂y ∂z ∂t
and
∇(v · A) = (v · ∇)A + v × (∇ × A). (3.79)
dp
= q(E + v × B), (3.80)
dt
where the momentum p and the velocity v are related by (3.73).
3.5 Exercises 59
4. We must take care of the relation (3.73). If we define the kinetic energy Ekin by
mc2
Ekin = , (3.81)
v2
1− c2
by taking the derivative of this equation with respect to time, and taking into
account the definition (3.73), we obtain
dEkin dp
=v· . (3.82)
dt dt
3.5 Exercises
∂ S12
E2 = − .
∂t2
We consider
a non-relativistic particle of mass m in a central potential V (r ), where
r = x 2 + y 2 + z 2 . We denote the velocity v ≡ ṙ and v 2 its square.
We study the problem in spherical coordinates (r, θ, φ) defined by
3.4. Brachistochrone
A popular problem for mathematicians of the 17th century was the brachistochrone
curve. Consider two points A and B in a vertical plane, joined by a curve C. In A,
a massive particle is dropped with zero initial velocity, and it slides without friction
along the curve under the effect of gravity. We want to determine the curve C such that
the time for the particle to go from A to B is minimum. We note z the altitude and x
the abscissa of a point on the curve. The endpoints A and B correspond respectively
to (x = a, z = α) and (x = b, z = β).
A sailboat has velocity v(θ), which is a function of the angle θ between the direction
of the wind and the direction of the boat and also of the norm w of the velocity of the
3.6 Problem. Strategy of a Regatta 61
Fig. 3.2 Diagram of the direction of the sailboat v compared with that of the wind w
wind. We assume that the velocity of the boat v is proportional to the velocity of the
wind w and that it depends on the angle θ chosen by the skipper. For convenience,
in what follows, we shall write this velocity in the form
w 1 1
v(θ) = , with h(u) = u+ . (3.86)
cos(θ) h(tan θ) 2 u
We are interested in the strategy where the sailboat tacks to the wind (i.e., θ ≤
π/2), as shown in Fig. 3.2. We assume that the x component vx of the velocity of
the boat is opposite to that of the wind and that the position of the sailboat along the
x axis always increases with time. We assume the coast is linear (land = half-plane
z < 0, sea = half-plane z > 0).
We assume the wind is parallel to the coast, of direction opposite to the x axis,
and that the norm of its velocity w(z) depends only on the distance z to the coast.
Here, we assume that the velocity of the wind has the form
z0
w(z) = w0 − w1 , (3.87)
z + z0
where w0 is the velocity far from the coast, which is larger than the velocity (w0 −
w1 ) ≥ 0 on the coast z = 0.
1. We denote
dx dz dz
ẋ = , ż = , z = .
dt dt dx
tack). We want to determine the fastest trajectory z(x). Write the expression of
the time dt to go, on this trajectory, from x to x + d x in terms of the functions
w and h. Give the value of the total time T between the starting point and the
arrival.
4. Deduce from (3) the equation that determines the optimal trajectory (which min-
imizes T ).
5. Show that the translation invariance of the problem along the x direction yields
h (z )z − h(z )
= A,
w(z)
where A is a constant.
6. Use the previous result to calculate the trajectory in the form of a function x(z)
(and not a function z(x)). Fix the value of the constant A.
7. Calculate the value of z = dz/d x as a function of z. We assume that z 1 L and
z 1 z 0 . Do you think the result corresponds to the best strategy? If not, what
modifications must the skipper make?
Chapter 4
Hamilton’s Canonical Formalism
The work of Lagrange was followed by the monumental, five volume, Traité de
Mécanique Céleste (Treatise of Celestial Mechanics) of Laplace, published between
1799 and 1825. This work had a crucial importance both in Astronomy and in the
evolution of philosophical ideas.
This leads us to the next step, in the 1830’s, and to the so-called canonical
formalism of Analytical mechanics due to Hamilton.1
Hamilton’s canonical formalism was elaborated in 1834.2 It is more convenient for
a series of problems such as the dynamics of point-like particles. But it is impressive,
above all, by the number of its developments, both in physics and in mathematics. In
the present book, we are mainly concerned with applications to mechanics, but we
shall describe several other spin-offs of Hamilton’s work. This work is actually of an
impressive richness by its completely new mathematical and physical developments,
concerning to the symmetry properties, which it reveals and which allow us to discover
unexpected properties and to tackle entirely new problems.
Hamilton had understood that, unlike intuitive variables, coordinates and their
time derivatives, coordinates and conjugate momenta have symmetric dynamical
1 As in the previous chapter, one can refer to Herbert Goldstein [3] or to Landau and Lifshitz [1]
and [2], for any development which would be missing here.
2 This formalism was developed by Hamilton in: On a General Method in Dynamics, Philosophical
Transactions of the Royal Society, part II for 1834, pp. 247–308, in a brilliant form.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 63
J.-L. Basdevant, Variational Principles in Physics,
https://doi.org/10.1007/978-3-031-21692-3_4
64 4 Hamilton’s Canonical Formalism
roles in Lagrangian mechanics, and this gives a lot of depth both to the problems
considered and to their solution.
This text is primarily oriented towards applications to mechanics, but we will end
with another consequence of Hamilton’s work, dynamical systems, whose founder
was Henri Poincar (1854–1912).
It is in the next chapter that we will describe more fully Hamiltons complete work,
which goes from optics to mathematics: in particular, he invented quaternions.
In Sect. (4.1), we explain this canonical formalism which consists in describing
the state of a system by the conjugate variables positions {x} and Lagrange con-
jugate momenta { p}, and not by positions and velocities. In other words, a system
is described by a point in phase space, its is governed by a Hamiltonian which is
obtained from the Lagrangian by a Legendre transformation.
In Sect. (4.2), we present the Poisson brackets, an important new mathematical
structure. Jacobi considered them as Poisson’s greatest discovery. In fact Poissons
brackets bear the germ of the group theory of Sophus Lie (1842–1899). This study
allows us to have a new vision of conservation laws.
At this point, we will arrive at a discovery of Dirac in 1925. There is a perfect
structural symmetry between analytical and quantum mechanics, if we match the
Poisson brackets of the classical physical quantities, and the commutators of quantum
observables (divided by i).
In Sect. (4.3), we will define the canonical transformations, which have many
applications, and which exemplify the equivalence between the state variables {x}
and { p}.
This will lead us, in Sect. (4.4), to the phase space which is, from the mathemati-
cal point of view, the actual space appropriate to the description of the evolution of a
system of points, contrary to the empirical space of position and velocity variables.
We will establish in particular the Liouville theorem (1809–1882), geometric prop-
erty of the Hamiltonian flow of the trajectories of a system in its phase space. We
will see the remarkable geometric structure of the evolution of a dynamic system, a
structure that cannot appear with the classical variables x , ẋ.
In Sect. (4.5), we will come back to the case of a charged particle in a magnetic
field treated in Sect. (3.3.4) of the previous chapter, where precisely the conjugate
momentum and the classical momentum differ radically. In quantum mechanics, it
has been demonstrated experimentally the fact that the Hamiltonian is expressed in
terms of the potentials, not the fields.
Finally, after writing Hamilton’s canonical equations, which are first order coupled
differential equations of the evolution of the state state variables, we shall present,
in Sect. (4.6), some aspects of dynamical systems. In fact, this type of physical
problem was an amazing source of discoveries, both in mathematics and in physics.
Henri Poincaré founded this field of research in 1885 when he studied the three-
body problem. This leads to fascinating developments, such as the behavior for
t = ∞, attractors and strange attractors, bifurcations, chaos etc. The most famous
strange attractor is the Lorenz attractor, after its inventor, Edward N. Lorenz, who
discovered it in 1963 in a mathematical model of the evolution of the atmosphere.
4.1 Hamilton’s Canonical Formalism 65
Lorenz gave a new and spectacular source of interest to chaos with his “butterfly”
effect in meteorology.
Actually, the formulation (3.2) of the least action principle is not due to Lagrange
(who used a more complicated form). It is due to Hamilton in 1834. Hamilton, one
of the greatest figures of science, was fascinated by Lagrange and by his analyt-
ical mechanics which he called a “scientific poem written by the Shakespeare of
mathematics”.
Hamilton’s canonical formalism was formulated in 1834. It is more convenient
for some categories of problems, but it goes far beyond that. It contains a particularly
fruitful mathematical structure which leads to Lie groups, to dynamical systems, and
many other developments.
Hamilton’s starting point is to describe the state of a system by the variables xi ,
positions, and pi , the conjugate momenta, instead of xi and ẋi .
Suppose that we invert Eq. (3.12) and that we can calculate the {ẋi } in terms of the
{xi } and { pi }, which are our new state variables.3
The problem is to obtain the equations of motion of the {xi } and { pi } in terms of
these same variables, by eliminating the {ẋi }.
The solution consists in performing a Legendre transformation. Let us introduce
the Hamilton function or Hamiltonian
H ({xi }, { pi }, t) = pi ẋi − L. (4.1)
i
∂L ∂L ∂L
d H = p d ẋ + ẋ dp − dx − d ẋ − dt.
∂x ∂ ẋ ∂t
Taking into account (3.12) and (3.13), the first and fourth terms cancel, and the third
one is − ṗ d x. Therefore, we have
∂L
d H = ẋ dp − ṗ d x − dt, (4.2)
∂t
3 Conjugate momenta always exist since the Lagrangian contains a quadratic term in ẋi .
66 4 Hamilton’s Canonical Formalism
∂H ∂H
ẋ = , ṗ = − . (4.3)
∂p ∂x
∂H ∂H
ẋi = , ṗi = − (4.4)
∂ pi ∂ xi
Hamilton’s equations (4.4) form a set of first order coupled differential equations
in the time variable, which is a major advantage. They are symmetric in x and p (up
to the minus sign, which we shall come back to). They possess the major technical
advantage to present directly the time evolution of the state variables in terms of
these same variables.
The value of the Hamilton function is, of course the energy (3.18). If the
Lagrangian does not depend explicitly on time, ∂L/∂t = 0, then ∂ H/∂t = 0 and
the energy is conserved
∂H d
= 0 −→ H = 0. (4.5)
∂t dt
Consider two physical quantities f and g, which are functions of the state variables
(xi , pi ), i = 1, . . . , N and possibly of time. One calls Poisson bracket of f and g
the quantity
N
∂ f ∂g ∂ f ∂g
{ f, g} = − . (4.6)
i=1
∂ xi ∂ pi ∂ pi ∂ xi
Poisson brackets have the following properties, which are straightforward to estab-
lish
{ f, g} = −{g, f } , { f 1 + f 2 , g} = { f 1 , g} + { f 2 , g} (4.7)
4.2 Poisson Brackets, Phase Space 67
{ f 1 f 2 , g} = f 1 { f 2 , g} + { f 1 , g} f 2 . (4.8)
and
∂f ∂f
{xi , f } = { pi , f } = − . (4.10)
∂ pi ∂ xi
df ∂f ∂f ∂f
f˙ = = ( ẋi + ṗi ) + . (4.12)
dt i
∂ xi ∂ pi ∂t
∂f
f˙ = { f, H } + . (4.13)
∂t
In particular, the canonical equations (4.4) are now written in a symmetric way
In the canonical formalism, the Hamiltonian governs the time evolution of the
system. If a physical quantity f does not depend explicitly on time, i.e. ∂ f /∂t = 0
(which amounts to saying that the system is isolated) then its time evolution is
obtained by taking the Poisson bracket of f and the Hamiltonian
f˙ = { f, H }. (4.15)
In fact, the Poisson brackets contain the germ of the group theory of Sophus Lie
(1842–1899).
Poisson Theorem
Theorem 1 If f and g are two constants of the motion, then their Poisson bracket
is also a constant of the motion.
This theorem, due to Poisson, can be derived from the Jacobi identity (4.11)
We assume that f and g are constants of the motion, i.e. {g, H } = 0 and {H, f } = 0.
Therefore,
{H, { f, g}} = 0
and { f, g} is a constant of the motion. In certain cases, this allows one to find new
constants of the motion.
Jacobi wrote, with sharpness: [this is] Mr. Poissons deepest discovery, which, I believe, was
not well understood neither by Lagrange, nor by the many others who quoted it, nor by its
author himself. The theorem I am talking about seems to me the most important in mechanics
[. . .] this truly prodigious theorem remained at the same time discovered and hidden.
{L x , L y } = L z {L y , L z } = L x {L z , L x } = L y . (4.17)
This reflects the 3-dimensional group structure of rotations, S O(3). If the Poisson
bracket of a Hamiltonian with the angular momentum cancels, that Hamiltonian is
rotation invariant.
The above formulas reveal an amazing fact. There is a strong analogy, if not more,
between the structures of analytical mechanics and of quantum mechanics. In quan-
tum mechanics, one proves quite easily what is called the Ehrenfest theorem:4 the
time derivative of the expectation value a of a physical quantity A is related to the
commutator of the observable  and the Hamiltonian Ĥ by the relation
d 1 ∂ Â
a = [ Â, Ĥ ] + . (4.18)
dt i ∂t
If, by definition, we introduce an operator Ȧˆ such that whatever the state vector |ψ
we have
ˆ d
ψ| Ȧ|ψ := a,
dt
then we have the equality of observables
1 ∂ Â
Ȧˆ = [ Â, Ĥ ] + . (4.19)
i ∂t
which has the same structure as (4.13) if one replaces the Poisson brackets by the
commutators, divided by i, of the quantum observables.
The same remark applies to the canonical commutation relations of the conjugate
variables of position x̂ and momentum p̂
In the Lagrangian formalism, the Lagrange-Euler equations keep the same form in any
change of coordinates xi −→ X i (x1 , . . . , xn ) (for instance, changing from Cartesian
coordinates (x, y, z) to polar coordinates (r, θ, φ)). These changes of coordinates in
configuration space are called point transformations.
In the Hamiltonian formalism there exists a much larger class of transformations
under which the equations of motion are invariant. One can, indeed, mix the state
variables, i.e. the positions {xi } and the conjugate momenta { pi }, and perform a
transformation in phase space, whose interest we see below. One calls canonical
transformation a coordinate transformation
such that Hamilton’s equations keep the same form in the new variables. Let
H (X 1 , . . . , X N , P1 , . . . , PN ; t) be the Hamilton function expressed in terms of the
new variables [X i , Pi ], then, in a canonical transformation, by definition one has
∂ H ∂ H
Ẋ i = , Ṗi = − . (4.22)
∂ Pi ∂ Xi
{X i , X j } = 0 {Pi , P j } = 0 {X i , P j } = δi j . (4.23)
∂X ∂H ∂X ∂H
Ẋ = − . (4.24)
∂x ∂p ∂p ∂x
∂H ∂ H ∂ X ∂ H ∂ P ∂H ∂ H ∂ X ∂ H ∂ P
= + ; = + . (4.27)
∂x ∂ X ∂x ∂ P ∂x ∂p ∂ X ∂p ∂ P ∂p
∂ H ∂ H
Ẋ = , Ṗ = − Q E D. (4.28)
∂P ∂X
(a) Comments
1. The extension to an arbitrary number N of variables
(x1 , . . . , x N , p1 , . . . , p N ) → (X 1 , . . . , X N , P1 , . . . , PN )
4. In general, one calls canonical conjugate variables q and p two physical quantities
such that {q, p} = 1. One example, in spherical coordinates, is the azimuthal angle
ϕ and the z component of the angular momentum L z (see exercise (3.3) of Chap.
2).
72 4 Hamilton’s Canonical Formalism
which amounts to
X2 + P2 P
A= , ϕ = arctan( ). (4.31)
2 X
The variables (A, ϕ) are canonical conjugate variables, as one can check with no
difficulty. In these variables, the Hamiltonian reduces to the simple expression H =
ω A. Hence the equations of motion
H = ω A, {A, ϕ} = 1 ⇒ Ȧ = 0 , ϕ̇ = ω, (4.32)
Here, E is the energy of the oscillator, a constant of the motion. The interesting point
about this operation is that we have reduced the problem to a single time-dependent
variable, the angle ϕ. Since the energy, which is proportional to the action A, is
conserved, only the angular variable ϕ evolves. The variable ϕ is a cyclic variable. It
does not appear explicitly in the Hamiltonian, and this results in the properties (4.32)
and (4.33).
The geometric interpretation in the (X, P) space, which is here equivalent to
phase space, is simple. The motion occurs on a circle of radius A = E/ω which
depends on the energy E, constant of the motion. On this circle, the motion of the
point (X, P) is uniform, of angular velocity ω : ϕ = ωt + ϕ0 .
We already mentioned cyclic variables in Sect. (3.2.2). This is a simple example
of the role played such variables in particular in the investigation of integrable sys-
tems.
Coupled Oscillators
1 2 1 1
H= ( p1 + k 2 x12 ) + ( p22 + k 2 x22 ) + K 2 (x1 − x2 )2
2 2 2
separates by the canonical transformation
√ √ √ √
P = ( p1 + p2 )/ 2 , X = (x1 + x2 )/ 2, Q = ( p1 − p2 )/ 2 , Y = (x1 − x2 )/ 2
into
1 2 1
H = H1 + H2 with H1 = (P + ω12 X 2 ) and H2 = (Q 2 + ω22 Y 2 )
2 2
√
where ω1 = k and ω2 = k2 + K 2.
We then obtain, in phase space, a toric motion, periodic or not depending on
whether ω1 /ω2 is rational or not, represented in Fig. (4.1b)
Fig. 4.1 a–Left: Path of the state of a one-dimensional harmonic oscillator in phase space. b–Right:
Dynamics on a torus, in phase space, of two coupled oscillators
74 4 Hamilton’s Canonical Formalism
Note that in this case the total energy of the system can be written as E = E 1 + E 2
where the two energies E 1 and E 2 are constants of the motion. If we refer to (4.33),
we see that the relative scale depends on E 1 and E 2 , that are determined by the initial
conditions.
dΩ = d x1 . . . d x N dp1 . . . dp N . (4.34)
Consider an arbitrary volume Ω of phase space, Ω = dΩ, we claim that this
volume is invariant under canonical transformations
d x1 . . . d x N dp1 . . . dp N = d X 1 . . . d X N d P1 . . . d PN . (4.35)
∂H ∂H
ẋ = , ṗ = − .
∂p ∂x
One calls the flow of this vector field the set of curves whose tangents at each point
are collinear with the vector at this point. We notice that the flow of (ẋ, ṗ), also called
the Hamiltonian flow, is orthogonal in each point to the gradient of the Hamiltonian
at this point
∂H ∂H
∇H = , .
∂x ∂p
In the example (4.29) above, the result is very simple. The trajectories in the (X, P)
plane are circles centered at the origin, and the gradient of H = (P 2 + X 2 )/2 lies
along straight lines going through the origin. This can be stated in the reverse way
: the gradient of H = (P 2 + X 2 )/2 lies along straight lines going through the ori-
gin, therefore the trajectories are circles centered at the origin. This result can be
generalized to any number of variables. One can express the conservation laws of
energy, momentum and angular momentum with geometrical considerations of that
kind (using the corresponding invariance properties).
Consider the form (3.52) of the Lagrangian of a charged particle placed in an elec-
tromagnetic field
1
L = m ṙ2 + q ṙ · A(r, t) − q Φ(r, t). (4.37)
2
The conjugate momentum is
4.5.1 Hamiltonian
1
H= (p − qA(r, t))2 + qΦ(r, t) (4.39)
2m
Which is expressed in terms of the potentials A and Φ, and not the fields E and B.
As an exercise, one can write the Hamiltonian in the relativistic case, (3.72), the
result is
H = m 2 c4 + c2 (p − qA)2 + qΦ (4.40)
where one discovers the “prescription” to introduce the electromagnetic field. One
must substitute p − qA to the momentum p and E + qφ to the energy E in the
expression of the energy-momentum relation for a free particle.
It is that prescription which Schrödinger applied to the free wave equation for de
Broglie waves in order to calculate the energy levels of the hydrogen atom. After
some unexpected mismatches, he finally ended up with his celebrated equation (to
leading order in v/c the hydrogen atom spectrum is nonrelativistic).
More generally, if we note X(t) = (ri (t), pi (t)) the position of the system at time
t in phase space, Hamilton’s equations are of the form Ẋ = F(X), i.e. a first order
differential equation for the evolution of the 2N -component vector X (t). This is
called a dynamical system.
This seems relatively simple when the number of components is small. In fact, this
was the case in the early development of mechanics with the discoveries of Galileo,
Newton and all the mathematicians and physicists who built the evolution in time
of fairly simple systems, even in astronomy (Kepler’s ellipses, Halley’s comet etc.).
Statistical physics, on the other hand, has grown thanks to the methods developed as
a result of Boltzmanns work. But many domains seem out of reach of computers as
soon as the number of variables is too large.
This type of problem has been an important source of discoveries both in mathe-
matics and in physics; one can refer to the book of I. Percival and D. Richards [13]
and to the works of D. Ruelle and I. Ekeland.8
In the case of large dimensions, certainly unattainable by the techniques of dif-
ferential calculus, this type of problem has been an incredible source of discoveries
both in mathematics and physics and in many fields, by studying what are called
Dynamical Systems. The evolution over time of complex systems has been and is at
present one of the very great problems both in basic research, in mathematics, and in
applied research. What is important is not to obtain a numerical precision on a given
quantity, but to identify qualitatively and quantitatively astonishing global behaviors
that a sequence of numbers can hardly express.
8 David Ruelle, “Turbulence, Strange Attractors, and Chaos”, World Scientific. 16: 195, (1995); Ivar
Ekeland, The broken dice, and other mathematical tales of chance, University of Chicago Press.
pp. iv+183. ISBN 978-0-226-19991-7 (1993).
9 See for example Charpentier Eric, Ghys Etienne, Lesne Annick editors (2010). The scientific legacy
attractors and strange attractors; bifurcations, which are sudden changes in the nature
of these flows for certain values of the parameters entering the function F(X), chaos
and the “butterfly effect” in meteorology, etc.
Poincaré proved that in a gravitating system involving more than two bodies, say
planets around the sun, to initial conditions as close as one wishes, there corresponds
a time when two of the planets can be as far away from each other (and from their
starting point) as one wishes. In the 19th century, Laplace and others had extensively
developed perturbation theory which provided extremely accurate predictions for
celestial mechanics. However, in the course of his work, Poincaré showed that the
perturbation expansion never converges; it is only an asymptotic expansion of which
only the first terms are useful and can be used over a finite time interval. This
effect is called chaos. It manifests itself by the fact that two solutions corresponding
to extremely close initial conditions, differ by a quantity increasing with time as
ex p(t/τ ), where the characteristic time τ , called the Liapounov horizon, depends on
the problem considered. It occurs in many other physical problems. Depending on
the system considered, Liapounovs horizon is very variable, it is about 200 million
to one billion years for the solar system.11
A very simple example of a chaotic system is playing dice. In principle, in clas-
sical mechanics, if we were to determine extremely accurately the conditions of the
problem (i.e. the initial conditions, the way to throw the dice, the geometry of the
dice, etc.) on could in principle predict the result of a throw of dice, and the phe-
nomenon would loose its probabilistic character. However, it is quite obvious and
intuitive that the outcome of different experiments on dice would be highly sensitive
to to the initial conditions and that it would require an enormous amount of informa-
tion to make the prediction. It is therefore much more efficient in practice to perform
a probabilistic description of the problem, where one imposes some ignorance on
the initial conditions which are said to be chosen “at random”. This phenomenon is
encountered in celestial mechanics, and many other problems, when initial condi-
tions are close but not “infinitesimally” close, provided the time of evolution is long
enough.
The case of three unequal mass planets orbiting around a “sun” taking into account
their mutual interactions is shown on Fig. (4.2). At the beginning, everything evolves
rather smoothly. However, after some time, the lightest planet is simply ejected from
the system; this is of course compatible with energy conservation, which would not
be the case for a two-body system. By letting the computer run for a longer time,
the two other planets, which have a smooth motion at first, also reach unexpected
configurations.
11Jacques Laskar, Is the Solar System stable ?, Progress in Mathematical Physics, 66, pp 239–270,
(2013); G.J. Sussman, J. Wisdom; Chaotic evolution of the solar system, Science 257 (1992), 56–62.
4.6 Dynamical Systems 79
Fig. 4.2 Evolution of three planets around a star taking into account their mutual interactions. The
time sequence of the pictures must be read from left to right and from bottom to top. The time interval
between two pictures is the same. One sees that at the eleventh stage, the third planet, lighter and
initially close to the second one, is expelled from the system. Pictures due to Jean-François Colonna,
[email protected], http://www.lactamme.polytechnique.fr; all rights reserved
Poincarè’s recurrence theorem12 (1890) says that for almost all “initial conditions”,
a dynamical conservative system with a phase space of finite volume will come back
with time as close as you want to its initial condition, and this repeatedly. That the
volume of the phase space is bounded is not exceptional: if the energy E is constant,
and the potential V is positive, the positivity of the kinetic energy ensures that the
phase space is bounded by the condition V (r) ≤ E.
To establish the theorem, consider a point P point in a D0 neighborhood of the
phase space. Of course, it will evolve and find itself, at time t in a different region D1
of the phase space. These two regions may not have anything in common, but they
have the same volume, according to the Liouville theorem. After kt, with k as big as
you want, the system will end up in a Dk neighborhood, always of the same volume.
It can browse through a region with as many Dk domains as you want. But, since the
volume of these domains is the same and the total volume of the space of the phases
is finite, while their number is as large as we want, necessarily two of these domains
12Barreira, Luis (2006). Zambrini, Jean-Claude (ed.). Poincaré recurrence: Old and new.
XIVth International Congress on Mathematical Physics. World Scientific. pp. 415422.
doi:10.1142/97898127040160 039.
80 4 Hamilton’s Canonical Formalism
Fig. 4.3 Zermelo paradox: the molecules, all to the left of the chamber at t = 0, are found there
again after a certain time T
will always have a non-zero intersection after a sufficiently long time, This dynamic
system will pass, at some time t as close as one wants to its initial condition, and this
repeatedly.
This effect, called the “Zermelo paradox”, thus seems to deprive us of the second
principle of thermodynamics. Indeed, imagine molecules of a gas all enclosed at
t = 0 in a half of a container, as in Fig. (4.3). We remove the intermediate wall, the
molecules will spread throughout the container. But Poincaré’s theorem says there
is a time T where they all end up in the initial half!
The paradox of Zermelo,13 is schematized on Fig. 4.3. Of course, this time T is
23
quite long. It is estimated to be of the order of 210 times a characteristic time of
molecule migration, which easily leads to some incomparably greater duration than
any extrapolation of the age of the universe.
The most famous strange attractor is probably the Lorenz attractor, after its inventor
Edward N. Lorenz,14 who discovered it in 1963 starting from a mathematical model
of the atmosphere. This gave a spectacular new interest in chaos.
Consider the evolution of a rectangular slice of the atmosphere which is heated
from below and cooled from above. There are three variables, x, the convective
flow of the atmosphere, y the horizontal temperature distribution and z the vertical
temperature distribution.
The details of the physics involved is of little interest here. In the Lorenz model,
the evolution of these variables is given by the (Hamiltonian) non-linear differential
system
13 Ernst Zermelo, Über einen Satz der Dynamik und die mechanischen Wärmetheorie, Wied. Ann.
57 (1896, 793).
14 Edward N. Lorenz, Deterministic Nonperiodic Flow, Journal of the Atmospheric Sciences, vol.
20, n0 2, march 1963, p. 130–141; Lorenz, Edward N., The essence of chaos, University of Wash-
ington Press, 1993.
4.6 Dynamical Systems 81
dx
= σ (y − x)
dt
dy
= ρx − y − x z
dt
dz
= x y − βz, (4.41)
dt
where σ is the ratio between the viscosity and the thermal conductivity, ρ is the
temperature difference between the top and the bottom of the slice, and β is the ratio
of the width and the height of the slice.
Lorenz used to solve this problem numerically using hours of computer time at
night,15 by standard successive iteration techniques (xi , yi , z i ) → (xi+1 , yi+1 , z i+1 ).
At that time, this generated kilograms of paper called computer listings. One day,
Lorenz had the idea of redoing a calculation whose solution he had found the day
before, using as a starting point not the last point obtained the day before, but some
intermediate value (xi , yi , z i ) obtained in the calculation. To his great surprise, after
a relatively small number of iterations, the following values appeared completely
different from those obtained previously. Lorenz had rediscovered chaos, due, in
that case, to round-off errors of the numbers he used.
The sensitivity of the results to the initial conditions induces the same type of
difference between two solutions initially close to one another. Lorenz called that
the “butterfly effect”. Actually, the title of one of his talks was: Can the beat of a
butterfly’s wing in Brasil cause a tornado in Texas? Whether or not it is a coincidence,
the “Lorenz attractor” has the shape of the wings of a butterfly.
On Figs. (4.4) and (4.5) one can see the result of an iteration of the Eq. (4.41). We
notice that the time evolution of the point (x, y, z) has a perfectly quiet behavior:
the point turns around on a wing of the attractor, but that unexpectedly it “jumps”
Fig. 4.4 Lorenz attractor viewed from two different sides. The points correspond to a discrete
numerical iteration of (4.41). One can follow the points and observe the sudden and unexpected
transition from one wing of the attractor to the other, which was not possible to predict half a
semi-period before. (Courtesy Jean-François Colonna)
15As an order of magnitude of the performances of computers in the early 1960’s, his first computer,
called Royal McBee, could perform 60 multiplications per second.
82 4 Hamilton’s Canonical Formalism
from one wing to the other at certain times. This occurs unexpectedly in space as
well as in time, in the sense that the trajectories of two points which are initially very
close (in positions and velocities) can become completely different at a later time.
In particular, the two positions can be on different wings of the attractor.
The rigorous proof was only given in 2001 by Warwick Tucker.16
4.7 Exercises
Calculate the Poisson brackets of the three components of the angular momentum
L = r × p.
p×L r
A= − e2
m r
between each other, with the components of the angular momentum and with the
Hamiltonian. What can one conclude on the number of unknowns in that problem?
16W. Tucker, A Rigorous ODE Solver and Smale’s 14th Problem, Found. Comp. Math., vol. 2,
2002, p. 53–117.
4.7 Exercises 83
The problem of three coupled oscillators is treated in analogous manner as the two-
body case of Sect. (4.3) with the Jacobi variables.
The Hamiltonian is
1 mω2 2
H= ( p12 + p22 + p32 ) + (x1 + x22 + x32 )
2m 2
mΩ 2
+ ((x1 − x2 )2 + (x2 − x3 )2 + (x3 − x1 )2 ).
2
The canonical transformation (Jacobi variables) is
√ √ √
X 1 = (x1 − x2 )/ √2 , X 2 = (2x3 − x1 − x2 )/ √6 , X 3 = (x1 + x2 + x3 )/ √
3
P1 = ( p1 − p2 )/ 2 , P2 = (2 p3 − p1 − p2 )/ 6 , P3 = ( p1 + p2 + p3 )/ 3
which gives: i Pi2 = i pi2 , and 3(X 12 + X 22 ) = (x1 − x2 )2 + (x2 − x3 )2 + (x3 −
x 1 )2 .
p2 1
H= + mω2 x 2 (4.42)
2m 2
where x and p are Lagrange conjugate variables.
√ √
1. We set x = X/ mω and p = P mω.
Write the expression of the Hamiltonian in terms of X and P, and calculate the
Poisson bracket {X, P}.
2. We introduce the functions a and a ∗ , the complex conjugate of a, defined by
X +iP X −iP
a= √ , a∗ = √ .
2 2
1
N
2ik(n − n )π
exp ( ) = δnn (Kronecker δ).
N k=1 N
N
p2 1 1
H= [ n + mω2 xn2 + mΩ 2 (xn − xn+1 )2 ] (4.43)
n=1
2m 2 2
where pn is the conjugate momentum to xn and where we use the cyclic convention
x N +1 ≡ x1 .
1. Define the following complex variables
1 2iknπ/N 1 −2iknπ/N
N N
yk = √ e x n , qk = √ e pn (4.44)
N n=1 N n=1
1 −2iknπ/N 1 2iknπ/N
N N
xn = √ e yk , pn = √ e qk . (4.45)
N k=1 N k=1
{y j , qk } , {y ∗j , qk∗ } , {y j , q N∗ −k } , {y ∗j , q N −k }. (4.48)
(c) Write the differential equations satisfied by the variables {yk , yk∗ , qk , qk∗ }.
(d) Write the general expression of {yk (t)}; deduce from it the expression of
{xn (t)}.
3. We assume that at time t = 0 we have y N (0) = 1, ẏ N (0) = 0 and {yn (0) =
0, ẏn (0) = 0, ∀n = N }. Calculate {xn (t)} and interpret the result.
4. Propagation of waves.
We now assume, for simplicity, that ω = 0. We also assume that N 1, so that
sin(kπ /N ) (kπ /N ) for k N . We assume that for t = 0 we have y N −1 = 1,
y1 = 1, yn = 0 if n = (1 or N − 1), and ẏn = 0 ∀n.
86 4 Hamilton’s Canonical Formalism
Fig. 5.1 Examples of caustics: a inside a ring; b light coming out of a glass of water; c caustic of
a parallel beam reflected by an ideal cylindrical mirror
theory of light. His three main articles were eventually named Theory of Systems of
Rays and published from 1828 in Transactions of the Royal Irish Academy.1 To this
must be added his three fundamental publications on mechanics.2
Hamilton was fascinated by the action and the Fermat principle. He understood
that it is a principle of stationary action and not minimal (in a concave mirror, there
are two images of the same object and not just one). The variational principle, also
known as the Hamilton principle, is the essential element of these articles. This
principle, reformulated by Jacobi, resulted in a formulation different from classical
mechanics; it is currently known as Hamiltonian mechanics.
This formulation, like the Lagrangian mechanics on which it is based, is, at first
glance, very mathematical without new impact on physics, it constitutes a more pow-
erful method for solving complex problems. Lagrangian and Hamiltonian mechanics
were developed to describe the movement of discrete systems; they were extended
to continuous systems and field theory.
In Sect. 5.1 we will briefly describe Hamiltons results on geometric optics, where
he chooses to work directly with the action in the form of his characteristic func-
tion S, function of canonical variables (x, p) and not with the Lagrangian or the
Hamiltonian.
Then we’ll see how Hamilton formalized the fact that geometrical optics is the
limit of wave optics for short wavelengths, and we will see its amazing structural
similarity with mechanics (which Hamilton discovered a hundred years before the
discovery of quantum mechanics). In the approximation of small wavelengths known
as the eikonal approximation, the wave propagates with a wave vector locally perpen-
dicular to geometric wave fronts. This corresponds exactly to the Fermat principle,
and the geometric interpretation is nothing else than the Huygens-Fresnel principle.
1 Theory of systems of rays, Transactions of the Royal Irish Academy, vol. 15, 69, 1828; Second
supplement, ibid. vol. 16, 93, 1831; Third supplement, ibid. vol. 17, 1, 1837.
2 On a General Method of expressing the Paths of Light and of the Planets by the Coefficients
Hamilton seeks to show how geometrical optics presents itself as the limit at small
wavelengths of wave optics. In particular, he wants to show the Fermat principle in a
wave propagation equation. For simplicity, we will not follow the original wording
of Hamilton.3 To simplify the presentation, we will consider the propagation of a
scalar wave (and not vector as in the Maxwell equations4 ) in a variable refractive
index medium. The general case of the propagation of electromagnetic waves in an
anisotropic environment, non-conductive of electrical and magnetic susceptibilities
ε and μ, taking into account possible discontinuities between two media and polar-
ization, is treated in the book of Born and Wolf Principles of Optics [14], Chap. 3
and Appendix I. It is enough for our purpose to consider a non-magnetic isotropic
medium, and the result is basically identical to the evolution of a scalar wave.
Consider the propagation of a scalar wave Φ in a variable refractive index medium
n(r), assuming that the medium is inhomogeneous, but isotropic: the refractive index
n depends on the point considered but not on the direction of propagation.
The propagation equation of this wave Φ(r, t) is
n2 ∂2Φ
− ∇ 2 Φ = 0. (5.1)
c2 ∂t 2
3 Theory of systems of rays, Transactions of the Royal Irish Academy, vol. 15, 69, 1828; Second
supplement, ibid. tbf 16, 93, 1831; Third supplement, ibid. vol 17, 1, 1837.
4 The validity of the results was demonstrated in 1911 by A. Sommerfeld and J. Runge Ann. d.
n2 ω2
ϕ + ∇ 2 ϕ = 0. (5.2)
c2
We seek a solution to this equation of the form
where
k0 = ω/c = (2/π)/λ0 , (5.4)
is the modulus of the mean wave vector in the vicinity of the relevant r point, λ0
being the corresponding wavelength.
Hamilton calls the quantity S in (5.3) the characteristic function. In 1895, in an
independent study, Burns gave it the name of eikonal (from the Greek εικων; image
or picture), name that was later kept for Hamilton’s function.
If we insert (5.3) in (5.2), we get, after simplification by eik0 S(r) and dividing by
2
k0 ,
i 1
ϕ0 (∇S)2 − 2∇ϕ0 · ∇S + ϕ0 ∇ 2 S − 2 ∇ 2 ϕ0 = n 2 ϕ0 . (5.5)
k0 k0
In this equation, the cancellation of the imaginary term proportional to 1/k0 can be
written, multiplying by ϕ0 ,
∇ · (ϕ20 ∇S) = 0. (5.6)
This is a conservation equation, in this case conservation of energy. The wave prop-
agates in the direction of ∇S and the energy density is proportional to ϕ20 (Fig. 5.2).
One will find the complete interpretation in terms of the Poynting vector in Born and
Wolf’s book [14], Chap. 3.
Consider the real part, reinstating the wavelength λ,
λ2 2
ϕ0 |∇S|2 − ∇ ϕ0 = n 2 ϕ0 . (5.7)
4π 2
Suppose that the wavelength is very small, that is the n index does not vary over a
wavelength and that the size of instruments (for example diaphragms) is much larger
than λ defined in (5.4). This hypothesis can also be expressed as λ → 0 therefore
k0 → ∞. For Hamilton, this is the limit of geometrical optics.
We then neglect the term in 1/k02 which leads to the fundamental equation of
geometrical optics
(∇S)2 = n 2 (5.8)
dr dS dr
n = ∇S and = · ∇S = n. (5.10)
ds ds ds
The geometric interpretation of (5.2) or (5.10), is nothing but the Huygens-Fresnel
principle. This principle, first wave theory of light, consists in saying that the light
propagates like a wavefront. At every moment t, each point of the wavefront can be
Fig. 5.3 a Two surfaces of S = constant, of infinitesimal separation ds where the light ray k,
perpendicular to the surfaces, follows a path ds of optical length nds; b Flow of rays in the vicinity
of a divergent device
92 5 Action, Optics, Hamilton-Jacobi Equation
considered as a point source. At the next moment t + δt the new wave front is the
envelope of the light ray spheres δr = (c/n)δt centered at each point of the previous
wave front (Fig. 5.4).
As for the Fermat principle, the optical path λ on a C curve with a n index is
λ = C nds, the corresponding light travel time being τ = nλ/c .
Consider, in a beam, the light ray T resulting from P1 passing through a point R1
of the surface area S1 and ending in P2 by passing through R2 on S2 . Any other path
from P1 to P2 that does not pass along the R1 R2 radius would, by triangular inequality,
be longer. S1 and S2 being at equal distance, the Q 1 Q 2 portion is greater than Q 1 Q 2
which is equal to R1 R2 . The radius follows the shortest optical path as required by
the Fermat principle: δ n(r)ds =0. Hamiltons characteristic function principle is
equivalent to the Fermat principle within the bounds of the eikonal approximation.
The principle of least action consists in finding the equations of the motion by mini-
mizing the action defined according to the Lagrangian and the departure and arrival
points by (3.2).
Hamilton discovered in 1831 that it is useful to work with the action itself as a
physical quantity. To do so, we will first express the action as a function of coordinates
and time S(x1 , x2 , . . . , xn ; t), then use the properties we know.
For one degree of freedom, the question is to calculate the values of S along the set
of physical trajectories, that is, as a function of the point and time of arrival (x, t),
The starting point and time being fixed.
5.2 Action and the Hamilton-Jacobi Equation 93
Generally speaking, we will characterize the various trajectories from (x1 , t1 ) and
arriving at (x, t) by the value of the action S(x, t; x1 , t1 ).
The action is defined by
t
S= L(x, ẋ, t ) dt , (5.11)
t1
the (x(t), ẋ(t)) variables assuming in this expression their physical values, which
satisfy the Lagrange-Euler equations.
The variation of the action written in (3.5) is
t
∂L ∂L
δS = δx(t) + δ ẋ(t) dt. (5.12)
t1 ∂x ∂ ẋ
We integrate the second term by parts, but we do not impose any more to arrive at the
same point x(t) but in a neighbouring point x(t) + δx(t) (by maintaining δx(t1 ) = 0)
the integrated term therefore no longer disappears, and we obtain
t
∂L ∂L d ∂L
δS = δx(t) + − δx(t) dt. (5.13)
∂ ẋ t1 ∂x dt ∂ ẋ
By hypothesis, the trajectory is physical, and the right hand side integral cancels.
We thus obtain a variation of the action
∂L
δS = δx(t) = p δx(t), (5.14)
∂ ẋ
or, in general,
N
δS = pi δxi . (5.15)
i=1
Time Dependence
Similarly, if we change the arrival moment t, the action being the integral of the
lagrangian (5.11), we have, obviously,
dS
= L. (5.16)
dt
If the action is seen as a function of coordinates and time, we have
94 5 Action, Optics, Hamilton-Jacobi Equation
∂S ∂S ∂S
N N
dS
= + ẋi = + pi ẋi . (5.17)
dt ∂t i=1
∂xi ∂t i=1
By combining these two equalities, we obtain that the partial derivative of the
action with respect to time is, up to its sign, equal to the Hamiltonian
∂S N
=L− pi ẋi = −H, (5.18)
∂t i=1
and the total differential of the action is therefore written in terms of coordinates and
time
N
dS = pi d xi − H dt. (5.19)
i=1
Since no reference is made to the point and time of departure, the formal expression
of the action is therefore
N
d xi
δS = δ pi −H dt ≡ δ L dt = 0, (5.20)
i=1
dt
Hamilton’s least action principle is written δS = 0, indeed Eq. (5.20) simply gives
N
d xi
δS = δ pi −H dt ≡ δ L dt = 0, (5.21)
i=1
dt
The second term in the integral can be integrated by parts. The integrated term
( p, δx) cancels out since by assumption δx(2) = δx(1) = 0, and we get
(2)
∂H ∂H
δS = δ p [ dx − dt] − δx [dp + dt] , (5.24)
(1) ∂p ∂x
which cancels for any variation (δx, δ p) if and only if the terms to be integrated are
identically zero that is to say
∂H ∂H
dx − , dt = 0, dp + , dt = 0,
∂p ∂x
From the expressions (5.18) and (5.15), we can replace in the Hamilton function
the conjugate momenta pi by the partial derivatives of the action. This leads to the
Hamilton-Jacobi equation
∂S ∂S ∂S
+ H (x1 , . . . , x N , ,..., ; t) = 0. (5.25)
∂t ∂x1 ∂x N
This equation is a single non-linear partial differential equation of first order. In the
same way as the equations of Lagrange-Euler or as the canonical equations, it allows
to calculate the motion.
The function S({xi }; t) completely determines the motion, it is called the Hamilton
main function.
The use of either of these formalisms is a matter of convenience or mathematical
structure of the problem.
The Hamilton-Jacobi equation is particularly suitable for the separation of vari-
ables and the choice of appropriate variables in the problem, to the symmetry of a
problem, as we will see below. One can refer to the book by Landau and Lifshitz,
[1] Mechanics Chap. 7, Sect. 48, for a discussion of this point, which involves a
formulation of canonical transformations more complete than what we have done in
Chap. 3 of the previous chapter.
96 5 Action, Optics, Hamilton-Jacobi Equation
Reduced Action
Suppose Hamiltons H function does not dependent explicitly on time. Then the
energy is conserved. Let E be the energy value of the problem under consideration,
the Eq. (5.18) translates as
∂S
= −E , (5.26)
∂t
that is
S = −Et + S0 (x1 , . . . , x N ) , (5.27)
and, for a conservative system, we see that the variational principle applies to this
quantity: δS0 = 0.
Geometrical Interpretation
The relation (5.15) can also be written in terms of the reduced action
∂ S0
= pi . (5.30)
∂xi
This form shows a simple geometric property that makes a direct link with optics.
Lets put ourselves in Cartesian coordinates for clarity and consider the simple case
where the momenta merge with motion quantities pi = m i ẋi . Consider in the space
the coordinates (x1 , x2 , . . . , x N ), surfaces on which the reduced action is constant
S0 = Cst . The relationship (5.30) means that the vector P ≡ ( p1 , p2 , . . . , p N ) is at
all points orthogonal to these surfaces (Fig. 5.5).
If we consider the simple case of a particle in 3-dimensional space, we see that
the trajectory is, at any point of space, orthogonal to the surface S0 = Cst passing by
this point. At a given time, this property is also valid for the action S. If we note d r̃
5.2 Action and the Hamilton-Jacobi Equation 97
Maupertuis Principle
1
(∇ S0 )2 + V (r) = E , or (∇ S0 )2 = 2m(E − V (r)) . (5.32)
2m
In this problem, the momentum is simply equal to the linear p = m ṙ. The reduced
action (5.29) is therefore written
S0 = p · dr = m ṙ · dr. (5.33)
But the kinetic energy is T = m ṙ2 /2 = m ˙2 /2, and since T = m ṙ2 /2 = E − V ,
we obviously get
2(E − V )
˙ = , (5.35)
m
98 5 Action, Optics, Hamilton-Jacobi Equation
Hence the simple form of the Maupertuis principle given in Chap. 2 Sect. (2.3).
δ 2m(E − V ) d = 0. (5.37)
Of course, we note the great similarity of the equation of the eikonal (5.8) and
the Hamilton-Jacobi equation (5.32) for a material point. The S0 reduced action
of the latter and the eikonal S for a light wave follow the same law if we do the
correspondence
One just has to go backwards on the path that leads to to (5.32), in particular (5.21),
to see that the eikonal approximation corresponds exactly to the Fermat principle
n(r)
δ n(r)d = 0 ⇐⇒ δT = δ d = 0. (5.39)
c
The Fermat Principle and the Maupertuis Principle (5.37) have an obvious simi-
larity if one makes the correspondence (5.38).
Hamilton made this discovery in 1834. He had understood, in 1830, how and in
what limits the geometrical optics was an approximation of wave optics. Fascinated
by variational principles, and in particular by the similarity between the Maupertuis
principle in mechanics and the Fermat principle in geometric optics, he made in
1830 the surprising remark that the formalisms of optics and could be unified, and
(prophetic vision) that the Newtonian mechanics corresponds to the same limit or
approximation, as geometrical optics compared to wave optics.
This remark was ignored by his contemporaries what deplored in 1891 the famous
mathematician Felix Klein (1849–1925). It is true that in 1830 no experiment showed
the role of Planck’s constant in mechanics.
The same idea can be applied to mechanics and the Schrödinger equation. This is
called the semi-classical approximation of Brillouin, Kramers and Wentzel (BKW).
5.3 Semi-Classical Approximation in Quantum Mechanics 99
For example, we will refer to the book of Messiah [15] Quantum mechanics, volume
1, Chap. 6 for all details, in particular in the practical application of this method.
Consider the Schrdinger equation
∂ 2
i ψ(r, t) = − Δψ(r, t) + V (r) ψ(r, t). (5.40)
∂t 2m
We separate in the wave function the modulus and the phase as
i
ψ(r, t) = A(r, t) exp S(r, t) . (5.41)
Substituting in (5.40) and separating the real part and imaginary part, we get
∂S 1 2 ∇ 2 A
+ (∇ S)2 + V = (5.42)
∂t 2m 2m A
∂A 1
m + ∇ A · ∇ S + A∇ 2 S = 0. (5.43)
∂t 2
The second equation expresses the conservation of probability. If we introduce the
probability density ρ and the current probability density J as
A2
ρ(r, t) = ψ ∗ (r, t)ψ(r, t) = A2 , J(r, t) = (ψ ∗ ∇ψ − ψ∇ψ ∗ ) = ∇S ,
2im m
the conservation of probability is written in local form
∂
ρ(r, t) + ∇ · J(r, t) = 0. (5.44)
∂t
This equation amounts, with the form (5.41) and by multiplying (5.43) by 2 A, to
∂ 2
m A + ∇ · (A2 ∇ S) = 0. (5.45)
∂t
This equation is closer to the Eq. (5.6).
The classical approximation is to take the limit → 0 in Eq. (5.42) or
∂S 1
+ (∇ S)2 + V = 0, (5.46)
∂t 2m
which is the classical Hamilton-Jacobi equation.
Therefore, in the semi-classical approximation, the wave function can be consid-
ered as describing a conventional particle fluid without mutual interactions, in the
100 5 Action, Optics, Hamilton-Jacobi Equation
potential V . The density and the current density of these particles are at all times
equal to the quantum probability density ρ and current probability density J.
∂S ∂S ∂S
+ H (x1 , . . . , x N , ,..., ; t) = 0. (5.47)
∂t ∂x1 ∂x N
Jacobi’s Theorem
An important result is useful: Jacobi’s theorem, that we will explain on the simplest
case of one dimension q.
Theorem Let be a an integration constant, and suppose that we know the S(q, a, t)
action. So β = ∂ S/∂a is a constant of the motion.
∂S d ∂ ∂S ∂2 S
β= that is β= + q̇ . (5.48)
∂a dt ∂t ∂a ∂q ∂a
Now, q̇ is, by definition, the derivative of q along the physical trajectory, therefore
∂H d ∂ ∂S ∂ H ∂2 S
q̇ = , and β= + . (5.49)
∂p dt ∂t ∂a ∂ p ∂q ∂a
Moreover, we have
∂ S(q, a, t) ∂ ∂ S(q, a, t) ∂ H ∂2 S
p= therefore H (q, )= . (5.50)
∂q ∂a ∂q ∂ p ∂a∂q
5.4 Hamilton-Jacobi Formalism 101
There are many applications of this equation, especially when variables can be sep-
arated.
Let us confine ourselves here, as an example, to a problem which encompasses
the Kepler problem, and spherical coordinates.
In spherical coordinates (r, θ, φ) the Hamiltonian is written
1 pθ2 pφ2
H= pr2 + + + V (r, θ, φ). (5.52)
2m r2 r 2 sin2 θ
f (θ)
V = V0 (r ) + (5.53)
r2
Multiplying by 2mr 2 we see that this equation separates in the sum of two terms,
one related to the variable θ, the other to the variable r .
We are therefore looking for a solution of the form
S0 = φ + S1 (θ) + S2 (r ). (5.58)
We get
2
d S1 2
+ 2m f (θ) + = a, (5.59)
dθ sin2 θ
2
1 d S2 a
+ V0 (r ) + = E, (5.60)
2m dr 2mr 2
where a is, like E and , a constant of the motion, determined by the initial conditions.
The integration of these equations gives
a 2
S = −Et + φ + 2m(E − V0 (r )) − 2 dr + a − 2m f (θ) − dθ
r sin2 θ
(5.61)
where (E, , a) are arbitrary integration constants .
To obtain the equations of motion, we use Jacobis theorem.
Let’s resume the result (5.61) and consider the three (E, , a) constants of the
motion. From the expression (5.61) of the action, the three constants β E , β , βa are
defined by
∂S ∂S ∂S
βE = , β = , βa = .
∂E ∂ ∂a
The value of these constants is fixed by the initial conditions of the problem. We thus
obtain the trajectory and the equation of the motion (5.61) from the three equations
in one variable obtained from E, and a.
5.5 Exercises
Prove that, with the Hamiltonian (4.39), Hamilton’s equations give the expected
equation of motion.
5.5 Exercises 103
p2
2 = r · ∇V (5.63)
2m
3. What does this equality become if the potential V is a central power law function
V = g r n with r = |r| ?
4. In the above case, what is the relation between the total energy E, the mean kinetic
energy E k and the mean potential energy V for
The Lagrangian formalism acquires its real power when one deals with systems that
possess a large, possibly infinite, number of degrees of freedom. That is the case
in mechanics of continuous media. We will now examine how this formalism deals
with field theory.
In itself, field theory is a vast domain that acquires its completeness when one
considers the quantization of fields and the theory of fundamental interactions. We
cannot ignore the practical importance of this subject in many present technologies,
which range from the acoustics of concert halls to the many modes of communication,
whether terrestrial, submarine, space, with vibratory modes, advanced optics and
electronics. In this rather short chapter, we will only give the principles of Lagrangian
field theory and its application to the electromagnetic field. In the present chapter we
want to explain the principles of Lagrangian field theory and its application to the
electromagnetic field. The classical theory of gravitation is beyond the scope of this
book. It is thoroughly treated in the literature, and we refer the interested reader to
references [2, 17, 18]), for instance.
In Sect. 5.1, we study the principle of the Lagrangian formulation of field theory,
starting with the case of a vibrating string. Actually, the procedure is rather simple.
One starts by considering a discrete problem with finite elements of the string. One
then takes the continuum limit such that a Lagrangian space density appears. It is in
this limiting procedure that one appreciates how well the Lagrangian formalism is
adapted to this type of problem.
The extension to three space dimensions, as well as several degrees of freedom,
is dealt with in Sect. 5.2. One can easily guess the extension of the method to four
dimensional space-time and relativistic fields. In Sect. 5.3, we will consider a scalar
field, and in Sect. 5.4 the electromagnetic field and the Maxwell equations. In Sect.
5.5, we shall say a few words about field equations that are of first order in time. The
first example is the Fourier diffusion equation, which corresponds to a nonreversible
problem; i.e., a dissipative problem. This example is interesting because of the sim-
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 105
J.-L. Basdevant, Variational Principles in Physics,
https://doi.org/10.1007/978-3-031-21692-3_6
106 6 Lagrangian Field Theory
ilarity between the Fourier equation and the Schrödinger equation. We shall see that
a Lagrangian approach can be constructed for the latter but that essentially it leads
nowhere in nonrelativistic quantum mechanics.
The vibrating string is the prototype of a system with an infinite number of degrees
of freedom.
Consider an elastic string of length l, fixed horizontally between the endpoints
x = 0 and x = l (we do not take gravity into account). Its linear mass density ρ is
assumed to be uniform.
We only consider deformations of the string in the transverse plane (transverse
waves). We denote by ψ(x, t) the transverse (vertical) displacement at point x with
respect to its position at rest. For simplicity, we assume that this displacement occurs
in a single direction (the vertical axis).
One can consider the string to be the set of a large number of elements of length
d x, each of which obeys the usual laws of dynamics. In the limiting procedure, this
will result in an infinite number of degrees of freedom.
Consider an element of length d x. Its kinetic energy is
1 ∂ψ 2
d E k = (ρ d x) . (6.1)
2 ∂t
Let τ be the elasticity constant of the string. If the displacement of two succes-
sive elements located at x and x + d x varies compared with its value at rest, the
corresponding potential energy V varies by
⎛ ⎞
2
∂ψ
dV = τ ⎝ 1 + − 1⎠ d x,
∂x
where obviously (∂ψ/∂x)2 1. The variation V of the potential energy of the string
when it is deformed is therefore
2
1 l
∂ψ
V = τ d x. (6.2)
2 0 ∂x
The quantity L that appears in this expression is called the Lagrangian density of
the string. In fact, the action of the string is
2 2
1 ∂ψ ∂ψ
S= L d x dt = ρ −τ d x dt. (6.5)
2 ∂t ∂x
We therefore see how the problem of a wave propagation can be deduced from
a variational principle. Here, the difference between the total kinetic energy of the
string and its potential energy must be as small as possible.
The previous case is slightly more complex than the equations we saw in (2.8) and
(2.9). Indeed, for a field, the dynamical variable ψ depends on several variables. In
the example (6.8), the field ψ depends on two variables, t and x.
More generally, consider n dynamical variables ψk , k = 1, . . . , n, that depend
on m variables xs , s = 1, . . . , m (including time); i.e., ψk (xs ), s = 1, . . . , m.
108 6 Lagrangian Field Theory
We define
∂ψk
ψks ≡ , (6.9)
∂xs
and we denote by [ψks ] the set of partial derivatives of ψk (x1 , . . . , xm ). The Lagrangian
density is of the form
L(ψ1 , [ψ1s ], . . . , ψn , [ψns ])
It is a bit tedious but not difficult to convince oneself that the determination of the
extremum of the action S under the set of all infinitesimal transformations ψk →
ψk + δψk , k = 1, . . . , n, which vanish on the edge of the integration volume once
one has performed all integrations by parts, lead to the generalized Lagrange–Euler
equations
m
∂L ∂ ∂L
= . (6.10)
∂ψk s=1
∂xs ∂ψks
∂ψk
ψ̇k ≡ ,
∂t
we obtain
m−1
∂ ∂L ∂L ∂ ∂L
= − , (6.11)
∂t ∂ ψ̇k ∂ψk s=1
∂xs ∂ψks
Consider again the vibrating string, adding for more generality a linear term in ψ
(which can come from an external force F(x) that we apply at each point). For
simplicity, we define
∂ψ ∂ψ
ψ̇ ≡ and ψ ≡ , (6.12)
∂t ∂x
6.2 Field Equations 109
1
L= [ρ(ψ̇)2 − τ (ψ )2 ] + Fψ, (6.13)
2
which leads to the equation of motion
∂2ψ ∂2ψ
−c 2
= G, (6.14)
∂t 2 ∂x 2
where G = F/ρ.
Since we are interested in the time evolution of the system, we define the density
of conjugate momentum p by
∂L ∂ψ
p= , i.e., here p = ρ . (6.15)
∂ ψ̇ ∂t
1 p2 1
H = p ψ̇ − L = ρ[(ψ̇)2 + c2 (ψ )2 ] − Fψ = + τ (ψ )2 − Fψ. (6.16)
2 2ρ 2
This density depends on ψ and p, but also on ψ , and the form of the canonical
equations must be modified. Inserting (6.16) (i.e., L = p ψ̇ − H ), in the least action
principle, and integrating by parts in the two variables x and t, we obtain
0=δ dt d x( p ψ̇ − H ( p, ψ, ψ ))
One can check that they yield the propagation equation (6.14).
110 6 Lagrangian Field Theory
The previous results allow us to understand the form of the Lagrangian of a scalar field
in three-dimensional space, for instance sound waves in a compressible nonviscous
fluid. Calling ψ(r, t) the compression of the fluid, and c the sound velocity in the
fluid, the Lagrangian density has the form
2
1 1 ∂ψ
L= ρ (∇ψ)2 − 2 . (6.19)
2 c ∂t
Notice that, compared to the vibrating string, space and time derivatives are inter-
changed. The kinetic term (local velocity) comes from a vector quantity, whereas the
potential (the pressure) is a scalar.
With the Lagrangian density (6.19), one obtains the propagation equation
1 ∂2ψ
− Δψ = 0. (6.20)
c2 ∂t 2
The case of the electromagnetic field is more complex and deeper. In fact, it involves
two vector fields, and above all, we must take care of relativistic invariance, which is
the fundamental property of Maxwell’s equations. This problem is treated thoroughly
in the book of Landau and Lifshitz [2], for instance. Here we want to point out the
major features.
Physically, the electromagnetic field cannot be separated from its sources, the
charges, on which it acts. For a system of charged particles in an electromagnetic
field, the action is written in full generality as
where S f ield is the action of free fields, S par t is the action of the free particles in
the absence of fields, and Sint corresponds to the interaction of these particles and
the field, which we know already from Sect. 3.3.2. An electromagnetic field derives
from the potentials A and Φ, the Lagrangian of a particle of charge q and mass m is
expressed in terms of the potentials A and Φ,
Relativistic Field
The previous form (6.22) has the right Lorentz transformation properties.
We said in Chaps. 2 and 4 that Minkowski’s space-time is based on a four-
dimensional vector space with a Lorentz scalar product of signature (+, −, −, −).
An orthogonal base, in this convention, consists of four-vectors: ei , i = 0, 1, 2, 3 that
have orthogonal relationships
gi j a i b j = a 0 b0 − a.b. (6.23)
where ρ and j are respectively the charge density and current density of particles,
and the potential four-vector field is
which is obviously invariant. We have seen that it is easier to work with poten-
tials than with the fields themselves, whose transverse parts are mixed in Lorentz
transformations.
(We keep the same symbol L for the Lagrangian density; the integration runs
along space and time.) The action is invariant since d 3r dt is a relativistic invariant.
112 6 Lagrangian Field Theory
∂A
B=∇ ×A, E = −∇Φ − . (6.27)
∂t
Using the notation ∂ μ = ∂/∂xμ , one expresses the electromagnetic tensor field as
F μν = ∂ μ Aν − ∂ ν Aμ ; (6.28)
The couple of homogeneous Maxwell equations follows from the structure of the
tensor F μν , and the four equations (or identities)
∂ μ F νρ + ∂ ν F ρμ + ∂ ρ F μν = 0, (6.29)
which lead to
∂B
∇ ×E=− , ∇ · B = 0. (6.30)
∂t
The relationship between the sources and the field (second pair of Maxwell equa-
tions)
ρ j ∂E
∇ ·E= c2 ∇ × B = + , (6.31)
0 0 ∂t
is
jν
∂ μ F μν = = μ0 j ν .
0 c2
From this tensor and its covariant conjugate Fμν , we can construct two relativistic
invariants:
2 2 8
Fμν F μν = − (E − c2 B2 ) and μνρσ F μν F ρσ = − E · B, (6.32)
c2 c
where μνρσ , the Levi-Civita tensor, is equal to +1 for a even permutation of μνρσ, to
−1 for an odd permutation and to 0 otherwise. The second invariant is the transverse
orientation of fields and does not contribute to the energy.
6.4 Electromagnetic Field 113
The inhomogeneous Maxwell equations relate the fields to the charge and current
densities.
Suppose there is a given charge and current density { j μ } = (cρ, j), the density of
electromagnetic field lagrangian and interaction lagrangian in the presence of these
sources is therefore
ε0 c 2
L = − jμ Aμ − Fμν F μν with ε0 μ0 c2 = 1. (6.33)
4
The action S is defined as the integral over time and all space
S= L d 3 r dt. (6.34)
We can see that the equations of motion of the electromagnetic field are, in covariant
form,
∂μ F μν = μ0 j ν . (6.37)
Aμ −→ Aμ + ∂μ χ (6.38)
which leaves invariant the Fμν components of the electromagnetic tensor. This allows
to impose an additional condition on the Aμ , for instance the Lorentz gauge
∂μ Aμ = 0. (6.39)
1 ∂2
∂ μ ∂μ A ν ≡ ( − ∇ 2 )Aν = μ0 j ν . (6.40)
c2 ∂t 2
114 6 Lagrangian Field Theory
ρ j ∂E
∇ ·E= , c2 ∇ × B = + . (6.41)
ε0 ε0 ∂t
We see from (6.35) that the physical electromagnetic field in the vacuum, away
from charges, minimizes the difference (E 2 − c2 B 2 ) given the constraints imposed
by the presence of the sources. This was implicit in the example of the simple
electrostatic field in (2.3.1).
In order to deal with equations of first order in time, such as the Fourier diffusion
equation or the Schrödinger equation, we use the technique described in Sect. 3.3.1
for dissipative systems.
The equation satisfied by ψ is the usual diffusion equation. That written for ψ ∗ would
represent a diffusion reversed in time, or a “concentration”.
It is necessary to use similar techniques in order to write in a Lagrangian form
the flow of a viscous fluid (see [12], Chap. 3, Sect. 3).
The Schrödinger equation is not a dissipative system since there is conservation of the
norm and wave propagation. Nevertheless, the formal similarity between its structure
and that of the Fourier equation1 allows us to write a Lagrangian formulation similar
to what we developed above.
We consider the simple case of a particle of mass m placed in a potential V .
Here, the wave function ψ is complex. Therefore, one can simply use its complex
conjugate ψ ∗ as the “mirror” dynamical variable. This amounts to considering the
real and imaginary parts of the wave function as independent dynamical variables.
In direct analogy with (6.42), the Lagrangian density is
2 ∗ ∂ψ ∂ψ ∗
L=− ∇ψ · ∇ψ ∗ − ψ −ψ − ψ ∗ V ψ. (6.44)
2m 2i ∂t ∂t
2 ∂ψ 2 ∂ψ ∗
− Δψ + = −V ψ and − Δψ ∗ − = −V ψ ∗ , (6.45)
2m i ∂t 2m i ∂t
as one can easily check.
The densities of conjugate momenta are
∂L ψ ∗ ∂L ψ
p= =− ; p∗ = = , (6.46)
∂ ψ̇ 2i ∂ ψ̇ ∗ 2i
2
H= ∇ψ · ∇ψ ∗ + ψ ∗ V ψ. (6.47)
2m
This form is appealing since its integral over space is simply the expectation value
of the quantum energy
2 ∗ ∗
H d r= E =
3
− ψ Δψ + ψ V ψ d 3 r. (6.48)
2m
1 One says that the Schrödinger equation is a Fourier equation with an imaginary time.
116 6 Lagrangian Field Theory
6.6 Problem
∂ρ 3 ∂2ρ
a2 + 2 2 − Δρ = 0, (6.49)
∂t v ∂t
called the telegraph equation (see, for instance, Appendix D of [16]) which shows a
propagation term of the neutron density, of individual velocities v which we assume
to be the same and constant here. In the diffusive regime, in reactor cores, this term
is negligible. There exist situations (for instance, neutrino transport in supernovae)
where all terms must be kept owing to the discontinuities of the diffusive medium.
Proceeding as in (3.40), write the form of a Lagrangian from which this equation
is derived.
Chapter 7
Motion in a Curved Space
1 Roland v. Eötvös, Mathematische und Naturwissenschaftliche Berichte aus Ungarn, 8, 65, 1890.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 117
J.-L. Basdevant, Variational Principles in Physics,
https://doi.org/10.1007/978-3-031-21692-3_7
118 7 Motion in a Curved Space
Einstein used to say2 that in 1907, when he was working on how to incorporate
Newtonian gravitation in relativity (the incorporation of electromagnetism was by
construction automatic), he had the “happiest thought of his life” (the original version
is “glücklichster Gedanke meines Lebens”). He was thinking of what someone falling
from the roof would feel. For such an “observer” (and of course as long as they
does not encounter any obstacle) there is no gravitational field (the italics are from
Einstein). If such observers let any object “fall” from their pocket, this object stands
still or has a uniform linear motion with respect to them, whatever its nature, and its
physical and chemical composition; (the resistance of the atmosphere is neglected).
The basic idea of the equivalence principle and its consequences is explained in
many texts including those of David Langlois [17], of Nathalie Deruelle and Jean-
Philippe Uzan [18], and Hans Stefani [20]. The ambition of this chapter is to show
how the notion of motion in a curved space can lead to a theory such that the equality
of the “two masses” emerges naturally.
The equivalence principle can be stated in the following way.
For a short time, the laws of physics in a small laboratory in free fall are the same
as they would be in the same laboratory in an inertial reference frame in the absence
of gravitation.
One usually makes a distinction between this principle, which only concerns the
motion, and the theory of general relativity itself, i.e. the Einstein equations that
relate the curvature tensor of space-time to the energy momentum tensor of matter.
In this book, we shall not describe fully Einstein’s equations and their consequence.
Here we will discuss various initial applications of the principle of equivalence,
such as precession of Mercury’s perihelion and the deviation of light rays by the
gravitational field. In the next chapter we will describe a great discovery of funda-
mental research: the detection of gravitational waves in September 2015. This natural
phenomenon was established one century after it was predicted by Albert Einstein.
We will start by studying the free motion of a particle in a curved space. In
Sect. 7.1, we define what one calls a curved space and introduce the fundamental
notion of the metric of the space. In Sect. 7.2, we will write the motion of a free
particle in such a space. This will lead us, in Sect. 7.3, to a fundamental result: The
physical trajectories are the geodesics of the space; i.e., the curves of minimal (or
extremal) length. As we shall see, this is how the motion of a particle of constant
energy E in a Euclidean space-time, can be transformed into the free motion of a
similar particle in a curved space, which is equivalent to the Maupertuis principle.
This will allow us to understand in Sect. 7.4 the reasoning of Einstein when he
constructed general relativity, and some consequences of this theory. We will display,
three historical examples: the variation of the beat of a clock due to the gravitational
field, the corrections to Newton’s celestial mechanics, and the deviation of light rays
by a gravitational field.
2 See, for instance, A. Pais, Subtle Is the Lord, Chapter 9, Oxford University Press, New York, 1982.
The original letter of Einstein to R. W. Lawson in January 1920 has been found. The published
article, A. Einstein, Nature, 106, 782 (1921), is not as light in spirit.
7.2 Curved Spaces 119
These examples are historical. As we shall see in Sect. 7.5, they are also very
important in present-day astrophysics and cosmology. The deviation of light by a
gravitational field plays an important role via the gravitational lensing effect that
it induces. One application is the search for a baryonic component in the “missing
mass” of the universe. Another is that the mass distribution in the universe, be it the
visible mass or the missing mass, acts as a natural telescope that can enable us to see
faraway objects, and therefore much younger objects. Through this natural cosmic
telescope (or microscope), the universe appears as an endless gallery of gravitational
mirages.
7.2.1 Generalities
3. Gauss’s idea was similar to that of Eratosthenes when he measured the radius of
the Earth by comparing the shadows of two vertical sticks on the same meridian,
in Syene and in Alexandria, on the summer solstice. Eratosthenes had read in a
document the observation that on the day of the summer solstice, and only on
that day, the wells in Syene (Aswan), on the tropic of Cancer, had no shadow
inside at noon. At any other moment, a shadow appeared somewhere on the sides
and on the bottom. Eratosthenes concluded that the sun was, at that moment, at
the vertical of Syene. He measured the shadow of a vertical bar in Alexandria at
noon on the same day on the same meridian.
The measurement gave him an angle of 7◦ and 12 min. He figured out the
distance between Syene and Alexandria (probably the most difficult task in the
experiment) and found a value of the circumference of the order of 40,350 km
(compared with the actual value of 40,074 km). There is part luck in the accuracy
of the value (Eratosthenes did not know how far each city was from the same
meridian). However, intellectually, the experiment is fascinating since it provides
a means to probe the structure of the space in which we live and to measure its
radius.
4. The necessary tool for this type of measurement is to have straight lines (i.e.,
geodesics) of the space. It appears that always, whether it was Thales measuring
the height of the Great Pyramid, Eratosthenes, or Gauss, it was implicit in the
minds of people that light rays are physical entities that possess the “perfect”
mathematical property of propagating along straight lines.
In his celebrated memoir on the theory of surfaces, Gauss understood that the
geometry of a surface is an intrinsic property of the surface, independent of whether
this surface is embedded in a Euclidean space or not. Gauss’s ideas were the starting
points of the developments performed by Riemann.
In order to see whether or not a space is Euclidean, one can check whether the
Pythagorean theorem, the triangle inequality, and the angle formula above are sat-
isfied or not. Analyzing this further shows that everything boils down to measuring
distances and comparing sets of them. Hence the importance of what is called the
metric tensor or simply the metric of the space, which we shall introduce below.
A famous example, due to Einstein, illustrates this fact. Consider four points in a
space, which we denote 1, 2, 3, 4, and let us denote di j the distance between points
i and j. In a flat, Euclidean, space the following relation is always satisfied
7.2 Curved Spaces 121
4 2
d12 d34 + d13
4 2
d24 + d14
4 2
d23 + d23
4 2
d14 + d24
4 2
d13 + d34
4 2
d12
+d12 d23 d31 + d12
2 2 2
d24 d41 + d13
2 2 2
d34 d41 + d23
2 2 2 2 2 2
d34 d42
−d12 d23 d34 − d13 d32 d24 − d12 d24 d43 − d14 d42 d23
2 2 2 2 2 2 2 2 2 2 2 2
One can use an airline schedule (and some courage) to verify that this equality is
not satisfied by Paris, New York, Johannesburg, and Shanghai (or any other set of
four airports), provided one uses the actual distances covered by airplanes going as
“straight” as possible from one place to the other.
In the second form, we make use of Einstein’s convention of summation over repeated
indices.
The inverse g αβ is defined by
In spherical coordinates (r, θ, φ), the metric tensor is also diagonal, but its elements
are no longer constants:
7.2.4 Examples
Sphere S2 in R3
(x d x + y dy)2
ds 2 = d x 2 + dy 2 + ;
R2 − x 2 − y2
x2 y2 xy
gx x = 1 + g yy = 1 + 2 , gx y = g yx = 2 .
R2 −x −y
2 2 R −x −y
2 2 R − x 2 − y2
ρ2 dρ2
w2 + x 2 + y 2 + z 2 = R 2 ; i.e., dw 2 = . (7.3)
R 2 − ρ2
2. Hyperbolic spaces:
Two-sheet hyperboloid:
ρ2 dρ2
w2 − (x 2 + y 2 + z 2 ) = R 2 ; i.e., dw 2 = , (7.4)
R 2 + ρ2
One-sheet hyperboloid:
7.3 Free Motion in a Curved Space 123
ρ2 dρ2
(x 2 + y 2 + z 2 ) − w 2 = R 2 ; i.e., dw 2 = . (7.5)
ρ2 − R 2
3. Parabolic space:
(x 2 + y 2 + z 2 ) ρ2 dρ2
w− = 0; i.e., dw 2 = . (7.6)
2a a2
In all these cases, the metric tensor is expressed as
General Case
One could, of course, continue playing the same type of game as in these examples
by imposing any constraint of the type Φ(x, y, z, w) = 0 in the space R4 . Actually,
one would be far from discovering all three-dimensional curved spaces.
The definition of a curved space consists of choosing the metric {gαβ }; the simple
examples above are only illustrations.
Historically, the most famous example was given by Felix Klein in 1890. It was
a concrete example of the geometries of Gauss, János Bólyai, and Lobatchevsky.
Klein’s “bottle” consists of an analytical geometry where each point is represented
by two real numbers, x1 and x2 , such that x12 + x22 < 1 and where the distance d(x, y)
between two points is defined to be
d(x, y) 1 − x1 y1 − x2 y2
cosh = , (7.8)
a (1 − x12 − x22 )(1 − y12 − y22 )
7.3.1 Lagrangian
Since the particle is free, the Lagrangian boils down to its kinetic part, E kin = mv 2 /2;
i.e., 2
1 ds 1 dxα dxβ
L= m = mgαβ . (7.9)
2 dt 2 dt dt
Note that if the space variables do not seem to appear explicitly in this Lagrangian,
they are present in the metric gαβ .
The conjugate momenta are obtained with no difficulty. Assuming the metric is
symmetric, gαβ = gβα , which does not restrain the generality, one obtains
∂L dxβ
pα = = mgαβ . (7.10)
∂ ẋ α dt
The Hamiltonian is
H = pα ẋ α − L = L. (7.11)
The value of the Hamiltonian is the same as the value of the Lagrangian as it should
be since we consider a free particle. (Of course, the Lagrangian and Hamiltonian
functions are not expressed with the same variables.) We deduce a consequence
which is both obvious and important. Because of energy conservation, the square of
the velocity is a constant of the motion along the trajectory.
d 1 2 d d
mv = L= H =0 . (7.12)
dt 2 dt dt
We shall use this in the next section. The expanded form of this equation is
∂gνβ α β 1 ∂gαβ α β
m(gνβ ẍ β ) + m α
ẋ ẋ = m ẋ ẋ . (7.14)
∂x 2 ∂x ν
We notice, and it is not surprising, that the mass cancels off identically:
7.3 Free Motion in a Curved Space 125
The free motion of a particle in a curved space is independent of the mass of the
particle. The trajectory only depends on the initial conditions.
We shall make no further use of these symbols in this chapter, but we mention them in
the next. It is a good example to show that the formal complexity of general relativity
is a matter of writing; what is subtle is the physics.
1. Motion on S2
One can, as an exercise, recover that the motion on a usual sphere S2 is a uniform
motion on a great circle.
2. Free motion on S3
Consider now the case of the three-dimensional “spherical” space of (7.3); i.e.,
the free motion on the sphere S3 . Obviously, the volume of this space is also
finite since ρ2 = x 2 + y 2 + z 2 ≤ R 2 .
The motion exhibits more diversity that on S2 .
In spherical coordinates, the Lagrangian of the problem is
m R2
L= ρ̇2 + ρ θ̇ + ρ sin θφ̇
2 2 2 2 2
. (7.17)
2 R 2 − ρ2
d 2 A
(ρ φ̇) = 0 =⇒ φ̇ = , (7.19)
dt ρ2
2R 2 E
A2 ≤ , (7.21)
m
which is a direct consequence of the fact that the energy is greater than
the rotational energy m A2 /2ρ2 . This is a consequence of (7.135); i.e., E ≥
m A2 /2ρ2 ≥ m A2 /2R 2 .
2E m A2
ω2 = and γ2 = . (7.22)
m R2 2E R 2
From (7.136), we have the inequality
γ 2 ≤ 1. (7.23)
1
E= m(ẋ 2 + ẏ 2 ) + V,
2
where the “effective potential” V is energy dependent:
1 ρ2 ρ̇2 Eρ2
V = m 2 = 2 .
2 R −ρ 2 R
2E
2 = .
m R2
(d) Of course, if the square of the velocity is a constant in the curved four-
dimensional space, this is not the case if one visualizes the phenomenon in
a Euclidean plane, as above.
(e) The simplicity of the result is intuitive. Quite obviously, as one can see in
the definition (7.3), the symmetry of the problem is much larger than the
sole rotation in R3 . There is a rotation invariance in R4 . The solutions of
maximal symmetry correspond to a uniform motion on a circle of radius R
in a plane whose orientation is arbitrary in R4 . The whole set of solutions
is obtained by projecting these particular solutions on planes of R3 , which
leads to the elliptic trajectories that we have found.
7.4.1 Definition
In the case of a positive metric, a possible definition of a geodesic line, hereafter called
a geodesic for simplicity, going through two points A and B is that it is the curve of
minimal (extremal) length between these two points. In differential geometry, there
are other equivalent definitions.5
Therefore, a geodesic is the path that minimizes the length
5 In Minkowski space, light rays follow trajectories of vanishing “length.” The notion of parallel
transport allows us to overcome this apparent difficulty; see [8, 21], or [23].
128 7 Motion in a Curved Space
B B
s AB = ds = gαβ d x α d x β ; (7.24)
A A
i.e., the path {X α } such that δs AB = 0 for any infinitesimal variation {δx α , δ ẋ α }.
Considering an arbitrary parameterization {x α (λ)} of the path, we must find the path
that minimizes the integral
B B
dxα dxβ
s AB = ds = gαβ dλ. (7.25)
A A dλ dλ
The assertion above is that these paths are the same as those along which the
action B B
1 dxα dxβ
S= L dt = m gαβ dt (7.26)
A 2 A dt dt
is stationary.
The variational problem posed in Eq. (7.25) is similar in every way with those con-
sidered in Chap. 2. Consider a variation
where
dν
˙ν = and ν (A) = ν (B) = 0. (7.28)
dλ
To first order, the variation of s AB is
B
1 ∂gαβ α β ν β ν
δs AB = ẋ ẋ + 2gνβ ẋ ˙ dλ, (7.29)
A 2 gαβ ẋ α ẋ β ∂x ν
where we have set ẋ α ≡ (d/dλ)x α . We now integrate the second term by parts.
Consider the quantity
F = gαβ ẋ α ẋ β .
We have
B
∂gαβ α β
1 d β β d 1
δs AB = ẋ ẋ − (2g νβ ẋ ) − (2gνβ ẋ ) ν dλ = 0.
A ∂x ν
2F dλ dλ 2F
(7.30)
This variation must vanish for any {ν }, and we obtain the equations
7.4 Geodesic Lines 129
1 ∂gαβ α β d β β d 1
ẋ ẋ − (2gνβ ẋ ) − (2gνβ ẋ ) = 0 ∀ν. (7.31)
2F ∂x ν dλ dλ 2F
These equations are simplified if one makes an appropriate choice of the parameter
λ. Consider the choice λ = s; i.e., λ is the length along the geodesic and dλ = ds.6
Then, by definition, inserting this in Eq. (7.25), we have along the geodesic
dF
F =1, and = 0.
dλ
Consequently, the equation of the geodesic becomes
∂gαβ ∂x α ∂x β d ∂x β
ν
− 2gνβ = 0 ∀ν. (7.32)
∂x ∂s ∂s ds ∂s
Not only does this equation have the same form as the equation of motion (7.13),
but it is equivalent to it. Indeed, we can choose λ = t. In the case of a free motion,
we have seen that v = ds/dt is a constant along the trajectory. Therefore, ds = vdt
and the factor 1/v 2 cancels off identically in (7.32).
We have proven our assertion. The trajectories of a free particle in a curved space
are the geodesics of the space. In other words, the trajectory followed by a free
particle to go from A to B in a curved space is the path of shortest length.7 Galileo’s
principle of inertia appears as a particular case of this property in flat space.
7.4.3 Examples
If we keep in mind how we have treated Example 2 of the free motion on S3 , we can
use the constants of the motion in order to determine the geodesics in simple but non
trivial cases that are not totally academic.
1. Isotropic spaces
Metrics of the form (7.7), or more generally
ρ
g(σ)
dσ = t − t0 . (7.35)
ρ0 (2E/m − A2 /σ 2 )
The only difficulty lies in the inversion of this formula in order to obtain the
dependence ρ(t).
2. Hyperbolic geodesics
Consider the metric
R2
ds 2 = dρ2 + ρ2 dθ2 + ρ2 sin2 θ dφ2 , (7.36)
ρ2 + R 2
ds 2 = d x 2 + dy 2 + dz 2 − dw 2 , (7.37)
w 2 = x 2 + y 2 + z 2 + R 2 = ρ2 + R 2 . (7.38)
d 2 A
(ρ φ̇) = 0 =⇒ φ̇ = (7.41)
dt ρ2
The solution of the problem is obtained rather easily. One defines the parameters
ω and γ as before,
2E m A2
ω2 = and γ 2
= . (7.43)
m R2 2E R 2
We obtain in a way similar to example 2, except that hyperbolic functions replace
trigonometric functions,
ρ(t) = R γ 2 cosh2 ω(t − t0 ) + sinh2 ω(t − t0 ) (7.44)
and
tan(φ(t) − φ0 ) = γ coth ω(t − t0 ). (7.45)
We notice that the distance to the origin increases exponentially when |t| → ∞.
The geodesics of the metric (7.126) are hyperbolas
x 2 − y 2 /γ 2 = R 2 . (7.46)
1
N
L= m i, j (q)q̇i q̇ j − V (q). (7.47)
2 i, j=1
Here, we denote by m i, j (q) the coefficients of the quadratic form that constitutes the
kinetic energy. In Cartesian coordinates, m i, j (q) is diagonal and does not depend on
the coordinates. This is no longer true in general.
The conjugate momenta are
∂L N
pi = = m i, j (q)q̇ j . (7.48)
∂ q̇i j=1
1
N
E= m i, j (q)q̇i q̇ j + V (q). (7.49)
2 i, j=1
N
N
d qj
d S0 = pi dqi = m i, j (q) d qi , (7.51)
i=1 j=1
dt
In this form, we see that the trajectory that minimizes the reduced action S0 is a
geodesic of a curved space whose metric, which depends on E, is given by
N
ds 2 = 2(E − V (q)) m i, j (q)d qi d q j . (7.53)
i, j=1
q
i, j m i, j (q )d qi d q j
t − t0 = . (7.54)
q0 2(E − V (q ))
The scheme we have studied up to now is appealing since the equality between the
inertial and the gravitational masses follows automatically. However, the theory has
an embarrassing by-product in that the norm of the velocity is a constant in time.
In order to get rid of this defect, we must introduce the time variable in the problem
and extend it to space-time and not only space.
Our purpose here is not to enter the domain of relativistic gravitation and gen-
eral relativity as a whole (see, for instance, [2, 20, 21], or [22]). We only want to
introduce a curvature of space-time that, at least to lowest order in v 2 /c2 , v being a
characteristic velocity of the problem, allows us to recover Newton’s usual equations
while maintaining the nice properties encountered above, in particular the fact that
the mass drops out from the equations of motion.
What metric of space-time can we choose in order to achieve this program?
We have seen in Chap. 3 that the Lagrangian of a free relativistic particle is
v2
L = −mc2 1 − . (7.56)
c2
The Lorentz-invariant action is
t2
v2
S = −mc2 1− dt. (7.57)
t1 c2
8Unfortunately, we follow the particle-physics tradition of having a metric with negative space
components instead of positive ones as we have used above.
134 7 Motion in a Curved Space
1
L = −mc2 + mv 2 − mφ, (7.60)
2
and an action
v2 φ
S= Ldt = −mc c− + dt. (7.61)
2c c
we end up quite naturally, to lowest order in v 2 /c2 and in φ/c2 , with the expression
of the invariant element
2φ
ds 2 = c2 1 + 2 dt 2 − dr2 . (7.63)
c
This is the simplest, or most naive, extension of the metric (7.55) which accounts for
the phenomena that interest us
2φ
g00 = 1 + 2 , gαβ = δαβ . (7.64)
c
r̈ = −∇φ, (7.65)
which involves neither the velocity of light, since we are in non-relativistic mechanics,
nor the mass, of course.
In the theory of general relativity, the metric is related to the mass distribution (actu-
ally to the energy-momentum tensor) by Einstein’s equations.
An exact solution of these equations was given in 1916 by Karl Schwarzschild. It is
the metric generated by the static gravitational field of an isotropic mass distribution
7.5 Gravitation and the Curvature of Space-Time 135
2G M
r0 = . (7.67)
c2
One can write this formula in the more general way
2Φ(ρ) 2 2 dρ2
ds = 1 +
2
c dt − − ρ2 (dθ2 + sin2 θ dφ2 ), (7.68)
c2 1 + 2Φ(ρ)
c2
Some arbitrariness remains as far as the couple of variables (ρ, t) are concerned.
Here, these variables are chosen so that there is no off-diagonal term dρ dt in the
metric.
The proof of this formula is, of course, beyond the scope of this book. One can
refer to Landau and Lifshitz [2], Sect. 97, and to Misner, Thorne, and Wheeler [23],
Chap. 25. One can find the complete description of black holes (i.e. physics inside
the Schwarzschild radius) in [23].
The “naive” metric (7.63) is the approximation of (7.66) to lowest order in v 2 /c2
and φ/c2 .
We remark on the form (7.66) that its spatial part is not locally Euclidean. There is
no local rotation invariance, which is intuitive since the radial variable plays a special
role. When fields are weak (i.e. r0 /ρ 1), or at large distances, one can use locally
Euclidean space variables (x, y, z), and, to a good approximation, the Schwarzschild
metric (7.66) is of the form
136 7 Motion in a Curved Space
r0 2 2 r0
ds 2 = 1 − c dt − 1 + (d x 2 + dy 2 + dz 2 )
r r
r0 2 2 r0
= 1− c dt − 1 + (dr 2 + r 2 (dθ2 + sin2 θ dφ2 )), (7.70)
r r
where (r, θ, φ) are the usual spherical coordinates. (The proof of this result can be
found in [2], Sect. 97).
We notice that if the metric (7.64) gives us the classical Newton equation, it “curves”
time at each point in space. In that respect, it is in full agreement with the gen-
eral solution of Schwarzschild, which predicts a dilation of the proper time in an
algebraically increasing gravitational potential
√ Φ
dτ = g00 dt 1 + 2 dt. (7.71)
c
This effect, as well as the “twin effect” of special relativity, has been measured
with great accuracy by R.F.C. Vessot and collaborators.9 A hydrogen maser was
sent to an altitude of 10,000 km by a Scout rocket, and the variation in time of its
frequency was made as the gravitational potential increased (algebraically). There
are many corrections, in particular due to the Doppler effect of the spacecraft and to
the Earth’s rotation. It was possible to test the predictions of general relativity on the
variation of the pace of a clock as a function of the gravitational field with a relative
accuracy of 7 × 10−5 . This was done by comparison with atomic clocks, or masers,
on Earth. Up to now, it has been one of the best verifications of general relativity.
The recording of the beats between the embarked maser and a test maser on Earth is
shown in Fig. 7.1. (These are actually beats between signals, which are first recorded
and then treated in order to take into account all physical corrections.)
To next order, the Schwarzschild metric curves space. This causes a variety of observ-
able phenomena in celestial mechanics. Among these is the famous precession of
the perihelion of planets and comets .
Here we choose to work with the form (7.70). In fact, the value of Schwarzschild’s
radius is rs = 2G M/c2 , rs = 3 km for the sun and rs = 0.44 cm for the Earth. It is
very small compared to the orders of magnitude of celestial mechanics in the solar
system (1 A.U. = 150 × 106 km). The effects are small corrections to the Newtonian
terms.
The length element is given by
GM
with α = 1.
r c2
For non-relativistic velocities, one has to good approximation
(1 + 3α) 2
ds = (1 − α)c − dr + r 2 (dθ2 + sin2 θ dφ2 ) dt (7.73)
2c
Classical Calculation
1 G Mm
L= m(ṙ 2 + r 2 φ̇2 ) + .
2 r
Let E be the energy of the planet and Λ the norm of its angular momentum. For
convenience, we define
E Λ
E≡ and L≡ . (7.75)
m m
Conservation of angular momentum yields
We study the trajectory r (φ), from which the time dependence follows by using
(7.76). We define r ≡ (dr/dφ), therefore ṙ = r dφ/dt = r L/r 2 . By introducing
the variable u(φ) = 1/r (φ), we obtain the first-order equation
2E 2 2G Mu
2
= u + u2 − . (7.78)
L L2
The trajectory is obtained by a simple quadrature (one can take the derivative of
(7.78) with respect to φ, which leads to a linear equation whose general solution is
inserted into (7.78) in order to fix the constants):
1 + e cos φ L2 2E L 2
u= with p = and e = 1+ . (7.79)
p GM G2 M 2
7.5 Gravitation and the Curvature of Space-Time 139
where one recognizes, for negative energies (E ≤ 0), an ellipse of parameter p and
eccentricity e.
Relativistic Correction
With the curvature of space-time, the motion remains planar. One chooses as above
θ = π/2, and the Lagrangian is given by (7.74); i.e.,
1 3G M G Mm
L= m 1+ (ṙ 2 + r 2 φ̇2 ) + . (7.81)
2 r c2 r
E Λ 1
E≡ , L≡ , u= . (7.82)
m m r
We make use of the parameter p, the eccentricity e of the Newtonian ellipse, and the
parameter λ defined by
L2 2E L 2 3G M 3G 2 M 2
p= , e= 1+ , and λ = = . (7.83)
GM G2 M 2 pc 2 L 2 c2
2E (u 2 + u 2 ) 2
= − u. (7.85)
L2 (1 + c2 r )
3G M p
Of course, we notice that in the absence of the relativistic correction (λ = 0), the
solution is
v0 = 1 + e cos φ. (7.88)
This is a necessary condition (the complete solution is obtained by inserting this into
(7.87)).
First-Order Perturbation
where v0 is the Kepler solution v0 = 1 + e cos φ and v1 is the correction that interests
us. Inserting this into (7.89) and retaining only the first-order terms in λ, we obtain
the equation
3 + e2
v1 + v1 = + 2e cos φ, (7.91)
2
whose solution is
3 + e2 e
v1 = + eφ sin φ + α + sin φ, (7.92)
2 2
α being an arbitrary constant that we choose to be equal to zero. We notice that to
first order in λ the initial equation (7.87) is satisfied.
The complete solution of the problem in first-order perturbation theory, taking
into account that cos(1 − ε)φ cos φ + εφ sin φ, is therefore
1 GM 3G 2 M 2 3G 2 M 2 3 + e2 e
= 2 1 + e cos 1 − 2 2 φ+ 2 2 + sin φ . (7.93)
r L c L c L 2 2
This is the equation of a deformed ellipse that precesses. The precession of the major
axis in one period (δφ = 2π) corresponds to an angle
7.5 Gravitation and the Curvature of Space-Time 141
6πG 2 M 2 6πG M
Δω 2 , (7.94)
c2 L 2 c a(1 − e2 )
where a is half of the major axis of the ellipse and e its eccentricity.
The parameters of the planet Mercury are a = 55, 3 × 106 km, ν = 1/T = 415
revolutions per century, and its eccentricity is e = 0.2056 (the mass of the sun is
M = 2 × 1030 kg). The calculated value is
compared with the observed 43.11 ± 0.45 per century. Einstein said that this result
was the strongest emotional experience of his scientific life.10
Another effect of the metric (7.66) and the corresponding geodesics is the deviation of
light rays by a gravitational field. This effect, which was one of the first verifications of
general relativity, in 1919, has regained considerable interest in recent years because
of its astrophysical and cosmological consequences through the gravitational lensing
effect that it induces.11 We use the weak-field approximation
2Φ(r ) 2Φ(r )
ds 2
∼ c dt
2 2
1+ − 1− dr2 . (7.95)
c2 c2
The most important astrophysical use of this effect is the gravitational lensing
effect it produces on remote objects. This effect comes from the gravitational curva-
ture of photon trajectories that it produces, as shown in Fig. 7.2.
In order to calculate the trajectory of a photon, we can use the fact that the proper
time dτ of a photon is zero; i.e., inserting this in (7.95), we have
2Φ(r ) 2Φ(r ) dr2
dτ 2
= 0 = dt 1 + 2
− 1− , (7.96)
c2 c2 c2
θ
r
R O
Fig. 7.2 Deviation of a photon trajectory in a gravitational potential Φ(r ). This potential is assumed
to be spherically symmetric. The photon position at a given time is parameterized by r and θ. The
straight line represents the trajectory of a photon in the absence of a gravitational field. In the
presence of this potential, the photon “falls” in the gravitational field (full curve)
With this expression, we can calculate the photon trajectory by the Fermat prin-
ciple exactly as for curved rays in Eq. (2.5); i.e., by minimizing the integral
B
d
T = , (7.98)
A v
where A and B are the endpoints of the photon trajectory and d is the length element
along the trajectory.
We assume that the potential Φ(r ) is spherically symmetric and centered at the
origin. We consider the motion in the plane (AO B) and we use polar coordinates
(r, θ) as shown in Fig. 7.2. We consider a situation where A and B are symmetric to
each other, so that θ = 0 corresponds to the point of shortest distance to the origin. It
is convenient to determine the function r (θ) that minimizes the time T . Under these
conditions, Eq. (7.98) can be written as
1 B
1 + r 2 θ̇2
T = 2Φ(r )
dr, (7.99)
c A [1 + c2
]
where θ̇ = dθ/dr .
We consider the potential created by a total mass M, and we assume the photon
path is outside the mass distribution so that we can set
2Φ(r ) 2G M λ
2
=− 2
≡ , (7.100)
c rc r
1 + r 2 θ̇2
L= . (7.101)
[1 + λr ]
where R is a length that is a constant of the motion. Solving this first-order differential
equation causes no special difficulty. We change to dimensionless variables
and, by squaring and taking into account that the gravitational term is small, μ/x 1,
we obtain
2μ
x 4 θ 2 = (1 + x 2 θ 2 ) 1 + . (7.104)
x
1 R
θ = √ ; i.e., θ = arccos or r cos θ = R,
x x −1
2 r
whose solution is
R λr
θ = arccos − √ (7.106)
r R r 2 − R2
GM
R = r0 (1 − ε) with ε= .
r0 c2
One can check that, to the same order, R is nothing but the impact parameter of the
photon (i.e. the distance between its trajectory, which is linear at long distances, and
the parallel line going through the center r = 0).
What is more interesting is the angular deflection compared with a straight line.
In the absence of the gravitational field, the photon follows a straight line, so that the
144 7 Motion in a Curved Space
difference between the direction of arrival and the direction of departure is Δθ∞ 0
= π.
This direction of departure is also the (Euclidean) direction of observation of the
source that emits the photon.
In solution (7.106), in the presence of the gravitational potential, this same differ-
ence is twice the difference θ(r = ∞) − θ(r = r0 ). By definition, θ(r = r0 ) = 0. For
r → ∞, Eq. (7.106) gives θ(r = ∞) = ±π/2 − λ/R, according to whether it is the
initial or final direction of the photon. The difference between the direction of recep-
tion of the photon and the geometrical direction of its source is Δθ∞ GM
= π − 2λ/R.
In other words, one observes a deflection of the light rays compared with a straight
line, due to the gravitational potential, of
4G M
Δθ∞ = Δθ∞
GM
− Δθ∞
0
= , (7.107)
Rc2
where R is the impact parameter or, to good approximation, the closest distance
between the photon and the center of the potential.12
For a light ray coming from a star and grazing the edge of the sun, the calculated
deflection is 1.75 . In the case of Jupiter, it is 0.02 .
The first measurement of this effect was performed by teams led by Sir Arthur
Stanley Eddington.13 It was done on the Sobral Islands in Brasil and the Principe
Islands in the Gulf of Guinea on May 29, 1919. The experiment consisted in observing
the apparent motion of stars (seven at Sobral and five at Principe) during a total eclipse
of the sun. The results, 1.98 ± 0.16 at Sobral and 1.61 ± 0.31 at Principe, were
in agreement with Einstein’s prediction. It is most probably this experiment that
generated the public’s interest in relativity and the fame of Einstein himself.
The most precise measurement at present comes from interferometric radioastro-
nomical observation of radio waves coming from the source 3C 279.14 It gives the
result 1.77 ± 0.20 .
The effect we have just described, the action of a gravitational field on the propa-
gation of light, shows that a vast field has now been opened in what one could call
gravitational optics.
12 It is amusing that this is exactly twice as much as the deflection that a Newtonian argument would
give using the Rutherford scattering formula.
13 F.W. Dyson, A.S. Eddington, and C. Davidson, Philos. Trans. R. Soc. London, Ser. 220 A, 291
As we shall see, the most important cosmological use of this effect is through gravi-
tational lensing of light on remote objects in the universe. This effect is due to the fact
that mass (not only mass of galaxies but also of “dark matter”) in the universe acts as
an optical instrument that can enable one to observe faraway objects and therefore
very “young” objects.
Two potentials are of particular interest. The first is that of a point-like mass M:
∂Φ GM 4G M
= ⇒ α = . (7.108)
∂r r2 Rc2
The angle of deflection α is of the order of the gravitational potential Φ(R) at the
shortest distance of approach or, equivalently, the square of the velocity vc2 /c2 of an
object orbiting on a circle at the shortest distance of approach.
The second potential of interest gives a constant rotation velocity and is a good
approximation for extended objects such as galactic halos of clusters of galaxies:
∂Φ v2 vc2
= c ⇒ α = π . (7.109)
∂r r c2
Here, vc is the (constant) circular velocity of objects orbiting in the galaxy or in the
cluster of galaxies.
As shown in Fig. 7.3, the gravitational deflection can yield two images of a source.
The two images have impact parameters b1 and b2 . The potential created by a point-
like mass always gives two images because the angle of deflection diverges for small
values of the impact parameter. We will see later that one of the images is in general
much more luminous than the other.
In the case of an extended mass distribution, such as a cluster of galaxies (7.109),
one can only observe two images separated by an angle α if the undeflected impact
parameter satisfies b0 < bmax = Lα/2 where L is the distance between the source
and the lens (which we take here to be equal to the distance between the observer
and the lens for simplicity). The reason is that if b0 were larger than Lα/2, the two
images would be on the same side, which is impossible.
The large clusters correspond to α = vc2 /c2 ∼ 10−5 , and the two images can be
separated by terrestrial telescopes of resolution σθ ∼ 3 × 10−6 .
The “cross-section” necessary for a double image to occur is σ ∼ πbmax 2
=
πL vc /c . This cross-section increases with L because the necessary angle of deflec-
2 2 2
y
L L
2θ1
observer b1 source
o s
b0
z
b2
2θ 2
Fig. 7.3 Creation of two images of a source S by a gravitational potential symmetric around the
origin. The undeflected impact parameter is b0 , whereas the physical photon trajectories have impact
parameters b1 and b2 and deflection angles α1 = 2θ1 and α2 = 2θ2 . For simplicity, the source and
the observer O are assumed to be at equal distances from the origin. For clarity, the angles have
been grossly exaggerated (Courtesy of James Rich)
The probability for a given object to have two images because of the lensing effect
due to a cluster of galaxies is simply equal to the probability P that this object hides
behind a cluster. This probability is proportional to the cross-section, to the number
density n of clusters, and to the total length of the path ∼ L:
P ∼ 2Lnσ ∝ L 3 . (7.110)
The fact that this probability increases rapidly with L makes the number of double-
image quasars sensitive to the value of the undeflected impact parameter b0 .
A second practical application of deflection by clusters is that the time of flight is
not the same for both images. Quasars have an intrinsic variability and by comparing
the light curves (light flux as a function of time) one can determine the difference
Δt of the two times of flight.
The time it takes light to go from one point to another can be deduced with no
difficulty from the calculations performed in Sect. 7.5.5. We have
1/2
dr 2G M 1 − 2G M/r c2 R 2
= 1− 2 1− . (7.111)
cdt c r 1 − 2G M/Rc2 r
To first order in 2G M/r c2 and 2G M/Rc2 , the time interval to get from r to R is
√
2G M r + r 2 − R2 GM r−R
ct (r, R) = r 2 − R2 + ln + 2 . (7.112)
c2 R c r+R
The first term is the obvious term in the absence of a gravitational effect.
7.6 Gravitational Optics and Mirages 147
If we consider a mirage, the time delay is the difference between the integrals
calculated along each path. The first-order term vanishes obviously. This leaves a
“gravitational” term, which is proportional to the potential difference, and a “geo-
metric term.”
For an angle of deflection independent of the point of impact, which is approxi-
mately the case for clusters of galaxies (7.109), the geometric term vanishes, leaving
∞
Δt ∼ 2 dz [Φ(y2 (z)) − Φ(y1 (z))] , (7.113)
−∞
where y1 (z) and y2 (z) are the photon trajectories in the two images. Consider the
nearly symmetric case |y1 | ∼ |y2 |. Going back to Fig. 7.3, we see that in the case
where θ1 ∼ θ2 = θ, we have b1 − b0 = b2 + b0 . The integral will be dominated by
the nearby region of the cluster, and we can make the approximation
The integral is simply the deflection angle 2θ given by (7.107), and the time delay is
b0
Δt = 4L 2θ . (7.116)
L
The factor b0 /L is the angular separation between the center of the cluster and the
average position of the two images.
In order to estimate the length of the delay, we can take b0 /L ∼ θ ∼ 10−5 and
L ∼ dhub (where dhub is the Hubble distance, dhub = c/H0 4300 Mpc, H0 being
the Hubble constant), which gives Δt ∼ 1 year.
The first historical observation of this effect was the observation in 1979 of the
“double quasar” caused by the gravitational lens Q0957+561.15 The original image
is shown in Fig. 7.4.
The two quasars have exactly the same spectrum. However, the time variation of
the signals emitted is the same except for a delay of 417 days. Once the two images
are subtracted from each other, taking this delay into account, the galaxy that acts as
a gravitational lens appears clearly.
15 D. Walsh, R.F. Carswell, and R.J. Weymann, Nature, 279, 381 (1979).
148 7 Motion in a Curved Space
Fig. 7.4 First high resolution observations of the double quasar QSO 0957+561 A/B. Left image
of the two spots, comparable, observed directly. Center; Top spectral distribution of the two spots
(Photo credit D. Walsh); Bottom Evolution of the brightness (apparent magnitude) of the two spots
over time—the red dots come from QSO A, with a delay of 417 days compared to the green squares
of QSO B (Luis J. Goicoechea and Vyacheslav N. Shalyapin), (Photo credit: the SAO/NASA
Astrophysics Data System, Liverpool Quasar Lens Monitoring (LQLM) Programme.) Right image
obtained by subtracting the intensity of the bottom spot (QSO 0957+561 B) the intensity and
spectral distribution of the QSO A (top spot), measured 417 days later. This simple subtraction,
whose orientation is adjusted with precision, almost erases the image of the quasar itself, and reveals
the galactic cluster G1 responsible for the mirage, with a total mass of 10,000 billion solar masses
(Alan Stockton, Op. Cit.). Photo Alan Stockton
One can observe pictures with a multiplicity greater than two. The most spec-
tacular example is perhaps the Einstein cross shown in Fig. 7.5. Four images of the
pulsar Q2237+0305 appear, together with the spiral galaxy that causes the mirage.
In the case of a straight alignment, one can observe an “Einstein ring,” as shown in
Fig. 7.6.
This is of course a very exceptional situation. Einstein did not believe that such a
phenomenon could ever be observed. Nevertheless, he did the calculation to please
his friend Mandl. In Fig. 7.6 the galaxy B1938+666 is “hidden” behind a nearby
galaxy. This latter galaxy does not act as a screen but, on the contrary, as a gigantic
gravitational lens. It amplifies considerably the luminosity of the first galaxy, in the
shape of a nearly circular ring caused by the fact that the sun and the two galaxies
are nearly exactly aligned.
In (7.116), the reflection angle 2θ can be determined from the angular separation
of the two images. The angle b0 /L is more difficult to determine because it depends
on the mass distribution in the cluster.
Once the two angles θ and b0 /L are determined, the distance L of the cluster can
be determined by measuring the value of Δt. This determination of the distance is a
very useful tool for the evaluation of Hubble’s constant (see [24]).
7.6 Gravitational Optics and Mirages 149
The last effect of gravitational lensing comes from the distortion of the image of an
extended object. The distortion along the radial and tangent directions is illustrated
in Fig. 7.7. This distortion causes the arcs that can be seen in Fig. 7.8.
This effect can be used to determine the mass of the cluster. The masses determined
by this effect can be compared with the visible masses and with the masses one
estimates with the virial theorem and the dispersion in velocities. It is a method to
evaluate the amount of dark matter in the cluster.
In this respect, the universe appears as an endless gallery of mirages.
150 7 Motion in a Curved Space
image 1
y
y
s
o b1
z b0
x
b2
image 2
Fig. 7.7 Gravitational lensing effect by a spherically symmetric potential centered at the origin on
the light emitted by an extended object S in the background of the sky. In this example, two images
are seen by the observer O. The right-hand panel shows a projection on the x, y plane (at the z
value of the lens) of the two images. The image one would observe in the absence of the lens is also
shown. Owing to the cylindrical symmetry, the motion of photons is planar and the two images are
therefore extended in the tangent direction. (If the object were exactly behind the lens, the image
would be a circle around the origin.) Owing to this distortion, galaxies of the background of the sky
can appear as arcs, as one can see in Fig. 7.8 (Figure courtesy of James Rich)
Fig. 7.8 In this image taken by the Hubble Space Telescope, virtually all of the bright objects
are galaxies of the giant cluster Abell 370 surrounded by arches, giant gravitational mirages. This
galaxy cluster is so massive and compact that its gravitational field focuses the light of the galaxies
behind it. The result is images stretched over arcs, a simple focusing effect analogous to what can
be observed through a glass of water looking at lights of the city at night. The Abell 370 cluster
is 5 billion light-years away in the constellation of the Dragon. The spectrum of these arcs is very
strongly shifted towards blue compared to that of the stars in the cluster because the focused light
comes from very young and hot stars at the beginning of their evolution (Photo credits NASA,
ESA/Hubble, HST Frontier Fields)
7.6 Gravitational Optics and Mirages 151
SMC
The theory of nucleosynthesis indicates that the baryon density in the universe is
0.04 times the critical density. This leads to the idea that baryons cannot account
for all the dark matter. It is nevertheless possible that baryons are a component of
the galactic dark matter if they are in a form that does not emit light in appreciable
amounts. The simplest way this can happen is if the baryons are hidden in objects
that either do not burn (for instance, brown dwarfs) or have ceased to burn (for
instance, white dwarfs, neutron stars, or black holes). Brown dwarfs have a mass
< 0.07M , which is not sufficient to create high enough temperatures for the burning
of hydrogen. Initially, they were the preferred candidates because they completely
avoid the problems associated with background light emission or the pollution of the
interstellar medium by heavy elements caused by mass loss or supernova explosions.
The dark objects located in galactic halos are called “machos”, for “massive
compact halo objects.”
Paczyński16 has suggested that machos could be detected through the gravitational
lensing effect they induce on individual stars of the Large Magellanic Cloud (LMC)
(Fig. 7.9). This small galaxy is 50 kpc away from the Earth.
The theory of gravitational lensing was done above. For a point-like lens, it is
simple to show that the two amplifications17 depend on the reduced impact parameter
u = b0 /RE , √
u2 + 2 ± u u2 + 4
A± = √ , (7.117)
2u u 2 + 4
4G M L x(1 − x)
RE2 = , (7.118)
c2
where L x is the distance between the observer and the lens, L being the distance
between the observer and the source. We see that for b0 RE , A+ = 1 and A− =
0, as expected. For b0 → 0, the amplifications become infinite formally, and this
actually results from the fact that a point-like source is deformed into a ring. The
divergence ceases if one takes into account the finite extension of the source, which
gives an effective extension at b0 .
In the case of “lensing” by stellar objects of the galactic halo, the angle between
the two images is very small (<1 milliarcsec). This type of effect is therefore called
“microlensing”. Terrestrial telescopes cannot resolve the two images because atmo-
spheric turbulence blurs the images and stellar objects have an angular dimension of
the order of 1 arcsec. Therefore, the only observable effect is a temporary amplifica-
tion of the total intensity when the macho comes close to the line of sight, and then
recedes from it. The amplification is
u2 + 2
A = √ , (7.119)
u u2 + 4
where u is the closest distance to the (undeflected) line of sight that the deflector
reaches in units of “Einstein’s radius,” RE = 4G M L x(1 − x)/c2 , where L is the
distance between the source and the observer, L x is the distance between the observer
and the deflector, and M is the mass of the macho.
The amplification is larger than 1.34 when the distance to the line of sight is less
than RE . This amplification corresponds to an acceptable observational threshold
since photometry can be performed quite easily to better than 10% accuracy. At a
given moment, the probability P for a star to be amplified by a factor 1.34 is equal
to the probability that its undeflected light path passes within one Einstein radius of
the macho,
P ∼ n macho L π RE2 , (7.120)
where n macho is the average number density of machos lying between the LMC and
us, and L is the distance of the LMC. The macho density is n macho ∼ Mhalo /M L 3 ,
where Mhalo is the total mass of the halo up to the position of the LMC. Using the
expression of the Einstein radius, one finds that P does not depend on the mass M
but is determined only by the velocity of the LMC:
G Mhalo v2
P∼ 2
∼ LMC . (7.121)
Lc c2
7.6 Gravitational Optics and Mirages 153
Fig. 7.10 Microlensing curves for a point-like source. The curves correspond to four values of the
closest approach distance (0.5, 0.7, 1.0, and 1.5 times the Einstein radius). The duration timescale
Δt, which depends on the mass, is normalized according to (7.122). (Courtesy of James Rich.)
The LMC is believed to orbit around the Milky Way with a velocity of vLMC ∼
200 km/s. (This corresponds to a flat rotation curve up to the LMC.) In that case, P
is of the order of 10−6 . More refined calculations give P = 0.5 × 10−6 [24].
Since the observer, the star, and the deflector are in relative motion with respect
to one another, a noticeable amplification lasts only as long as the non-deflected ray
remains within one Einstein radius. The resulting light curve, which is achromatic and
symmetric, is represented in Fig. 7.10 for a variety of values of the impact parameter.
The timescale of the amplification is the time Δt that it takes the object to cross one
Einstein radius between the observer and the source. For the lensing of stars of the
LMC and deflectors of our halo, the relative velocities are of the order of 200 km s−1
and the position of the deflector is roughly halfway between the observer and the
source (x ∼ 0.5). The average Δt is then
RE M
Δt ∼ ∼ 75 days . (7.122)
200 km/s M
The duration distribution can therefore be used to estimate the mass of machos,
assuming they are in the galactic halo (and not in the LMC).
Two experimental groups, the MACHO collaboration and the EROS collaboration,
have published results of searches for events in the directions of the LMC and the
SMC (the nearby Small Magellanic Cloud). The absence of events lasting less than
15 days allowed both groups to exclude objects of masses in the interval 10−7 M <
M < 10−1 M as the main component of the halo.18 these limits exclude brown
dwarfs of masses ∼ 0.07M as the major components of the halo.
Fig. 7.11 Gravitational microlensing event in the EROS experiment. The upper picture is in the
blue part of the optical spectrum and the lower picture is in the red part. The phenomenon is
symmetric and achromatic, as expected (Courtesy of James Rich)
One important aspect of the phenomenon is that the amplification should be sym-
metric and achromatic. Figure 7.11 shows an event recorded by the EROS experiment
that possesses both properties.
The MACHO collaboration, however, observed events of a duration of ∼ 50
days.19 If these are interpreted as originating from dark lenses of the galactic halo,
this rate corresponds to a fraction f = 0.2 of machos contributing to the total mass
of the halo. The timescale corresponds to objects of mass ∼ 0.4M .
EROS has only published upper bounds on the fraction of the halo components
made of machos.20
The results of the two experiments show that it is unlikely that the halo is made up
predominantly of objects of the order of stellar masses. The present challenge is to
prove that the events observed by the MACHO collaboration are caused by lensing
by objects of the halo and not, for instance, lensing objects in the clouds themselves.
If that is the case, the mass estimation implies that they correspond to very old white
dwarfs, perhaps the oldest stars.
The information on the localization of the lenses (in the galactic halo or in the
Magellanic Clouds) is difficult to obtain. The simplest case is that of events with a
very large amplification, in particular the events due to binary lenses. In such events,
the light curve is modified in a way that depends on the relative distance of the lens
and the source stars.21 It is also possible to obtain information on the distance of
lenses in events of very long duration when the motion of the Earth around the sun
modifies the light curve.22 In the future, it will also be possible to resolve the two
microlensing images with interferometric space telescopes. Such measurements will
give enough information to determine the distances of the lensing objects and to
draw a definite conclusion.
The search for dark objects by microlensing is under way in the Andromeda
Nebula, the spiral galaxy M31, which is close to our galaxy.23
Experimental results on these problems and on general relativity can be found in
the book by James Rich [24].
7.7 Exercises
7.1. Geodesics
R2
ds 2 = dρ2 + ρ2 dθ2 + ρ2 sin2 θ dφ2 , (7.123)
ρ2 − R 2
ds 2 = d x 2 + dy 2 + dz 2 − dw 2 , (7.124)
w 2 = x 2 + y 2 + z 2 − R 2 = ρ2 − R 2 . (7.125)
R2
ds 2 = dρ2 + ρ2 dθ2 + ρ2 sin2 θ dφ2 , (7.126)
ρ2 + R2
ds 2 = d x 2 + dy 2 + dz 2 − dw 2 , (7.127)
w 2 = x 2 + y 2 + z 2 + R 2 = ρ2 + R 2 . (7.128)
d 2 A
(ρ φ̇) = 0 =⇒ φ̇ = (7.131)
dt ρ2
Prove the conservation laws of the problem which bring simplifications to the
motion:
d 2 A
(ρ φ̇) = 0 =⇒ φ̇ = , (7.134)
dt ρ2
2R 2 E
A2 ≤ , (7.136)
m
which is a direct consequence of the fact that the energy is greater than the rota-
tional energy m A2 /2ρ2 . This is a consequence of (7.135); i.e., E ≥ m A2 /2ρ2 ≥
m A2 /2R 2 .
The Eqs. (7.134) and (7.135) are first-order differential equations that determine
the motion in terms of the constants of the motion E and A.
158 7 Motion in a Curved Space
2E m A2
ω2 = and γ2 = . (7.137)
m R2 2E R 2
From (7.136), we have the inequality
γ 2 ≤ 1. (7.138)
Setting
ρ = R cos(ωψ); i.e., ρ̇ = −ω ψ̇ R sin(ωψ). (7.139)
ω2 γ 2
ω 2 = ω 2 ψ̇ 2 + ; (7.140)
cos2 (ωψ)
i.e.,
ω 2 ψ̇ 2 cos2 (ωψ) = ω 2 (cos2 (ωψ) − γ 2 ). (7.141)
and to the result, i.e. The expressions of ρ(t), tan(φ(t) − φ0 ) as well as the frequency
ω.
ρ(t) = R cos2 ω(t − t0 ) + γ 2 sin2 ω(t − t0 ), (7.145)
which is periodic and of frequency ω. The calculation of the time evolution of the
azimuthal angle φ(t) is obtained by this expression and Eq. (7.134),
A
φ̇ = ; (7.146)
R 2 (cos2 ω(t − t0 ) + γ 2 sin2 ω(t − t0 ))
7.8 Problem. Motion on the Sphere S3 159
i.e.,
tan(φ(t) − φ0 ) = γ tan ω(t − t0 ), (7.147)
1
E= m(ẋ 2 + ẏ 2 ) + V,
2
where the “effective potential” V is energy dependent:
1 ρ2 ρ̇2 Eρ2
V = m 2 = .
2 R − ρ2 R2
2E
2 = .
m R2
4. Of course, if the square of the velocity is a constant in the curved four-
dimensional space, this is not the case if one visualizes the phenomenon in
a Euclidean plane, as above.
5. The simplicity of the result is intuitive. Quite obviously, as one can see in the
definition (7.3), the symmetry of the problem is much larger than the sole rotation
in R3 . There is a rotation invariance in R4 . The solutions of maximal symmetry
correspond to a uniform motion on a circle of radius R in a plane whose orienta-
tion is arbitrary in R4 . The whole set of solutions is obtained by projecting these
particular solutions on planes of R3 , which leads to the elliptic trajectories that
we have found.
160 7 Motion in a Curved Space
R2
pρ = m ρ̇ , pθ = mρ2 θ̇, pφ = mρ2 sin2 θφ̇, (7.148)
R2 − ρ2
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 161
J.-L. Basdevant, Variational Principles in Physics,
https://doi.org/10.1007/978-3-031-21692-3_8
162 8 Gravitational Waves
8πG
G μν ≡ Rμν − gμν R = Tμν , (8.1)
c4
G is the gravitational constant and c is the velocity of light in the vacuum, gμν is the
metric tensor (here we adopt the (−, +, +, +) signature).
The Tμν energy-momentum tensor (also called stress-energy tensor) generalizes
a simple point mass distribution. Its components consist of:
T00 T0 j energy density/c2 flux of energy density
= .
Ti0 Ti j momentum density flux of momentum
– The Ricci Tensor Rμν is obtained by contracting two indices of the Riemann
Tensor: Rμν ≡ Reμν
e
. R is the corresponding curvature scalar: R = g μν Rμν .
Variational Formulation
Einstein’s equations can be obtained with a variational principle and the Einstein-
Hilbert action S E H which describes the dynamics of the gravitational field:
c4
SE H = d 4 x R (|g|), (8.3)
16πG
where R is the scalar curvature of Ricci and g is the determinant of the metric tensor.
The constant (c4 /16πG) is chosen so that one can simply add to it the actions
that describe the matter to get the Einstein equations.
This (important) discovery is due quasi-simultaneously to Hilbert and Einstein.1
This topic is detailed by Sean Carroll.2 Simple presentations can be found in
references [17] (annex B), [18] page 438, and in Feynman’s book.3
1 One can find the details and passions that it revealed in the article by Dieter W. Ebner,
arXiv:physics/0610154v1.
2 Carroll, Sean M. (2004), Spacetime and Geometry: An Introduction to General Relativity, San
Francisco: Addison-Wesley.
3 Richard P. Feynman, Feynman Lectures on Gravitation, Addison-Wesley, (1995).
8.1 Evolution of Space-Time 163
Einstein’s equations are therefore a set of ten nonlinear partial differential equa-
tions, which are very difficult to solve completely. We’ll benefit here of the fact that
gravitation is an extremely weak interaction, and in many cases it can be treated in
a perturbative way. Of course this is the dominant interaction of the vast majority
of astrophysical phenomena, but most manifest themselves in such a way that their
local description by Newtonian mechanics, in Minkowski space, is excellent. Note
that electromagnetism and the physics of electromagnetic radiation, of indisputable
importance in the cosmos, are in conformity with general relativity because this the-
ory is, by construction, Lorentz invariant and that the mass of the photon is zero (see
[17] Sect. 2.2). Let us point out that the calculations we have made previously come
on the one hand from what Schwarzschild found in 1916, a particular solution of the
Einstein equations in the vicinity of a large mass and, on the other hand, that at great
distance from such a mass, the objects that populate the universe follow Newton’s
law in a Minkovski space.
Indeed, the circumstances in which the generality of Einstein’s equations is
unavoidable are reduced to the presence of ultra-dense clusters. One can obviously
think of the origin of the Big Bang, but as soon as the recombination, 380,000
years later, the Newtonian approximation becomes quite suitable. The real objects
for which one cannot escape a complete and difficult theory are the black holes
themselves in their own structure.
It is useful to analyze the Schwartzschild formula (7.66) and how it was useful to
us. As we have said, the correction of the curvature of space-time at the perihelion of
Mercury can be calculated perturbatively because the effect is small. We could not
have done this type of calculation at distances in the order of the Schwartzschild radius
of the source. This metric has a singularity within the radius of Schwartzschild r = r S .
We obviously cannot calculate the deviation of light rays, nor any electromagnetic
(or gravitational) phenomena in the r ≤ r S region, the physics inside a black hole or
in its vicinity, as in the Fig. 8.1,4 is entirely in the realm of general relativity.
It remains that the question of the evolution of the curvature of space-time is
inevitable. How does it occur, and especially is it detectable in another way than
by observing the motion of the stars? How does it spread from one star to another?
Let us recall that Newton himself was troubled by the apparent instantaneity of the
action of the sun’s gravity on Earth. Halley’s results on the comet that bears his name
seemed to confirm that this gravitational field was acting instantaneously at all times.
Poincaré had already noticed in 1905 that special relativity suggested the existence
of “gravific waves”5 which would propagate.
4 Kazunori Akiyama, Antxon Alberdi et al. First M87 Event Horizon Telescope Results I, The
shadow of the supermassive Black Hole, The Astrophysical Journal, vol.875, 1, 2019, p. L1; First
M87 Event Horizon Telescope results VIII: magnetic field structure near the event horizon. Astro-
physical Journal Letters. Published online March 24, 2021.
5 H. Poincaré, On the dynamics of the electron, Proceedings of the Académie des Sciences, t. 140,
Fig. 8.1 Left: First image of a black hole, published by the Event Horizon Telescope collaboration
in 2019. This is the supermassive black hole M87∗ at the heart of the galaxy M87. The image
shows the shadow of the horizon of the black hole events M87∗ in contrast with its accretion disk
and what is called its photon sphere. Right: Recent image, Event Horizon data from March 2021
reveals twisted magnetic fields around the black hole M87∗ . Credit ESO, Event Horizon Telescope
Collaboration
Within the framework of general relativity, Einstein was led to predict in 1916
the existence of gravitational waves, produced by these evolutions of the curvature
of space-time. These waves were to be emitted in the evolution of the curvature,
in a manner similar to the emission of electromagnetic waves that results from the
motion of electrically charged objects. The subject remained for a long time a source
of speculation (even from Einstein at the beginning) until the spectacular response
given by the detection of these waves which interest us here.
The first gravitational waves were observed experimentally a century after Einstein’s
prediction, during a direct detection of these waves on September 14th, 2015, con-
firmed on February 11th, 2016, by the detectors and physicists of the LIGO labo-
ratory.6 The emission, named GW150914,7 came from the rapid spiral rotation of
two black holes, located at 400 Mpc, or 1.3 billion light-years, of respective masses
6 At the time of the occurrence, the Virgo gravitational wave detector was at stop for an improvement
of its equipment, but all its researchers have worked on the data. The LIGO detector was on standby
and was not supposed to resume its observations until September 18th, only four officials were
aware of the reception of signals at the time it occurred. These four managers obeyed the working
charter of the collaboration (more than 1000 physicists) and they only announced it to the rest
of the collaboration four days later for all the data processing work. Full details, illustrated, of
the collaboration between LIGO and Virgo Laboratories can be found at http://www.ligo.org and
(http://www.virgo.infn.it).
7 Abbreviation of “Gravitational Wave”, September 14th, 2015; Abbott et al., Observation of Grav-
itational Waves from a Binary Black Hole Merger, Physical Review Letters, American Physical
Society, vol. 116, no 061102, February 2016. This 12-page, very unusual length Physical Review
Letters article contains 5 pages of names of physicists as well as institutions of the collaboration. The
article itself contains absolutely all the information about the theoretical design, the experimental
realization, and the results of this detection, with, of course, all the useful references.
8.2 Detection of Gravitational Waves 165
8 This amount of energy, which comes from a subtraction done by a child between the masses
expressed in solar masses, is totally unimaginable in the solar system.
9 “The most important is the proof of the existence of black holes”, on lemonde.fr, February 11,
2016 (https://www.lemonde.fr/cosmos/article/2016/02/11).
10 Full details of the results from the LIGO and Virgo laboratories can be found at
Fig. 8.2 The GW150914 gravitational event observed by the Hanford (H1 left panel) and Livingston
(L1, right panel) LIGO detectors. The times shown horizontally correspond to September 14, 2015,
at 09:50:45 UTC. For easy reading, all time series are filtered with a 35 to 350 Hz band-pass filter
to suppress large fluctuations outside the most sensitive frequency band of the detectors. Vertically,
the amplitude h of the strain in units 10−21 . Top row: left amplitude in H1, right amplitude in L1.
GW150914 arrived first in L1 and 6.9 ms later in H1; For ease of visual comparison, H1 data is also
shown, right with L1 data, offset in time by this amount and with an opposite sign (to take into account
the relative orientation of the detectors). Middle Row: Gravitational strain projected on each detector
in the 25–350 Hz band. The lines come from an optimized relativistic numerical calculation of the
waveform with parameters compatible with those obtained by GW150914, confirmed at 99.9% by an
independent calculation. Shaded areas are credible at 90% for two independent reconstructions of the
waveform. One (in dark grey) models the signal using binary waveforms of black hole models. The
other (light grey) does not use an astrophysical model, but rather calculates the deformation signal
as a linear combination of Gaussian sinusoidal wavelets. These reconstructions have an overlap of
90%. These data treatments can be found in B.P. Abbott et al., Phys. Rev. Lett. 116, 241102 (2016).
Bottom row: Time dependence of the frequency, and the intensity, of the stress signal shown above,
we see a growth in time before coalescence, qualitatively in accordance with what we calculate
below. As the coalescence approaches, the gravitational signal has a characteristic behaviour: both
its frequency and its amplitude increase. (On the left Hanford, on the right, Livingston.) (Photo
credit B.P. Abbott et al., Phys. Rev. Lett. 116, 241102 (2016))
8.3 Gravitational Waves 167
We do not wish to carry out here a complete theoretical study of gravitational waves.
We will refer to specialized literature, but many of the key qualitative and quantitative
aspects are easy to understand. We are interested in the nature of these waves, the
orders of magnitude involved, the mechanism of their emission and their interaction
with matter.
In the following, we will explain the nature and origin of the amplitude of the
waves in these data, including its cause, its value, its evolution and its observation
during the interaction of gravitational waves with the devices of the detectors.
This will allow us to understand the physics of the double pulsar, discovered in
1974 and analyzed by Taylor and Hulse who were awarded the Nobel Prize in 1993.
G 2 6 2 4
L∼ s ω M R
c5
where M and R are the mass and radius of the system, ω its pulsation, s its asym-
metry factor (a spherical mass has a zero asymmetry s = 0, a dumbbell system an
asymmetry s = 1), G is the Newton constant and c the speed of light.
With everyday objects, this effect is undetectable. But if we reformulate this
expression using the orders of magnitude of systems that, actually, emit detectable
waves, that is to say by returning to more significant quantities for this type of
system, that is the speed v = ω R, the Schwarzshild radius R S = 2G M/c2, the same
expression is rewritten as
c5 v 6 2
L ∼ s2 Ξ (8.4)
G c
where Ξ = R S /R is called the system’s “relativistic compactness parameter”.
168 8 Gravitational Waves
c5
LP = ≈ 3, 6 1052 watt. (8.5)
G
This considerable maximum value, in the order of 1026 times that of the Sun, is
effectively reached by black holes. We will show it in Sect. 3.7 It takes place only
during the brief time of the fusion of two black holes, a fraction of a second. It’s
one of those mythical numbers of general relativity, like Planck’s strength FP =
(c4 /4G), coefficient of passage from the Einstein tensor to energy-momentum tensor
in Eq. (8.1). But it is now accessible to our intuitive perception.
What is perhaps more astonishing is to see how our intuition is troubled by the
coming of the third fundamental constant, the Planck constant . The latter, present
in all the usual effects of the other three fundamental interactions, such as matter at
our scale, leads to a Planck length of P and a Planck time of t P of
G G
P = ∼ 1.6, 10−35 m , t P = ∼ 5.4, 10−44 s; ,
c3 c5
• The first is that this simple physical process to describe and understand is the
same one that Einstein had imagined. The formula of the quadrupole obtained by
him in 1916 links the amplitude h of the wave emitted by a physical system to
the variation of its quadrupole moment Q (recall that the gravitational field, only
attractive, has no dipole moment):
8.3 Gravitational Waves 169
2G
h= Q̈(t − r/c), (8.6)
c4 r
We will see that the parameter h of (8.6) determines the gravitational deforma-
tion (tidal effect) that the passage of the wave causes on the distance between two
points (in free fall) of the receiver. It is this parameter of deformation which is mea-
sured and shown in Fig. 8.2. We will show in Sect. 3.5 how the theory predicts that.
The smallness of the factor 2G/c4 ≈ 1.65 10−44 m−1 kg−1 s2 , cause of the smallness
of h, illustrates what is called the “stiffness” of space-time. It must be compensated
by large variations of the quadrupole moment in order to produce detectable effects
(any attempt on a human scale, such as gigantic wagons on a merry-go-round,
would give tiny results).
• On the other hand, concerning the source of the wave, we will examine below a
binary system of massive stars, typically black holes or pulsars in relative motion,
in the Newtonian approximation which gives an extremely precise result. We will
see at the end of this chapter the amazing observation of Taylor and Hulse that led
them to the discovery of the double Pulsar PSR B1913+16 where they measured the
loss of rotational energy of two orbiting pulsars with an accuracy that is consistent
with the Newtonian calculation of motion, and the prediction of general relativity
for the loss of energy by the corresponding gravitational radiation.
where ημν is the Minkowski metric and h μν is dimensionless. (We follow the method
used in [17], Chap. 7.)
(1) λ 1 λσ
Γμν = η (∂μ h σμ + ∂ν h μσ − ∂σ h μν ) (8.9)
2
170 8 Gravitational Waves
(1) ρ ρ
ρ
Rλμν = ∂λ (1) Γμν − ∂μ (1) Γλν . (8.10)
(1) 1 1
Rμν = ∂ σ ∂μ h νσ − ∂ σ ∂σ h μν − η ρσ ∂μ ∂ν h ρσ . (8.11)
2 2
Defining
1
h̄ μν = h μν − hημν with h ≡ η ρσ h ρσ , (8.12)
2
Einstein’ equation at that order becomes
8πG
∂ σ ∂σ h̄ μν − ημν ∂ρ ∂σ h̄ ρσ − ∂ν ∂ρ h̄ ρν − ∂μ ∂ρ h̄ ρν = −2 Tμν . (8.13)
c4
Choice of Gauge
h μν = h μν − ∂μ ξν − ∂ν ξμ .
The amplitude must meet the above gauge conditions. Consider a wave propagating
in the direction of Oz. Among the ten components of the symmetric tensor h μν a
suitable choice of the four functions ξ μ (x) above, cancels 4 combinations of the h μν
in the vacuum:
h = η μν h μν = 0, h 0i = 0 (8.16)
This defines what is called the T T gauge (Transverse and Traceless) Another similar
set of considerations (see [17] chap.7 §3) shows that we are reduced to four non-
zero components. In this case (propagation along Oz), these components are Hx x ,
Hyy = −Hx x , and Hx y = −Hyx .
The wave is transverse and characterized by two polarizations.
The mode Hx x is called polarization + and the mode Hx y is called
polarization ×.
Consider a free test particle initially at rest. In the TT gauge, this particle will travel
on a geodesic determined by the metric
L= gi j Δx i Δx j where Δx i = x Bi − x iA . (8.19)
The relative variation in distance between A and B at the passage of the gravita-
tional wave is therefore
δL 1
= h iTjT n i n j (8.20)
L0 2
where n i is the unit vector (δi j n i n j = 1) such that Δx i = L 0 n i before the wave
passes.
To visualize the effect of the gravitational wave, it is convenient to insert the
coordinates
1
X i = x i + h iTjT (t, 0)x j
2
172 8 Gravitational Waves
that will fluctuate as the wave passes. We see on this expression how the amplitude
h intervenes in the order of magnitude of the effect to be measured. In current
detections, this order of magnitude is of the order of h ∼ 10−21 (Fig. 8.2), that we
will calculate on a model. This deformation effect, tidal effect, is of the same order
of magnitude as the size of an atom compared to the Earth-Sun distance.
Consider a monochromatic wave propagating along Oz. The wave corresponding
to the + polarization is of the form
A set of test particles, all in free fall, located on a circle of radius a in the plane z = 0
undergoes a variable distortion in time according to
1 1
X = a[cos θ + h + sin(ωt) cos θ], Y = a[sin θ + h + sin(ωt) sin θ] (8.22)
2 2
For the polarization ×, the gravitational wave is of the form
Fig. 8.3 Deformation, at successive moments, of a circle of test free particles at the passage of a
gravitational wave. Top: polarization +, bottom: polarization ×
8.3 Gravitational Waves 173
Fig. 8.4 Observatories of LIGO, at Hanford (left): the arms are 4 km long, and of Virgo at Pisa
(right): the arms are 3 km long
In the Virgo interferometer, shown below, the purified laser beam, first split in
two by a separating blade, is reflected on mirrors in each of the two perpendicular
arms, one to the north (NE), the other to the west (WE). These two 40 kg mirrors are
perfectly reflective and allow 99.999% of the incident light to pass through. The two
reflected beams recombine on the mirror at 45◦ , producing an interference pattern.
Each beam travels back and forth in reflecting on mirrors. It eventually comes out
and crosses the other beam with which it recomposes itself. If both beams travelled
the same distance, the waves light is perfectly compensated and the output of the
interferometer remains obscure. However, if a gravitational wave passes, it shortens
one arm and lengthens the other. The two beams recompose with a slight phase
shift and light comes out of the interferometer. This is observed by a photodetector,
which converts the incident light into an electric current, which is then amplified and
digitally recorded for analysis (Fig. 8.5).
Fig. 8.5 Simplified optical diagram of the Virgo interferometer. One tube is directed to the north:
NE, and the other to the west: WE. The widening of the lines represents the amplification of the
laser rays by the Fabry-Pérot cavities. (Credits: Virgo Collaboration)
174 8 Gravitational Waves
This detection is in itself a conceptual and experimental feat as one can see in the
article Virgo interferometer 12
After propagation and detection, let’s see how gravitational waves are produced by
the presence of matter.
We seek to solve the linearized Einstein equations linearized with the energy-
momentum tensor:
16πG
∂ σ ∂σ h̄ μν = − Tμν (8.25)
c4
which is analogous to the electromagnetic equation (6.40) in the presence of a source.
In the absence of incoming gravitational waves, the (8.25) solution is formally
written as
16πG
h̄ μν = − 4 d 4 y Gr et (x − y)Tμν (y) (8.26)
c
δ[ −
→
x −−
→
1
Gr et (x λ − y λ ) = − −
→ y −(x 0 − y 0 )](x 0 − y 0 ),
4π x −−
→
y
(8.27)
and is a solution of ∂ σ ∂σ Gr et (x λ ) = δ 4 (x λ ).
Substituting in (8.26) the explicit form (8.27), we obtain
−
→
Tμν (ct− x −−→
y ,−
→
y)
h̄ μν (t, −
→ 4G
x)=− d3 y −
→ −
→ . (8.28)
c4 x − y
of the same order as the h μν and the corrective terms from the covariant derivative
are of second order.
As far as we are concerned, the observer, in −→x , is located very far from the source. By
setting the origin of the coordinates inside the source, we can replace the denominator
−
→x −− →y by the distance of the observer r = − →x . To approximate tr by t − r/c
in the energy-momentum tensor, assumes a slowly varying source. The expression
(8.28) becomes under these conditions
−
→ 4G
h̄ μν (t, x ) = − 4 d 3 y Tμν (ct − r, −
→
y) (8.29)
c r
Let us now consider the spatial components of this expression. These are the only
ones we need to work with in the TT gauge. Thanks to the energy-momentum tensor
conservation, the integrals d 3 yTi j (y) can be expressed simply.
The integral of the components of T i j are, after integrating by parts
d 3 y T i j = d 3 y ∂k (y i T k j ) − d 3 y y i ∂k T k j ,
where the first term gives a surface term that cancels out because the source is
compact. Energy-momentum tensor conservation implies
∂k T k j = −∂0 T 0 j .
and, using the gauge condition, which gives ∂k T 0k = ∂0 T 00 , one obtains the equality
1 d2
d 3 yTi j = d 3 y y i y j T 00 . (8.30)
2 dt 2
2G
h= Q̈(t − r/c). (8.32)
c4 r
Gravitational radiation causes a decrease of the energy of the source. The calculation
of this energy loss was used in particular in the case of the variation of the distance,
thus of the rotation frequency, between the two sources of a binary Pulsar system, and
constituted the first, and very remarkable, verification of the predictions of general
relativity: the discovery of the double Pulsar PSR B1913+16 by Taylor and Hulse,
1993 Nobel Prize winner. We will show these results below.
In electromagnetism, the flux of energy radiated in the form of electro-magnetic
waves is obtained from the energy-momentum tensor which varies quadratically with
Aμ .
Similarly, in this case, we must evaluate the energy carried by gravitational waves
from the energy-momentum tensor which must be quadratic in the perturbation h μν .
We must therefore develop Einstein’s equations to second order of perturbations:
(1)
G μν +(2) G μν = 8πGTμν (8.33)
By shifting the term h μν to the right (see [17] Chap. 7.6) one gets the expression
c3
F= (h 2 + h 2× ) (8.36)
16πG +
πc3 2 2
F= f h , (8.37)
4G
so that by integrating on a large sphere surrounding the source and using the
quadrupole formula, we find the brightness (8.35).
We can now understand the change from the very large factor c3 /G of the expres-
sion (8.37) to the very small factor G/c5 of (8.35). We discussed this above. To have a
realistic order of magnitude of this gravitational power, or brightness, we must related
it to the “natural” orders of magnitude of this effect in the bodies that emit it. We
must compare the speed v with the speed of light c, the radius and the Schwartzchild
radius R S = 2G M/c2 , a simple rearrangement of terms leads to rewrite this formula
of brightness with the appropriate parameters, as in (8.4):
2
c5 2 v 6 RS
L∼ s
G c R
so that for systems or objects for which s ∼ 1, R ∼ R S and v/c ∼ 1 which is the
case for systems of black holes or neutron stars in close orbit, we obtain luminosities
of the order of L ∼ 1052 W ,i.e. 1026 times the light power of the sun or 104 times
that of all the stars of the current accessible universe, for a relatively short time, a
fraction of a second.
The calculation of the Newtonian motion of the two stars in gravitational interaction
is simple and consistent with the observation.
178 8 Gravitational Waves
G M1 M2 G M1 M2
M1 ẍ1 = − (x1 − x2 ), M2 ẍ2 = − (x2 − x1 ). (8.38)
x1 − x2 3 x1 − x2 3
The separation vector between the two stars d = x1 − x2 satisfies the equation of
motion
G Mtot
d̈ = − d (8.39)
d3
where Mtot ≡ M1 + M2 is the total mass of the system. The total energy of the system
can be written in two ways:
1 1 G M1 M2 1 1 G Mtot μ
E= M1 ẋ12 + M2 ẋ22 − = Mtot ẋ2B + μḋ2 − (8.40)
2 2 d 2 2 d
(G Mtot )1/2
Ω= . (8.41)
d 3/2
The energy of the system is then
1 GμMtot GμMtot
E= μΩ 2 d 2 − = . (8.42)
2 d 2d
The trajectories of the two stars are then given by
M2 M1
x1 = d, x2 = d (8.43)
M1 + M2 M1 + M2
By choosing x and y coordinates in the binary system plane, the origin being placed
at the c.m. of the system, the components of d are of the form
or:
M2 M2
x1 = d cos Ωt, y1 = d sin Ωt
Mtot Mtot
M1 M1
x2 = d cos Ωt, y2 = d sin Ωt
Mtot Mtot
The two stars are treated as point objects. The quadrupole moment of the binary
system is simply
Ii j = d 3 x ρ xi x j = M1 (x1 )i (x1 ) j + M2 (x2 )i (x2 ) j (8.45)
1 2
Ix x = M1 x12 + M2 x22 = μd 2 cos2 Ωt = μd (1 + cos 2Ωt)
2
1 2
I yy = M1 y12 + M2 y22 = μd 2 sin2 Ωt = μd (1 − cos 2Ωt)
2
1 2
Ix y = M1 x1 y1 + M2 x2 y 2 = μd 2 cos Ωt sin Ωt = μd sin 2Ωt
2
By substituting in (8.31) we obtain the gravitational wave emitted by this binary
system:
⎛ ⎞
− cos(2Ωte ) − sin(2Ωte ) 0
4G 2 μMtot ⎝
h̄ i j (t, x) = − sin(2Ωte ) cos(2Ωte ) 0 ⎠
r dc4 0 0 0
with te ≡ t − r/c, where r is the distance between the source and the observer, and
te the time of emission.
The order of magnitude h ∼ (G M)2 /c4 r d ∼ r S2 /r d of the amplitude obtained for
the gravitational waves emitted in GW150914, in the previous calculation by taking
the two initial black holes of the same mass M ∼ 30M or Schwartzschild radius
r S ∼ 90 km, and orbiting at d ∼ 7r S = 630 km, with a detection at a distance of
400 Mpc ∼ 1.3 1022 km leads to an acceptable value of this amplitude h ∼ 10−21 .
180 8 Gravitational Waves
Back to the case of a double system and the formula of the quadrupole (8.35). The
gravitational brightness of the binary system is
32 G 4 μ2 Mtot
3
LG = 5 5
(8.46)
5 c d
where M is the total mass, μ is the reduced mass, d is the distance of the two objects.
The angular speed of rotation is Ω = (G M/d 3 )1/2 , with a period of P = 2π/Ω
(Kepler’s third law).
The emission of gravitational waves corresponds to a decrease in the energy of
the system:
Ė = −LG (8.47)
This causes a gradual bringing together of the two bodies, a decrease in the distance
d and the period P :
3P 96 G 3 μMtot2
P
Ṗ = − LG = − 4
. (8.48)
2E 5 d
or
3P 96 G 3 μMtot2
P
Ṗ = − LG = − 4
. (8.49)
2E 5 d
Using Kepler’s law Ω = (G M/d 3 )1/2 , we can deduce a differential equation for
the period:
96
Ṗ = − (2π)8/3 G 5/3 μMtot P −5/3
2/3
(8.50)
5
The distance d between the two stars changes, according to the above,
64 G 3 μMtot
2
ḋ = − 3
, (8.51)
5 d
which is directly integrated.
256 3
d4 = (G μMtot
2
)(tc − t) (8.52)
5
where tc is the fusion or coalescence time (to be determined).
The change in the frequency of the waves f = 2Ω/2π = 2/P is deduced by
8.5 Double Pulsar Discovery PSR B1913+16 181
f˙ Ṗ 96 G 3 μMtot
2
=− = , (8.53)
f P 5 d4
or
96 8/3 5/3
f˙ =
2/3
(π) G μMtot f 11/3 . (8.54)
5
The time evolution of the frequency and amplitude of gravitational waves at the
approach of the moment of fusion or coalescence tc are deduced from that:
1
f ∝ (tc − t)−3/8 h∝ ∝ (tc − t)−1/4 . (8.55)
d
As coalescence approaches, the gravitational signal has a characteristic behaviour:
both its frequency and its amplitude increase.
One can see that on the results of LIGO-Virgo, on the lower part of Fig. 8.2, where
are represented the evolution of the frequency of the waves detected in GW150914, at
Hanford and Livingston. We observe the increase of this frequency, until the moment
of coalescence, when the waves cease to be emitted.
The result of Mercury’s perihelion had been Einstein’s greatest scientific emotion.
Eddington’s results in 1919 on the deviation of light rays had amazed the world and
instantly made Einstein’s international scientific reputation. Many other measure-
ments on time dilation took place in the following decades, including the remarkable
experiment of Pound and Rebka on the red displacement of spectral lines in 1959.
However, it is a result of exceptional precision on an unexpected phenomenon that
was the first to be admitted as a proof of General Relativity when the 1993 Nobel
Prize was awarded to Joseph H. Taylor , and Russell A. Hulse “For the discovery
of a new type of pulsar, which has opened up new possibilities for the study of
gravitation”. The discovery was twofold. On the one hand Taylor and Hulse had
discovered the first example of a double pulsar, on the other hand, the rotation of this
system emitted gravitational energy and its rotation period decreased over time with
an accuracy identical to that of the best theoretical calculations of General Relativity.
The discovery of the double pulsar PSR B1913+16 is in itself amazing. In 1974,
Hulse and Taylor recorded pulsar signals on Arecibo’s 305-meter reflector (destroyed
in December 2020). These radio signals, instead of being perfectly regular as stan-
dard ones, varied by advancing or delaying repetitively with a period of 7.75 hours.
They understood that they came from the fact that the pulsar in question was in orbit
(binary) with another star, identified as another pulsar whose possible emissions did
not reach the solar system. The effect of the rotation of the two pulsars was reflected
in the frequency of the observed radio signal of the first pulsar. This was an impor-
tant astronomical discovery. Taylor and Hulse were first able to establish with great
182 8 Gravitational Waves
Fig. 8.6 Cumulative decrease in the period of orbit of PSR 1913B+16 between 1975 and 2014.
The points are experimental measurements, the curve is the calculated value in general relativity.
(T. Damour et N. Deruelle, Ann. Inst. Henri Poincaré, Vol. 44, No. 3, 1986). (Photo credit: The
Astrophysical Journal, Volume 829, Number 1 (2016))
precision the parameters of the binary system, pulsar of mass M1 = 1.4414 M , satel-
lite of mass M2 = 1.3857 M , of very eccentric orbit, periastre ∼1.1, R , apoastre
∼4.8, R , eccentricity e = 0.617. Since its discovery, the orbit has evolved in accor-
dance with the predictions of general relativity: precession of the periastre of 4.22◦
per year, and decay of the semilong axis (of 1.95 106 km) of 3.5 m per year.
The most interesting measure of precision is that of the 27, 907 s period, which
decreases by 76 µs per year. This decrease comes from the system’s loss of energy
by gravitational wave emission. There are 1130 orbits described per year and the
cumulative decrease of the period observed between 1975 and 2003 is shown in the
Fig. 8.6 accompanied by the forecast calculated by general relativity.13
The relativistic calculation has been done and refined by several authors.14 Apart
from details too complex for this chapter, we find the structure of the expression
13 See in particular J. M. Weisberg and Y. Huang, Relativistic measurements from timing the binary
pulsar PSR B1913+16, The Astrophysical Journal, Volume 829, Number 1 (2016), and references.
14 See Nathalie Deruelle, Jean-Pierre Lasota: Gravitational Waves, Odile Jacob “Sciences”, March
2018; Damour T. and Deruelle N. 1986 AnIHP 44 263; Damour T. and Taylor J. H. 1991 ApJ 366
501; Damour T. and Taylor J. H. 1992 PhRevD 45 1840; Weisberg J. M. and Taylor J. H. 1981
GReGr 13 1; Weisberg J. M. and Taylor J. H. 2002 ApJ 576 942.
8.5 Double Pulsar Discovery PSR B1913+16 183
(8.55) taking completed into account the e eccentricity of the system, that is
5/3 P
192πG b −5/3 −1/3 73 2 37 4
Ṗ G R = − ( ) m 1 m 2 (m 1 + m 2 ) 1 + e + e (1 − e2 )−7/2
5c2 2π 24 96
(8.56)
where Pb is the observed period. By inserting the measured values for the parameters, this
equation gives
Ṗ G R = −2, 402 10−12 for Ṗ ex p = −2, 398 10−12 (8.57)
15 Lyne AG, Burgay M, Kramer M, Possenti A, Manchester RN, Camilo F, McLaughlin MA,
Lorimer DR, D’Amico N, Joshi BC, Reynolds J, Freire PC. A double-pulsar system: a rare labo-
ratory for relativistic gravity and plasma physics. Science. 2004 Feb 20;303(5661):1153–7.
16 Foster, R. S., Wolszczan, A., Camilo, F., ApJ, 410, L91, (1993).
Chapter 9
Feynman’s Path Integrals in Quantum
Mechanics
1 John Archibald Wheeler and Richard Phillips Feynman, Interaction with the Absorber as the
Mechanism of Radiation, Rev. Mod. Phys. 17, 157, 1945.
2 At a beer party in the Nassau Tavern at Princeton. The account of this event can be found in
University, 1942; Space-Time approach to Non-Relativistic Quantum Mechanics, Rev. Mod. Phys,
vol. 20, p. 367, 1948: see also [26].
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 185
J.-L. Basdevant, Variational Principles in Physics,
https://doi.org/10.1007/978-3-031-21692-3_9
186 9 Feynman’s Path Integrals in Quantum Mechanics
Relativistic Quantum Mechanics, defended in May 1942 and published only after
the end of the war.
Let us stand next to Feynman and Jehle the next day, when they open Dirac’s 1932
article. An astonishing sentence appears:
At two times in the vicinity of (t) and (t + ), the basic transition amplitude q2 , (t + )|q1 , t
is analogous to ex p(i S [x]/).
S[x] is the classical expression of the action, but what does analogous mean?5
Note: A state |qi , t is, in Dirac’s article, an eigenstate of the position operator q̂(t)
in the Heisenberg representation.
We work here in one space dimension, with the spatial variable x. We will see later
that everything is easily generalized. The action is, depending on the Lagrangian L
t+ t+
S= L(x, ẋ)dt = ( m ẋ 2 /2 − V (x) dt (9.1)
t t
where we integrate on the values of y that is to say the full extent of the initial wave
function. The number A is here a quantity to be determined.
5 The exact and tasty words of the dialogue are reported by Feynman in his Nobel Prize-winning
address.
6 Common term due to Feynman, in field theory, for a Green’s function.
9.1 The Initial Click 187
In the exponential, the first term (i/)(mη2 /2) becomes very large as soon as y is
a little different from x. The phase of the integration of (9.5) then varies quickly and
this integration oscillates rapidly. The phase of the integrand of (9.5) varies rapidly,
this integrand oscillates quickly. On the average, these contributions to the integral
cancel each other and only the sufficiently small values of |x − y| = |η| contribute
appreciably. We can therefore rewrite Eq. (9.9) keeping in mind that only the weak
values of |η| contribute.
+∞
i mη2 i
ψ(x, t + ) = A dη exp exp − V (x + η/2) ψ(x + η, t).
−∞ 2
(9.6)
Let us expand, in this equation, the function ψ in series in and η. By developing
the first order in and the second order in η, we get
+∞
i mη2 i
ψ(x, t + ) = A dη exp exp − V (x + η/2) ψ(x + η, t).
−∞ 2
(9.7)
The term in ∂ψ/∂ x cancels (odd factor in η). The values of the two remaining
Gaussian integrals are recalled in the (9.12) table:
i η2 2iπ
C1 = dη exp [ (m )] = (9.8)
2 m
i η 2
i
C2 = dη η2 exp [ (m )] = C1 (9.9)
2 m
1 m
A= = , (9.10)
C1 2iπ
188 9 Feynman’s Path Integrals in Quantum Mechanics
∂ 2 ∂ 2
ψ(x, t) = − ψ(x, t) + V (x)ψ(x, t). (9.11)
i ∂t 2m ∂ x 2
Silvan Schweber says that in the autumn of 1946, during the bicentennial of
Princeton University, Feynman met Dirac and that the following exchange, rather
concise, took place:
Feynman: Did you know that these two quantities are proportional?
Dirac: Are they?
Feynman: Yes.
Dirac: Oh! That’s interesting.7
Dirac, like Schrödinger and Louis de Broglie, had reread Hamilton’s articles,
and in particular meditated on the characteristic function and connection between
geometric optics and classical mechanics. Dirac was interested in the phase and
ratio of the action and the universal Planck constant , in the expression exp (i/)S,
similar to the characteristic function of Hamilton in optics (Chap. 5, Eq. 5.3). But
Dirac had not been able to pursue his effort to find the normalization coefficient of
this expression. This normalization coefficient is difficult to find, but is essential, as
we shall see. Dirac had meanwhile reconsidered the subject by quoting Feynman.8
But there was a crucial point in Feynman’s analysis, which we shall explain. The
difficulty, and Feynman’s discovery, is that a probability amplitude must be calculated
as the sum of amplitudes from all possible paths to get from one point to another,
and not just from the classical path of minimal action. At the mathematical level, it
is therefore a matter of summing an infinite number of trajectories in what he called
a Path Integral.
Gaussian Integrals
∞ ∞ ∞
2 π 2 iπ 2 1 −iπ
e−λx d x = , eiλx d x = , x 2 eiλx d x = (9.12)
−∞ λ −∞ λ −∞ 2 λ3
Starting with this discovery, Feynman builds a version of quantum mechanics on the
following bases.
195, 1945.
9.2 Feynman’s Principle 189
9 Feynman’s Thesis; A new approach to Quantum Theory: The Principle of Least Action in Quantum
Mechanics, Laurie M. Brown (Editor), World Scientific, Singapore, (2005).
10 R. P. Feynman emphSpace-Time Approach to Non-relativistic Quantum Mechanics, Rev. Mod.
and presented his statements in a differential form, easier to read than Feynman’s
unconventional formulation.
4. The least action principle states that the actual physical trajectory X (t) renders
S minimal (extremal).
5. The equation of motion that determines the actual trajectory is the Lagrange–
Euler equation
∂L d ∂L
= . (9.14)
∂x dt ∂ ẋ
6. For a free particle, L = m ẋ 2 /2, the classical action between (x1 , t1 ) and (x2 , t2 )
is
m (x2 − x1 )2
Scl = . (9.15)
2 t2 − t1
The basic concept on which Feynman relies is that of the amplitude of a process.
The concept of the quantum state of a system (i.e., the description of the state of a
system) only comes afterwards. This point of view is more realistic in the sense that
192 9 Feynman’s Path Integrals in Quantum Mechanics
any experiment consists of a series of processes. Feynman wants to obtain the laws
of quantum processes. Therefore, Feynman’s principle concerns the dynamics and,
to a lesser extent, the physical quantities.
Feynman’s approach relies on the superposition principle. To each physical pro-
cess, there corresponds complex amplitudes, which we denote by φk , that add up.
The probability of observing an event coming from several interfering alternatives
for a process is given by the modulus squared of the sum of amplitudes that lead to
this event.
In the Young slit interference experiment, the interfering alternatives correspond
to the passage through each slit. To each of them there corresponds an amplitude
(i.e., φ1 and φ2 ), and the probability P of observing the outgoing particle at a given
point of the detector is the modulus squared of the sum P = |φ1 + φ2 |2 .
We can generalize the experiment by placing a series of screens one after the other,
each of which bears a set of slits. To each possible path of the particle, between the
source and the detector there corresponds an amplitude. The sum of these amplitudes
gives the total amplitude on the detector, and its modulus squared is the probability.
Consider a simple process where a particle moves from a point (xa , ta ) to another
point (xb , tb ) (we work in space-time, and we include the time variable in the def-
inition of the position of the particle). As in classical physics, one can imagine a
variety of paths by which this process can happen. Of course, the classical trajectory
is well defined, and it corresponds to an extremum of the action S(b, a). In quantum
mechanics, all paths x(t) coming from a and ending at b contribute to the amplitude
of the process, as one can visualize from the successive interferences represented in
Fig. 9.1.
9.3 The Path Integral 193
The modulus of all these amplitudes is roughly the same,12 but the phase differs
appreciably from one path to another. The amplitude K (b, a) is the sum of individual
amplitudes
K (b, a) = φk (b, a), (9.17)
k
where x(t) defines a path between a and b. Of course, the specific structure of the
setup in Fig. 9.1 does not matter.
The Feynman principle consists in stating that, in full generality, in any experi-
mental setup, the phase of the amplitude φ (x(t)) corresponding to a given path is
the classical action along this path, calculated according to Eq. (9.13), divided by :
1 i S(x(t))
φ (x(t)) = e . (9.19)
C
We shall see later on how one fixes the normalization constant C (which is essential).
We insist on the fact that the quantity S(x(t)) in this expression is the value of the
action (9.13) along the path x(t). It does not necessarily correspond to an extremum
of the classical action.
This leads us to a central point, which is the evaluation of the sum (9.18) with the
definition (9.19). In fact, the family of possible trajectories x(t) between a and b is a
complicated set. The result does not correspond to a simple limit of the discrete set,
which we could calculate in the case of Fig. 9.1, to a continuous set.
In order to define the sum on all paths, we proceed by first taking discrete time
intervals tb − ta in the form of N successive equal intervals ti , i = 0, . . . , N as :
tb − ta = N , = ti+1 − ti t0 = ta , t N = tb . (9.20)
For each value ti , we choose a value xi of the variable x. This gives a set of N − 1
values since the endpoints are fixed,
x0 = xa , x N = xb .
12Of course, it is only after we have understood the physical and mathematical structure of the
problem that this claim appears justified in good approximation.
194 9 Feynman’s Path Integrals in Quantum Mechanics
t0 t1 t2 ti ti+1 tN t
By joining the successive xi ’s by straight lines (we shall come back to this point),
we define a trajectory in the form of a broken line that joins the points a and b. Each
set {xi } defines a different possible trajectory.
If we integrate on the values of each xi from −∞ to +∞, we sum over all
“trajectories” corresponding to this particular discretization of the time variable.
This procedure is illustrated in Fig. 9.2.
For a given value of , let C() be the normalization constant of (9.19). The
amplitude K (b, a) is given by
1 i d x1 d x2 d x(N −1)
K (b, a) = lim ··· e S(b,a) ... , (9.21)
→0 C() C() C() C()
where, for each value of the set {xi (ti )}, S(b, a) is the action calculated on the
trajectory defined by this set, as represented in Fig. 9.2.
The end of the calculation is to take the limit → 0. This is where the normaliza-
tion factor C() enters, as well as the number of such factors. Indeed, the limit must
exist and only involve physical quantities. Assuming this is achieved, the amplitude
K (b, a) is given by
K (b, a) = lim K (b, a). (9.22)
→0
that the second derivative is not defined at the instants ti . In the case we consider,
this has no importance since the Lagrangian does not involve ẍ(t). In other cases
that one can imagine and that are not too pathological, the prescription ẍ =
(xi+1 − 2xi + xi−1 )/ 2 leads to acceptable results.
3. From the mathematical viewpoint, a satisfactory definition of the path integral
requires a formalism and some concepts more subtle than this discretization of
time. However, for what concerns us here, the only important points are that
the summation exists and that the prescriptions (9.21) and (9.22) lead to correct
results.
Most of the time in what follows, we will avoid writing the sum over paths as the
limits (9.21) and (9.22). We will write this sum as
b
i
K (b, a) = e S(b,a) Dx(t), (9.23)
a
where the symbol Dx(t) characterizes the mathematical nature of this expression.
The form (9.23) is called a path integral. In this expression, S(b, a) is a number
whose value depends on the function x(t). The “integration” over this function x(t),
which is represented by Dx(t), is called a functional integral.
One justification of the form (9.21) is obtained if we consider the combination law
for amplitudes of successive events.
Consider a process (a → b) and some intermediate time tc such that ta ≤ tc ≤ tb .
The action S(b, a) is therefore the sum
Indeed, the action is a time integral, and we work with Lagrangians L(x, ẋ, t) that
do not depend on higher-order derivatives such as ẍ. The integral (9.23) is written as
b
i
K (b, a) = exp (S(b, c) + S(c, a)) Dx(t),
a
where, of course, the integral over xc is a usual integral. We have factorized the
integrand.
196 9 Feynman’s Path Integrals in Quantum Mechanics
In other words, the amplitudes for two successive events going through the same
given intermediate point c, (a → c) and (c → b), are multiplied. The amplitude
K (b, a) is the sum of these products on all possible values of the intermediate point.
This is simply the superposition principle, which is a basic principle of Feynman.
This argument can be extended to any number N of intervals, with intermediate
points xi , i = 1, · · · (N − 1), which leads to
K (b, a) = K (b, N − 1)K (N − 1, N − 2) · · · K (i + 1, i)
· · · K (1, a)d x1 d x2 . . . d x N −1 . (9.27)
Assuming these intervals are infinitesimal and of equal length , the corresponding
expression resembles Eq. (9.21). It is not identical, however, since the latter form is a
limit, whereas (9.27) is an equality. This, however, enables us to obtain an infinitesi-
mal form of the amplitude K between two points separated by an infinitesimal time
interval . In fact, when t2 − t1 = is infinitesimal, the action (9.13) is, to first order
in ,
x 2 + x 1 x 2 − x 1 t 2 + t1
S(2, 1) = L , , , (9.28)
2 2
or
1 i x 2 + x 1 x 2 − x 1 t 2 + t1
K (2, 1) = exp L , , . (9.29)
C(t2 − t1 = ) 2 2
Inserting this result into the formula (9.27), and assuming we can exchange the
order of the integration and the limit → 0, we indeed obtain an equality between
the two expressions. This justifies the method (9.21) and (9.22) in all cases where
the expressions converge sufficiently well.
We first apply what we have done to the case of a particle propagating freely in space.
One calls propagator the amplitude K (b, a) of the propagation of a particle (free or
not) from point a to point b.
A free classical particle propagates according to a uniform linear motion. The
corresponding classical action is
9.4 Free Particle 197
m (x2 − x1 )2
Scl = . (9.30)
2 t2 − t1
This result is obvious. In this problem, the velocity ẋ is a constant and the Lagrangian
is L = m ẋ 2 /2 = m[(x2 − x1 )/(t2 − t1 )]2 /2, which leads directly to (9.30) since S =
L dt.
In order to calculate the propagator of a free particle, we could use the limiting form
(9.21). However, in this case, it is completely equivalent to use the expression (9.27)
because the result is independent of the value of = ti+1 − ti . We will also obtain
the value of the normalization coefficient C() in (9.19).
The final result is that the propagator of a free particle between points a and b is
−1/2
2πi(tb − ta ) im(xb − xa )2
K (b, a) = exp . (9.31)
m 2(tb − ta )
This gives the value of the normalization factor. For a free particle and a time interval
(tb − ta ), we read in the formula above
1/2
2πi(tb − ta )
C(tb − ta ) = . (9.32)
m
Of course, we have made use of the fact that t2 − t1 = t1 − t0 = , but we have not
taken any limit.
The expression (9.35) can be written in terms of (t2 − t0 ) = 2 as
2
1 i(2)π im
K (2, a) = exp (x2 − x0 )2 . (9.36)
C() m 2(t2 − ta )
Consequently, the equality of (9.37) and (9.36) for infinitesimal time intervals
tb − ta = imposes the choice
1/2 1/2
2iπ 2iπ (tb − ta )
C() = ≡ . (9.38)
m m
The proof of (9.31) can be obtained by recursion. Assuming this result is exact,
we insert it in (9.26) by considering an intermediate point (x, t) we obtain
2πi(tb − t) 2πi(t − ta ) −1/2
K (b, a) =
m m
im (xb − x)2 (x − xa )2
exp + d x. (9.39)
2 (tb − t) (t − ta )
It is straightforward to obtain
(xb − x)2 (x − xa )2
+ =
(tb − t) (t − ta )
(xb − xa )2 (tb − ta ) xb (t − ta ) + xa (tb − t) 2
+ x− . (9.40)
(tb − ta ) (tb − t)(t − ta ) (tb − ta )
9.4 Free Particle 199
The first term, which is independent of x, factorizes in the integral, which boils down
to a Gaussian integral. The value I of this integral (without the prefactors of (9.39))
is therefore
−1/2
m(tb − ta ) im (xb − xa )2
I = exp . (9.41)
2iπ (tb − t)(t − ta ) 2 (tb − ta )
The proof is completed by noticing that the previous result does not require any
condition on a, b and the intermediate point (x, t). Therefore, the method can be
extended to any partition of the interval [a, b]. The formula therefore coincides with
the “definition” (9.35) in the infinitesimal limit (tb − ta ) = N if it is legitimate to
interchange the order of the integration and the limit → 0.
so that the expression (9.43) which actually factors into an exponential times a
Gaussian integral, is identical to (9.42):
m im(x − x)2
K (x , t ; x, t) = exp . (9.44)
2iπ (t − t) 2(t − t)
200 9 Feynman’s Path Integrals in Quantum Mechanics
For simplicity, let us fix the origin of time and position at a. We call b ≡ (x, t) the
point of arrival, and we examine the properties of the propagator K (x, t; 0, 0) as a
function of the endpoint (x, t). If we set K(x, t) ≡ K (x, t; 0, 0), we have
m im x 2
K(x, t) = exp . (9.45)
2πit 2 t
One can check with no difficulty that the free propagator obeys the partial differ-
ential equation
∂K 2 ∂ 2 K
i =− (9.46)
∂t 2m ∂ x 2
for t > 0 (or tb > ta ).
Equation (9.46) has the same form as the Schrödinger equation for a free particle.
We must, however, be careful since we do not yet know the physical nature of the
amplitude K and how it is related to a physical probability amplitude.
The same thing happens here. It is convenient to work with the propagator K (b, a)
and to call it a probability amplitude, even though we are aware that a true probability
amplitude is obtained after a summation of K (b, a) over vicinities of b and a.
One can check that if we forget this precaution, the “probability” of observing the
particle in a vicinity d x of point xb , knowing that it originates from a, would be
m
P(b) d x = d x,
2π (tb − ta )
whose integral over all space is infinite. This is exactly the same problem as encoun-
tered in quantum mechanics to shift from de Broglie plane waves to wave packets.
Some authors stress the fact that the free Schrödinger equation can be considered as
a Fourier diffusion equation,
∂ρ
= D∇ 2 ρ,
∂t
for a purely imaginary time t = iτ . This remark is interesting in that the same math-
ematical techniques apply to both and that the solutions have obvious formal simi-
larities.
Two points are in order. First, the function K that we use here becomes a density
ρ (of heat or matter), which is positive. The solution is then real and positive, or zero.
The result (9.47) expresses the conservation of energy, and the limit (9.48) represents
an initial condition where some quantity of heat has been deposited on a given point,
which avoids any problem of interpretation.
Secondly, and this is perhaps more interesting, this is an example of the fact that
path integral techniques are useful in a large category of problems. One can refer to
the remarkable book Techniques and Applications of Path Integration by Schulman
[27]. In the present case, the solution of a partial differential equation of first order
in time can be cast quite directly into the form of a path integral. This is the case for
the Fourier equation as well as for the Schrödinger equation.
m
K(x, t) = exp iφ(x, t) ∝ ei(kx−ωt) , (9.49)
2πit
where k is the wave vector and ω the frequency. These are locally related by
∂φ ∂φ
k= ω=− . (9.50)
∂x ∂t
Here, the value of the phase is
m x2
φ(x, t) = .
2 t
Therefore, we obtain
mx 2π h
k= , i.e., λ = = , (9.51)
t k m(x/t)
and
m x 2
ω= . (9.52)
2 t
If we place ourselves far from the origin x λ, the propagator oscillates in x and t
with a wavelength λ and a frequency ω, which are both nearly constant. If the particle,
emitted at the origin at t = 0 is detected at point x at time t, its velocity is v = x/t, its
momentum is p = mv = m(x/t), and its kinetic energy is E k = m(x/t)2 /2.13 We
therefore obtain the de Broglie relation between the wavelength of the propagator
and the momentum of the free particle
h
λ= . (9.53)
p
Similarly, the kinetic energy E k = mv 2 /2 of the free particle is related to the fre-
quency ω of the wave by the relation ω = (m/2)(x 2 /t 2 ); i.e.,
E k = ω. (9.54)
These de Broglie–Einstein relations (9.53) and (9.54) will appear later as being
in agreement with the definition of energy and momentum in the classical limit.
13 See, for instance, [11], Chap. 2, Sect. 6, for a discussion of this point.
9.5 Wave Function and the Schrödinger Equation 203
In the book by Feynman and Hibbs, there are several examples and calculations
of this type. We shall not elaborate further on this aspect.
The physical content of this formula is important. The amplitude ψ(x, t) for the
particle to arrive at (x, t) is the sum over all possible values of an intermediate point
x of the product of the total amplitude ψ(x , t ) and the amplitude K (x, t; x , t ) to
go from (x , t ) to (x, t).
In other words (we intentionally keep the enthusiastic presentation of Feynman),
the effect of all the past history of a particle is contained in a single function ψ(x, t).
One can forget everything one knows about the past history of a particle. If one
knows its wave function at a given time t, one can calculate and “read” in it all that
can happen to the particle in the future.15
In fact, Eq. (9.58) is nothing else than the modern expression of the Huygens–
Fresnel principle in optics (see, for instance, Born and Wolf [14], Chap. VIII),
which founded wave optics. The Huygens principle, given in 1690, was that “Each
infinitesimal element of a wave front can be considered as a secondary perturbation
which radiates spherical wavelets. The wave front at a later time is the envelope of
these wavelets”. Fresnel completed this principle later, in 1818, by postulating that
the secondary wavelets “are in mutual interference”. The fundamental principles of
wave optics were stated.
We have abundantly treated the case of a free particle above. The propagator can be
calculated with no difficulty:
m im (x2 − x1 )2
K(x2 , t2 , x1 , t1 ) = exp . (9.59)
2πi(t2 − t1 ) 2 (t2 − t1 )
∂ψ(x, t) 2 ∂ 2 ψ(x, t)
i =− . (9.60)
∂t 2m ∂x2
As in the case of a particle in a potential, which we examine in the next paragraph,
we will not further pursue the analysis, which is completely analogous to the usual
analysis of Schrödinger theory. One can refer to the book by Feynman and Hibbs
[25] for all details.
15 Feynman added, with his legendary sense of humor, “The effect of the entire History on the future
of the universe could be obtained from a single gigantic wave function.”
9.5 Wave Function and the Schrödinger Equation 205
We have already dealt, at the beginning, with the case of the Schrödinger equation
for a particle placed in a potential V (x). We can easily consider a potential V (x, t)
that varies with time (it’s done in the book of Feynman and Hibbs [25]).
We will not pursue the analysis of quantum mechanics in general, we can refer to
the book of Feynman and Hibbs.
Let us note only the following points.
1. The theory of observables, their algebraic properties, We shall see two important
cases: Hamiltonian and momentum.
2. There are significant technical and conceptual simplifications in the design and
treatment of perturbation theory.
3. The resulting formalism extends much more easily to several particles.
It is a simple calculation to extend what we have just done to situations that differ
in the number of variables. In three dimensions we know that we are able to find the
Schrödinger equation
∂ψ(r, t) 2 2
i =− ∇ ψ(r, t) + V (r)ψ(r, t). (9.61)
∂t 2m
These equations are written in the form
∂ψ
i = Ĥ ψ (9.62)
∂t
2 2
Ĥ = − ∇ +V (9.63)
2m
It is of course possible to extend this result to situations where the interaction is
different. The same calculation is a little longer in the well-known case of a charge
particle q plunged into a magnetic field B and an electric field E deriving from the
potentials A, Φ and undergoing the Lorentz force of Lorentz, where the lagrangian
is
L = m ẋ 2 /2 + q ṙ · A(r, t) − q Φ(r, t). (9.64)
The three-dimensional calculation is similar to the previous one, with some com-
plications that are treated by L. S. Schulman ([27] Chap. 4), which refers to what
206 9 Feynman’s Path Integrals in Quantum Mechanics
is said by Feynman himself in his original article [26]. It leads to the well-known
Schrödinger equation:
∂ψ(r, t) 1
i = ∇ − qA · ∇ − qA ψ + qΦ ψ, (9.65)
∂t 2m i i
Let us emphasize that this form comes from the classical Lagrangian and the
technique of path integrals. We have not introduced the conjugate momentum p =
(/i)∇, whose definition comes later in Feynman’s presentation.
The wave equation (9.62) means that if ψ(x, t) is a probability amplitude, then there
is conservation of probability.
First we see that if f and g are square integrable, then
+∞ +∞
(H g)∗ f d x = g ∗ (H f )d x (9.67)
−∞ −∞
∂ψ ∗ i
= + ( Ĥ ψ ∗ ). (9.69)
∂t
Therefore
∂ψ ∗ ∂ψ ∗
ψ =− ψ (9.70)
∂t ∂t
9.5 Wave Function and the Schrödinger Equation 207
This result can be obtained directly from the starting hypotheses of Feynman, and
his propagator. If f is the wave function at the moment ta , then the integral of its
square modulus squared is the same at time tb :
+∞ +∞
if ψ(b) = K (b, a) f (a)d xa , then ψ ∗ (b)ψ(b)d xb = ψ ∗ (a)ψ(a)d xa
−∞ −∞
f (t) −i Ĥ φ(x))
= . (9.73)
f (t) φ(x)
The equality of two functions, one of x, the other of t, means that they are both equal
to a constant, which we set equal to −(i/)E. This particular solution is therefore
of the form
In particular, this means that the wave function oscillates at any point in space with the
same well-defined frequency. We have seen in (9.54) that the frequency at which the
phase oscillates corresponds to the energy. The system therefore has a well defined
energy.
The presence probability of the particle at a point is given by the modulus squared
|ψ(x)|2 of the wave function at that point. This probability is invariant in time, the
system is in a stationary state.
208 9 Feynman’s Path Integrals in Quantum Mechanics
By defining the momentum operator, we open the way to all traditional observables,
function of positions and momentum.
One way to measure the momentum of a particle (or any other quantity) is to organize
a process of position measurements that tells us about this quantity.
Let us consider a one-dimensional particle, originally located, and proceed to the
next step.
We will observe where the particle ended up after a time t = T . If the position is
y then the particle speed is equal to y/T and the amount of motion, or momentum,
is p = my/T . In fact, to have a precise determination of the law of probability of
the momentum, that is the probability that the value of the momentum is between p
and p + dp, it is necessary to measure the probability P(y) dy that the free particle,
that is, being disconnected from any interaction, ends up between y and y + dy. So
we determine p from the measurement of y by p = my/T .
Let f (x) be the particle wavefunction at t = 0. We want to express P( p) directly
from f (x).
The probability amplitude that the particle will arrive at y at time t = T is
+∞
ψ(y, T ) = K 0 (y, T ; x, 0) f (x) d x. (9.75)
−∞
If we put the value of the free propagator (9.31) in this expression, we get
+∞
m imy 2 im(−2yx + x 2 )
ψ(y, T ) = ( )1/2 exp{ } exp{ } f (x) d x. (9.76)
2πiT 2T −∞ 2T
The modulus squared of this expression gives the probability that the particle is
between y and y + dy that is, in the limit T → ∞, the probability that the momentum
is between p and p + dp
9.6 The Momentum 209
2
+∞ im(−2yx + x 2 )
mdy
P(y)dy = exp{ } f (x) d x = P( p)d p for T → ∞ .
2π T −∞ 2T
(9.77)
Starting from a wave function ψ(x, t) we can therefore construct the probability
amplitude φ( p, t) such that P( p, t) = |φ( p, t)|2 .
+∞
−i px dx
φ( p, t) = exp{ }ψ(x, t) . (9.79)
−∞ (2π )1/2
The two amplitudes are Fourier transforms of each other. The inverse Fourier trans-
form of (9.79) is
+∞
i px dp
ψ(x, t) = exp{ }φ( p, t) . (9.81)
−∞ (2π )1/2
The isometry of the Fourier transformation means that if f 1 (x) and f 2 (x) are
transforms of g1 ( p) and g2 ( p), then
+∞ +∞
f 1∗ (x) f 2 (x)d x = g1∗ ( p)g2 ( p)dp, (9.82)
−∞ −∞
so that
210 9 Feynman’s Path Integrals in Quantum Mechanics
+∞ +∞
|ψ(x, t)|2 d x = |φ( p, t)|2 dp = 1, (9.83)
−∞ −∞
φ( p, t) and ψ(x, t) are both probability amplitudes, and either can represent the
state of a particle and its evolution.
Feynman and Hibbs present several examples, such as cases of diffraction, where
each time the Heisenberg inequality is satisfied. They do not give a proof of the
relationship itself, which, if violated, would mark the end of quantum mechanics.
This inequality is a very simple consequence of Fourier’s analysis for the probability
amplitudes of ψ(x, t) and φ( p, t) that we have just obtained above.
Both from the conceptual and the technical points of view, the method of Feynman
path integrals has an undeniable elegance and richness. We have mentioned that it
extends to many other physical problems such as quantum field theory, Brownian
motion, polarons, spin physics, statistical mechanics, and critical phenomena, as one
can see in the book of Schulmann [27]. This book contains, in particular, a very
pleasant discussion of quantum mechanics in curved spaces. We end this chapter
with a series of remarks that the present results have induced after going through the
previous five chapters of this book.
There is no hierarchical relationship between the depth of the various approaches
and different chapters of physics, neither do we wish to discuss any axiomatics of
physics. It is a personal matter of taste to prefer such and such a line of thought.
What is interesting here is to see the unifying character of what we have discussed,
from the Fermat principle up to the Feynman path integrals.
and suppose the classical action S(b, a) is macroscopic, i.e., it is much larger than the
Planck constant . Consider the contribution of several paths that can perfectly well
be close to each other in the classical sense but whose difference is much larger than
. The contributions of these paths to the phase will be completely different (and very
difficult to determine with an accuracy better than, say, π ). With great probability,
they will interfere destructively. If one considers the set of all those paths, their total
contribution to the integral will vanish.
9.7 Concluding Remarks 211
However, in the vicinity of the classical trajectory xcl (t), the action Scl (b, a) is
stationary. Therefore, paths that are sufficiently close to the classical trajectory will
give contributions that will interfere in a constructive way. Only those paths along
which the action S(b, a) is sufficiently close to the classical action Scl (b, a) will
contribute, the difference being noticeably smaller than the unit of action . Notice
that for all processes involving macroscopic values of the action, this quantity will
be larger than, say, 10(25 to 30) .
In other words, under these conditions, the only appreciable contribution will
come from an infinitesimal vicinity of the classical trajectory that cannot be resolved
experimentally. Consequently, the “probability” of the classical trajectory is equal to
one. The probability for any trajectory that can be distinguished from the classical
one vanishes.
Therefore, classical mechanics appears here as the limit of quantum mechanics
for macroscopic actions. Of course, one may wonder about the fact that Feynman’s
starting point involves the classical action in (9.84), which means that some care
should be taken with the previous assertion. However, from the very beginning,
Feynman operates in space-time (x, t). Therefore, all quantities defined in Sect.
9.3.1 (i.e., (x, ẋ, t), the Lagrangian L, and the action S) are perfectly well-defined
quantities, even though they do not have to possess any intuitive meaning.
This approach removes one of the sometimes confusing aspects of traditional
quantum mechanics that tends to make us live in abstract spaces of infinite dimension.
Consequently, all the quantities defined in Sect. 9.3.1, that is (x, ẋ, t), the lagrangian
L and the action S, are perfectly defined, even if they would not be allocated any
intuitive physical significance.
In addition, in theoretical physics, this approach makes it possible to deal with
problems directly in the space-time of Einstein, Lorentz and Poincaré. Special rel-
ativity can therefore be easily incorporated into calculations. On this subject, it is
interesting to see that Feynman does not hesitate to take historical examples such as
the Lamb shift of the levels of the hydrogen atom, which was long considered as one
of the summits of relativistic quantum theory.
It is thus understandable that while this approach to quantum mechanics remains
on the sidelines of the methods of teaching elementary theory, it has for many years
been a necessary step in quantum field theory of fundamental interactions. The fun-
damental structures, the gauge groups, which are the basis, for example, of the unified
theory of electro-weak interactions are infinitely easier to deal with path integrals.
And of course, this theory applies to an impressive number of other fields, in physics
as well as in mathematics and engineering, as can be seen in Lawrence S. Schulman’s
book [27].
compatible limits. This dream had first been glimpsed by Dirac and many people
took part in the adventure.
Is it really the case? Not quite. We stumble upon one of the most difficult subjects
to unravel in the birth of quantum mechanics at the beginning of the 20th century:
spin 1/2. This quantum physical quantity, which remained a mystery for more than
twenty-five years between Zeeman’s 1896 measurements of atomic level cleavages
in even numbers and the final solution given by Uhlenbeck, Goudsmit and Pauli in
1925-26. The spin was an enigma because, precisely, it does not correspond to any
intuitive notion of the world around us. And yet, it is a fundamental physical quantity,
without which we would not understand the structure of matter. At the end of his
book, Feynman wrote “The path integrals suffer from a terrible defect. They do not
allow discussion of spin operators or other operators of this type in a simple and
lucid way.”
Indeed, spin is not an intuitive quantity! Of course, Feynman16 has done this,
attempting to harmonize integration formulas with quaternions, but the lack of com-
mutativity of these numbers is a serious complication. The optimism of great minds
is always fascinating.17
In addition, a current topic of great interest is entangled states, which are the
basis of an impressive amount of research and technological discoveries in quantum
information, including quantum cryptography and the Holy Grail of the quantum
computer. But we do not see this topic appearing together with a label of path integral.
In states called “GHZ”,18 there is not only a physical quantity that is out of our
intuitive perception, but physical states whose measurement leads to paradoxical
results in the same way as Schrödinger’s cat.19
remarkable analysis of Alexander Altman, Ben D. Simons, Condensed matter field Theory, Cam-
bridge University Press, (2010) Chap. 3, 134.
18 D. M. Greenberger, M. Horne, and A. Zeilinger, in Bell’s Theorem, Quantum Theory, and Con-
quantum nonlocality in three photon GHZ entanglement, Nature, 403 (6769), 515 (2000); Jian-Wei
Pan and Anton Zeilinger, (2002) Multi-Photo Entanglement and Quantum Non-Locality https://
vcq.quantum.at/fileadmin/Publications/2002-12.pdf.
9.7 Concluding Remarks 213
(one can also vary the Lagrangian itself). The fact that the amplitude corresponding
to the propagator is stationary to first order in the variation of the xi (t) implies the
classical equation tb
δ Ldt = 0;
ta
Feynman’s principle can also be compared with what is called the stationary
phase method in analysis. Consider the limit for μ → ∞ of the integral
∞
G(μ) = eiμ f (t) dt. (9.85)
−∞
For large values of μ, the phase of eiμ f (t) varies very rapidly unless f (t) = 0.
Therefore, the dominant contributions to the integral will come from values of t for
which f (t) vanishes. If f (t) vanishes at a single point t0 , we can expand f as a
power series in the vicinity of t0 ; i.e.,
∞
f (t0 )+··· )]
e[iμ( f (t0 )+ 2 (t−t0 )
1 2
G(μ) = dt. (9.86)
−∞
If one neglects higher-order terms in the expansion, one obtains the result
2πi iμ f (t0 )
G(μ) = e , (9.87)
μf (t0 )
9.8 Exercises
1 2 1
L= m ẋ − mω2 x 2 .
2 2
9.8 Exercises 215
Exercises of Chapter 2
The variational principle here consists in imposing that the energy losses by Joule
heating are as small as possible. In other words, we want to find the minimum value
of
W = R1 I12 + R2 I22 with the constraint I1 + I2 = I.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to 217
Springer Nature Switzerland AG 2023
J.-L. Basdevant, Variational Principles in Physics,
https://doi.org/10.1007/978-3-031-21692-3
218 Solutions of the Problems and Exercises
a
V = μgz 1 + ż(x)2 d x. (A.1)
0
1 + ż 2 = z z̈, (A.2)
z = c cosh((x − x0 )/c)
where the parameters c and x0 are determined by the constraints z(0) = z 0 , z(a) = z 1 ,
a
and the length of the string L = 0 1 + ż(x)2 d x.
The minimum is located in the interval x ∈ [0, a] according to the relative posi-
tions of the endpoints.
In this exercise one can see that by using the technique of Lagrange multipliers,
which we defined in Sect. 2.4.3, the problem can be cast as a translation-invariant
problem along the z axis since the length L is an intrinsic quantity of the string.
Consider the interval {z, z + dz} and let r (z) be the radius of a transverse section of
the surface. We want to minimize the energy
h
A= 2πr 1 + ṙ 2 dz
−h
This surface, which is rotation invariant around the z axis, bears the sweet name of
a catenoid (Fig. A.1).
One can attempt to determine shapes of bubbles attached to more complicated
structures. (In general this must be done numerically.)
In exercise (2, 2) one can see that by using the technique of Lagrange multipliers,
which we define in Sect. 2.4.3, the problem can be cast as a translation-invariant
problem along the z axis since the length L is an intrinsic quantity of the string.
Solutions of the Problems and Exercises 219
and therefore a φ̇(z) = 1 and the solution r (z) = a cosh((z − z 0 )/a). This is a par-
ticular case of the use of conserved quantities discussed in Chap. 3.
We must minimize a
V = μgz 1 + ż(x)2 d x, (A.3)
0
yields ż = sinh φ(x), i.e., μgz + λ = C cosh φ with C φ̇ = μg. The solution is
220 Solutions of the Problems and Exercises
λ C μg
z=− + cosh (x − x0 ) . (A.6)
μg μg C
The
a
constants x0 , C, and λ are fixed by the conditions z(0) = z 0 , z(a) = z 1 , and
0 1 + ż(x)2 d x = L .
d y
0= .
d x x(1 + (y )2 )
6. We deduce
y
C= ,
x(1 + (y )2 )
y dy ẏ ẏ
= = = √ = C,
x(1 + (y )2 ) x(d x 2 + dy 2 ) x(ẋ 2 + ẏ 2 ) x 2g sin α
√ (A.7)
and therefore ẏ = K x with K = C 2g sin α.
7. The parametric form x(θ) = (1 − cos 2θ)/2C 2 = sin2 θ/C 2 , y(θ) = (2θ−
sin 2θ)/2C 2 satisfies the equation (y )2 = C 2 x/(1 − C 2 x); i.e., (dy/dθ)2 =
(d x/dθ)2 tan2 θ. From ẏ/x = K , we obtain (dy/dθ)(dθ/dt)/x = K ; i.e., dθ/
dt = K /2 and θ = K t/2 since, for t = 0, θ = 0.
8. The curve is a portion of a cycloid. We have dy/d x = tan θ and therefore y 1
for θ ∼ π/2. The trajectory starts vertically (dy/d x = 0 for θ = 0) and becomes
horizontal if y(A) x(A), as shown in Fig. 7.1.
9. Since point A is fixed, the velocity v A at A is fixed by energy conservation. It is
the maximum velocity of the skier. Therefore, the time to get horizontally from
y(A) to y(0) is larger than the time (y(A) − y(0))/v A it would take to cover this
distance at the maximum velocity. On the other hand, one must start vertically in
Solutions of the Problems and Exercises 221
order to acquire the maximum velocity as quickly as possible. The ideal trajectory
comes from an optimization between these two effects (Fig. A.2).
Exercises of Chapter 3
m1 + m2 2 m2 2 2
L= ẋ + (l φ̇ + 2l ẋ φ̇ cos φ) + m 2 gl cos φ.
2 2
1. Free particle
m (x2 − x1 )2
S= .
2 t2 − t1
2. Harmonic oscillator
mω
S= (x22 + x12 ) cos ωT − 2x2 x1 .
2 sin ωT
3. Constant force
m 1 F 2
S= v02 − F x1 (t2 − t1 ) − (t2 − t1 )3
2 3 m
5. One varies t2 , taking into account that the variation of the time of arrival yields a
variation of the trajectory.
3.3. Brachistochrone
We want to minimize ⎛ ⎞
b
⎝ 1 + ż 2
⎠ dx
T = (A.9)
a 2g(α − z)
∂L ∂L ∂L
pr = = m ṙ, pθ = = mr 2 θ̇, pφ = = mr 2 sin2 θφ̇.
∂ ṙ ∂ θ̇ ∂ φ̇
3. Taking the derivative of (3.84) with respect to time, and taking into account that in
Cartesian coordinates p = mv, one obtains directly the result L z = mr 2 sin2 θφ̇ =
pφ .
4. The conservation of pφ , or L z , corresponds to the invariance under translation in
φ; i.e., rotation invariance around the z axis.
5. If a charged particle is in a magnetic field B parallel to Oz, there is rotational
invariance around the z axis and the component L z is conserved.
Solutions of the Problems and Exercises 223
d ∂Φ ∂Φ
Φ = z + z .
dx ∂z ∂z
Consequently,
d ∂Φ
Φ − z = 0,
dx ∂z
w0 z − w1 z 0 ln(1 + (z/z 0 ))
x=L , (A.13)
w0 z 1 − w1 z 0 ln(1 + (z 1 /z 0 ))
If z 1 L and z 1 z 0 , the velocity of the wind does not vary appreciably over the
whole path, and one has z ∼ z 1 /L 1.
In the second question, we have seen that the optimal velocity for a constant wind
velocity is attained for z = 1. The present configuration certainly does not corre-
spond to the best strategy. One must tack at some point (x1 , Z ) with 0 < x1 < L
224 Solutions of the Problems and Exercises
and Z z 1 , as represented in Fig. A.3 in order to benefit fully from the power of
the wind (this possibility was excluded in the text).
The trajectory drawn with an angle√of θ = 45 degrees (|z | = 1) and a tacking θ →
−θ at x = L/2 has a total length√L 2 and a velocity greater than (w0 − w1)/2. The
time along this path, Tv = 2L 2/(w0 − w1), is obviously shorter than the time
along the path with no tacking, T ∼ 2L(z 1 /L)/(w0 − w1) = 2z 1 /(w0 − w1) .
In realistic cases, for instance the America’s Cup, one can see how subtle the
strategy of a regatta problem is. Skippers must make quick decisive choices between
very different options.
Exercises of Chapter 4
{L x , L y } = L z {L y , L z } = L x {L z , L x } = L y
We obtain
2
{A x , A y } = − H L z , {A x , L x } = 0 , {A x , L y } = A z ,
m
and cyclic permutations. Besides that its Poisson bracket with the hamiltonian van-
ishes. This leads to the impression of 7 conserved quantities in the Kepler or Hydro-
gen atom problems (two vectors and the energy). Actually these reduce to 5, given
that L · A = 0 (that vector lies in the plane of the orbit, and furthermore there is
a scalar relation between it, the coupling constant and the energy). This is called
“superintegrability”, since there are only 6 physical quantities in the problem: (r, p).
The Runge-Lenz vector has played an incredible role in the history of mechan-
ics.1 Actually, in Newtonian dynamics, (Potential k/r ), the Lenz vector points to
the perihelion of planets. Einstein remarked that about the planet Mercury, since in
1 H. Goldstein, Prehistory of the Runge-Lenz Vector, American Journal of Physics, 43, 735 (1975).
Solutions of the Problems and Exercises 225
P2 mω 2 X 2 Q2 m(ω 2 + Ω 2 )Y 2
H= + + + .
2m 2 2m 2
√
2. The eigenfrequencies of the system are therefore ω1 = ω and ω2 = ω2 + Ω 2 .
3. The general form of the motion follows from
m 2 mω 2 2 3mΩ 2 2
H= ( p1 + p22 + p32 ) + (x1 + x22 + x32 ) + (x1 + x22 ).
2 2 2
The canonical√transformation (Jacobi variables)
√ gives: √
X 1 = (x1 − x2 )/ √2 , X 2 = (2x3 − x1 − x2 )/ √6 , X 3 = (x1 + x2 + x3 )/ √
3
P1 = ( p1 − p2 )/ 2 , P2 = (2 p3 − p1 − p2 )/ 6 , P3 = ( p1 + p2 + p3 )/ 3
The “path” in phase space of these three circular motions is not easy to mimic on
a two-dimensional graph. Notice that the mode for which the initial three oscillators
are in phase is done at the initial pulsation ω, which is not guessed easily on the
formulation (x, ẋ).
2. In these variables, which are the same as those used by Dirac in the quantum
harmonic oscillator,
H = ω(a ∗ a).
ȧ = {a, H } = −iωa,
Therefore, we have
ȧ = {a, H } = −iωa − ib sin Ωt.
This is solved by standard techniques. With the condition E(t < 0) = 0, one
obtains
e−i(Ω−ω)T − 1 e−i(Ω+ω)T − 1 2
E(t > T ) = ωb2 | + | .
2i(Ω − ω) 2i(Ω + ω)
sin2 (Ω − ω)T /2
E(t > T ) = ωb2 ,
(Ω − ω)2
4.8. Problem
yk = y N∗ −k , qk = q N∗ −k .
Solutions of the Problems and Exercises 227
(b) We have
N
N
1 2iknπ/N
N
1 −2ikn π/N
N
yk yk∗ = √ e xn √ e xn . (A.14)
k=1 k=1
N n=1 N n =1
N
N
qk qk∗ = pn2 . (A.15)
k=1 n=1
Similarly
N
N
1 −2iknπ/N
N
1 2ikn π/N
N
qk qk∗ = √ e pn √ e pn . (A.16)
k=1 k=1
N n=1 N n =1
k=1
2m 2
with
kπ
Ω k = ω 2 + 4Ω 2 sin2
2
.
N
(b) We have
{y j , qk } = δ jk , {y ∗j , qk∗ } = δ jk , {y j , q N∗ −k } = δ jk , {y ∗j , q N −k } = δ jk .
(A.18)
(c) We obtain
m
ẏk = {yk , H } = (qk∗ + q N −k ) = mqk∗ ,
2
228 Solutions of the Problems and Exercises
m
ẏk∗ = {yk∗ , H } = (qk + q N∗ −k ) = mqk ,
2
mΩ 2k (yk∗ + y N −k )
= mΩ k yk∗ ,
2
q̇k = {qk , H } = −
2
mΩ 2k (yk + y N∗ −k )
q̇k∗ = {qk∗ , H } = − = mΩ k yk .
2
2
(d) We therefore have {yk (t)} = ak cos(Ω k t + φk ), and hence {xn (t)}.
3. If, at time t = 0, we have y N (0) = 1, ẏ N (0) = 0 and {yn (0) = 0, ẏn (0) = 0},
∀n √= N , then y N (t) = cos(ωt) and yn (t) = 0, ∀n = N . Therefore xn (t) =
(1/ N )
cos(ωt). Oscillators of the same amplitude at a given time are always in phase,
and only the global motion with respect to the plane x = 0 with frequency ω
appears.
4. Wave propagation.
If ω = 0, the eigenfrequencies are Ωk = 2Ω sin(kπ/N ) ∼ 2Ω(kπ/N ) for k
N . The boundary conditions give y1 = cos 2Ωπt/N , y N −1 = cos 2Ωπt/N , and
yn = 0 otherwise.
(a) Therefore, we obtain
2 2Ωπt
lxn = x N −n = √ cos cos 2nπ N (A.19)
N N
1 2Ωπt + 2nπ 2Ωπt − 2nπ
= √ cos + cos . (A.20)
N N N
in the notation above. The point xn+m has the same amplitude at time t + m/Ω
as the point xn at time t.
(c) If we write xn (t) = f (t, y = na), the function f is
1 2Ωπt + 2yπ/a 2Ωπt − 2yπ/a
f (t, y) = √ cos + cos
N N N
1 ∂2 f ∂2 f
− = 0.
Ω a ∂t
2 2 2 ∂x 2
Solutions of the Problems and Exercises 229
The proof takes some time, but it is straightforward (follow Sect. (3.3.2)).
dA p2
= {A, H } = − r · ∇V.
dt m
3. If V = g r n , we have
∂V
r · ∇V = r = nV.
∂r
We therefore obtain 2E c = nV .
4. The total energy is E = E c + V . We therefore obtain
(a) For a harmonic oscillator, E = 2E c = 2V .
(b) For a Newtonian potential, E = −E c = (1/2)V , which is obvious on a
circular trajectory, but holds for any elliptic trajectory.
5. In general, for an arbitrary potential, the orbits of bound states are not closed. How-
ever, they remain confined in a given region of space at any time. The generalization
of the averaging (5.62) is
1 T
f = lim (T →∞) f (t) dt.
T 0
since A(t) is bounded for any t. With this definition, the result remains true.
230 Solutions of the Problems and Exercises
3 ∂ψ ∂ψ ∗ ∗ a2 ∂ψ ∂ψ ∗
L= − ∇ψ · ∇ψ − ψ∗ −ψ , (A.21)
v 2 ∂t ∂t 2 ∂t ∂t
where ψ ∗ is the “mirror” density which concentrates instead of diffusing. This leads
to the propagation equation
3 ∂2ψ ∂ψ
− Δψ + a 2 = 0. (A.22)
v 2 ∂t 2 ∂t
This equation can be solved by Fourier transformation if the coefficients v and a are
constants. (This is not the case if the medium is inhomogeneous or discontinuous.)
Exercises of Chapter 7
7.1. Geodesics
The calculation is similar to previous cases such as (2). We define the parameters
ω and γ as before:
2E m A2
ω2 = , γ 2
= . (A.24)
m R2 2E R 2
We obtain
ρ(t) = R 1 + (1 − γ 2 ) sinh2 ω(t − t0 ), (A.25)
and
tanh(φ(t) − φ0 ) = γ tanh ω(t − t0 ). (A.26)
R2
ds 2 = dρ2 + ρ2 dθ2 + ρ2 sin2 θ dφ2 , (A.27)
ρ2 + R 2
Notice that one considers this metric as deriving from a “Lorentzian” metric,
if one changes the sign of R
ds 2 = d x 2 + dy 2 + dz 2 − dw 2 , (A.28)
w 2 = x 2 + y 2 + z 2 + R 2 = ρ2 + R 2 . (A.29)
m R2
L= ρ̇2 + ρ2 θ̇2 + ρ2 sin2 θφ̇2 . (A.30)
2 ρ2 + R 2
m R2
L= ρ̇2 + ρ2 φ̇2 . (A.31)
2 ρ2 + R 2
d 2 A
(ρ φ̇) = 0 =⇒ φ̇ = (A.32)
dt ρ2
m R2 A2
E= ρ̇2 + . (A.33)
2 ρ2 + R 2 ρ2
2. The solution of the problem is obtained rather easily. One defines the parameters
ω and γ as:
2E m A2
ω2 = and γ 2
= . (A.34)
m R2 2E R 2
The result is in a way similar to Example 2, except that hyperbolic functions replace
some trigonometric functions,
232 Solutions of the Problems and Exercises
ρ(t) = R γ 2 cosh2 ω(t − t0 ) + sinh2 ω(t − t0 ) (A.35)
and
tan(φ(t) − φ0 ) = γ coth ω(t − t0 ). (A.36)
We notice that the distance to the origin increases exponentially when |t| → ∞.
The geodesics of the metric (A.27) are hyperbolas
x 2 − y 2 /γ 2 = R 2 . (A.37)
m R2
L= ρ̇2 + ρ2 θ̇2 + ρ2 sin2 θφ̇2 . (A.38)
2 R 2 − ρ2
The conservation laws of the problem which bring simplifications to the motion
are
1. There is rotational invariance. The angular momentum is conserved, and the motion
occurs on a plane.
2. We can choose the direction of the angular momentum as polar axis; i.e., θ = π/2
and θ̇ = 0.
3. The Lagrangian of the planar motion therefore reduces to
m R2
L= ρ̇2 + ρ2 φ̇2 . (A.39)
2 R 2 − ρ2
d 2 A
(ρ φ̇) = 0 =⇒ φ̇ = , (A.40)
dt ρ2
m R2 A2
E= ρ̇2 + . (A.41)
2 R 2 − ρ2 ρ2
2R 2 E
A2 ≤ , (A.42)
m
which is a direct consequence of the fact that the energy is greater than the rota-
tional energy m A2 /2ρ2 . This is a consequence of (A.41); i.e., E ≥ m A2 /2ρ2 ≥
m A2 /2R 2 .
The Eqs. (A.40) and (A.41) are first-order differential equations that determine
the motion in terms of the constants of the motion E and A.
The solution is simple. We define parameters ω and γ by
2E m A2
ω2 = and γ2 = . (A.43)
m R2 2E R 2
From (A.42), we have the inequality
γ 2 ≤ 1. (A.44)
Setting
ρ = R cos(ωψ); i.e., ρ̇ = −ω ψ̇ R sin(ωψ). (A.45)
ω2 γ 2
ω 2 = ω 2 ψ̇ 2 + ; (A.46)
cos2 (ωψ)
i.e.,
ω 2 ψ̇ 2 cos2 (ωψ) = ω 2 (cos2 (ωψ) − γ 2 ). (A.47)
and to the result, i.e. The expressions of ρ(t), tan(φ(t) − φ0 ) as well as the frequency
ω.
234 Solutions of the Problems and Exercises
ρ(t) = R cos2 ω(t − t0 ) + γ 2 sin2 ω(t − t0 ), (A.51)
which is periodic and of frequency ω. The calculation of the time evolution of the
azimuthal angle φ(t) is obtained by this expression and Eq. (A.40),
A
φ̇ = ; (A.52)
R 2 (cos2 ω(t − t0 ) + γ 2 sin2 ω(t − t0 ))
i.e.,
tan(φ(t) − φ0 ) = γ tan ω(t − t0 ), (A.53)
1
E= m(ẋ 2 + ẏ 2 ) + V,
2
where the “effective potential” V is energy dependent:
1 ρ2 ρ̇2 Eρ2
V = m 2 = 2 .
2 R −ρ 2 R
2E
Ω2 = .
m R2
4. Of course, if the square of the velocity is a constant in the curved four-dimensional
space, this is not the case if one visualizes the phenomenon in a Euclidean plane,
as above.
5. The simplicity of the result is intuitive. Quite obviously, as one can see in the defini-
tion, the symmetry of the problem is much larger than the sole rotation in R3 . There
is a rotation invariance in R4 . The solutions of maximal symmetry correspond to a
Solutions of the Problems and Exercises 235
© The Editor(s) (if applicable) and The Author(s), under exclusive license to 237
Springer Nature Switzerland AG 2023
J.-L. Basdevant, Variational Principles in Physics,
https://doi.org/10.1007/978-3-031-21692-3
238 References
21. S. Weinberg, Gravitation and Cosmology (John Wiley & Sons, New York, 1972)
22. P.A.M. Dirac, General Theory of Relativity (John Wiley & Sons, New York, 1975)
23. C.W. Misner, K.S. Thorne, J.A. Wheeler, Gravitation (W.H. Freemann and Company, New
York, 1973)
24. J. Rich, Fundamentals of Cosmology (Springer-Verlag, Heidelberg, 2001)
25. R.P. Feynman, A.R. Hibbs, Quantum Mechanics and Path Integrals (McGraw-Hill, New York,
1965)
26. L. Brown, Feynman’s Thesis, A new approach to Quantum Theory (World Scientific, Singa-
pore, 2005); Contains Feynman’s Thesis at Princeton The Principle of Least Action in Quantum
Mechanics (1942); R.P. Feynman Space-Time Approach to Non-relativistic Quantum Mechan-
ics, Rev. Mod. Phys. 20,367 (1948); and P.A.M. Dirac The Lagrangian in Quantum Mechanics
Physikalische Zeitschrift der Sowjetunion (1933)
27. S. Lawrence, Schulman, Techniques and Applications of Path Integration (John Wiley & Sons,
New York, 1981)
28. J. Schwinger, Selected Papers on Quantum Electrodynamics (Dover, New York, 1958)
29. S. Albeverio, R. Hoegh-Krohn, S. Mazzuchi, Mathematical Theory of Feynman Path Integrals,
Lecture Notes in Mathematics, vol. 523 (Springer-Verlag, 1976)
Index
© The Editor(s) (if applicable) and The Author(s), under exclusive license to 239
Springer Nature Switzerland AG 2023
J.-L. Basdevant, Variational Principles in Physics,
https://doi.org/10.1007/978-3-031-21692-3
240 Index
F J
Fermat, P. de, 3, 11, 39 Jacobi
Fermat principle, 3, 11, 98 theorem, 100
Feynman principle, 185 Jacobi identity, 67
Feynman, R.P., 185 Jacobi, C.G.J., 68
Field equations, 107 Jian-Wei Pan, 212
Field theory, 105
First integrals, 18
Flow, 77, 89, 97 K
Flow of a vector field, 75 Kepler, J., 37
Fourier equation, 115 Klein, Felix, 98, 123
Kosmann-Schwartzbach, Y., 47
G
Galileo, G., 37 L
Gauge invariance, 53 Lagrange–Euler equations, 18, 40
Gauss, C.F., 119, 123 Lagrange function, 17
General relativity, 117 Lagrange, J.-L., 5, 11, 17, 38, 39
generalized momentum, 44 Lagrange multipliers, 27
Geodesics, 127 Lagrangian, 39
Geometrical and wave optics, 89 Langlois, D., 118
Gravitational Laplace, P.S. de, 63
deflection, 141 Least action principle, 38, 39
lens, 148, 150 Least time principle, 12
microlensing, 151, 152 Legendre transformation, 65
Gravitational lensing, 141, 145, 146 Leibniz, G. W., 3
by a cluster of galaxies, 145, 148 Lie, S., 64
time delay, 146 Liouville, J., 64
gravitation and the curvature of space-time, Liouville theorem, 74
133 Lobatchevsky, N.I., 119, 123
Guichardet, A., 50 Lorentz force, 38, 51, 52, 57
GW150914, 164 Lorentz invariance, 39, 54, 55
Lorentz invariant, 55, 57
Lorentz transformation, 54
H Lorenz attractor, 80
Hamilton
characteristic function, 89
Hamiltonian, 65, 76 M
Hamiltonian operator, 205 Machos, 151
Hamilton-Jacobi Magellanic clouds, 151
equation, 92, 95 Maupertuis, P.L. de, 3, 12, 16, 22, 37, 39
Hamilton, W.R., 5, 39, 65 Maupertuis principle, 3, 12, 13, 22, 96, 97,
Heat, 32 131
Index 241
R
Reduced action, 96 Z
Refraction, 14 Zeilinger A., 212
Relativistic particle, 54 Zermelo, E.
Renormalization group, 185 paradox, 80