0% found this document useful (0 votes)
8 views

Relativity Lecture Notes

The document contains course notes for MTH 6132 on Relativity, taught by Prof. Pau Figueras at Queen Mary, University of London. It covers topics including Special and General Relativity, differential geometry, curvature, and gravitational waves, with a focus on understanding the principles and equations governing these theories. The notes are adapted from previous lectures and include discussions on black holes and gravitational waves.

Uploaded by

yseewooruttun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Relativity Lecture Notes

The document contains course notes for MTH 6132 on Relativity, taught by Prof. Pau Figueras at Queen Mary, University of London. It covers topics including Special and General Relativity, differential geometry, curvature, and gravitational waves, with a focus on understanding the principles and equations governing these theories. The notes are adapted from previous lectures and include discussions on black holes and gravitational waves.

Uploaded by

yseewooruttun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 128

Relativity

MTH 6132 Course notes


Spring 2023

Prof. Pau Figueras


(adapted from notes from Dr. Shabnam Beheshti and Dr. Juan A.
Valiente Kroon)

School of Mathematical Sciences


Queen Mary, University of London
Mile End Road, London E1 4NS

1
2
Contents

1 Introduction 3
1.1 What is Relativity? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Special Relativity? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 General Relativity? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Pre-relativistic Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Galilean Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Laws of Newton . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Galilean transformations . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Galilean transformation formulae for the velocity and acceleration 6
1.2.5 Invariance of Newton’s laws under Galilean transformations . . . . 6
1.2.6 Electromagnetism . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Special Relativity 9
2.1 Einstein’s postulates of Special Relativity . . . . . . . . . . . . . . . . . . 9
2.2 Spacetime diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Lorentz transformations (LT) . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Clocks and rods in relativistic motion . . . . . . . . . . . . . . . . . . . . 14
2.4.1 Time dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.2 Length contraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Paradoxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 Experimental evidence for Special Relativity . . . . . . . . . . . . . . . . . 16
2.7 Hyperbolic form of the Lorentz transformations . . . . . . . . . . . . . . . 16
2.8 Further discussion on the Lorentz Transformations . . . . . . . . . . . . . 22
2.8.1 Transformation formula for the velocity . . . . . . . . . . . . . . . 22
2.8.2 Transformation formula for the acceleration . . . . . . . . . . . . . 22
2.9 The Minkowski spacetime . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.9.1 Minkowski diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.9.2 A brief discussion of causality . . . . . . . . . . . . . . . . . . . . . 27
2.10 4-vectors and tensors in Special Relativity . . . . . . . . . . . . . . . . . . 28
2.10.1 Index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.10.2 4-vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.10.3 Dual vectors (one-forms) . . . . . . . . . . . . . . . . . . . . . . . . 31
2.10.4 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.11 Proper time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.12 4-velocity and 4-momentum . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.13 Photons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.14 Doppler shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.15 Relativistic dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3
2.15.1 Examples of relativistic collisions . . . . . . . . . . . . . . . . . . . 41

3 Prelude to General Relativity 45


3.1 General remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 The Equivalence Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Differential Geometry and tensor calculus 47


4.1 Manifolds and coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 The metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Covariant derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6 Parallel transport and geodesics . . . . . . . . . . . . . . . . . . . . . . . . 60
4.6.1 Metric geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6.2 An example of the use of the Euler-Lagrange equations . . . . . . 64

5 Curvature 67
5.1 The Riemann curvature tensor . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Geodesic deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Symmetries of the curvature tensor . . . . . . . . . . . . . . . . . . . . . . 70
5.4 Bianchi identities, the Ricci and Einstein tensors . . . . . . . . . . . . . . 71

6 General Relativity 75
6.1 Towards the Einstein equations . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2 Principles employed in General Relativity . . . . . . . . . . . . . . . . . . 78
6.2.1 The Einstein equations in vacuum . . . . . . . . . . . . . . . . . . 79
6.2.2 The (full) Einstein Equations . . . . . . . . . . . . . . . . . . . . . 79
6.2.3 Newtonian limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3 The Schwarzschild solution . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.4 Geodesics of the Schwarzschild geometry . . . . . . . . . . . . . . . . . . . 84
6.5 Experimental tests of General Relativity . . . . . . . . . . . . . . . . . . . 89
6.5.1 Perihelion precession . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.5.2 Bending of light . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.5.3 Gravitational redshift . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.6 Black holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.7 More general black holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

7 Linearised theory and gravitational waves 109


7.1 Linearised theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2 Gravitational waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2.1 Tidal accelerations and polarisation of gravitational waves . . . . . 112
7.2.2 The far field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.2.3 Energy in gravitational waves . . . . . . . . . . . . . . . . . . . . . 116
7.2.4 The quadrupole formula . . . . . . . . . . . . . . . . . . . . . . . . 119

4
Preface

These are notes for the Relativity course (MTH6132) I am currently lecturing during
the Spring 2022 term at the School of Mathematical Sciences of Queen Mary, University
of London. The material is primarily based on both typeset and handwritten notes I
have inherited from Dr. Juan Valiente-Kroon and Dr Shabnam Beheshti; the primary
addition I am making to these notes is the discussion on black holes and gravitational
waves (not implemented yet).
The present course on Relativity is aimed at mathematics and physics undergraduates
interested in learning the mathematical foundations of Special and General Relativity. In
particular, very little physical background is assumed, so a certain amount of time is spent
presenting underlying assumptions and experimental motivation for such a theory. The
lectures also assumes minimal prerequisites on the mathematical side. Necessary ideas
from differential geometry and tensors are self-contained and references are provided for
further study. The course is quite an ambitious one, divided approximately into thirds.
It begins with Special Relativity, then moves to Differential Geometry and finally it
provides an introduction to General Relativity.
Due to time constraints, there are some clear omissions in the choice of topics. In
particular, in the chapter on Special Relativity it would be desirable to have a discussion
of the Maxwell equations. In the chapter on Differential Geometry, it would be elegant to
discuss the Hamilton-Jacobi Equations as motivation for generalizing curvature. In the
chapter on General Relativity, the discussion is restricted to the vacuum field equations
with little mention of the field equations with matter. Also, it would be desirable to
include positive mass theorems, cosmological models and of course, the most recent and
exciting progress on gravitational waves! Thorough mathematical investigation of these
topics would require at the very least a couple of weeks more. Perhaps a re-orgainsation of
the topics and thoughtful collaboration could result in a two-semester course in Geometric
Analysis and General Relativity. I do not discard the possibility of carrying out such a
revision in future iterations of the course. In the meantime, please know that the current
lecture notes have been adapted to my particular understanding and appreciation of the
subject.
Corrections, omissions and suggestions for improvements with which the readers of these
notes may favour me will be greatly appreciated.

–P. Figueras

1
2
Chapter 1

Introduction

1.1 What is Relativity?


The term Relativity encompasses two physical theories proposed by Einstein1 . Namely,
Special Relativity and General Relativity. However, as we will see, the word relativity is
also used in reference to Galilean Relativity 2 . The term Theory of Relativity was first
coined by Max Planck3 in 1906 to emphasize how a theory devised by Einstein in 1905
—what we now call Special Relativity— uses the Principle of Relativity.

1.1.1 Special Relativity?


Special Relativity is the physical theory of the measurement in inertial frames of refer-
ence. It was proposed in 1905 by Albert Einstein in the article On the Electrodynamic of
moving bodies (Zur Elektrodynamik bewegter Körper, Annalen der Physik 17, 891 (1905)).
It generalises Galileo’s Principle of Relativity —all motion is relative and that there is
no absolute and well-defined state of rest. Special Relativity incorporates the principle
that the speed of light is the same for all inertial observers, regardless the state of motion
of the source. The theory is termed special because it only applies to the special case
of inertial reference frames —i.e. frames of reference in uniform relative motion with
respect to each other. Special Relativity predicts the equivalence of matter and energy
as expressed by the formula
E = mc2 .
Special Relativity is a fundamental tool to describe the interaction between elementary
particles, and was widely accepted by the Physics community by the 1920’s.

1.1.2 General Relativity?


General Relativity is the geometric theory of gravitation published by Albert Einstein
in 1915 in the article The field equations of Gravitation (Die Feldgleichungen der Grav-
itation, Sitzungsberichte der Preussischen Akademie der Wisenschaften zu Berlin 884).
It generalises Special Relativity and Newton’s law of universal gravitation, providing a
unified description of gravity as the manifestation of the curvature of spacetime. The
theory is general because it applies the Principle of Relativity to any frame of reference so
as to handle general coordinate transformations. From General Relativity it follows that
1
Albert Einstein (1879-1955). Physicist of German origin. Died in the USA.
2
Galileo Galilei (584-1642) . Italian physicist, mathematician and astronomer.
3
Max Planck (1858-1947). German physicist.

3
Special Relativity still applies locally. The domain of applicability of General Relativity
is in Astrophysics and Cosmology. More recently, the Global Positioning System (GPS)
requires of General Relativity to function accurately! Contrary to Special Relativity,
General Relativity was not widely accepted until the 1960’s.

1.2 Pre-relativistic Physics


1.2.1 Galilean Relativity
In order to study General Relativity one starts discussing Special Relativity. To this end,
it is important to briefly look at pre-relativistic Physics to see how Special Relativity
arose.
The starting point of Special Relativity is the study of motion. For this one needs
the following ingredients:

ˆ Frames of reference. These consist of an origin in space, 3 orthogonal axes and


a clock.

ˆ Events. This notion denotes a single point in space together with a single point in
time. Thus, events are characterised by 4 real numbers: an ordered triple (x, y, z)
giving the location in space relative to a fixed coordinate system and a real number
giving the Newtonian time. One denotes the event by E = (t, x, y, z).

There are an infinite number of frames of reference. Motion relative to each frame
looks, in principle, different. Hence, it is natural to ask: is there a subset of these frames
which are in some sense simple, preferred or natural? The answer to this question is yes.
These are the so-called inertial frames. In an inertial frame an isolated, non-rotating,
unaccelerated body moves on a straight line and uniformly.
Inertial frames are not unique. There are actually an infinite number of these. This
raises the question: can one tell in which inertial frame are we in? It turns out that
within the framework of Newtonian Mechanics this is not possible. More precisely, one
has the following:
Galilean Principle of Relativity. Laws of mechanics cannot distinguish between
inertial frames. This implies that there is no absolute rest. In other words, the laws of
Mechanics retain the same form in different inertial frames.
In this sense, Relativity predates Einstein.

1.2.2 Laws of Newton


The three Laws of Newtonian Mechanics4 are:

(1) Any material body continues in its state of rest or uniform motion (in a straight
line) unless it is made to change the state by forces acting on it. This principle is
equivalent to the statement of existence of inertial frames.

(2) The rate of change of momentum is equal to the force.

(3) Action and reaction are equal and opposite.


4
Isaac Newton (1643-1727). English physicist and mathematician.

4
These laws or principles, together with the following fundamental assumptions (some
of which are implicitly assumed in Newton’s laws) amount to the Newtonian framework :

(A1) Space and time are continuous —i.e. not discrete. This is necessary to make use
of the Calculus.

(A2) There is a universal (absolute) time. Different observers in different frames mea-
sure the same time. In fact, Newton also regarded space to be absolute as well.
However, the absoluteness of space is not necessary for the development of the
Newtonian framework, as space intervals turn out to be invariant under Galilean
transformations. Historically, Newton demanded this for subjective reasons.

(A3) Mass remains invariant as viewed from different inertial frames.

(A4) The Geometry of space is Euclidean. For example, the sum of angles in any triangle
equals 180 degrees.

(A5) There is no limit to the accuracy with which quantities such as time and space can
be measured.

As it will be seen in the sequel, Assumptions 2 and 3 are relaxed in Special Relativity
while Assumption 4 is relaxed in General Relativity. Assumption 5 is relaxed in Quantum
Mechanics —not to be discussed in the course. Presumably Assumption 1 will be relaxed
in Quantum Gravity!

1.2.3 Galilean transformations


Galilean transformations tell us how to transform from one inertial frame to another.
Consider two inertial frames: F (x, y, z, t) and F ′ (x′ , y ′ , z ′ , t′ ) moving with velocity v
relative to one another in standard configuration —that is, F ′ moves along the x axis of
the frame F with uniform speed v; all axes remain parallel. See the figure:

 
 


 
 
 

Now, suppose that at a given moment of time t, an event E is specified by coordinates


(t, x, y, z) and (t′ , x′ , y ′ , z ′ ) relative to the frames F and F ′ , respectively. Let the origins
O and O′ coincide at t = 0. From the figure one sees that

x′ = x − vt, y ′ = y, z ′ = z, t′ = t. (1.1)

In the more general case of inertial frames of reference where the velocity has also
components in the y and z axes one has:

r′ = r − vt,

5
where v = (vx , vy , vz ) and r = (x, y, z), r′ = (x′ , y ′ , z ′ ) are the position vectors with
respect to the frames F and F ′ , respectively. Notice that in the case of frames of reference
in standard configuration one has vy = vz = 0).
Remark. It is customary to call the observer associated to the inertial frame F , Joe,
and that of F ′ , Moe.

1.2.4 Galilean transformation formulae for the velocity and accelera-


tion
To see this, let the position of a particle P be specified by r = r(t) relative to a frame
F . The motion relative to F ′ is given by equation (1.1). Differentiating both sides twice
with respect to t (notice that t = t′ ) gives:

V ′ = V − v, (1.2a)

a = a, (1.2b)

where
dr d2 r
V = , a= ,
dt dt2
are, respectively, the velocity and acceleration of the particle with respect to the frame
F while
dr′ d2 r′
V′ = , a′ = 2 ,
dt dt
are the velocity and acceleration of the particle with respect to F ′ .
Remark. Notice that as a consequence of the transformation formula for the acceler-
ation (1.2b), the acceleration of the particle as measured by F and F ′ coincide. Thus,
although the position and the velocity are different in each system of reference, both sets
of observers agree on the acceleration. This result is some times phrased as: acceleration
is Universal.
Example. The following example will be of relevance in the sequel. Consider a cannon-
ball moving along the x-axis. If the cannonball has velocity V with respect to the frame
F , then the velocity as measured by the frame F ′ (moving with velocity v with respect
to F ) is given by V ′ = V − v. In what follows, suppose for simplicity that v > 0. Then
if V > 0 (i.e. the cannonball moves away from the origin of F ) then V ′ = V − v < V
—that is, F ′ sees the cannonball moving more slowly. On the other hand if V < 0 (the
cannonball goes towards the origin of F ), then |V − v| > v so that F ′ sees the cannonball
moving faster.

1.2.5 Invariance of Newton’s laws under Galilean transformations


Important for the sequel is the notion of invariance. Invariance refers to properties of
a system that remain unchanged under a particular type of transformations. In the
previous section we have already seen that two inertial systems of reference measure the
same acceleration of a moving particle. Thus, acceleration is an invariant under Galilean
transformations.
In what follows, we will see that the laws of Mechanics keep the same form as we
go from one inertial frame to another —i.e. under Galilean transformations. The First

6
and Third Laws are invariant as the former involves inertial frames and the latter in-
volves accelerations which are invariant. It remains to show that the Second Law (the
fundamental equation of Newtonian Mechanics)
dV
m = ma = f (1.3)
dt
is invariant as we go from one inertial frame to another.
To show the invariance of (1.3) recall that a′ = a and m remains invariant (by
assumption) so that one only needs to show that f remains invariant as we go from F to
F ′ . To do this, recall that generally f takes the form f = f (r, v, t) where usually r and
v are the relative distance and the relative velocity between two bodies. One can verify
that the relative distances and velocities remain invariant. That is,

V ′2 − V ′1 = V 2 − V 1 , r′2 − r′1 = r2 − r1 .

This implies that f , and hence the Second Law remains invariant under changes in the
inertial frames.
This discussion amounts to a form of self-consistency, in the sense that Physics, when
confined to Newtonian Mechanics, satisfies the Galilean Principle of Relativity.

1.2.6 Electromagnetism
Special Relativity arises from the tension between Newtonian Mechanics with the other
great physical theory of the 19th century —Electromagnetism. The fundamental laws of
Electromagnetism are the so-called Maxwell equations 5 :

∇ · D = ρ,
∂B
∇×E =− ,
∂t
∇ · B = 0,
∂D
∇×H =j− ,
∂t
where B is the magnetic induction, E the electric field, H the magnetic field, D the
electric displacement, j the electric current and ρ the electric charge. In vacuum, the
relations between them are D = ε0 E and B = µ0 H, where ε0 is the vacuum permitivity
and µ0 is the vacuum permeability.
It can be shown that these equations predict the existence of electromagnetic waves
for E and H in the form

1 ∂2E 1 ∂2H
∇2 E = , ∇2 H = ,
c2 ∂t2 c2 ∂t2

where c = 1/ ε0 µ0 is the speed of propagation of the waves. These electromagnetic
waves were soon identified with the propagation of light.
We recall that light travels with a speed of c ≈ 3 × 108 m/s 6 . This was first measured
by Rømer 7 in 1675 by studying the delay in the appearance of moons of Jupiter. It is of
5
James C. Maxwell (1831-1879). Scottish mathematician.
6
The letter c used to denote the speed of light comes from the Latin word celeritas, velocity, speed.
7
Ole C. Rømer (1644-1710). Danish astronomer.

7
interest to noticed that fastest object created by Mankind, a satellite probing the Sun,
had a speed of 70km/s which corresponds to about 0.0002% of the speed of light!
Within the Newtonian framework, the Maxwell equations give rise to two problems:

(1) With respect to which system of reference is the speed of light c is measured? First,
it was assumed that the absolute space of Newton —the so-called ether — was the
medium in (and relative to) which light moved. However, attempts at detecting
the effects of Earth’s motion on the velocity of light —the so-called terrestrial
ether drift— all failed. The most important of these was the Michelson-Morley
experiment 8 . This gave a null result.

(2) It is easy to show that Maxwell’s equations and the wave equation do not remain
invariant under Galilean transformations.

These problems gave to a crisis in the 19th century Physics. Three scenarios were
put forward to resolve the tension. These were:

(i) Maxwell’s equation were incorrect. The correct laws of Electromagnetism would
remain invariant under Galilean transformations.

(ii) Electromagnetism had a preferred frame of reference —that of ether.

(iii) There is a Relativity Principle for the whole of Physics —Mechanics and Electro-
magnetism. In that case the laws of Mechanics need modification.

Now, Electromagnetism was very successful and have a very strong predictive power.
There was no experimental support for (ii). Hence the point of view (iii) was adopted by
Einstein. His resolution of the tension between Mechanics and Electromagnetism came
to be known as Special Relativity.

Appendix: General Galilean transformations


In general, if the coordinate axes are not in standard configuration and the origins O and
O′ of the coordinate axes do not coincide, then the general form of the transformation
takes the form:
r′ = Rr − vt + d,
where R is the rotation matrix aligning the axes of the frames and d is the distance
between the origins at t = 0. Note that the general transformation is linear, so that F ′
is inertial if F is. The most general transformation would also include

t′ = t + τ

where τ is a real constant.


These transformations form a 10-parameter group (1 for τ , 3 for v, 3 for d, and 3 for
R). The group property implies that the composition of two Galilean transformations is a
Galilean transformation, and that given a Galilean transformation there is always an in-
verse transformation. The Galilean transformations restricted to standard configurations
form a 1-parameter subgroup of this group, with v as variable.
8
Albert Michelson (1852-1931). Edward Morley (1838-1923). American physicists.

8
Chapter 2

Special Relativity

The contradiction brought about by the development of Electromagnetism gave rise to a


crisis in the 19th century that Special Relativity resolved.

2.1 Einstein’s postulates of Special Relativity


(i) There is no ether (there is no absolute system of reference).
(ii) The laws of Nature have the same form in all inertial frames (Einstein’s principle
of Relativity)
(iii) The velocity of light in empty space is a universal constant, i.e. same for all
observers and light sources, independent of their motion —Michelson & Morley’s
result is promoted to an axiom.

Note that postulate (iii) is clearly incompatible with Galilean transformations which
imply c′ = c − v. Because of this the Galilean transformations need modification. This
leads to the Lorentz transformations.

2.2 Spacetime diagrams


Spacetime is defined as the set of 4 reals (t, x, y, z). An event in spacetime is represented
by a point E(t, x, y, z). For simplicity (in order to be to visualise) confine ourselves to 2
dimensions: one space and one time coordinates so that events are depicted by E(t, x).
Such diagrams are a very useful way to approach problems involving multiple frames of
reference.



The wordline of a particle is defined as the set of all points that the trajectory of a
particle follows in spacetime.

9


To develop our intuition, we consider a few examples. The worldline of a particle


which is stationary at x = x0 is a vertical line:

 

The worldline of a particle moving with uniform velocity v and passing through O at
t = 0 is straight line:
1
x = vt so that t = x.
v
Therefore the slope of of the line is given by 1/v.

 

The worldline of a light ray is a straight line with slope equal to 1/c. In practice we
shall usually choose c = 1 so that the slope is equal to 1.

 

Note. All uniformly moving particles have worldlines which are straight lines with slopes

10
bigger than 1/c or bigger than 1 if c = 1. Therefore they all lie in the shaded region of
the figure.

The worldlines of accelerating bodies are curved. For example, for a uniformly accel-
erated body from rest one has that initially the worldline is tangent to the t. The upper
bound for v is c. The slope of the asymptotic motion is 1(= 1/c). This situation will be
analysed in detail later on.

The worldlines of instantaneous travel is a horizontal line —however, this is forbidden


within the framework of Special Relativity.

Some further examples


The following example is based on the notion that every particle in uniform motion (with
velocity less than the speed of light) is an inertial frame of reference. Let F denote the
frame of reference associated to Joe. Then if Moe moves with velocity v with respect to
Joe, one has the following diagram:

   

11
An important observation is that the t′ -axis coincides with Moe’s worldline.

Now, from the point of view of Moe, Joe moves away with velocity −v. One has the
following diagram:



   



Notice that the t-axis coincides with Joe’s worldline.

As a final example, we consider the following situation: a light ray is shot at from
x = 0 at t = −t0 in the positive direction of the x-axis. The light ray reflects at a mirror
located at x = x0 and returns to x = 0 at t = 0. The corresponding diagram is:


 
 

 

 

Notice that from Joe’s (system F ), the time the light ray requires to go back and
forth are equal.

Now, we consider the diagram of the situation as seen by the system of reference F ′
(Moe). The required diagram is given by

   



 



 

Notice that the light rays in this diagram are also lines with slope of 45 degrees as
required by the postulates of relativity. Notice that from Moe’s point of view the times
requires by the light ray to go back and forth are not equal!

12
2.3 Lorentz transformations (LT)
In this section we address the following question: what type of transformation does one
need to ensure that the speed of light as measured by two inertial frames of reference F
and F ′ is equal?
In order to explore the consequences of this requirement, let us consider 2 inertial
systems of reference in standard configuration moving with relative velocity v. Suppose
a light ray is fired at x = 0 at t = 0. Futhermore, suppose that this light ray reaches
(t, x). Let (t′ , x′ ) be the coordinates of the event (t, x) as seen by F ′ . As the speed of
light is c for both systems of reference, one has that

c = x/t, c = x′ /t′ .

For reasons that will become clear later in the chapter, it is covenient to rewrite these
expressions as
0 = −c2 t2 + x2 , 0 = −c2 t′2 + x′2 .
Thus one has that
−c2 t2 + x2 = −c2 t′2 + x′2 . (2.1)
One can readily verify (by direct substitution) that the Galilean transformation t = t′ ,
x = x′ − vt, cannot satisfy this condition. Thus, one needs to consider a different kind
of transformation.
The so-called Lorentz transformations are given by:
 vx  1
x′ = γ(x − vt), t′ = γ t − 2 , y ′ = y, z ′ = z, with γ = q . (2.2)
c 1− v2
c2

Remark. This is a particular case of a more general transformation with 10 parameters.


These parameters are the 3 components of the velocity, 3 components of a shift of the
origin, 3 parameters of a rotation and a further parameter fixing the origin of the time.
The set of these transformations forms a group. The transformation given by (2.2) is the
1-parameter subgroup of this group called the special Lorentz group.
Remark. One can verify by direct substitution that the Lorentz transformation (2.2)
satisfies
−c2 t2 + x2 = −c2 t′2 + x′2 .

Remark. It is interesting what happens with the Lorentz transformations for low ve-
locities. Using a Taylor expansion about 0, we recall

(1 − x)−1/2 = 1 + 12 x + O(x2 ).

It follows that
2
1v
γ ≈1+ 2 c2 .

Now, if v ≪ c, then v 2 /c2 ≈ 0, so that γ ≈ 1. Hence, form the experssions for the Lorentz
transformation one has that
t′ ≈ t, x′ ≈ x − vt.
That is, one recovers the Galilean transformations!

13
The inverse Lorentz transformation
We have discussed the Lorentz transformation which given the coordinates (t, x) of an
event as seen by the system of reference F , allows to compute the coordinates (t′ , x′ )
as seen by F ′ . Now, we are interested in the inverse transformation which given (t′ , x′ )
allows to calculate (t, x). By symmetry, as F and F ′ are both inertial systems, the inverse
transformation should have the same functional form. The key observation is then that
if F sees F ′ moving with velocity v, then F ′ sees F moving with velocity −v. Hence, the
required transformtion is given by

vx′
 

t=γ t + 2 ,
c
x = γ(x + vt′ ).

Remark. One could also have obtained the required expressions by inverting directly the
original Lorentz transformation formulae. This is, however, a much longer computation!
A similar short argument an be used for the transformation formulae for the velocity and
the acceleration.

2.4 Clocks and rods in relativistic motion


We now consider the effects of uniform motion on clocks and rods.

2.4.1 Time dilation


Consider F and F ′ in standard configuration. Let a standard clock be at rest in F ′ (at
x = x0 ) and consider two events in this clock at times t′1 and t′2 . Let also

∆t′ = t′2 − t′1 .

In order to find the interval ∆t as measured by F , recall that

v∆x′
 

∆t = γ ∆t + 2 .
c

However, ∆x′ = 0 as x′2 = x′1 = x0 . Hence one obtains

∆t = γ∆t′ ,

Since
1
γ=p > 1,
1 − v 2 /c2
one finds that the interval as measured by F is longer.
There is a symmetry! Both observers say the same thing about each other!

2.4.2 Length contraction


This is also called the (Lorentz-Fitzgerald contraction). Consider F and F ′ in standard
configuration. Let a rod of length ∆x′ be placed at rest along the x′ -axis of F ′ . To find
the length as measured in F , we must measure the distance between the two ends of the

14
rod simultaneously in F . Consider two events occurring simultaneously at the end points
of the rod in F . Therefore one has ∆t = 0. Now, using

∆x′ = γ(∆x − v∆t)

one finds that


1
∆x′ = γ∆x, or ∆x = ∆x′ .
γ

Accordingly, the length of the rod in the direction of motion as measured by F is reduced
by a factor of (1 − v 2 /c2 )1/2 .

Geometrically:

F measures the distance between the two ends of the rod at t = 0, i.e. F measures OB,
while F ′ measures OA.




 

 

2.5 Paradoxes
These arise from an incautious view of the situation, and the fact that simultaneity means
different things to different observers.

The twin paradox

Consider a pair of twins A and B. Let A be stationary at origin of F whereas B moves


with sped v for a time T and then with speed −v for equal time and returns to A’s
position. The total elapsed time as measured by A is 2T . Because of time dilation, the
time as measured by B is
2T
< 2T.
γ

Therefore, when twins reach the point (0, 2T ) in A’s frame A is older than B.

The “paradox”: cannot B say with equal right that it was she/he who remained where
she/he was while A went on a round trip and that A should, consequently, be the younger
when they meet?

15




  

 

Answer: No, since there is no symmetry! The twin A remained in the same inertial
frame, but B has experienced acceleration and deceleration and therefore knows that
she/he has not been in an inertial frame! This solves the paradox.

2.6 Experimental evidence for Special Relativity


Clearly Special Relativity is consistent with Michelson & Morley’s experiment and its
refined versions since.
A well know test of time dilation comes from the behaviour of muons (elementary
particles formed by the collision of Cosmic rays with particles in the upper atmosphere).
The mean life of muons is approximately 2.2 × 10−6 s so that if the moved at the speed
of light they could only cover a distance of approximately 0.66km. However, they reach
the ground level from heights of about 10km. To explain this, they must have a dilation
factor of approximately 15. This means they would have a speed of about 0.997c!
From the muon’s point of view, they have a normal life time, however, they depth of
the atmosphere is contracted by a factor of 15,
Time dilation can also be observed using accurate atomic clocks on board of airplanes
which are then compared with fixed clocks.

2.7 Hyperbolic form of the Lorentz transformations


This a convenient representation for showing the group properties of the Lorentz trans-
formation.
The key idea is to replace the velocity parameter v by a hyperbolic parameter α that
satisfies the following:
v v
cosh α = γ, sinh α = γ, tanh α = .
c c
We also require α and v to have the same sign as cosh α = cosh(−α).
The Lorentz transformation (2.2) becomes (hyperbolic form of the Lorentz transfor-
mation):

x′ = x cosh α − ct sinh α, (2.3a)



ct = −x sinh α + ct cosh α, (2.3b)

y = y, (2.3c)

z =z (2.3d)
(2.3e)

16
Adding and subtracting x′ and ct′ as given by (2.3a) and (2.3b) one obtains

ct′ + x′ = e−α (ct + x), (2.4a)


′ ′ α
ct − x = e (ct − x), (2.4b)

where it has been used that

eα + e−α eα − e−α
cosh α = , sinh α = .
2 2

To show that the Lorentz transformations form a group one needs to show:

(i) there exists an identity element;

(ii) for every Lorentz transformation there exists an inverse;

(iii) the composition of Lorentz transformations is a Lorentz transformation and that


the composition is associative.

The most convenient way to verify the latter is to use the form given by (2.4a) and (2.4b)
and then check one by one:

(i) One sees that there exists an identity Lorentz transformation corresponding to v
(α = 0).

(ii) There exists an inverse Lorentz transformation with v = −v (α → −α).

(iii) Let F ′′ move with velocity v2 (α2 ) relative to F ′ and F ′ with velocity v1 (α1 )
relative to F —all in standard configuration.

From (2.4a) and (2.4b) one has that

ct′′ + x′′ = e−α2 (ct′ + x′ ),


ct′′ − x′′ = eα2 (ct′ − x′ ),
y ′′ = y, z ′′ = z ′

and

ct′ + x′ = e−α1 (ct + x),


ct′ − x′ = eα1 (ct − x),
y ′ = y, z ′ = z.

It follows then that

ct′′ + x′′ = e−(α1 +α2 ) (ct + x),


ct′′ − x′′ = e(α1 +α2 ) (ct − x),
y ′′ = y, z ′′ = z ′ ,

which shows that the composition of Lorentz transformations is a Lorentz trans-


formation and since the hyperbolic parameters add, one also has the associativity.

17
The previous discussion allows also to discuss the Special Relativity rule for the com-
position of velocities. Since the resultant of two Lorentz transformations with parameters
α1 and α2 is a Lorentz transformation with parameters α1 +α2 , the corresponding relation
between the velocity parameter of the transformation can be easily derived from
v
tanh α =
c
by recalling that
tanh α1 + tanh α2
tanh(α1 + α2 ) = .
1 + tanh α1 tanh α2
Substituting for
v1 v2 v
tanh α1 = , tanh α2 = , tanh (α1 + α2 ) =
c c c
one obtains
v1 + v2
v= (2.5)
1 + v1 v2 /c2
where v is the velocity of F ′′ relative to F —it represents the relativistic sum of collinear
velocities v1 and v2 along the x-axis. A generalisation of this rule will be discussed later.
Remark 1. When
v1 v2
≪ 1, ≪ 1,
c c
then equation (2.5) takes the Galilean form

v = v1 + v2 .

Remark 2. Since | tanh α| < 1, it follows that v always satisfies |v| < c.

Appendix: Derivation of the Lorentz transformations


Consider two frames F and F ′ moving in standard configuration —i.e. O′ moves with
speed v along the x-axis relative to O. The worldline of O′ in the frame is given as in
the figure:

 
 

Let observers O and O′ carry clocks measuring t and t′ respectively such that when
O′ is at (t, vt) according to O, the clock at O′ registers t′ = βt, where β may be a function
of v —in this sense β carries all the effect that the motion has on t. Note also that β = 1
for Galilean transformations.
Now consider a light ray emitted by O at t = t1 , travelling via O′ , being reflected at
p(t, x) and received by O at t = t4 —i.e. a round trip.

18

 

 







We want to relate the coordinates of the event at p relative to the frames F and F ′ .

In line with Einstein’s postulates assume that the speed of light is c for both O and
O′ .From the perspective of O the distance and time may be fixed using the so-called
radar convention:

x = 21 c(t4 − t1 ), t = 12 (t4 + t1 )

so that

x = c(t − t1 ) = c(t4 − t). (2.6)




 



Similarly,

x − vt2 = c(t − t2 ), (2.7)


x − vt3 = c(t3 − t). (2.8)

19



 









 




Now, equations (2.7) and (2.8) imply, respectively


ct − x ct + x
t2 = , t3 = .
c−v c+v
The corresponding times as measured by O′ are:
 
′ ct − x
t2 = βt2 = β , (2.9a)
c−v
 
ct + x
t′3 = βt3 = β , (2.9b)
c+v

where it has been used that t′ = βt. Therefore, the time and location of p(t, x) as
measured by O′ is (using again the radar convention) is given by:

βc2 (x − vt)
x′ = 12 c(t′3 − t′2 ) = , (2.10a)
c2 − v 2
β(c2 t − vx)
t′ = 12 (t′3 + t′2 ) = , (2.10b)
c2 − v 2
where equations (2.9a) and (2.9b) have been used to obtain the second equalities in the
last pair of equations.
Note. The observer O′ is also assuming that the velocity of light is c. This assumption
is inconsistent with the Galilean transformations.
Eliminating x between (2.10a) and (2.10b) one obtains

1 ′ vx′
 
t= t + 2 . (2.11)
β c
Now, the Relativity principle requires that we obtain the same result if we interchange
x, x′ and t, t′ and let v → −v. Applying this idea to equation (2.10b) and equating to
(2.11):
β(c2 t′ + vx′ ) 1 ′ vx′
 
t= = t + 2 , (2.12)
c2 − v 2 β c
so that 1/2
v2

β= 1− 2 .
c

20
Letting γ ≡ 1/β, the transformation for x′ can be found from (2.10a):

x′ = γ(x − vt). (2.13)

Similarly for t from equation (2.10b):


 vx 
t′ = γ t − 2 .
c
Finally, the coordinates y and z remain the same as there is no motion in these directions.

21
2.8 Further discussion on the Lorentz Transformations
2.8.1 Transformation formula for the velocity
Let F and F ′ be in standard configuration and moving with velocity v along the x-axis.
For simplicity, we will restrict our attention to movements along the x-axis. Let V be
the velocity of a particle relative to F . To find V ′ , the velocity relative to F ′ recall that:
dx
V ≡ , (2.14a)
dt
dx′
V′ ≡ ′, (2.14b)
dt
where the increment represents the distances and times between two events for the parti-
cle relative to the two frames. Using the differential form of the Lorentz transformations

dx′ = γ(dx − vdt), dt′ = γ(dt − v/c2 dx),

in (2.14b) one obtains

dx′ γ(dx − vdt) V −v


V′ = ′
= 2
= .
dt γ(dt − v/c dx) 1 − V v/c2

Remark. If v ≪ c one finds that V ′ ≈ V − v. That is, one recovers the Galilean
transformation. Note also that if we do not restrict our attention to motion in the x-
direction, then one computes the other components of the velocity V ′ in a similar manner

dz ′
to above using dydt′ and dt′ . Observe that in contrast with the Galilean transformations,
the velocity components transverse to the direction of motion of frame F ′ are affected by
the Lorentz transformation!

2.8.2 Transformation formula for the acceleration


A similar transformation for the acceleration can be found. Recall that
dV dV ′
a≡ , a′ ≡ .
dt dt′
Starting from
V −v
V′ =
1 − V v/c2
and calculating the differential
dV V −v
dV ′ = 2
+ v/c2 dV,
1 − V v/c (1 − V v/c2 )2
one concludes that
1 dV
dV ′ = . (2.15)
γ (1 − vV /c2 )2
2

Also, from the Lorentz transformation

dt′ = γ(dt − v/c2 dx),

it follows that
dV ′ 1 dV

= 3 2 3
.
dt γ (1 − vV /c ) dt

22
Alternatively, on can write
1
a′ = a.
γ 3 (1
− vV /c2 )3
Notice that as a consequence of this formula, although acceleration is not an invariant,
if the acceleration is zero in one inertial frame, then it is zero in all inertial frames. Hence,
acceleration is in a certain sense absolute.
Remark. As before, if v ≪ c, then one finds that a′ ≈ a —the Galilean invariance of
acceleration.

2.9 The Minkowski spacetime


There are many ways to study Special relativity. Here we take the geometrical approach
developed in 1908 by H. Minkoswki. This approach naturally leads to (and led Einstein!)
to General Relativity.
To gain some intuition, start with the Euclidean geometry of the 2 dimensional plane
and recall the transformation of coordinates corresponding to the rotation of Cartesian
axes by an angle α in such a plane:
x′ = x cos α + y sin α,
y ′ = −x sin α + y cos α,
where (x, y) and (x′ , y ′ ) correspond to the coordinates of the point p in the two frames.








The transformation can be deduced from the diagram by observing that:


x′ = OA + AB = OA + CD
= OC cos α + P C sin α
= x cos α + y sin α

y = P B = P D − BD
= P C cos α − OC sin α
= −x sin α + y cos α.

Eliminating the rotation parameter α by taking


x′2 + y ′2 = (x cos α + y sin α)2 + (−x sin α + y cos α)2
= x2 + y 2 .

23
Letting
(OP )2 ≡ x2 + y 2 , (2.16)
one sees that in Euclidean space, rotations leaves the distance (OP ) invariant. Note
also that the rotation leaves curves of constant distance from the origin —i.e. circles—
invariant.





Analogue for Lorentz transformations. Starting from

ct′ + x′ = e−α (ct + x),


ct′ − x′ = eα (ct − x),

and multiplying both sides one obtains

−ct2 + x2 = −ct′2 + x′2 ,

where the choice of sign in the previous equation is a convention. Furthermore, since
y ′ = y and z ′ = z one obtains

−c2 t2 + x2 + y 2 + z 2 = −c2 t′2 + x′2 + y ′2 + z ′2 . (2.17)

Alternatively, one could start from the infinitesimal version of the Lorentz transfor-
mations
 
′ v∆x
∆t = γ ∆t − 2 , ∆x′ = γ (∆x − v∆t) , ∆y ′ = ∆y, ∆z ′ = ∆z,
c
and taking the limit in equation (2.17) one obtains

−c2 dt2 + dx2 + dy 2 + dz 2 = −c2 dt′2 + dx′2 + dy ′2 + dz ′2 . (2.18)

Therefore
−c2 dt2 + dx2 + dy 2 + dz 2
remains invariant under Lorentz transformations (boosts).
Remark 1. The value of c is unit dependent. Often, relativists choose units (relativistic
units) such that c = 1. That is, distance is measured in light seconds —the distance
travelled by light in 1 second. From now on we shall put c = 1. Subsequent formulae
may be put “right” dimensionally by putting the missing c’s back on basis of dimensional
grounds.
Remark 2. With c = 1 one has that equation (2.18) reduces to

−dt2 + dx2 + dy 2 + dz 2

24
which, apart from the negative sign is very similar to the Euclidean distance in 4 dimen-
sions
ds2 = dx2 + dy 2 + dz 2 + dw2 .
Furthermore, they both remain invariant under coordinate transformations: Lorentz
transformations and rotations, respectively. This invariant quantity is called the in-
terval ds2 (or line element) in a new type of geometry called the Minkowski geometry or
spacetime. It is then described by

ds2 = −dt2 + dx2 + dy 2 + dz 2 .

The latter measures the “distance” between events (t, x, y, z) and (t + dt, x + dx, y +
dy, z + dz) in spacetime.
Note. As opposed to Euclidean geometry, the set of points with equal distances from
the origin defines a hyperbola:

x2 − t2 = D, D a constant.










The set of curves in Minkowski space that are left invariant by Lorentz transformations
are hyperbolae.

The light cone


In the sequel the 3-dimensional surface in 4-dimensional spacetime given by

x2 + y 2 + z 2 − c2 t2 = 0, (2.19)

will be of importance. This is said to define a light cone at the origin, because all lights
rays emitted at t = 0 at origin lie on the cone x2 + y 2 + z 2 = c2 t2 . Suppressing 1-space
dimension one has the following figure:




25
This figure is obtained by noticing that for constant t, equation (2.19) describes a
sphere or radius |t|. Notice that the radius of the sphere increases as |t| increases. Also,
notice that if y = z = 0 one obtaines lines in the (t, x) plane with slope of ±45 degrees.

2.9.1 Minkowski diagrams


The consequence of Special Relativity are best visualised using Minkowski diagrams.
These are pictures in Minkowski spacetime (usually x − t pictures). As an example let
us look at the positions of the x′ and t′ axes relative to the x and t axes.
The x′ axis (i.e. t′ = 0) is given by (c = 1):

t′ = γ(t − vx), so that t = vx.

Similarly, the t′ axis (i.e. x′ = 0) is given by

1
x′ = γ(x − vt) = 0, so that t= x
v








One can also ask what is seen in the reference frame F ′ . For this one can use the
inverse Lorentz transformations

t = γ(t′ + vx′ ), x = γ(x′ + vt′ ).

The x and t axes from the point of view of the frame F ′ are given, respectively, by

1
t′ = −vx′ , t′ = − x′ .
v
Thus, the picture from F ′ ’s point of view is the following:






This picture is consistent with the Principle of Relativity —all frames of reference
are equivalent and should provide an equivalent picture! We shall see further examples
of this symmetry in the sequel.

26
2.9.2 A brief discussion of causality
In what follows we discuss some consequences of the x-dependence of the Lorentz trans-
formation of time.










Consequence 1. Any event Ei inside the light cone occurring after O from the perspec-
tive of F will also occur after O from the perspective of F ′ no matter how fast F ′ moves
with respect to F so long as v ≤ c. An event E0 outside the light cone and occurring
after O from the point of view of F could occur before O from the point of view of F ′ .
Therefore, outside the future (and similarly the past) light cone of O there exists no
ordered time sense of events.
Given any point O, the spacetime is divided up into the absolute past of O (the past
light cone at O) and the absolute future of O (the future light cone at O) and a region
(spacelike) know as the region of relative simultaneity.


 

 
  




Consequence 2. For invariance of causality, interactions must take place at speeds


less than c. To see this, consider a process in which an event E1 causes an event E2 at
super-light speed u > c relative to some frame F . Choose coordinates in F such that E1
and E2 occur on the x-axis and let their space and time separation ∆x > 0, ∆t > 0 (i.e.
E1 precedes E2 ). Now, in frame F ′ moving with with velocity v relative to F we have:
 v   uv 
∆t′ = γ ∆t − 2 ∆x = γ∆t 1 − 2
c c
where
∆x
u=
∆t
is the speed of propagation. Now, for

c2
<v<c
u

we would have ∆t′ < 0 so that in F ′ the event E2 precedes E1 —i.e. cause and effect are
reversed or we have information from receiver to transmitter!

27
2.10 4-vectors and tensors in Special Relativity
In order to write Newton’s laws in the Minkoswki spacetime, we require 4-vectors. In
analogy with 3-vectors (which are invariant under the change of coordinates) we define
4-vectors in the Minkowski 4-dimensional geometry in such a way that the resulting
calculus will have equations invariant under Lorentz transformations (boosts). In order
to accomplish this, we require index notation, discussed in the next section.

2.10.1 Index notation


In what follows let
(t, x, y, z) = (x0 , x1 , x2 , x3 ),
where the index position is a convention —more about this later. Write

xa , (a = 0, 1, 2, 3)

for x0 , x1 , x2 , x3 we may write (2.18) as


3 X
X 3
ds2 = ηab dxa dxb (2.20)
a=0 b=0

where ηab is called the Minkowski metric tensor given by


 
−1 0 0 0
 0 1 0 0 
(ηab ) ≡ 
 0 0

1 0 
0 0 0 1

so that
η11 = η22 = η33 = 1, η00 = −1,
while all other ηab ’s are zero.
In order to drop clumsy summations hereafter we will use the so-called Einstein
summation convention:

(i) Whenever an index is repeated (appears exactly twice, once upstairs and once
downstairs) in a term, it is understood to imply summation over that index over all
its permissible values. We refer to the repeated indices as being “contracted”. In
this course lower case Latin indices a, b, . . . take values 0, 1, 2, 3. Hence equation
(2.20) may be written
ds2 = ηab dxa dxb .

(ii) Repeated indices as called dummy indices since they may be replaced by another
index (from the same alphabet!) not already used. For example:

ds2 = ηab dxa dxb = ηcd dxc dxd .

(iii) To avoid ambiguity, no index should appear more than twice in the same expression.
So, for example,
xi yi zi or xa ya za
are not allowed!

28
(iv) Indices that occur only once in an expression (or terms of an equation) are called
free indices. In an equation such indices match in every term. For example consider

Ai B i C j = D j .

Notice that i is a dummy index and that j is a free index.

Examples
For simplicity in the following examples let the Latin lower case index take values 1, 2.

(1)
Ai B j = {A1 B 1 , A1 B 2 , A2 B 1 , A2 B 2 }
as i, j are free indices.

(2)
2
X
Ai Bi = Ai Bi = A1 B1 + A2 B2 ,
i=1

as i is a dummy index.

(3)  
g11 g12
gij =
g21 g22
as, again, i, j are free indices.

(4) In Γi jk all indices are free. There are 8 terms: Γ1 11 , Γ1 12 ; . . . .

(5) In Ri jkl all indices are free and there are 16 terms: R1 111 , R1 112 , R1 122 , . . .

(6)
dxj dxk dxl dxm
Γi jk = Γi lm
ds ds ds ds
as l, m are dummy indices while i is free.

(7) xa yb z b = za yc y c .

(8) gij dxi dxj = gmn dxm dxn = g11 (dx1 )2 + g12 dx1 dx2 + g21 dx2 dx1 + g22 (dx2 )2 .

2.10.2 4-vectors
In spacetime, vectors are four-dimensional and that is why we refer to them as 4-vectors.
The most important point to remember about vectors is that they are located at a given
point in spacetime. You may be used to thinking of vectors are arrows connecting two
points on the plane; moreover, one tends to carelessly slide vectors from one point to
another. These concepts however do not generalise to curved spaces, where there are
no preferred curves connecting two points, or no unique way to move a vector around.
Rather, to each point p in spacetime, we associate the set of all possible vectors located
at that point; this set is known as the tangent space at p and we denote it by Tp . It is
important to think of these vectors as located at a single point p rather than stretching
from one point to another.

29
The tangent space Tp is an abstract vector space for each point in spacetime. Recall
that a (real) vector space is a collection of vectors that can be added together and
multiplied by real numbers in a linear way. For instance, for any two vectors V and W
and real numbers a and b, we have

(a + b)(W + V ) = a V + b V + a W + b W .

Every vector space has a zero vector that functions as the identity element under vector
addition. In some vector spaces there is an additional operation, namely the inner (dot)
product, but this requires extra structure.
A vector field is defined as set of vectors with precisely one element at each point in
spacetime. It is useful sometimes to decompose vectors into components with respect to
some set of basis vectors. A basis is any set of vectors which both spans the vector space
and is linearly independent. For any vector space, there are infinitely many bases, but
each basis has the same number of vectors; this number is the dimension of the vector
space. In Minkowski space, the dimension is of course four.
Consider a basis of four vectors ê(i) with i ∈ [0, 1, 2, 3]. In a basis adapted to the
coordinates xa , we would have

ê(0) = (1, 0, 0, 0) , ê(1) = (0, 1, 0, 0) , ê(2) = (0, 0, 1, 0) , ê(3) = (0, 0, 0, 1) .

Any abstract vector A can be written as a linear combination of the basis vectors:

A = Ai ê(i) .

The coefficients Ai are the components of the vector A. One has to remember that a
vector is an abstract geometrical object, while the components are just the coefficients
of the vector in some convenient basis.
An example of a vector in spacetime is the tangent vector to a curve. A parametrised
curve in spacetime is specified by the spacetime coordinates xa as a function of the
parameter along the curve λ, i.e., xa (λ). Then, the tangent vector V (λ) to the curve has
components
dxa
Va = .

The entire vector is given by V = V a ê(a) . Under a Lorentz transformation the spacetime
coordinates xa change according to
′ ′
xa = Lab xb (2.21)

where Lab is the Lorentz transformation matrix defined as
′ ′ ′ ′
L0 0 L0 1 L0 2 L0 3
   
γ −γv 0 0
′  L1′ 0 ′
L1 1

L1 2

L1 3   −γv γ 0 0
(Lab ) ≡ 

= .
 L2′ 0 ′
L2 1

L2 2 2′
L 3   0 0 1 0 
′ ′ ′ ′
L3 0 L3 1 L3 2 L3 3 0 0 0 1

for the case of two frames in standard configuration. Since the parameter λ along the
curve is unchanged by the transformation, we then deduce that the components of the
tangent vector must transform as
′ ′
V a → V a = Lab V b .

30
However, the vector V itself (as opposed to its components in some coordinate system) is
invariant under Lorentz transformations. From this, we can deduce the transformation
rule for the basis vectors ê(a) under Lorentz transformations:
′ ′ ′
V = V a ê(a) = V b ê(b′ ) = Lba V a ê(b′ ) =⇒ ê(a) = Lba ê(b′ ) .

To get the new basis ê(b′ ) in terms of the old one ê(a) , we should multiply by the inverse of

the Lorentz transformation Lba . But the inverse of a Lorentz transformation is another
Lorentz transformation, and therefore we can write
′ ′ ′
Lab′ Lbc = δca , Lpq Lqr′ = δrp′ .

Then, the transformation rule for the basis vector is

ê(b′ ) = Lab′ ê(a) .

Therefore, the set of basis vectors transforms via the inverse Lorentz transformation of
the coordinates or vector components.

2.10.3 Dual vectors (one-forms)


Once we have defined a vector space Tp , we can define an associated vector space, namely
the dual vector space Tp∗ called the co-tangent space. The dual space is the space of all
linear maps from the original vector space Tp to the real numbers R. If ω ∈ Tp∗ , V, W ∈ Tp
and a, b ∈ R, then
ω(a V + b W ) = a ω(V ) + b ω(W ) ∈ R .
These maps form a vector space themselves, so if ω and η are dual vectors, then

(a ω + b η)(V ) = a ω(V ) + b η(V ) .

We can similarly introduce a set of basis dual vectors θ̂(b) defined by

θ̂(b) (ê(a) ) = δab .

Then every dual vector can be written in terms of its components, which we label with
lower indices:
ω = ωa θ̂(a) .
We will usually write ωa , in perfect analogy with vectors, to stand for the entire dual
vector. Sometimes the elements of Tp are called contravariant vectors and the elements
of Tp∗ are referred to as covariant vectors or one-forms.
The component notation leads to a very simple way of writing the action of a dual
vector on a vector:

ω(V ) = ωa θ̂(a) (V b ê(b) )


= ωa V b θ̂(a) (ê(b) )
= ωa V b δba
= ωa V a ∈ R . (2.22)

Note that we only need to know the components. This equation also suggests that we
can think of vectors as linear maps on dual vectors by defining,

V (ω) ≡ ω(V ) = ωa V a .

31
Therefore, the dual space to the dual space is the original vector space itself.
In spacetime we will be interested in fields of vectors and dual vectors. In that
case, the action of a dual vector field on a vector field is not a single number but a
scalar (function) on spacetime. A scalar is a quantity that is invariant under Lorentz
transformations; it is a coordinate-indepedent map from spacetime to the real numbers.
We can use the same arguments that we used earlier for vectors to deduce the transfor-
mation properties of the dual vectors under Lorentz transformations. We find (exercise),
′ ′
ωa′ = Lba′ ωb , θ̂(c ) = Lcd θ̂(d) .

This transformation rules ensure that the scalar ω(V ) is indeed invariant under Lorentz
transformations, as it should be.
The simplest example of a dual vector in spacetime is the gradient of a scalar function
ϕ,
∂ϕ (a)
dϕ = θ̂ .
∂xa
The conventional chain rule used to transform partial derivatives amounts in this case to
the transformation rule of the components of dual vectors:

∂ϕ ∂xb ∂ϕ
=
∂xa′ ∂xa′ ∂xb
∂ϕ
= Lba′ ,
∂xb
where we have used (2.21) for the Lorentz transformation of the coordinates. The fact
that the gradient of a scalar function is a dual vector leads to the following shorthand
notation for the partial derivatives:
∂ϕ
= ∂a ϕ = ϕ,a .
∂xa
In this lectures we will usually use ∂a rather than the comma. Note that the gradient
acts in a natural way on the example of the vector tangent to a curve:
dxa dϕ
∂a ϕ = .
dλ dλ
The result is just the derivative of the function ϕ along the curve xa (λ).

2.10.4 Tensors
Just as a dual vector is a linear map from vectors to R, a tensor T of type (or rank) (k, l)
is a multilinear map from a collection of dual vectors and vectors to R:

T : Tp∗ × · · ×} Tp∗ × Tp ×
| ·{z | ·{z
· · ×} Tp → R .
k times l times

Here “ × ” denotes the Cartesian product, so for example Tp × Tp is the space of or-
dered pairs of vectors. Multilinearity means that the tensor acts linearly in each of its
arguments; for instance, for a tensor of type (1, 1), we have

T (a ω + b η, c V + d W ) = a c T (ω, V ) + a d T (ω, W ) + b c T (η, V ) + b d T (η, W ) ,

where ω, η ∈ Tp∗ , V, W ∈ Tp and a, b, c, d ∈ R.

32
The space of all tensors of fixed type (k, l) forms a vector space. To construct a basis
for this space, we first need to define the tensor product, denoted by ⊗. If T is a (k, l)
tensor and S is an (m, n) tensor, we define a (k + m, l + n) tensor T ⊗ S by

T ⊗ S(ω (1) , . . . , ω (k) , . . . , ω (k+m) , V (1) , . . . , V (l) , . . . , V (l+n) )


= T (ω (1) , . . . , ω (k) , V (1) , . . . , V (l) ) × S(ω (k+1) , . . . , ω (k+m) , V (l+1) , . . . , V (l+n) ) .

Here is how it works: first act T on the appropriate set of dual vectors and vectors, and
then act S on the reminder and multiply the answers. Note that in general T ⊗S ̸= S ⊗T .
It is now straightforward to construct a basis for the space of all (k, l) tensors by
taking tensor products of basis vectors and dual vectors; this basis will consists of all
tensors of the form
ê(a1 ) ⊗ · · · ⊗ ê(ak ) ⊗ θ̂(b1 ) ⊗ · · · ⊗ θ̂(bl ) .

In components, we the write an arbitrary (k, l) tensor as

T = T a1 ...akb1 ...bl ê(a1 ) ⊗ · · · ⊗ ê(ak ) ⊗ θ̂(b1 ) ⊗ · · · ⊗ θ̂(bl ) .

Alternatively, one can define the components by acting T on the basis of vectors and
dual vectors:
T a1 ...akb1 ...bl = T (θ̂(a1 ) ⊗ · · · ⊗ θ̂(ak ) , ê(b1 ) ⊗ · · · ⊗ ê(bl ) ) .

As with vectors, we will usually denote a tensor T by its components T a1 ...akb1 ...bl .
The action of a general tensor T on a set of vectors and dual vectors is the expected one:

T (ω (1) , . . . , ω (k) , V (1) , . . . , V (l) ) = T a1 ...akb1 ...bl ωa(1)


1
. . . ωa(k)
k
V (1)b1 . . . V (l)bl .

A (k, l) tensor thus has k upper indices and l lower indices. The order is important since
it need not act in the same way on its various arguments.
The transformation of the tensor components under Lorentz transformations follows
from the transformation properties of the basis of vectors and dual vectors, and it is what
one would expect from the placement of the indices:

a′1 ...a′k a′1 a′k


T b′1 ...b′l
=L a1 ...L b1
ak L b′ . . . Lbl b′ T a1 ...akb1 ...bl .
1 l

Although we have defined tensors as linear maps from sets of vectors and dual vectors
to R, nothing forces us to act on all the arguments. Thus, a (1, 1) tensor also acts as a
map from vectors to vectors:
T ab : V b → T ab V b .

Exercise: Check that T ab V b transforms as a vector under Lorentz transformations.


Similarly, one can act one tensor on (all or part of) another tensor to obtain a third
tensor. For example,
U ab = T adc S cdb ,

is a (1, 1) tensor.
In spacetime we have already seen examples of tensors. The Minkowski metric ηab
is a (0, 2) tensor. The metric provides extra structure to the space and, in particular, it
allows us to define an inner product and hence a norm. We have the following operations:

33
Norm or magnitude of a 4-vector. It is defined by

|A|2 ≡ ηab Aa Ab = −(A0 )2 + (A1 )2 + (A2 )2 + (A3 )2 , (2.23)

which in analogy with the invariance of

−t2 + x2 + y 2 + z 2 = |x̄|2 , xa = (t, x, y, z),

is invariant.
Example: Show by direct substitution that the norm of a 4-vector is invariant. For
simplicity let c = 1. One has that

A′0 = γ(A0 − vA1 ),


A′1 = γ(A1 − vA0 ),
A′2 = A2 ,
A′3 = A3 .

Hence,

−(A′0 )2 + (A′1 )2 = γ 2 (A1 )2 + γ 2 v 2 (A0 )2 − 2γvA1 A0 − γ 2 (A0 )2 − γ 2 v 2 (A1 )2 + 2γvA0 A1 ,


= γ 2 (A1 )2 (1 − v 2 ) − γ 2 (A0 )2 (1 − v 2 ),
= −(A0 )2 + (A1 )2 .

Remark. Because of the negative sign in (2.23), the norm of a vector does not have to
be positive! A 4-vector Aa is said to be:

ˆ timelike if |A|2 < 0,

ˆ spacelike if |A|2 > 0,

ˆ null if |A|2 = 0.

In Minkowski spacetime a null vector need not be a zero vector whose components
are zero! Only in a space in which the norm is positive definite, it is true that |A|2 = 0
implies Aa = 0.
Example: Show that Aa = (1, 1, 0, 0) is a null vector. A direct computation gives

|A|2 = −(A0 )2 + (A1 )2 = −1 + 1 = 0.

Similarly for
(1, −1, 0, 0), (1, 0, 1, 0), (1, √12 , √12 , 0), etc.

34
Inner (or scalar) product The scalar product of two 4-vectors Aa , B b is defined by
A · B = ηab Aa B b = −A0 B 0 + A1 B 1 + A2 B 2 + A3 B 3 .
Notice that as a consequence of this definition |A|2 = A · A.
Example: With the help of a sketch convince yourself that the sum of two timelike or
spacelike vectors or the sum of a timelike and a spacelike vector can be null!

Manipulating tensors
The operation of contraction consists of summing over one upper and one lower index,
thus turning a (k, l) tensor into a (k − 1, l − 1) tensor:
S abc = T adbcd .
One can check (exercise) that the object on the left hand side is a well-defined tensor.
Note that it is only allowed to contract an upper index with a lower one, otherwise the
result would not be a tensor. Also, in general the order of the indices matters, so one
gets different tensors by contracting in different ways:
T abcdb ̸= T acbdb
The metric and inverse metric can be used to raise and lower indices on tensors. That
is, given a tensor T abcd we can use the metric to define new tensor with different positions
of the indices:
T abmd = η mc T abcd ,
Tm bcd = ηma T abcd ,
Tmn pq = ηma ηnb η pc η qd T abcd ,
and so forth. Notice that raising or lowering does not change the position of the index
relative to the other indices, and also that free indices (which are not summed over) must
be the same on both sides of the equation, while dummy indices (which are summed over)
only appear on one side. For example, we can turn vectors and dual vectors into each
other by raising and lowering the indices:
Va = ηab V b ,
ω a = η ab ωb .
Because the metric and inverse metric are inverses of each other, we can raise and lower
simultaneously a pair of indices being contracted over:
Aa Ba = η bc Ac ηbd B d = δdc Ac B d = Ad B d .
Therefore, in spaces with a metric, we don’t make a distinction between vectors and dual
vectors.

35
Symmetries of tensors
A tensor is said to be symmetric in any of its indices if it is unchanged under the exchange
of those indices. For example, if
Sabc = Sbac
we say that Sabc is symmetric in its first two indices, while if

Sabc = Sacb = Scab = Sbac = Sbca = Scba

we say that Sabc is symmetric in all of its indices. Similarly, a tensor is said to be
anti-symmetric in any of its indices if it changes sign when those indices are exchanged.
Hence,
Aabc = −Acba
means that the tensor Aabc is anti-symmetric in its first and third indices. If a tensor is
(anti-) symmetric in all of its indices, we refer to it as simply (anti-) symmetric (some-
times with the redundant modifier “completely”). Notice that it does not make sense to
exchange upper and lower indices with each other, so for example Kronecker’s delta δba
is not symmetric nor anti-symmetric.
Given any tensor, we can symmetrise (or anti-symmetrise) any number of its lower
or upper indices. To symmetrise,

c 1  
T(a1 a2 ...an )b = Ta1 a2 ...an b c + sum over permutation of indices a1 . . . an ,
n!
while anti-symmetrisation comes with the alternating sum:

c 1  
T[a1 a2 ...an ]b = Ta1 a2 ...an b c + alternating sum over permutation of indices a1 . . . an ,
n!
By alternating sum we mean that permutations that are the result of an odd number of
exchanges of indices are given a minus sign, for example,
1
T[abc]d = (Tabcd − Tacbd + Tcabd − Tbacd + Tbcad − Tcbad ) .
3!
The standard notation is to use round/square brackets for symmetrisation/anti-
symmetrisation. Sometimes we may want to (anti-)symmetrise indices that are not next
to each other, in which case we use vertical bars to denote indices that are not included
in the sum:
1
T(a|b|c) = (Tabc + Tcba ) .
2
If we are contracting over a pair of indices that are symmetric/anti-symmetric on one
tensor, only the symmetric/anti-symmetric part of the lower indices will contribute (ex-
ercise):
A(ab) Bab = A(ab) B(ab) , X [cd] Ycd = X [cd] Y[cd] ,
regardless of the symmetry properties of B and Y respectively. For any two indices we
can decompose a tensor into its symmetric and anti-symmetric parts,

Tabcd = T(ab)cd + T[ab]cd ,

but this is not generally true for three or more indices:

Tabce ̸= T(abc)d + T[abc]d ,

36
because there are parts with mixed symmetry that are not specified by either the sym-
metric or anti-symmetric pieces. Note that according to the convention used here, a
symmetric tensor S satisfies
Sa1 ...an = S(a1 ...an ) ,
and likewise for anti-symmetric tensors.
If we think of X ab as a matrix, we can sum the diagonal components to compute its
trace and this makes sense. However, we will also want to compute the trace for a (0, 2)
tensor Yab , in which case we first have to raise an index (Y ab = η ac Ycb ) and then contract:

Y = Y aa = η ab Yab .

Note that the sum of the diagonal components of Y ab is not the same as the sum of the
diagonal components of Yab . Then, the trace of the Minkowski metric is given by

η ab ηab = δaa = 4 .

Note that anti-symmetric (0, 2) tensors are always tracelss (check!).

37
2.11 Proper time
In order to develop relativistic dynamics one requires the analogues of
dxa dv b dpc
va = , ab = , Fc = ,
dt dt dt
etc. The problem is that in Special Relativity, t = x0 is not a scalar, so that we cannot
just carry d/dt over to Special Relativity.
The closest thing to dt which is a scalar is the proper time interval dτ defined by
ds2 dx2 dy 2 dz 2
dτ 2 ≡ − = dt2
− − 2 − 2 .
c2 c2 c c
In the previous definition the minus sign is included so that dτ and dt have the same
sign! The name of proper time comes from the fact that a clock at rest with a moving
particle —i.e. in the particle’s rest frame where dx = dy = dz = 0— has dτ = dτ —i.e.
it is equal to the time elapsed on the particle’s clock.
We employ τ as the invariant measure of time for the particle.

2.12 4-velocity and 4-momentum


In order to express Newton’s laws in Special Relativity in an invariant way, we need to
express them in terms of 4-vectors.

4-velocity
The 4-velocity of a particle is defined as a unit tangent to its Worldline:
dxa dxi
Ua = , Ui = .
dτ dτ
Notation: we will reserve the middle Latin indices i, j, k, . . . to denote the spatial com-
ponents, so these indices range from 1 to 3. In what follows, for simplicity we set c = 1.
Remarks:
(1) From the definition of dτ one finds that

ds2 = −dτ 2 = dx̄ · dx̄ = ηab dxa dxb

where dxa = (dt, dx, dy, dz) so that

U a Ua = −1. (2.24)

So that 4-velocity as defined has unit length.


(2) From dτ 2 = dt2 − dx2 − dy 2 − dz 2 one finds that
 2  2  2  2
dτ dx dy dz
=1− − − = 1 − v2,
dt dt dt dt
where v denotes the 3-velocity relative to the frame F and v 2 = v · v. Hence, one
concludes that
dt 1
=√ = γ(v) (c = 1).
dτ 1 − v2

38
Now, using
dx dx dt
= = γ(v)v 1 , etc
dτ dt dτ
one finds that  
a dt dx dy dz
U = , , , = γ(v)(1, v 1 , v 2 , v 3 ),
dτ dτ dτ dτ
or in short
U a = γ(v)(1, v). (2.25)
Note that the spatial part of U a is essentially v (with a relativistic correction).

4-momentum
The 4-momentum is the natural analogue of the 3-momentum:
pa = m0 U a ,
where m0 denotes the mass of the particle. From the definition it follows that
pa pa = m20 U a Ua = −m20 ,
where it has been used that U a Ua = −1. Also, using (2.25) one has
pa = m0 γ(v)(1, v). (2.26)
It follows that the space part of (2.26) can be identified with the 3-momentum, where
by analogy m0 γ is called the the moving mass, or the apparent mass and m0 is referred
as the rest mass. From
m0
m ≡ m0 γ(v) = p ,
1 − v 2 /c2
we see that m → ∞ as v → c, so it is impossible to accelerate a massive particle to the
speed of light since its mass would effectively become infinite, thus requiring an infinite
amount of energy.
Multiplying the time component of pa by c, we can identify it with the energy
E = m0 c2 γ(v).
One reason for this identification comes from considering the limit for small v/c. For
v/c ≪ 1 one has
E = m0 c2 γ(v) = m0 c2 (1 − v 2 /c2 )−1/2
≈ m0 c2 + 12 m0 v 2 + O(v 4 /c2 )
where the binomial expansion has been used. Now, the second term is just the Newtonian
kinetic energy ( 21 m0 v 2 ). The first term (m0 c2 ) is then interpreted as the rest mass energy.
This is the famous equation
Erest = m0 c2 .
Note that since c ≈ 300, 000 km/s is a very large number, Erest is typically enormous.
From the previous discussion one can write (c = 1),
pa = (E, p), (2.27)
with p the 3-momentum and E the energy. From (2.26) one concludes that
pa pa = (E, p) · (E, p) = −E 2 + p · p = −m20 ,
and hence
E 2 − p · p = m20 , (c = 1).

39
4-acceleration
As one might expect, the 4-acceleration is the natural analogue of the 3-acceleration:

dU b d2 xb
ab = = ,
dτ dτ 2
where U a is the 4-velocity previously defined. Observe that by differentiating the equation
U a Ua = −1, it follows that
ab Ub = 0,
so that 4-acceleration and 4-velocity are found to be orthogonal.

2.13 Photons
The definition of 4-velocity given in the previous sections breaks down when applied to
particles moving with the speed of light (photons) since for light rays one has ds2 =
−dτ 2 = 0. In this case one may choose another parameter λ and define

dxa
ka = ,

but again k a ka = 0 since k a is null. This also implies that pa pa = 0 for photons as pa
is in the direction of U a . Now, recalling that pa pa = −m20 , it follows that m0 = 0 for
photons. Hence, particles moving with the speed of light must be massless!
Consider a photon with 4-momentum pa = (E, p) defined relative to some frame F .
As seen before pa pa = 0, so that one finds that

E 2 − p2 = 0, or E = p.

Therefore, for photons the spatial 3-momentum and the energy are equal. In particular,
if the photon moves along the x-direction one has that

px = E.

2.14 Doppler shift


Let F and F ′ be in standard configuration. Consider a photon of frequency ν moving in
the x-direction relative to the frame F . Relative to the frame F ′ the energy of the photon
may be obtained using a Lorentz transformation. For this recall that pa is a 4-vector and
its energy is given by its t-component. So, from

pa = (E, px ), py = pz = 0,

one obtains
E ′ = γ(E − vpx ), (c = 1). (2.28)
Also, recall that from Quantum Mechanics, a photon of frequency ν has energy given by
hν where h denotes Planck’s constant:

h = 6.625 × 10−34 Js.

40
Similarly, one has E ′ = hν ′ . Substituting in (2.28) one obtains

hν − vpx
hν ′ = √ . (2.29)
1 − v2
Furthermore, for such a photon E = px so that substituting into (2.29):

hν − vhν
hν ′ = √ ,
1 − v2
from which we can conclude
ν′
r
1−v 1−v
=√ = .
ν 1 − v2 1+v
Adding the constant c: s
ν′ 1 − v/c
= . (2.30)
ν 1 + v/c
This is the relativistic Doppler shift formula. Note that when v/c ≪ 1, then using the
binomial expansion in (2.30) one obtains

ν′
≈ 1 − v/c,
ν
which is the usual (non-relativistic) formula for the Doppler shift.
Remark. The Doppler shift has been fundamental in Cosmology to establish the ex-
pansion of the Universe.

2.15 Relativistic dynamics


In Special Relativity Newton’s laws become:
First law. Remains unchanged, except that the straight lines referred to are now world-
lines in Minkowski spacetime.
Second law. One has
dpa
Fa = .

Third law. On basis of very precise experiments of Particle Physics, this remains
unchanged. That is, 4-momentum is conserved in collisions:
X
pai = constant,
i

where the sum is over the particles involved in the collision.


Note. Due to constancy of the time component, the conservation of energy with rest
mass is included in the balance!

2.15.1 Examples of relativistic collisions


This type of problems can be solved by equating components, squaring and then using
further properties of pa .

41
Example 1
Consider 2 particles with rest masses m1 and m2 both moving along collinearly with
speeds u1 and u2 . The particles collide and coalesce with the resulting particle moving
in the same direction. The question is: what are the mass m and the speed u of the
resulting particle?

Recall that pa = mγ(1, v) for a particle of 3-velocity v. The initial 4-momenta are:

pa1 = m1 γ(u1 )(1, u1 , 0, 0),


pa2 = m2 γ(u2 )(1, u2 , 0, 0).

The final 4-momentum is


pa = mγ(u)(1, u, 0, 0).

The conservation of -momentum is expressed by

pa = pa1 + pa2 . (2.31)

Squaring
p2 = pa pa = p21 + p22 + 2p1 · p2 . (2.32)

However,

p2 = −m2 , p21 = −m21 , p22 = −m22 ,


p1 · p2 = m1 m2 γ(u1 )γ(u2 )(−1 + u1 u2 ).

Substituting in (2.32):
q
m= m21 + m22 + 2m1 m2 γ(u1 )γ(u2 )(1 − u1 u2 ). (2.33)

Taking space and t-components of 4-momenta in equation (2.31)

mγ(u)u = m1 γ(u1 )u1 + m2 γ(u2 )u2 , (2.34a)


mγ(u) = m1 γ(u1 ) + m2 γ(u2 ). (2.34b)

Dividing (2.34a) by (2.34b) one obtains

m1 γ(u1 )u1 + m2 γ(u2 )u2


u= . (2.35)
m1 γ(u1 ) + m2 γ(u2 )

Remark. In the limit of u1 ≪ c and u2 ≪ c one has that γ(u1 ), γ(u2 ) ≈ 1 and that
(1 − u1 u2 ) ≈ 1 so that (2.33) and (2.35) yield

m ≈ m1 + m2 ,
m1 u1 + m2 u2
u≈ ,
m1 + m2

which are the classical version of the result.

42
Example 2
Consider the collision (scattering) of a photon of frequency ν moving in the x-direction
by an electron of mass me in a frame in which me is initially at rest. Assume that the
subsequent motion remains in the xy plane. After the collision, the photon has frequency
ν ′ and is moving at an angle α with the horizontal while the electron is moving at an
angle β. Find the angles α and β in terms of frequencies before and after the collision,
i.e., ν and ν ′ , and the rest mass of the electron me .
Before the collision the 4-momenta of the photon and electron are given, respectively,
by

pap1 = hν(1, 1, 0, 0),


pae1 = me γ(0)(1, 0, 0, 0), γ(0) = 1.

After the collision we have that

pap2 = hν ′ (1, cos α, sin α, 0),


pae2 = me γ(v)(1, v cos β, −v sin β, 0),

where ν ′ is the new photon frequency and α, β are as given in the figure.
The conservation of 4-momentum gives:

pap1 + pae1 = pap2 + pae2 .

Squaring:
(pp1 + pe1 − pp2 ) · (pp1 + pe1 − pp2 ) = pe2 · pe2 . (2.36)
But,
p2e1 = p2e2 = −m2e , p2p1 = p2p2 = 0.
Substituting in (2.36) one obtains

pe1 · pp1 − pe1 · pp2 = pp1 · pp2 ,

from where
−me hν + me hν ′ = h2 νν ′ (cos α − 1),

43
and
me c2
 
2 α
 1 1
sin 2 = ′
− . (2.37)
2h ν ν

Similarly, to find β rewrite (2.36) as

(−pp1 + pe2 + pp2 ) · (−pp1 + pe2 + pp2 ) = pe1 · pe1 .

This example shows that the photon is deflected (or scattered) by and angle given by
(2.37)

44
Chapter 3

Prelude to General Relativity

3.1 General remarks


At the time of the development of Special Relativity, physical interactions were supposed
to be either gravitational or electromagnetic. Electromagnetism was already compatible
with Special Relativity —i.e. invariant under Lorentz transformations. On the other
hand, Newton’s laws were not.
After the development of Special Relativity, what was needed was to construct a
relativistic theory of gravity compatible with Special Relativity. The first attempts to
construct such theory involved generalisations of Newton’s laws of gravity. For example,
Nordström developed a theory which was Lorentz invariant but which is incompatible
with the observations —it does not produce light bending.
Einstein in 1915 succeeded in constructing a theory which is both Lorentz invariant
and which s compatible with predictions. This theory is called General Relativity. In
order to develop General Relativity, we will require some ingredients of tensor calculus.
To understand why this mathematical tool is required, we take first a look at some of
the principles that underlie the theory.

3.2 The Equivalence Principle


The Equivalence Principle amounts to the following two statements:

(1) The (equation of) motion of a (spherically symmetric) test particle (one whose
own gravitational field may be neglected) in a gravitational field is independent
of its mass and composition. The first verification of this statement is claimed
to be Galileo’s Pisa bell tower experiment —although this particular experiment
probably never took place. More recent experiments like the one by Roll, Krotkov
and Dicke (1964) have allowed to establish the equality to 1 part in 1011 .

(2) Matter (as well as every form of energy) is acted on by (and is itself a source of)
gravitational field. In other words, gravity couples everything.

An immediate consequence of (2) is that it is not possible to eliminate the force of


gravity in the same way that other forces may be eliminated, by for example, discon-
necting power sources or by means of shielding as in the case of Faraday cages. The only
other forces that behave in this way are the so-called fictitious forces (i.e. the centrifugal

45
and Coriolis forces) which arise when non-inertial frames of reference are employed. The
important point about these forces is that like gravity, they are proportional to the mass
of the particle. This led Einstein to suspect that these and the gravitational forces should
enter the theory in the same way.
To get a better feeling for this, recall that the only way one can eliminate the force of
gravity is by choosing a freely falling frame —i.e. a comoving frame with the freely falling
particle. This is can be visualised in the thought experiment (Gedankenexperiment) —
sometimes referred to as the lift experiment.
The experiment suggests that there are no local experiments which distinguish non-
rotating free fall in gravitational field from a uniform motion in a space free from gravita-
tional fields. By local, here its is understood that the experiment is performed in a small
region such that the variation of the gravitational field is negligible (observationally).
This is another way of expressing the Equivalence Principle (all particles fall in the same
way). In this sense, Special Relativity is regained locally, in the sense that the laws of
Physics in a freely falling frame are compatible with Special Relativity. Alternatively,
one can say that spacetime is locally Minkowskian. Furthermore, for a global theory in
the presence of gravitation (i.e. GR), the geometry of spacetime must be such that it
is locally Minkowskian. The natural tool to express and implement these ideas is the
so-called tensor calculus.

3.3 Summary
In presence of gravitational fields there exist, in small regions (locally), preferred inertial
frames (i.e. the non-rotating free falling frames) in which the special relativistic results
hold. On a large scale, on the other hand, there are no such preferred frames, and hence
one needs to treat all large scale reference frames on the same footing. This suggests
that the laws of nature should be formulated in such a way that they are invariant under
arbitrary transformations of coordinates (i.e. reference frames), and not just the Lorentz
transformations as was the case of Special relativity.
Interpreted physically, this is called the General Principle of Relativity as opposed to
the Special Principle of Relativity according to which laws of nature have the same form
in inertial frames.
Interpreted mathematically, it is called the principle of General Covariance —the
equations of Physics should have tensorial form.

46
Chapter 4

Differential Geometry and tensor


calculus

In describing spacetime we wish our equations to be valid for any coordinates. Tensorial
equations satisfy this property —hence their significance.

4.1 Manifolds and coordinates


Manifolds are fundamental concepts in mathematics and physics. We are used to the
properties of n-dimensional Euclidean space, Rn , the set of n-tuples (x1 , . . . , xn ) often
equipped with the positive-definite metric δij . There is the well-known theory of analysis
in Rn , e.g., differentiation, integration, etc.. However, there are other spaces, e.g., spheres,
torii, which are “curved” or have topology, where we would like to perform analogous
operations. The notion of manifold addresses this issue. A manifold is a space that can
be curved and/or can have a complicated topology but locally it looks like Rn . The entire
manifold is constructed by sewing together these local regions. For example, Rn , an n-
dimensional sphere S n or an n-torus T n are manifolds. While a single cone is another
example of a manifold, two cones intersecting at the vertex are not a manifold because
at the vertex the space does not look like Euclidean space.
To formalise the notions of “looking locally like Rn ” and “smoothly sewn together”
we need some preliminary definitions. Given two sets M and N , a map ϕ : M → N is
a relationship that assigns, to each element of M , exactly one element of N . Hence, a
map is just a generalisation of the concept of a function. Given two maps ϕ : A → B
and ψ : B → C, we define the composition ψ ◦ ϕ : A → C by the operation (ψ ◦ ϕ)(a) =
ψ(ϕ(a)), so a ∈ A, ϕ(a) ∈ B and hence (ψ ◦ ϕ)(a) ∈ C. The order in which the maps are
written is such that the one on the right acts first.
A map is called one-to-one (or injective) if each element of N has at most one element
of M mapped into it, and onto (or surjective) if each element of N has at least one element
of M mapped into it. For example, consider functions ϕ : R → R. Then ϕ(x) = ex is
one-to-one but not onto; ϕ(x) = x3 − x is onto, but not one-to-one; ϕ(x) = x3 is both;
and ϕ(x) = x2 is neither.
The set M is known as the domain of the map ϕ, and the set of elements in N that M
gets mapped into is called the image of ϕ. For any subset U ⊂ N , the set of elements of
M that get mapped to U is called the preimage of U under ϕ or ϕ−1 (U ). A map that is
both one-to-one and onto is known as invertible (or bijective). In this case we can define
the inverse map ϕ−1 : N → M by (ϕ−1 ◦ ϕ)(a) = a.

47
Figure 4.1: Overlapping coordiante charts. The maps are only defined on the shaded
regions and they should be smooth there.

Consider maps between general Euclidean spaces ϕ : Rm → Rn . A map from Rm to


Rn takes an m-tuple (x1 , . . . , xm ) to an n-tuple (y 1 , . . . , y n ) and can therefore be thought
of as a collection of n functions ϕi of m variables:

y 1 = ϕ1 (x1 , . . . , xm )
y 2 = ϕ2 (x1 , . . . , xm )
.. (4.1)
.
y n = ϕn (x1 , . . . , xm )

We will refer to any of these functions as C p if its pth derivative exists and is continuous,
and refer to the entire map ϕ : Rm → Rn as C p if each of its component functions is at
least C p . The C 0 map is continuous but not differentiable, while a C ∞ map is continuous
and it can be differentiated infinitely many times; such a map is also called smooth.
We will call two sets M and N diffeomorphic if there exists a C ∞ map ϕ : M → N
with a C ∞ inverse ϕ−1 : N → M ; the map ϕ is then called a diffeomorphism. This is
the best notion we have to say that two spaces are “the same”.
Now we can proceed to rigorously define a manifold. Consider first an open ball,
namely the set of all points n that |x − y| < r for some fixed y ∈ Rn and
Pn x ini R such
i 2 1/2
r ∈ R, where |x − y| = [ i=1 (x − y ) ] is the usual Euclidean norm. An open set
in Rn is a set constructed from an arbitrary union of open balls. A chart or coordinate

48
Figure 4.2: Stereographic coordinates on the S 2 defined by projecting from the north
pole down to the x3 = −1 plane. This coordinate chart covers the whole sphere except
for the north pole.

system consists of a subset U of a set M along with a one-to-one map ϕ : U → Rn such


that the image ϕ(U ) is open in Rn . We then say that U is an open set in M . A C ∞ atlas
is an indexed collection of charts [(Uα , ϕα )] that satisfies two conditions:

1. The union of the Uα is equal to M , that is, the Uα cover M .

2. The charts are smoothy sewn together. More precisely, if two charts overlap, Uα ∩
Uβ ̸= ∅, then the map (ϕα ◦ ϕ−1 n
β ) takes points in ϕβ (Uα ∩ Uβ ) ⊂ R onto an open
set ϕα (Uα ∩ Uβ ) ⊂ Rn , and all these maps must be C ∞ where they are defined.

See Fig. 4.1. So a chart is what we normally think of as a coordinate system on some
open set, and an atlas is a collection of charts that are smoothly related on their overlaps.
Then a C ∞ n-dimensional manifold is simply a set M along with a maximal atlas, one
that contains every possible compatible chart. We can replace C ∞ by C p in the definition
but in our applications we will always consider smooth manifolds. The requirement that
the atlas be maximal is so that two equivalent spaces equipped with different atlases do
not count as different manifolds.
A non-trivial example of a manifold is the two-dimensional sphere, S 2 . Let’s take the
S 2 of unit radius to be the set of points in R3 defined by (x1 )2 + (x2 )2 + (x3 )2 = 1. We
can construct a chart from the open set U1 defined by the sphere minus the north pole
via the stereographic projection (see Fig. 4.2: draw a straight line from the north pole
to the plane defined by x3 = −1, and assign to the point on the S 2 intercepted by the
line the Cartesian coordinates (y 1 , y 2 ) of the appropriate point on the plane. Explicitly,
the map is given by

2 x1 2 x2
 
1 2 3 1 2
ϕ1 (x , x , x ) ≡ (y , y ) = , .
1 − x3 1 − x3

49
Another chart (U2 , ϕ2 ) can be obtained by projecting from the south pole to the plane
x3 = +1. The resulting coordinates cover the sphere minus the south pole, and are given
by:
2 x1 2 x2
 
1 2 3 1 2
ϕ2 (x , x , x ) ≡ (z , z ) = , .
1 + x3 1 + x3
Together, these two charts cover the entire manifold, and they overlap in the region
−1 < x3 < 1. One can check that the composition ϕ2 ◦ ϕ−1
1 is given by

4 yi
zi = p , i = 1, 2 ,
(y 1 )2 + (y 2 )2

which is C ∞ in the overlap region.


Consider the maps f : Rm → Rn and g : Rn → Rl , and the composition map
(g ◦ f ) : Rm → Rl . We can label points on each space in terms of the usual Carte-
sian coordinates: xa on Rm , y b on Rn and z c on Rl , where the indices range over the
appropriate values. The chain rule relates the partial derivatives of the composition to
the partial derivatives of the individual maps:

∂ X ∂f b ∂g c
c
(g ◦ f ) = .
∂xa ∂xa ∂y b
b

This is usually abbreviated to

∂ X ∂y b ∂
= . (4.2)
∂xa ∂xa ∂y b
b

∂y b
When m = n, the determinant of the matrix ∂x a is called Jacobian of the map, and the

map is invertible whenever the Jacobian is non-zero.

4.2 Vectors
To construct the tangent space Tp using objects that are intrinsic to M we proceed as
follows. Let F be the space of all smooth functions on M , i.e., C ∞ maps f : M → R.
Each curve through p defines an operator on this space, namely the directional derivative,
df
which maps f → dλ at p. Then the tangent space Tp can be identified with the space
of directional derivative operators along curves through p. To see that this is indeed the
d d
case, note that two operators dλ and dη representing derivatives along two curves xa (λ)
a
and x (η) through p can be added and scaled by real numbers to give another operator
d d
a dλ + b dη . This new operator clearly acts linearly on functions and it can also be shown
to satisfy the Leibniz rule. Therefore, the set of directional derivatives forms a vector
space.
To identify the vector space of directional derivatives with the tangent space Tp we
need to show the directional derivatives form a suitable basis for this space. To construct
such a basis, consider a coordinate chart with coordinates xa . An obvious set of n-
directional derivatives at p are the partial derivatives ∂a at this point. Note that this is
the definition of partial derivative with respect to xa : the directional derivative along a
curve defined by xb = constant for all b ̸= a, parametrised by xa itself. We are now going
to show that the partial derivative operators {∂a } at p form a basis for the tangent space
Tp . To do this, we are going to show that any directional derivative can be decomposed

50
Figure 4.3: Decomposing the tangent vector to a curve γ : R → M in terms of partial
derivatives with respect to local coordinates on M .

into a linear combination of partial derivatives. Consider an n-dimensional manifold M ,


a coordinate chart ϕ : M → Rn , a curve γ : R → M , and a function f : M → R. This
leads to the tangle of maps shown in Fig. 4.3. If λ is the parameter along γ, we want to
d
express the vector/operator dλ in terms of partial derivatives ∂a . Using the chain rule,
we have
d d
f= (f ◦ γ)
dλ dλ
d
= [(f ◦ ϕ−1 ) ◦ (ϕ ◦ γ)]

(4.3)
d(ϕ ◦ γ)a ∂(f ◦ ϕ−1 )
=
dλ ∂xa
dx a
= ∂a f .

The first line simply takes the informal expression on the left hand side and writes it as
an honest derivative of the function (f ◦ γ) : R → R. The second line just comes from
the definition of inverse map ϕ−1 . The third line is just the formal chain rule and the last
line is a return to the informal notation of the start. Since the function f is arbitrary,
we have
d dxa
= ∂a .
dλ dλ
Thus, the partial derivatives {∂a } do indeed represent a good basis for the vector space
of directional derivatives, and hence we can identify the latter with the tangent space.
This particular basis, i.e., ê(a) = ∂a , is known as a coordinate basis for Tp ; it is the
formalisation of the notion of setting up basis vectors to point along the coordinate axes.
In general, we do not have to limit ourselves to coordinate bases when we consider tangent
vectors. For example, the coordinate basis vectors are typically not normalised to unity

51
nor orthogonal to each other. In fact, on a curved manifold, the coordinate basis will
never be orthogonal throughout the neighbourhood of a point where the curvature does
not vanish. One can define non-coordinate orthonormal bases by giving their components
in a coordinate basis but we will not use this in these lectures.
One of the advantages of the abstract point of view that we have taken regarding
vectors is that now the transformation law under changes of coordinates is immediate.
Since the basis vectors are ê(a) = ∂a , the basis vectors in some new coordinate system

xa are simply given by the chain rule (4.2):
∂xa
∂a′ = ∂a .
∂xa′
Just as in flat space, we can get the transformation law for the components of a vector
V by demanding that V = V a ∂a is unchanged by a change of basis:

V a ∂a = V a ∂a′
′ ∂x
a (4.4)
=Va ∂a ,
∂xa′

∂xa ∂xa
and hence, since the matrix ∂xa is the inverse of the matrix ∂xa′
, we have

′ ∂xa a
Va = V . (4.5)
∂xa
Since the basis vectors are usually not written explicitly, the rule (4.5) for transforming
components of vectors is what we call “vector transformation law”. Note that this law
is compatible with the transformation of vector components in Special Relativity un-
′ ′
der Lorentz transformations, V a = Laa V a since Lorentz transformations are just a very
′ ′
special kind of coordinate transformations, namely xa = Laa xa . However, (4.5) is com-
pletely general and it determines the transformation of vectors under arbitrary changes
of coordinates.
Since a vector at a point can be thought of as directional derivative along a path
through that point, a vector field defines a map from smooth functions to smooth func-
tions all over the manifold by taking a derivative at each point. Given two vector fields
X and Y , we can define the commutator [X, Y ] by its action on an arbitrary function
f (xa ):
[X, Y ](f ) ≡ X(Y (f )) − Y (X(f )) . (4.6)
Clearly this operator is independent of the coordinates. Moreover, the commutator of
two vector fields is itself a vector field: if f and g are functions and a and b are real
numbers, the commutator is linear:
[X, Y ](af + bg) = a[X, Y ](f ) + b[X, Y ](g) ,
and it obeys the Leibniz rule,
[X, Y ](f g) = f [X, Y ](g) + g[X, Y ](f ) .
Exercise: show that the components of the vector field [X, Y ]a are given by
[X, Y ]a = X b ∂b Y a − Y c ∂c X a .
Exercise: using the result above show that the components of [X, Y ]a transforms as a
vector under general coordinate transformations.
Remark. The commutator is a special case of the Lie derivative.

52
4.3 Tensors
Having defined vectors on general manifolds, we can now consider dual vectors (one-
forms). Once again, the co-tangent space Tp∗ can be thought of as the set of linear maps
ω : Tp → R. The canonical example of a one-form is the gradient of a function f ,
d
denoted by df . Its action on a vector dλ is the directional derivative of the function:
 
d df
df = . (4.7)
dλ dλ

Like in the case of a vector, a one-form exists only at the point where it is defined and it
does not depend on information at other points in M .
Just as the partial derivatives along the coordinate axes provide a natural basis for
the tangent space, the gradients of the coordinate functions xa provide a natural basis
for the co-tangent space. In flat space we constructed a basis for Tp∗ by demanding that
θ̂(a) (ê(b) ) = δba . On an arbitrary manifold M we can do the same and (4.7) leads to

∂xa
dxa (∂b ) = = δba . (4.8)
∂xb
Therefore the gradients {dxa } are an appropriate basis of one-forms, and hence an arbi-
trary one-form can be expanded into components as ω = ωa dxa .
The transformation rules of basis dual vectors and components follow from the usual
procedure. For the basis one-forms, we get

′ ∂xa
dxa = dxa , (4.9)
∂xa
and for the components,
∂xa
ωa . ωa′ = (4.10)
∂xa′
We will usually write the components ωa we we refer to a one-form ω.
Just as in flat space, a (k.l) tensor is a multilinear map from k dual vectors and l
vectors to R. Its components in a coordinate basis can be obtained by acting the tensor
on the basis of one-forms and vectors,

T a1 ...akb1 ...bl = T (dxa1 , . . . , dxak , ∂b1 , . . . , ∂bl ) . (4.11)

This is equivalent to the expansion

T = T a1 ...akb1 ...bl ∂a1 ⊗ · · · ⊗ ∂ak ⊗ dxb1 ⊗ · · · ⊗ dxbl .

The transformation law for general tensors under general changes of coordinates follows
exactly the same pattern as in flat space, now replacing the Lorentz transformation
matrix used in flat space with the Jacobian of the general coordinate transformation:
′ ′
a′1 ...a′k ∂xa1 ∂xak ∂xb1 ∂xbl a1 ...ak
T b′1 ...b′l
= . . . . . . ′ T b1 ...bl . (4.12)
∂xa1 ∂xak ∂xb′1 ∂xbl
It is often easier (but entirely equivalent) to transform a tensor under coordinate
transformations by taking the identity of basis vectors and one-forms as partial derivatives
and gradients at face value, and simply substituting in the coordinate transformation.

53
Example: Consider a symmetric (0, 2) tensor S on a two-dimensional manifold whose
components in a coordinate system (x1 = x, x2 = y) are given by:
 
1 0
Sab = .
0 x2
This can be equivalently written as
S = Sab dxa ⊗ dxb = (dx)2 + x2 (dy)2 , (4.13)
where in the last equality we suppress the tensor product symbols for brevity (as this is
common practice!). Consider the coordinate transformation:
2x y
x′ = , y′ = ,
y 2
(valid for example when x > 0 and y > 0). This coordinate transformation can be
straightforwardly inverted:
x = x′ y ′ , y = 2 y ′ . (4.14)
To calculate the components of S in the new coordinate system, Sa′ b′ , we could easily

∂xa ∂xa
compute the matrix ∂x a′ (and its inverse ∂xa if needed) and apply the general formula
for the tensor transformation law (4.12). Instead, we will use the fact that we compute
the derivatives of the old coordinates in terms of the new ones using (4.14) to express

dxa in terms of dxa :
dx = y ′ dx′ + x′ dy ′
dy = 2 dy ′ .
Plugging these expressions into (4.13) and remembering that the tensor products do not
commute (dx′ dy ′ ̸= dy ′ dx′ ), we obtain,
S = (dx)2 + x2 (dy)2
= (y ′ dx′ + x′ dy ′ )(y ′ dx′ + x′ dy ′ ) + (x′ y ′ )2 (2 dy ′ )(2 dy ′ )
= (y ′ )2 (dx′ )2 + x′ y ′ (dx′ dy ′ + dy ′ dx′ ) + [(x′ )2 + 4 (x′ y ′ )2 ](dy ′ )2 ,
which is equivalent to
(y ′ )2 x′ y ′
 
Sa′ b′ = .
x y (x ) + 4 (x′ y ′ )2
′ ′ ′ 2

Most tensor operations defined in flat space, e.g., contraction, symmetrisation, etc.,
are unaltered on a general manifold. However, the partial derivative of a general tensor
is not, in general, a new tensor. The gradient, which is the partial derivative of a scalar is
an honest (0, 1) tensor, but the partial derivative of a higher rank tensor is not a tensor.
To see this, for example, let’s consider the transformation of ∂a Wb , where Wa is a (0, 1)
tensor, under a general coordinate transformation:
∂xa ∂
 b 
∂ ∂x
W b′ = W b
∂xa′ ∂xa′ ∂xa ∂xb′
∂xa ∂xb ∂xa ∂
   b
∂ ∂x
= ′ ′ Wb + Wb a′ a . (4.15)
a
∂x ∂x b ∂x a ∂x ∂x ∂xb′
The first term on the right hand side is the expected one for the transformation law of a
(0, 2) tensor but the second one spoils the correct transformation.

54
4.4 The metric
The metric tensor in a general curved space is denoted by the symbol gab (while ηab is
specifically reserved for the Minkowski metric). The metric tensor gab is a symmetric
(0, 2) tensor and, at least in these lectures, we will take it to be non-degenerate, that is,
its determinant g ≡ |gab | does not vanish. In this case, we can define the inverse metric
g ab via
g ab gbc = gcd g da = δca .
Since gab is symmetric, the inverse metric g ab is also symmetric. Just as in special
relativity, the metric and its inverse can be used to raise and lower indices of tensors.
Here we outline some of the reasons why the metric is such an important object:

1. The metric provides a notion of “past” and “future”.

2. The metric allows to compute the length of paths and proper time.

3. The metric determines the “shortest” distance between two points (and therefore
the motion of test particles).

4. In general relativity, the metric replaces the Newtonian gravitational field ϕ.

5. The metric provides a notion of locally inertial frames and therefore a sense of “no
rotation”.

6. The metric determines causality, by defining the speed of light faster than which
no signal can travel.

7. The metric replaces the usual Euclidean three-dimensional dot product.

In our discussion of proper time and special relativity we introduced the line element
ds2 = ηab dxa dxb , which we used to calculate the length of the path. Now we now that
a
dx is really a basis of dual vectors. In a general curved manifold, the line element is
given by
ds2 = gab (x) dxa dxb . (4.16)
For example, the line element of three-dimensional Euclidean space in Cartesian coordi-
nates is
ds2 = (dx)2 + (dy)2 + (dz)2 .
In spherical coordinates,

x = r sin θ cos θ ,
y = r sin θ sin θ ,
z = r cos θ ,

the line element becomes (exercise),

ds2 = dr2 + r2 dθ2 + sin2 θ dϕ2 .

It can be shown that, at some given point p on a manifold M , the metric gab can
always be put into its canonical form, where its components are

gab = diag(−1, −1, . . . , −1, +1, +1, . . . , +1, 0, 0, . . . , 0) .

55
Furthermore, ∂c gab |p = 0 but ∂c ∂d gab |p ̸= 0. The signature of the metric is the number
of both positive and negative eigenvalues. If all signs are positive, the metric is called
Euclidean or Riemannian (or just positive definite), while if there is a single minus it is
called Lorentzian or pseudo-Riemannian, and any metric with some +1’s and some −1’s
is called indefinite.
With the definition (4.16) of a general metric gab on a curved space, we can straight-
forwardly generalise many of the notions we had in Minkowski space:

Norm of a vector: Given a vector V a , the norm is defined via

|V |2 ≡ gab V a V b .

If |V |2 > 0 ( or |V |2 < 0) for all vectors V a , the metric is called positive definite (or
negative definite) —this is the Riemannian case. Otherwise it is called indefinite —this
includes the Lorentzian case.

Inner (scalar) product between two vectors: Given two arbitrary vectors Aa and
B a , their inner (scalar) product is defined as

A · B ≡ gab Aa B b .

If gab Aa B b = 0, then Aa and B b are said to be orthogonal.

Null vectors: For indefinite metrics, null vectors are those that have zero norm, i.e.,
they are orthogonal to themselves: For indefinite metrics there are vectors that are
orthogonal to themselves. That is,

gab Aa Ab = 0.

Note for indefinite metrics, note that this does not imply that Aa is the zero vector
(Aa = 0).

4.5 Covariant derivatives


In flat space in inertial coordinates, the partial derivative operator ∂a is a map from
(k, l) tensor fields to (k, l + 1) tensor fields which acts linearly on its arguments and
obeys the Leibniz rule on tensor products. However, we have seen in (4.15) that on
a general curved manifold, it is no longer true that the partial derivative operator ∂a
operator acting on a tensor produces another tensor. Therefore, we need to define a new
derivative operator, called covariant derivative and denoted by the symbol ∇, which is
independent of the coordinates and that maps (k, l) tensor fields to (k, l + 1) tensor fields.
Given two arbitrary tensor fields T and S, then we demand that ∇ obeys:

1. Linearity: ∇(T + S) = ∇T + ∇S.

2. Leibniz rule: ∇(T ⊗ S) = (∇T ) ⊗ S + T ⊗ (∇S).

It can be shown that if ∇ has to satisfy the Leibniz rule, then it can always be written
as the partial derivative plus some linear transformation. For the case of a vector field
V a , this implies
∇a V b = ∂a V b + Γbac V c , (4.17)

56
where the Γbac ’s are called connection coefficients. We can determine the transformation
rule for the connection coefficients by demanding that the covariant derivative of a vector
(4.17) transforms as a (1, 1) tensor under coordinate transformations, namely

b′ ∂xa ∂xb
∇ V
a′ = ∇a V b .
∂xa′ ∂xb
Expanding the left hand side,

′ ∂xa ∂xb
∇a′ V b = ∇a V b
∂xa′ ∂xb !
′ ′ ′
∂xa ∂xb b ∂xa b ∂ ∂xb ′ ∂xc c
= ∂ a V + V + Γba′ c′ V .
∂xa′ ∂xb ∂xa′ ∂xa ∂xb ∂xc

Expanding now the right hand side:


′ ′ ′
∂xa ∂xb b ∂xa ∂xb b ∂xa ∂xb b
∇ a V = ∂ a V + Γ Vc.
∂xa′ ∂xb ∂xa′ ∂xb ∂xa′ ∂xb ac
The last two expressions have to be equated; the first terms in each side are the same
and hence they cancel, so we are left with
!
∂xc′ ∂x a ∂ ∂xb′ ′
∂xa ∂xb b
b ′ c c
Γ a′ c′ V + a′ V = Γ Vc.
∂xc ∂x ∂xa ∂xc ∂xa′ ∂xb ac

Notice that in the second term on the left hand side of this equation we have changed
∂xc ′ ′
the dummy index b → c. Multiplying this equation by ∂x d′ and relabelling d → c we
find ′ ′
b′ ∂xa ∂xc ∂xb b ∂xa ∂xc ∂ 2 xb
Γ a′ c′ = Γ − . (4.18)
∂xa′ ∂xc′ ∂xb ac ∂xa′ ∂xc′ ∂xa ∂xc
This is not a tensor transformation law since the second term on the right hand side
spoils it; therefore, it is clear that the connection coefficients are not the components of
a tensor. They are precisely constructed to be non-tensorial but in way such that the
combination (4.17) transforms as a tensor.
Equation (4.18) does not determine the connection coefficients uniquely. To further
constraint the connection, we impose that the covariant derivative satisfies the following
two properties, in addition to the previous ones:

3. Commutes with contractions: ∇a (T ccb ) = (∇T )accb .

4. Reduces to the partial derivative when acting on scalars: ∇a ϕ = ∂a ϕ.

The first property is equivalent to demand that the Kronecker delta (the identity map)
is covariantly constant: ∇a δcb = 0, which resonable to impose since the components of δba
are constants (zeros and ones).
Let’s see what these new properties imply. Consider an arbitrary one-form ωa and an
arbitrary vector field V a ; we can compute the covariant derivative of the scalar defined
by ωa V a to obtain:

∇a (ωb V b ) = (∇a ωb ) V b + ωb (∇a V b )


= (∇a ωb ) V b + ωb (∂a V b + Γbac V c ) .

57
But

∇a (ωb V b ) = ∂a (ωb V b )
= V b ∂a ωb + ωb ∂a V b

Equating these two expressions, re-labelling the dummy indices b → c and c → b in the
term with the connection in the first expression and recalling that V a is arbitrary, we
obtain the formula for the covariant derivative of a one-form

∇a ωb = ∂a ωb − Γcab ωc . (4.19)

It is now straightforward to determine the formula for the covariant derivative acting
on an arbitrary (k, l) tensor. We find,

∇c T a1 ...akb1 ...bl = ∂c T a1 ...akb1 ...bl


a1 ...ak−1 d
+ Γacd
1
T da2 ...akb1 ...bl + · · · + Γacd
k
T b1 ...bl

− Γdab1 T a1 ...akdb2 ...bl − · · · − d


Γ abl T a1 ...ak
b1 ...bl−1 d . (4.20)

Sometimes an alternative notation is used; just as commas are used to denote partial
derivatives, semi-colons are used for the covariant derivatives:

∇c T a1 ...akb1 ...bl = T a1 ...akb1 ...bl ;c .

In these lectures we will mostly use ∇a for the covariant derivative.


Still, we have not fully specified the connection and in fact, one can define many
different connections on a manifold satisfying the previous four requirements. It turns
out though that every metric defines a unique connection, which is the one that is used
in general relativity.
To see this, the first thing to notice is that the difference between two connections,
say Γ and Γ̃, is a tensor. Indeed,
 
∇a V b − ∇˜ a V b = ∂a V b + Γb V c − ∂a V b + Γ̃b V c
ac ac

= (Γbac − Γ̃bac )V c .

Since the left hand side is a tensor by definition of covariant derivative, the right hand
side must also be a tensor. Hence,

Γbac − Γ̃bac = S bac

where S bac is a tensor. Next, notice that given a connection Γcab , one can form an-
other connection by simply permuting the lower indices; namely the coefficients Γcba also
transform as (4.18) and hence they determine a distinct connection. Therefore, to every
connection we can associate a tensor, known as the torsion tensor, given by

T cab = Γcab − Γcba = 2 Γc[ab] . (4.21)

It is clear that the torsion tensor in anti-symmetric in its lower indices, and a connection
that is symmetric in its lower indices (and hence the torsion tensor vanishes) is called
“torsion-free”.
We can now determine the unique connection on a manifold with a metric gab by
introducing two additional properties:

58
5. Torsion-free: Γcab = Γcba .

6. Metric compatibility: ∇c gab = 0 .

The torsion free condition implies that covariant derivatives acting on a scalar field
commute:
∇a ∇b ϕ = ∇b ∇a ϕ .

A connection is metric compatible if the covariant derivative of the metric with respect to
that connection vanishes everywhere, and such a connection is known as the Levi-Civita
connection.
The metric-compatibility condition implies the following for the covariant derivative
of the inverse metric:

0 = ∇a δcb
= ∇a (g bd gdc )
= gdc ∇a g bd + g bd ∇a gdc
= ∇a (g bd ) gdc
⇒ ∇a g bd = 0 .

In addition, a metric-compatible covariant derivative commutes with raising and lowering


of indices. For example, for an arbitrary vector field V a ,

gac ∇b V c = ∇b (gac V c ) = ∇b Va .

Now we will show both the existence and uniqueness of the metric-compatible con-
nection by explicitly determining the connection coefficients in terms of the metric. To
do so, consider the following three expressions for the expanded metric compatibility
condition obtained by permuting the indices:

∇c gab = ∂c gab − Γdca gdb − Γdcb gad = 0


∇a gbc = ∂a gbc − Γdab gdc − Γdac gbd = 0
∇b gca = ∂b gab − Γdbc gda − Γdba gcd = 0

Subtracting the second and third equations from the first and using the symmetry prop-
erties of the connection, we obtain

∂c gab − ∂a gbc − ∂b gca + 2 Γdab gdc = 0 .

Multiplying this expression by g ec and re-labelling the indices, we find the final expression
for the Levi-Civita connection:

1 cd
Γcab = g (∂a gbd + ∂b gad − ∂d gab ) . (4.22)
2

This connection is also known as the Christoffel connection and the symbols in (4.22) are
referred to as the Christoffel symbols.

59
Example: Christoffel symbols of the flat Euclidean metric in 2 dimensions in polar
coordinates:
ds2 = dr2 + r2 dθ2 .
The non-zero components of the metric are grr = 1, gθθ = r2 , while the non-zero com-
ponents of the inverse metric are g rr = 1 and g θθ = r12 . Notice that we use r and θ as
indices in a notation that should be obvious; we will continue to do so in the rest of these
lectures. Using (4.22), we compute
1 ra
Γrrr = g (∂r gra + ∂r gra − ∂a grr )
2
1 1
= g rr (∂r grr + ∂r grr − ∂r grr ) + g rθ (∂r grθ + ∂r grθ − ∂θ grr )
2 2
1 1
= (1)(0 + 0 − 0) + (0)(0 + 0 − 0)
2 2
= 0.

Similarly, we compute
1 ra
Γrθθ = g (∂θ gθa + ∂θ gθa − ∂a gθ )
2
1
= g rr (∂θ gθr + ∂θ gθr − ∂r gθθ )
2
= −r .

Proceeding in exactly the same manner as above, we find that the remaining components
of the Christoffel symbols are given by

Γrθr = Γrrθ = 0
Γθrr = 0
1
Γθrθ = Γθθr =
r
Γθθθ = 0 .

Remark. The Christoffel symbols of the flat metric in Cartesian coordinates vanish
identically (exercise).

4.6 Parallel transport and geodesics


In flat space, parallel transport of a vector along a curve intuitively means “keeping
the vector constant” as we move it along the curve. More precisely, given a curve xb (λ),
imposing that an arbitrary vector V a is constant along this curve in flat space corresponds
to:
d a dxb ∂ a
V = V = 0.
dλ dλ ∂xb
The crucial difference between flat and curved spaces is that, in a curved space, the result
of parallel transporting a vector from one point to another will depend on the path taken
between the points, see Fig. 4.4.
The generalisation of this concept to curved manifolds amounts to replace the par-
tial derivative by a covariant derivative. Therefore, a vector V a is said to be parallely
transported along W b if
W b ∇b V a = 0.

60
Figure 4.4: Parallel transport on the sphere. On a curved space, the result of parallel
transporting a vector depends on the path taken.

This concept can be generalised to tensors of any rank. For example, a (k, l) tensor T is
said to be parallel transported along a vector W b if
W b ∇b T a1 a2 ...akb1 b2 ...bl = 0 .
Now, recall that one way of characterising straight lines in Euclidean space is as
curves whose tangent vectors are parallely transported at every point —i.e. they are
autoparallels. The notion defined above can be used to define the analogue of straight
lines in more general manifolds. Such curves are referred to as affine geodesics —i.e.
curves along which the tangent vector is propagated parallely to itself. We remark that
this definition can also be used to define a notion of shortest distance on a manifold using
the metric, but we shall defer this to a later point in the course.
Letting W b to be tangent to a geodesic, one has that
W b ∇b W a = 0,
from where
W b ∂b W a + Γa cb W c W b = 0.
If the curve is parametrised by λ, then
dxb
Wb = ,

and since
dxb ∂
 
∂b d
W b
= ≡ ,
∂x dλ dλ ∂xb
so that
dxa dxc dxb
 
d
+ Γa bc = 0,
dλ dλ dλ dλ

61
and finally that
d2 xa c
a dx dx
b
+ Γ bc = 0. (4.23)
dλ2 dλ dλ
This the famous geodesic equation.
Note. From the existence and uniqueness theorems for ordinary differential equations,
it follows that corresponding to every direction at a point, there exists a unique geodesic
passing through the point. The initial conditions are
dxa
λ = λ0 , xa0 = xa (0), W0a = (0).

Example. Show that changing the geodesic parameter λ to σ in such a way that σ =
σ(λ), the geodesic equation only keeps its form (4.23) in σ if σ = aλ + b.
To see this recall that
dxa dxa dσ
= ,
dλ dσ dλ
so that 2
d2 xa d2 xa dxa d2 σ


= + .
dλ2 dσ 2 dλ dσ dλ2
Substituting into equation (4.23) one gets
2
d2 xa c b dxa d2 σ
 
a dx dx dσ
+ Γ bc + = 0,
dσ 2 dσ dσ dλ dσ dλ2

which only has the form of (4.23) if

d2 σ
= 0.
dλ2
That is, if
σ = aλ + b.
A parameter of this form is called an affine parameter.

4.6.1 Metric geodesics


In Euclidean geometry, straight lines are defined as the shortest distance between any
two points. Here we give an analogue of this for a manifold with a metric.
In Lorentzian manifolds, straight lines are not those with shortest distances (intervals)
between 2 points, but the longest. The generalisation of a straight line —a geodesic line—
turns out to be the curve of extremal path (i.e., maximal or minimal). In order to find
extrema, one needs some elements of calculus of variations. Let
dx
L = L(x, ẋ, λ), x = x(λ), ẋ = .

That is, L is a function of functions of λ —L is called a functional. It is assumed that L
is differentiable in x, ẋ, λ.
We are looking for the necessary conditions on the function x such that the integral
Z x2
L(x, ẋ, λ)dλ
x1

62
is stationary (i.e., a maximum or a minimum) with respect to changes in the function x.
The required condition is called the Euler-Lagrange equation and takes the form
 
d ∂L ∂L
− = 0. (4.24)
dλ ∂ ẋ ∂x

This expression can be generalised to the case where L is a function of N independent


functions, xi (λ), i = 1, . . . , N , provided they can be varied independently. In that case
(4.24) becomes  
d ∂L ∂L
− i = 0, (4.25)
dλ ∂ ẋi ∂x
corresponding to N equations, one for each value of i.

RTo deduce the geodesic equation we want to consider the length of the curve defined
by ds to be stationary. Introducing a parameter λ along the curve such that
Z Z
ds
ds = dλ,

the problem becomes that of finding the extremals of


r
ds dxj dxk
q
L= = gjk = gjk ẋi ẋj .
dλ dλ dλ
Alternatively, one can find extremals of
 2
ds
L= = gjk ẋi ẋj .

A computation renders

∂L ∂ ẋa ∂ ẋb
c
= gab c ẋb + gab ẋa c ,
∂ ẋ ∂ ẋ ∂ ẋ
= gab δ a c ẋb + gab ẋa δ b c ,

= gcb ẋb + gac ẋa = 2gac ẋa .

Now, recall that the chain rule gives

d dxe ∂
= = ẋe ∂e .
dλ dλ ∂xe
Thus,

dẋa
 
d ∂L dgac a
=2 ẋ + 2gac
dλ ∂ ẋc dλ dλ

= 2∂e gac ẋe ẋa + 2gac ẍa .

Finally,
∂L
= ∂c gab ẋa ẋb .
∂xc

63
Thus, one has that
 
d ∂L ∂L
0= − = 2gac ẍa + (∂b gac + ∂a gbc − ∂c gab )ẋa ẋb .
dλ ∂ ẋa ∂xa

Multiplying by 12 g f c one obtains

ẍf + Γf ab ẋa ẋb = 0,

which can be rewritten as


d2 xf a
f dx dx
b
+ Γ ab = 0. (4.26)
dλ2 dλ dλ
This is the geodesic equation which we have met already. Thus, “straight lines” are also
extremal.
Remark 1. In Euclidean space in Cartesian coordinates or in Minkowski space in
Minkowski coordinates all the Christoffel symbols vanishes and equation (4.26) becomes

d2 xl
= 0,
ds2
which is the usual equation for straight motion.
Remark 2. As it stands, the above equation only makes sense for spacelikeRcurves for
which ds2 > 0. For timelike curves one uses dτ instead. Also, starting with ds2 gives
the same geodesic equation.
Remark 3. For null geodesics, i.e. geodesics for which ds = 0, the curve may be
parametrised by a parameter

d2 xl j
l dx dx
k
+ Γ jk = 0,
du2 du du
where
dxj dxk
gjk = 0.
du du

Remark 4. It can be proved that if gab is Riemannian then the solutions to equation
(4.26) are curves of minimum length. On the other hand, if gab is Lorentzian, then the
geodesics maximise length. Now, recall that in Special Relativity one defines the proper
time as dτ 2 = −ds2 /c2 . Thus, time observed by a comoving clock always goes slower.

4.6.2 An example of the use of the Euler-Lagrange equations


The Euler-Lagrange equations can be used to compute in a more efficient way the
Christoffel symbols. As a way of comparison, consider again the line element of the
2-dimensional sphere.
ds2 = dθ2 + sin2 θdφ2 .
The Lagrangian is in this case given by
 2  2  2
ds dθ 2 dφ
L= = + sin θ = θ̇2 + sin2 θφ̇2 .
dλ dλ dλ

64
The Euler-Lagrange equations are given by
 
d ∂L ∂L
i
− i = 0,
dλ ∂ ẋ ∂x
with (x1 , x2 ) = (θ, ϕ). Let’s consider the different components of these equations. For
i = 1 one has  
∂L 2 ∂L d ∂L
= 2 sin θ cos θφ̇ , = 2θ̇, = 2θ̈
∂θ ∂ θ̇ dλ ∂ θ̇
and finally that
θ̈ − sin θ cos θφ̇2 = 0. (4.27)
The latter is equivalent to (cfr. (4.26)):
d2 x1 dxj dxk
2
+ Γ1 jk = 0,
ds ds ds
or
d2 x1 1
1 dx dx
1 1
1 dx dx
2 2
1 dx dx
1 2
1 dx dx
2
+ Γ 11 + Γ 12 + Γ 21 + Γ 22 = 0.
ds2 ds ds ds ds ds ds ds ds
However, in our case one only has ϕ̇2 terms so the latter becomes
 2 2
d2 x1 1 dx
2
+ Γ 22 = 0.
ds ds
The latter in combination with gives
Γ1 22 = − sin θ cos θ, Γ1 11 = Γ1 12 = Γ1 21 = 0.
For i = 2 one finds from
d
(2φ̇ sin2 θ) = 0,

that
φ̈ + 2 cot θθ̇φ̇ = 0. (4.28)
Again, from the equation for the geodesic one has that
d2 x2 1
2 dx dx
2 2
2 dx dx
1
+ Γ 12 + Γ 21 = 0.
ds2 ds ds ds ds
However,
Γ2 12 = Γ2 21 ,
and hence
Γ2 12 = Γ2 21 = cot θ.
Finally,
Γ2 22 = Γ2 11 = 0.

The solution to the geodesic equations


As a final remark it is noticed that a solution to the geodesic equations found in the
previous section is given by
θ = λ, φ = constant.
his corresponds to a geodesic starting at the North Pole. Notice that the geodesic moves
towards the Equator along a meridian. This shows that geodesics on the sphere move
along the great circles of the sphere —the circles obtained by intersecting the sphere by
a plane that passes through the centre of the sphere.

65
66
Chapter 5

Curvature

A novel feature of General Relativity is that it employs the notion of curved space. Our
intuition of curvature is mainly based on the curvature of 2-dimensional objects in 3-
dimensional space, like spheres, saddles, etc. The notion of curvature whose definition
depends on a space of higher dimension is called extrinsic. In the case of spacetime this
notion is not useful and require an intrinsic notion —i.e. a definition which is independent
of the embedding space.

5.1 The Riemann curvature tensor


Having studied covariant derivatives and parallel transport, we are now ready to discuss
curvature. The curvature is quantified by the Riemann tensor, which is computed from
the connection. The idea behind this is that we know what we mean by a “flat” connection
– the usual Christoffel connection for the Euclidean or Minkowski metric in Cartesian
coordinates. In addition, in flat space we know that parallel transport around a closed
loop leaves a vector unchanged, the covariant derivatives of tensors commute and initially
parallel geodesics remain parallel. The Riemann tensor appears when we study those
aspects of the connection in curved spaces.
We have already seen in the case of the 2-sphere that the parallel transport of a vector
around a closed loop leads to a transformation of the vector. The resulting transformation
depends on the total curvature enclosed by the loop. However, it is more useful to have
a local notion of curvature. To do so, it is conventional to consider the parallel transport
of a vector along an infinitesimal loop. Consider two infinitesimal vectors Aa and B b and
suppose that we parallel transport an arbitrary vector V a first in the direction of Aa ,
then along B b , then backwards along Aa and B b to return to the starting point, see Fig.
5.1 (left).The change in the vector V a , denoted by δV a , is given by

δV c = Rcdab V d Aa B b . (5.1)

where Rcdab is the Riemann tensor. This tensor is antisymmetric in the last two indices:

Rcdab = −Rcdba .

This has to be the case because interchanging the vectors Aa and B b corresponds to
traversing the loop in the opposite direction and it hence it should give the inverse of
the original answer. If (5.1) is taken as the definition of the Riemann tensor, it implies
certain choice of convention for the ordering of the indices.

67
Figure 5.1: Left: Infinitesimal loop defined by the vectors Aa and B b . Right: commutator
of two covariant derivatives.

A related (and equivalent) way of defining the Riemann tensor is to consider the
commutator of two covariant derivatives. The commutator of two covariant derivatives
measures the difference between parallel transporting a given tensor first in one direction
and then the other, versus the opposite ordering, see Fig. 5.1 (right). The computation
goes as follows; consider an arbitrary vector field V a , then

[∇a , ∇b ]V c = ∇a ∇b V c − ∇b ∇a V c
= ∂a (∇b V c ) − Γdab ∇d V c + Γcad ∇b V d − (a ↔ b)
= ∂a ∂b V c + (∂a Γcbd )V d + Γcbd ∂a V d − Γdab ∂d V c − Γdab Γcde V e
+ Γcad ∂b V d + Γcad Γdbe V e − (a ↔ b)
= (∂a Γcbd − ∂b Γcad + Γcae Γebd − Γcbe Γead )V d . (5.2)

In the last step we have relabelled some dummy indices and eliminated terms that cancel
by antisymmetry. Since the left hand side of this expression is a tensor, so must be the
right hand side. We write,
[∇a , ∇b ]V c = Rcdab V d , (5.3)
where the Riemann tensor is defined as

Rcdab = ∂a Γcbd − ∂b Γcad + Γcae Γebd − Γcbe Γead . (5.4)

Notice that the Riemann tensor (5.4) is constructed from non-tensorial quantities
(i.e., the Chrisoffel symbols and their derivatives) but it transforms as a tensor under
general coordinate transformations (check!).
Using the fact that [∇a , ∇b ](Wc V c ) = 0 because Wc V c is a scalar, we find

[∇a , ∇b ]Wc = −Rdcab Wd . (5.5)

Proceeding by induction, we can compute the action of [∇c , ∇d ] on a tensor of arbitrary


rank. One finds,
a1 ...ak−1 e
[∇c , ∇d ]X a1 ...akb1 ...bl = Ra1 ecd X ea2 ...akb1 ...bl + . . . + Rak ecd X b1 ...bl
− Reb1 cd X a1 ...akeb2 ...bl − . . . − e
R bl cd X a1 ...ak
b1 ...bl−1 e (5.6)

68
Figure 5.2: A set of geodesics γs (t) with tangent vectors T a . The vector field S a measures
the deviation between nearby geodesics.

5.2 Geodesic deviation


The defining property of Euclidean (flat) geometry is the parallel postulate: initially
parallel lines remain parallel forever. In a curved space this is not true; for instance, on
a sphere we have seen that initially parallel geodesics will eventually cross. Quantifying
this behaviour on an arbitrary curved space is not straightforward because the notion
of “parallel” does not extend naturally from flat to curved spaces. The best one can do
is to consider geodesic curves that are initially parallel and see how they behave as we
travel along the geodesics.
Consider a one parameter family of geodesics γs (t) so that for each s ∈ R, γs is a
geodesic parametrised by the affine parameter t. The collection of such curves defines
a smooth two-dimensional surface; the parameters (s, t) may be chosen as coordinates
on this surface provided that the geodesics do not cross. The entire surface is the set
of points xa (s, t) ∈ M . There are two natural vector fields on this surface; the tangent
vectors to the geodesics,
∂xa
Ta = ,
∂t
and the deviation vectors
∂xa
Sa = .
∂s
This name comes from the (informal) notion that S a points from one geodesic towards
the neighbouring ones. See Fig. 5.2. This idea leads to define the “relative velocity of
geodesics”,
V a = (∇T S)a = T b ∇b S a ,
and the “relative acceleration of geodesics”,

Aa = (∇T V )a = T b ∇b V a .

69
Since S and T are basis vectors adapted to a coordinate system, their commutator
vanishes,
[S, T ] = 0 ⇒ S b ∇b T a = T b ∇b S a .
Keeping this in mind, we can explicitly compute the relative acceleration of geodesics:

Aa = T b ∇b V a = T b ∇b (T c ∇c S a )
= T b ∇b (S c ∇c T a )
= (T b ∇b S c )(∇c T a ) + T b S c ∇b ∇c T a
= (T b ∇b S c )(∇c T a ) + T b S c (∇c ∇b T a + Radbc T d )
= (T b ∇b S c )(∇c T a ) + S c ∇c (T b ∇b T a ) − (S c ∇c T b )∇b T a + Radbc T d T b S c
= Radbc T d T b S c . (5.7)

The first line is just the definition of Aa and the second line comes from [S, T ] = 0. The
third line is just the Leibniz rule; the fourth line replaces a double covariant derivative
by derivatives in the opposite order plus the Riemann tensor. The fifth line uses again
the Leibniz rule (in the opposite order than usual), and then we cancel two identical
terms and notice that the term T b ∇b T a vanishes because T a is the tangent vector to a
geodesic. The result,
Aa = ∇T ∇T S a = Radbc T d T b S c , (5.8)
is the geodesic deviation equation. It expresses that the relative acceleration between two
neighbouring geodesics is proportional to the curvature. Physically the acceleration of
neighbouring geodesics is interpreted as a manifestation of the gravitational tidal forces.

5.3 Symmetries of the curvature tensor


In general, a tensor of rank 4 has 44 = 256 components (in spacetime). Symmetries,
if present, are important because they reduce the number of independent components.
Lowering the index in the definition of the Riemann tensor one obtains

Rabcd = gaf (∂c Γf bd − ∂d Γf bc ) + Γaec Γe bd − Γaed Γe bc ,

where
Rabcd = gaf Rf bcd , Γabd = gaf Γf bd .
Now, since Rabcd is a tensor, it should have the same symmetries in all frames. Accord-
ingly, choose a locally inertial frame for which the Christoffel symbols vanish. For these
coordinates one has then that
ˆ ˆ
Râb̂ĉdˆ = gâfˆ(∂ĉ Γf b̂dˆ − ∂dˆΓf b̂ĉ ).

where we use hatted indices â, . . . to denote that these expressions are only valid in locally
inertial coordinates. Recalling that
1
Γabc = 2 (∂b gca + ∂c gba − ∂a gbc )

one obtains
1

Râb̂ĉdˆ = 2 ∂b̂ ∂ĉ gâdˆ + ∂â ∂dˆgb̂ĉ − ∂â ∂ĉ gb̂dˆ − ∂b̂ ∂dˆgâĉ ,
from where it is easy to read the symmetries of the tensor. It can be checked that

Rabcd = −Rbacd , Rabcd = −Rabdc , Rabcd = Rcdab .

70
Furthermore,
Rabcd + Radbc + Racdb = 0 ⇒ Ra[bcd] = 0.
These symmetries amount to 236 constraints, so Rabcd has only 20 non-zero components.

5.4 Bianchi identities, the Ricci and Einstein tensors


Recall that in a locally inertial frame one had that
1

Rĉdâ
ˆ b̂ = 2 ∂â ∂dˆgĉb̂ − ∂â ∂ĉ gb̂dˆ − ∂b̂ ∂dˆgĉâ + ∂b̂ ∂ĉ gâdˆ .
Differentiating with respect to x̂e one obtains
1

ˆ b̂ = 2 ∂ê ∂â ∂dˆgĉb̂ − ∂â ∂ĉ gb̂dˆ − ∂b̂ ∂dˆgĉâ + ∂b̂ ∂ĉ gâdˆ .
∂ê Rĉdâ
Now consider the sum of the cyclic permutations of the first three indices:
∂ê Rĉdâ
ˆ b̂ + ∂ĉ Rdêâ
ˆ b̂ + ∂dˆRêĉâb̂
1
= 2 ∂ê ∂â ∂dˆgĉb̂ − ∂ê ∂â ∂ĉ gb̂dˆ − ∂ê ∂b̂ ∂dˆgĉâ + ∂ê ∂b̂ ∂ĉ gâdˆ
+ ∂ĉ ∂â ∂ê gdˆb̂ − ∂ĉ ∂â ∂dˆgb̂ê − ∂ĉ ∂b̂ ∂ê gdâ
ˆ + ∂ĉ ∂b̂ ∂dˆgâê

+ ∂dˆ∂â ∂ĉ gêb̂ − ∂dˆ∂â ∂ê gb̂ĉ − ∂dˆ∂b̂ ∂ĉ gêâ + ∂dˆ∂b̂ ∂ê gâĉ
= 0. (5.9)
Since this is an equation between tensors, it is true in any coordinate system, even though
we derived it in a particular one. By the antisymmetry property Rcdab = −Rdcab , we can
re-write this equation as
∇e Rcdab + ∇d Recab + ∇c Rdeab = 0 ⇒ ∇[e Rcd]ab = 0 . (5.10)
This tensorial equation is valid in all frames and is called the Bianchi identity. One could
have derived it by directly taking the covariant derivative of the Riemann tensor.

The Ricci tensor


The Ricci tensor is obtained by contracting the first and third indices of the Riemann
tensor:
Rab ≡ g cd Rcadb = Rc acb
(5.11)
= ∂c Γcab − ∂a (Γccb ) + Γdab Γccd − Γdca Γcdb .
Remark 1. Because of the symmetries of the Riemann tensor one has that the Ricci
tensor is symmetric. That is,
Rab = Rba .

Remark 2. Other contractions of the Riemann tensor vanish or give ±Rab . For example
Rc cab = 0 since Rcdab is anti-symmetric in c and d. Also,
Rc abc = −Rc acb = −Rab ,
and so on.
Remark 3. One can show that
p
Γaab = ∂b ln |g| ,
where g = det(gab ). Therefore, we have the following formula for the Ricci tensor:
p p
Rab = ∂c Γcab − ∂a ∂b ln |g| + Γcab ∂c ln |g| − Γdca Γcdb . (5.12)

71
The Ricci scalar
The Ricci scalar is defined as the contraction of the indices of the Ricci tensor:
R ≡ g ab Rab = g ac g bd Rabcd .

The Einstein tensor


In the next computations recall that ∇c gab = 0 and ∇c g ab = 0 since the Christoffel
connection is metric compatible. Contract twice the Bianchi identity (5.10),
0 = g bd g ae (∇e Rcdab + ∇c Rdeab + ∇d Recab )
= ∇a Rca − ∇c R + ∇b Rcb , (5.13)
or
1
∇a Rac = ∇c R . (5.14)
2
Note that, unlike the partial derivative, it makes sense to raise an index on the covariant
derivative of a tensor because it is another tensor and due to the metric compatibility.
We define the Einstein tensor as
1
Gab ≡ Rab − R gab , (5.15)
2
We then see that the twice-contracted Bianchi identity (5.14) is equivalent to
∇a Gab = 0 . (5.16)

Remark 1. The Einstein tensor, which is symmetric due to the symmetry of the Ricci
tensor and the metric, has 10 independent components and it will play a crucial role in
general relativity.
Remark 2. By construction, the Einstein tensor is divergence free.

The Weyl tensor


The Ricci tensor and the Ricci scalar contain all the information about the possible
contractions of the Riemann tensor. The remaining information, namely the trace-free
parts, are captured by the Weyl tensor. This tensor is defined as the Riemann tensor
with all the contractions removed. In an n-dimensional manifold, one has
2 2
Cabcd = Rabcd − (ga[c Rd]b − gb[c Rd]a ) + g g R,
n−2 (n − 1)(n − 2) a[c d]b
and hence
C a bac = 0 .

Remark 1. By construction, the Weyl tensor has the same symmetries as the Riemann
tensor:
Cabcd = C[ab][cd] , Cabcd = Ccdab , Ca[bcd] = 0 .

Remark 2. The Weyl tensor is only defined in three or more dimensions; in three
dimensions it vanishes identically.
Remark 3. A very important property of the Weyl tensor is that it is invariant under
conformal transformations of the metric, gab → Ω(x)2 gab , where Ω(x) is an arbitrary
function of the spacetime coordinates.

72
Example: curvature tensors of the 2-sphere. Consider a round 2-sphere of radius
a with metric
ds2 = a2 (dθ2 + sin2 θ dϕ2 ) .
The non-zero Christoffel symbols are given by

Γθϕϕ = − sin θ cos θ ,


Γϕθϕ = Γϕϕθ = cot θ .

Given the symmetries of the Riemann tensor, the only non-trivial component (up to
symmetries) is:

Rθϕθϕ = ∂θ Γθϕϕ − ∂ϕ Γθθϕ + Γθθb Γbϕϕ − Γθϕb Γbθϕ


= (sinθ − cosθ ) − (0) + (0) − (− sin θ cos θ)(cot θ)
= sin2 θ .

Lowering the first index gives

Rθϕθϕ = gθc Rcϕθϕ


= gθθ Rθϕθϕ
= a2 sin2 θ .

The Ricci tensor is then computed from Rab = g cd Rcadb , which gives

Rθθ = g ϕϕ Rϕθϕθ = 1
Rθϕ = Rϕθ = 0
Rϕϕ = g θθ Rθϕθϕ = sin2 θ .

Finally, the Ricci scalar is given by,


2
R = g ab Rab = g θθ Rθθ + g ϕϕ Rϕϕ = .
a2
Note that the scalar of curvature, i.e., the Ricci scalar, decreases as the radius of the
sphere increases. In more general cases, we will sometimes refer to the “radius of curva-
ture” of a manifold as providing a length scale over which the curvature varies; the larger
the radius of curvature, the smaller the curvature is.

73
74
Chapter 6

General Relativity

6.1 Towards the Einstein equations


There are several ways of motivating the Einstein equations. The most natural is perhaps
through considerations involving the Equivalence Principle. In gravitational fields there
exist local inertial frames in which Special Relativity is recovered. The equation of motion
of a free particle in such frames is:

d2 xa
= 0. (6.1)
dτ 2
Relative to an arbitrary (accelerating frame) specified by x′a = x′a (xb ), the latter be-
comes:
d2 x′a ′b
a dx dx
′c
+ γ bc = 0,
dτ 2 dτ dτ
where
∂x′a ∂ 2 xd
γ a bc = .
∂xd ∂x′b ∂x′c
Here the γ a bc are the “fictitious” terms that arise due to the non-inertial nature of the
frame. Now, due to the Equivalence Principle the latter implies that locally gravity is
equivalent to acceleration and this in turn gives rise to non-inertial frames. The main
idea of General relativity is to argue that gravitation as well as inertial forces should be
described by appropriate γ a bc ’s!
Clearly (6.1) is not a tensorial equation since it is not left invariant upon changing
a d2 xa
frame: although dx dτ is a well-defined vector, dτ 2 is not. Note that we can use the chain
d b ∂
rule dτ = dx
dτ ∂xb to write
d2 xa dxb
 a
dx
2
= ∂b
dτ dτ dτ
Now it is clear how we can generalise equation (6.1) to curved space: we simply replace
the partial derivative by a covariant derivative:

dxb d2 xa
 a
dxb
 a b c
dx dx a dx dx
∂b → ∇b = + Γ bc .
dτ dτ dτ dτ dτ 2 dτ dτ

Therefore, we conclude that the generalisation of (6.1) to curved spaces is

d2 xa b
a dx dx
c
+ Γ bc = 0. (6.2)
dτ 2 dτ dτ

75
To see that (6.2) indeed describes the motion of test particles in gravitational fields,
we can consider the Newtonian limit of this equation. More precisely, in Newtonian limit
we assume that particles are moving slowly compared to the speed of light, gravitational
fields are weak (so it can be considered as a perturbation of flat space) and that the
gravitational field is static. Taking the proper time τ as an affine parameter along the
geodesic, “moving slowly” means
dxi dt
≪ .
dτ dτ
where i = 1, 2, 3 denotes the spatial coordinates. In this limit, the geodesic equation
(6.2) becomes
 2
d2 xa a dt
+ Γ 00 = 0. (6.3)
dτ 2 dτ
Since the gravitational field is assumed to be static, all t-derivatives of gab vanish (∂0 gab =
0) and the relevant Christoffel symbols simplify

Γa 00 = 12 g ab (∂0 gb0 + ∂0 g0b − ∂b g00 )


= − 12 g ab ∂b g00 (6.4)

Furthermore, since the field is weak, one may adopt a local coordinate system in which

gab = ηab + hab , |hab | ≪ 1. (6.5)

From the definition of the inverse metric, g ab gbc = δba , we find that to first order in hab ,

g ab = η ab − hab

where hab = η ac η bd hcd . Substituting this into (6.4) and expanding to first order in hab ,
one has that
Γa 00 = − 12 η ad ∂d h00 .
Therefore, in this limit the geodesic equation (6.3) becomes:1
2
d2 xi

1 dt
= ∂i h00 , (6.6a)
dτ 2 2 dτ
d2 t
= 0, as ∂0 h00 = 0. (6.6b)
dτ 2
dt
From (6.6b) it follows that dτ is a constant. Also, from

dxi dxi dt
= ,
dτ dt dτ
it follows that 2
d2 xi d2 xi dxi d2 t

dt
= + ,
dτ 2 dt2 dτ dt dτ 2
which in our case reduces to 2
d2 xi d2 xi

dt
= .
dτ 2 dt2 dτ
1
Note that η ij = δ ij so spatial indices upstairs and downstairs are the same.

76
Combining the latter with (6.6a) we obtain

d2 xi 1
2
= ∂i h00 . (6.7)
dt 2
The corresponding Newtonian result is

d2 xi
= −∂i ϕ (6.8)
dt2
where ϕ is the gravitational potential. Far from a central body of mass M at a distance
r, ϕ is given by
GM
ϕ=− ,
r
where G is Newton’s constant of gravitation. Comparing (6.7) and (6.8) one finds that

h00 = −2ϕ + constant.

However, at large distances from M one has that ϕ → 0 (gravity becomes negligible)
and gab → ηab (the space becomes flat). Therefore the constant must be zero and we can
conclude that
h00 = −2ϕ. (6.9)
Substituting in (6.5) one finds
g00 = −(1 + 2ϕ). (6.10)
Now, recall that ϕ has dimensions of (velocity)2 , [ϕ] = [GM/R] = L2 /T 2 . Therefore one
has that ϕ/c2 at the surface of the Earth is ∼ 10−9 , on the surface of the Sun is ∼ 10−6 ,
at the surface of a white dwarf is ∼ 10−4 while at the surface of a neutron start is ∼ 10−2 .
On the other hand, at horizon of a black hole ϕ/c2 ∼ 1. It follows that in most cases
the distortion produced by gravity in the spacetime metric gab is very small, except near
black holes.
We have argued that free particles (subject only to gravitational forces) move along
geodesics. In the Newtonian limit of the geodesic equation we have shown how the
Christoffel symbols Γa bc are associated with gravitational forces and, in turn, how the
spacetime metric gab can be associated with the gravitational potential. However, we do
not know yet what equation the metric gab has to satisfy. To motivate it, note that the
gravitational potential in the Newtonian theory satisfies

∇2 ϕ = 4πGρ, (6.11)

where ρ is the mass density. The relativistic analogue of this equation should be tenso-
rial and of second order in the derivatives of the metric. To take this analogy further,
consider two neighbouring particles moving in a gravitational field with a potential ϕ
with coordinates xi (t) and xi (t) + ξ i (t) respectively, with ξ i (t) small and i = 1, 2, 3. The
equations of motion are then given:

∂ϕ(x)
ẍi = −
∂xi
and
∂ϕ(x) 2
j ∂ ϕ
ẍi + ξ¨i = − − ξ + O(ξ 2 ).
∂xi ∂xi ∂xj

77
Subtracting the two last equations:

∂2ϕ
ξ¨i = −ξ j i j .
∂x ∂x
This is the relative acceleration of two test particles separated by a 3-vector ξ i – the
second derivative of the potential gives the tidal forces. This is in analogy to the geodesic
deviation equation:
∇V ∇V ξ a = Ra cdb V c V d ξ b ,
provided that one identifies

∂2ϕ
−ξ j and Ra cdb V c V d ξ b .
∂xi ∂xj
This identification would make clear the relation between gravity and geometry – note
that the Riemann tensor involves second derivatives of the metric tensor.

6.2 Principles employed in General Relativity


The main idea underlying General Relativity is that matter –including energy– curves
spacetime (assumed to be a smooth Lorentzian manifold). This in turn affects the motion
of particles and light rays, postulated to move on timelike and null geodesics of the man-
ifold, respectively. These ideas are understood in conjunction with the main principles
of General Relativity, listed below.

(1) Equivalence Principle. In small enough regions of spacetime, the laws of physics
reduce to those of Special Relativity; it is impossible to detect the existence of a
gravitational field by means of local experiments.

(2) Principle of General Covariance. This states that laws of Nature should have
the same mathematical form in any reference frame; hence, they should be tensorial.

(3) Principle of minimal gravitational coupling. This is used to derive the Gen-
eral Relativity analogues of Special Relativity results. According to this principle,
one should change
ηab → gab , ∂ → ∇.
For example, in Special Relativity the equations for a perfect fluid are given by:

T ab = (ρ + p)V a V b − p η ab ,
∂a T ab = 0.

In General Relativity these should be changed to:

T ab = (ρ + p)V a V b − p g ab ,
∇a T ab = 0.

(4) Correspondence principle. General relativity must agree with Special Relativity
in absence of gravitation and with Newtonian gravitational theory in the case of
weak gravitational fields and in the non-relativistic limit (slow speed).

78
6.2.1 The Einstein equations in vacuum
In vacuum, such as in the outside of a body in empty space, one has that the mass density
ρ vanishes and the equation for the Newtonian potential becomes:

∇2 ϕ = 0.
2
The Laplace equation involves an object with two indices, namely ∂x∂i ∂x
ϕ
j . Therefore, one
would guess that the gravitational field equations involve a symmetric geometric object
with two indices, and hence the same number of components as the metric gab , arising
from a contraction of the Riemann tensor (since the Riemann has two derivatives of the
metric). The Ricci tensor is such a tensor and hence one would be tempted to guess that
the gravitational field equations are

Rab = 0. (6.12)

These are indeed the correct equations for gravity in absence of matter fields and they
are known as the Einstein vacuum field equations. The equations (6.12) form a set of ten
nonlinear, second order partial differential equations for the components of the metric
tensor gab . These are hard to solve, except simple settings with a high degree of symmetry.
Remark 1. One of the simplest solutions to the vacuum equations is the Minkowski
metric. Expressing the metric gab locally as ds2 = −dt2 + dx2 + dy 2 + dz 2 , we see that
all the Christoffel symbols vanish, from which Rab = 0 is trivially satisfied.
Remark 2. The most general form of the vacuum equations which is tensorial and
depends linearly on second derivatives of the metric is:

Rab + Λgab = 0,

where Λ is the so-called Cosmological constant. Outside Cosmology, Λ is usually taken


to be zero.

6.2.2 The (full) Einstein Equations


Matter in relativity is described by a (0,2) tensor Tab called the energy-momentum tensor.
Therefore, in the presence of matter, one would be tempted to generalise (6.12) to

Rab = κ Tab

for some coupling constant κ. In fact, Einstein did suggest this equation. However, this
equation is problematic for the following reason. The mass-energy is conserved and this is
described by ∇a Tab = 0, consistent with the minimal coupling principle that generalises
of the equations motion in the Special Relativity case. However, in general ∇a Rab ̸= 0.
Therefore, consistency with the conservation of mass-energy implies that we have to
equate Tab with a curvature tensor with vanishing divergence. There is only one (0,2)
tensor, constructed from the Ricci tensor, which is automatically conserved: the Einstein
tensor
1
Gab = Rab − R gab
2
which always satisfies ∇a Gab = 0. Therefore, one is led to propose

Gab = κ Tab .

79
as the field equation for the spacetime metric gab in the presence of matter-energy sources.
Note, however, that since ∇a gab = 0, we could also have written

Gab + Λgab = κ Tab . (6.13)

These are the complete Einstein field equations for the metric gab of a spacetime.
Note that the Einstein equations are the simplest compatible with the Equivalence
Principle, but they are not the only ones. In general, the Einstein field equations are
extremely complicated set of non-linear partial differential equations. In some simple
settings, analytic solutions may be found. These include:

(i) The vacuum spherically symmetric static case (the Schwarzschild spacetime).

(ii) The weak field case (gravitational waves).

(iii) The isotropic and homogeneous case (Cosmology).

We will study cases (i) and (ii) in the following sections.

6.2.3 Newtonian limit


To determine the value of the constant κ one needs to make contact with the Newtonian
theory. In this subsection we are going to see how (6.13) (with Λ = 0) reproduces the
Poisson equation for the gravitational potential in the Newtonian limit. From now on,
we will set Λ = 0 unless otherwise stated.
Contracting both sides of (6.13) we find R = −κ T , which allows us to rewrite (6.13)
as
Rab = κ Tab − 12 T gab

(6.14)
We want to show that this equation reduces to Newtonian gravity in the weak-field,
time-independent and slowly moving limit. For simplicity, we consider dust as the source
of energy-momentum, for which
Tab = ρ Ua Ub ,
where U a is the dust four-velocity, and ρ is the energy density in the rest frame. The
“dust” we are considering is a massive body, such as the Sun. Without loss of generality,
we can work in the dust rest frame, in which

U a = (U 0 , 0, 0, 0) .

We can fix U 0 using the normalisation condition gab U a U b = −1. In the weak field limit,
from (6.9) and (6.10) we can write

g00 = −1 + h00 , g 00 = −1 − h00 . (6.15)

Then, to first order in hab we get

U 0 = 1 + 12 h00 .

In fact, we are already assuming that ρ is small. Therefore, the contributions from h00
to Tab coming from the U0 terms will be of higher order, and we can simply take U 0 = 1,
and correspondingly U0 = −1. Then,

T00 = ρ ,

80
and all the other components of the stress-energy tensor Tab vanish. In this limit, the
rest energy ρ = T00 will be much larger than the other terms in Tab , so we can focus
on the a = b = 0 component of (6.14). To the lowest non-trivial order, the trace of the
energy momentum tensor is

T = g ab Tab = g 00 T00 = −T00 = −ρ .

and hence, the 00-component of (6.14) becomes


1
R00 = 2 κρ. (6.16)

Now we need to express the lhs of this equation in terms of the metric. To do so, we
have to compute R00 = Ra 0a0 = Ri 0i0 . We have

Ri 0j0 = ∂j Γi 00 − ∂0 Γi j0 + Γi ja Γa 00 − Γi 0a Γa j0 .

Note that the second term in this expression is a time derivative, which vanishes for
static fields. The third and fourth terms are of the form (Γ)2 , and since the Christoffels
Γ are of first order in the metric perturbation hab , these terms are of higher order and
can be neglected. Therefore, to first order in hab we have Ri 0j0 = ∂j Γi 00 . From this, we
compute

R00 = Ri 0i0
= ∂i 12 g ia (∂0 ga0 + ∂0 g0a − ∂a g00 )
 

= − 12 δ ij ∂i ∂j h00
= − 12 ∇2 h00 .

Then, equation (6.16) becomes


∇2 h00 = −κ ρ . (6.17)
From equation (6.9) we have h00 = −2 ϕ. Comparing with the Poisson equation for New-
tonian gravity (6.11), we see that General Relativity does indeed reproduce Newtonian
gravity if we set κ = 8π G, where G is Newton’s gravitational constant.
Having fixed the normalisation correctly to reproduce the Newtonian limit we arrive
at the final form of Einstein’s equations for general relativity:

Rab − 21 R gab = 8π G Tab . (6.18)

6.3 The Schwarzschild solution


In GR, the unique spherically symmetric vacuum solution of the Einstein equations is
the Schwarzschild metric. It is second in importance only to Minkowski space and it
corresponds to the static, spherically symmetric gravitational field in empty space sur-
rounding a (spherically symmetric) source, such as a star. As we shall see later, it also
represents a black hole.
The assumption of spherical symmetry and staticity severely constraints the form of
the line element. Firstly, assuming that the spacetime is static means that there exists a
timelike Killing vector field such that, far away from any sources, is of the form ∂t , which
is the canonical timelike Killing vector field in Minkowski space. Furthermore, in these
coordinates the line element is invariant under a time inversion t → −t. The assumption
to preserve spherical symmetry implies that coordinates can be chosen such that the

81
line element possesses an explicit round sphere, dΩ2(2) = dθ2 + sin2 θ dϕ2 , where (θ, ϕ)
are the standard angular coordinates on a unit 2-sphere. Therefore, with our symmetry
assumptions, the most general line element that we can write down is of the following
form:
ds2 = −e2A(r) dt2 + e2B(r) dr2 + r2 e2C(r) (dθ2 + sin2 θ dϕ2 ) . (6.19)
We can use our freedom to choose the coordinates to simplify (6.19) further. Defining a
new radial coordinate,

r̄ = r eC(r) ⇒ dr̄ = 1 + r dC
 C(r)
dr e dr , (6.20)

the metric (6.19) becomes


dC −2 2(B(r)−C(r)) 2
ds2 = −e2A(r) dt2 + 1 + r + r̄2 (dθ2 + sin2 θ dϕ2 ) ,

dr e dr̄ (6.21)

where r is now a function of r̄ defined by (6.20). Making the following relabelings,


dC −2 2(B(r)−C(r))
→ e2B(r) ,

r̄ → r , 1+r dr e

the metric (6.21) becomes

ds2 = −e2A(r) dt2 + e2B(r) dr2 + r2 (dθ2 + sin2 θ dϕ2 ) . (6.22)

This is the most general static and spherically symmetric spacetime. Note that in these
coordinates, r has a physical meaning, namely is the areal radius of the 2-spheres.
Given the form of the metric (6.22), we are now ready to solve the Einstein vacuum
equations,
Rab = 0 ,
From (6.22), we find that the only non-vanishing components of the Ricci tensor are:
 
2
Rtt = e2(A−B) A′′ + A′ − A′ B ′ + 2r A′ , (6.23)
2
Rrr = −A′′ − A′ + A′ B ′ + 2r B ′ , (6.24)
Rθθ = e−2B r(B ′ − A′ ) − 1 + 1 ,
 
(6.25)
2
Rϕϕ = sin θ Rθθ , (6.26)

where ′ denotes dr
d
. Having calculated the components of the Ricci tensor, we now have to
equate them to zero. Since all components have to vanish independently, we can consider
the combination
2
0 = e2(B−A) Rtt + Rrr = (A′ + B ′ ) ,
r
which implies A(r) = −B(r) + c, where c is a constant. We can set this constant to zero
by rescaling the time coordinate by t → e−c t, after which we have

A(r) = −B(r) . (6.27)

Considering Rθθ = 0, using the previous result this equation now becomes

e2A (2 r A′ + 1) = 1 ,

which is equivalent to
∂r r e2A = 1 .


82
This equation can be straightforwardly integrated to obtain
RS
e2A(r) = 1 − , (6.28)
r
where RS is an undetermined constant. Using the results (6.27) and (6.28), we find that
the spacetime metric that solves the Einstein vacuum equations is

dr2
 
2 RS
dt2 + + r2 dθ2 + sin2 θ dϕ2 .

ds = − 1 − R
(6.29)
r 1− r S

This metric depends on a single parameter, namely the constant RS , which is called the
Schwarzschild radius. To fix this constant in terms of a physical parameter, recall that
in the weak field limit (i.e., far away from the source), the tt-component of the spacetime
metric sourced by a mass M is given by
 
2GM
gtt = − 1 − . (6.30)
r

The metric (6.29) should reduce to the weak field case when r ≫ RS , but for the tt-
component to agree with (6.30) we need to identify

RS = 2GM .

We can now write down the final form of a static, spherically symmetric spacetime metric

2GM −1 2
   
2 2GM 2
dr + r2 dθ2 + sin2 θ dϕ2 .

ds = − 1 − dt + 1 − (6.31)
r r

This line element is known as the Schwarzschild metric, and it depends on a single
parameter, namely M . This parameter can be interpreted as the mass of the spacetime.
Note that as M → 0, we recover Minkowski space, as expected. Note also that as
r → ∞, the metric (6.31) becomes more like Minkowski space; this property is known as
asymptotic flatness.
Remark 1. This solution demonstrates how the presence of mass curves flat spacetime.
Remark 2. The solution only applies to the exterior of a star, where there is vacuum.
We will see shortly that, in the absence of matter, this solution describes a black hole.
Remark 3. The Birkhoff Theorem: a spherically symmetric solution in vacuum is
necessarily static. That is, there is no time dependence is spherically symmetric solutions.
Therefore, the assumption of staticity is not necessary.

Singularities
We see from (6.31) that the some of the metric coefficients become infinite or zero at
r = 0 and r = 2GM , which suggests that something may be going wrong there. The
metric coefficients are of course coordinate dependent; hence, it is entirely possible that
the apparent problems at those values of the radial coordinate r are simply coordinate
singularities that result in a breakdown of the coordinates rather than a problem with
the spacetime manifold itself. For instance, this is precisely what happens at the origin
of polar coordinates in flat space, where the metric ds2 = dr2 +r2 dθ2 becomes degenerate
and g θθ blows up at r = 0. Of course, we know that there is nothing wrong with flat space

83
at r = 0: this point is equivalent to any other point of the manifold, and by changing to
Cartesian coordinates we see that both the metric ds2 = dx2 + dy 2 and its inverse are
perfectly well-behaved at x = y = 0 (r = 0).
Therefore, in GR we need to assess singularities in a coordinate independent way.
In general, this is difficult but for our present purposes we will identify singularities as
places where the curvature of spacetime becomes infinite. The curvature is measured
by the Riemann tensor, so to say that the curvature become infinite one cannot simply
use the components of this tensor since they are coordinate-dependent. However, from
the curvature one can construct scalars and, since the latter are coordinate independent,
they provide a meaningful way to assess when the curvature becomes infinite. Scalars
involving the Ricci scalar R or the Ricci tensor, e.g., Rab Rab , are not useful since they are
fixed by the Einstein equations and, in the vacuum case, they trivially vanish. On the
other hand, scalar quantities such as Rabcd Rabcd or Rabcd Rcdef Ref ab contain information
about the curvature of the spacetime which is not determined by the Einstein equations
and hence we can use them to detect physical singularities. If any of these scalars (but
not necessarily all of them) blows up as we approach a certain point on the manifold, we
regard that point as a singularity of the curvature. We should also check that this point
is not infinitely far away in physical distance, that is, that it can be reached by observers
or light travelling a finite distance along a curve.
Therefore, we have a sufficient condition for a point to be considered a singularity,
but it is not a necessary condition. For the Schwarzschild metric (6.31), we find that

48G2 M 2
Rabcd Rabcd = . (6.32)
r6
This scalar of the curvature, known as the Kretschmann scalar, blows up at r = 0,
which is sufficient to convince us that r = 0 is a true singularity in the manifold. The
other potentially troublesome point is r = 2GM , the Schwarzschild radius. We see
that the Kretschmann scalar (6.32) (and in fact any other curvature scalar) is perfectly
well-behaved there. This suggests that the singularity at r = 2GM may just be a
coordinate singularity and that the spacetime metric may be perfectly smooth there in
more appropriate coordinates. We will see that this is indeed the case and that it is
possible to find coordinates such that the Schwarzschild metric is smooth at r = 2GM ;
as we shall see, this surface corresponds to the event horizon of a black hole.
In the case of the Sun, it is a body that extends to a radius of R⊙ = 106 GM⊙ .
Therefore, the surface r = 2GM⊙ is far inside the Sun and hence the Schwarzschild
metric does not apply there. On the other hand, there are compact objects for which the
Schwarzschild metric is valid everywhere; as we will see, these objects are in fact black
holes.
Remark 4. Uniqueness Theorem (Israel ’67): The Schwarzschild metric (6.31) is the
unique static, topologically spherical, asymptotically flat black hole solution of the Ein-
stein vacuum equations.

6.4 Geodesics of the Schwarzschild geometry


The classical experimental tests of General Relativity are based on the Schwarzschild
solution. These are based on the comparison of the trajectories of freely falling particles
and light rays in gravitational field of a central body with their counterparts in New-
tonian theory. Therefore, we have to consider geodesics, both timelike and null, in the

84
Schwarzschild geometry. This changed with the recent detection of gravitational waves
by the LIGO/Virgo collaboration, but we will postpone the discussion of gravitational
waves to the next chapter.
In order to derive the geodesics in Schwarzschild spacetime, it is best to use the
Euler-Lagrange equations with

2GM −1 2
   
2GM 2
L=− 1− ṫ + 1 − ṙ + r2 (θ̇2 + sin2 θ ϕ̇2 ) , (6.33)
r r

where ˙ denotes differentiation with respect to an affine parameter λ. For timelike


geodesics one has L = −1, while for null geodesics L = 0 (and for spacelike geodesics
L = +1).
The Euler-Lagrange equations for the geodesics are:

2GM
ẗ + ṙ ṫ = 0 , (6.34a)
r(r − 2GM )
GM GM  
r̈ + 3 (r − 2GM )ṫ2 − ṙ2 − (r − 2GM ) θ̇2 + sin2 θ ϕ̇2 = 0 , (6.34b)
r r(r − 2GM )
2
θ̈ + θ̇ ṙ − sin θ cos θ ϕ̇2 = 0 , (6.34c)
r
2
ϕ̈ + ϕ̇ ṙ + 2 cot θ θ̇ ϕ̇ = 0 . (6.34d)
r

Note that the Schwarzschild metric possesses 4 Killing vectors: three for the spherical
symmetry, and one for time translations. Each of these Killing vectors leads to a constant
of motion for the free particle (i.e., geodesics). Recall that if K a is a Killing vector and
ẋa is the vector tangent to a geodesic, then

Ka ẋa = constant (6.35)

along the geodesic. In addition, there is always another constant of motion for the
geodesics: the Lagrangian itself. Indeed, the geodesic equation together with metric
compatibility implies that the quantity

L = gab ẋa ẋb

is constant along the geodesic.


Invariance under time translations leads to conservation of the energy, while invari-
ance under spatial rotations leads to conservation of the three components of the angular
momentum. The angular momentum can be thought of as a three vector with a magni-
tude (one component) and a direction (two components). Conservation of the direction
of the angular momentum means that the particle will move in a plane. Because of
the spherical symmetry of the Schwarzschild solution, without loss of generality we can
choose this plane to be the equatorial plane in our coordinate system. Thus, the two
Killing vectors that lead to conservation of the direction of the angular momentum imply
that, for a single particle, we can choose
π
θ= .
2

85
The two remaining Killing vectors lead to the conservation of the energy and the mag-
nitude of the angular momentum. The energy arises from the canonical timelike Killing
vector  
a a 2GM
T = (∂t ) ⇒ Ta = − 1 − (dt)a .
r
Similarly, conservation of the angular momentum is associated to the canonical rotational
Killing vector
Ra = (∂ϕ )a ⇒ Ra = r2 sin2 θ (dϕ)a .
π
Since θ = 2 for the equatorial geodesics, we find that the two conserved quantities are
 
a 2GM
E = −Ta ẋ = 1 − ṫ , (6.36)
r
L = Ra ẋa = r2 ϕ̇ . (6.37)

Indeed, since the Lagrangian (6.33) does not explicitly depend on either t or ϕ, the
corresponding components of the Euler-Lagrange equations (6.34a) and (6.34d) in fact
are
   
d ∂L ∂L 2GM
=0 ⇒ = −2 1 − ṫ = constant , (6.38)
dλ ∂ ṫ ∂ ṫ r
 
d ∂L ∂L
=0 ⇒ = 2 r2 ϕ̇ = constant , (6.39)
dλ ∂ ϕ̇ ∂ ϕ̇

which results in the previous definitions of E and L.2


Having obtained the conserved quantities E and L, we can now get an understanding
of the orbits of particles, both timelike and null, in the Schwarzschild spacetime. To
do so, let us consider the Largrangian (6.33) that governs the geodesics specialised to
trajectories on the θ = π2 plane:

2GM −1 2
   
2GM 2
− 1− ṫ + 1 − ṙ + r2 ϕ̇2 = −ϵ , (6.40)
r r

where ϵ =1, 0, −1 for timelike, null and spacelike geodesics respectively. Multiplying by
1 − 2GM
r and using the expressions for E and L, equations (6.36) and (6.37), we obtain,
  2 
2 2 2GM L
−E + ṙ + 1 − +ϵ =0 (6.41)
r r2

Multiplying this equation by 12 , we arrive at the following expression


1
2 ṙ2 + V (r) = E , (6.42)

where
1 GM L2 GM L2
V (r) = ϵ−ϵ + 2− , (6.43)
2 r 2r r3
and
E= 1
2 E2 . (6.44)
2
Note that E is not the energy of a particle with 4-momentum pa as measured by an observer with
4-velocity U a ; the latter would be −pa U a . The reason why they do not agree is that U a is normalised to
Ua U a = −1, while T a = (∂t )a is not; only this form of T a corresponds to a Killing vector.

86
Equation (6.42) is precisely the equation of motion for a classical particle of unit mass
and “energy” E moving in a potential V (r). Note that while the conserved energy is E,
2
the effective potential for the motion along the radial direction r is sensitive to E = E2 .
Our ultimate goal is to obtain the full trajectory of the particle, that is r(λ) and t(λ)
and ϕ(λ). Nevertheless, as we shall see, understanding the radial motion provides very
useful intuition about the actual orbits.
A similar analysis of the orbits of a particle of unit mass in Newtonian gravity would
have led to a very similar result: we would have ended up with equation (6.42) but with
a different effective potential; the effective potential for the radial motion in Newtonian
gravity does not contain the last term in (6.43). In the potential (6.43), the first term is
just a constant; the second term corresponds to the Newtonian gravitational potential,
and the third term is the angular momentum contribution, which leads to a centrifugal
repulsion. The last term is the GR contribution and it leads to differences in the motion
compared to the Newtonian case, especially for small r.
The possible orbits can be determined by comparing E to V (r) in Fig. 6.1 for the
different values of L. This is so because E is conserved along the trajectory and equation
(6.42) tells us that as the particle moves, both the kinetic energy (i.e., 21 ṙ2 ) and the
potential energy (i.e., V (r)) change in a way such that E remains constant. Therefore,
starting from a given value of r such that ṙ ̸= 0, the particle will move until it reaches
a turning point where V (r) = E, when it will start moving in the opposite direction.
Depending on E and L, there may be no turning point, in which case the particle just
keeps moving. For example, for a fixed L, this happens when E is larger than the
maximum of V (r). In other cases, the particle may describe a circular orbit at a constant
r = rc (and hence ṙ = 0). This happens at an extremum of the potential, dV dr = 0.
Differentiating (6.43), we find that circular orbits occur when

ϵ GM rc2 − L2 rc + 3GM L2 γ = 0 , (6.45)

where γ = 0 in Newtonian gravity and γ = 1 in GR. Circular orbits will be stable if


they correspond to a minimum of the potential, and unstable if they correspond to a
maximum. Bound orbits that are not circular will oscillate around the radius of a stable
circular orbit and hence they will correspond to ellipses.
For Newtonian gravity, circular orbits occur at
L2
rc = . (6.46)
ϵ GM
For massless particles, ϵ = 0 and there are no circular orbits. This is consistent with the
top plot in Fig. 6.1, which shows that the dashed curves have no extrema. Therefore, in
Newtonian gravity, a photon with energy E shot from r = ∞ will move towards smaller
values of r until it reaches a turning point and moves back to r = ∞. On the other hand,
for massive there stable circular orbits at the radius (6.46) as well as bound orbits that
oscillate around this radius. If the energy is greater than the asymptotic value E = 1,
the orbits will be unbound and they will describe a particle that approaches a gravitating
body but eventually escapes.
In GR the situation is different, especially at small values of r, where the term
−GM L2 /r3 dominates V (r). Indeed, for r → 0 the potential goes to −∞ in GR while
it goes to +∞ in Newtonian gravity. This is clearly seen in Fig. 6.1 comparing the solid
curves (GR) with the dashed ones (Newton’s gravity). In GR, V (r) = 0 at r = 2GM for
any value of L; inside this radius is the black hole, which we will discuss more thoroughly
later. For massless particles, there is always a barrier (except for L = 0, for which the

87
2.0

1.5

1.0
V (r )

0.5

0.0

- 0.5
0 5 10 15 20
r
3

2
V (r )

-1
0 5 10 15 20
r

Figure 6.1: Effective potential for geodesics for different values of L. Top: null geodesics
(ϵ = 0). Bottom: timelike geodesics, i.e., trajectories of free massive particles (ϵ = 1).
The GR potential is corresponds to the solid curves while the Newtonian one corresponds
to the dashes curves. The shown values are L = 1 (blue), 2 (orange), 5 (green), and 10
(red). We have chosen GM = 1.

potential vanishes), but a sufficiently energetic photon (i.e., with E that is greater than
the maximum of V (r)) will be able to overcome the barrier and inevitably fall to r = 0.
At the top of the potential barrier there are unstable circular orbits. For ϵ = 0 and γ = 1
equation (6.45) gives
rc = 3GM . (6.47)
A photon can orbit forever in a circle precisely at this radius, but any small perturbation
will drive it to r = 0 or r = ∞. This radius is also known as the photon sphere, and
whilst it is unstable for the Schwarzschild black holes, rotating black holes can have stable
photon spheres, which play an important role in the recent images of supermassive black
holes by the Event Horizon Telescope.3
3
See https://eventhorizontelescope.org/.

88
For massive particles, equation (6.45) tells us that there are circular orbits at

L2 ± L4 − 12G2 M 2 L2
rc = . (6.48)
2GM

For L > 12GM , there will be two circular orbits, one stable and one unstable. In the
L → ∞ limit, their radii are

L2 ± L2 (1 − 6G2 M 2 /L2 )
 2 
L
rc = = , 3GM ,
2GM GM

In this limit the stable circular orbit occurs far away, while the unstable one approaches
3GM , similar to the massless case. As we decrease L, the two orbits come closer together
and they coincide when the discriminant in (6.48) vanishes, which happens for

L = 12 GM ,

for which
rc = 6GM , (6.49)
and they disappear for smaller values of L. Therefore, 6GM is the smallest possible
radius of a stable circular orbit for a massive particle in the Schwarzschild metric. There
are also unbound orbits, which come from infinity and turn around, and bound but non-
circular orbits, which oscillate around the stable circular orbits. These orbits, which are
describe exact ellipses in Newtonian gravity, will no longer do so in GR. Finally, there
orbits which come from r = ∞ and continue all √ the way to r = 0; this can happen if the
energy is higher than the barrier, or for L < 12GM , when there is no barrier at all.
To summarise, we have found that for the Schwarzschild spacetime, there are stable
circular orbits for r > 6GM and unstable circular orbits for 3GM < r < 6GM . It
is important to remember that these orbits correspond to geodesics, i.e., free particles.
There is nothing that prevents an accelerating particle from dipping below 3GM and
escaping to infinity, as long as it stays above r = 2GM .

6.5 Experimental tests of General Relativity


Up until the detection of gravitational waves, most experimental tests of GR involve the
motion of test particles in the solar system, and hence geodesics of the Schwarzschild
metric.4 Einstein suggested three tests: precession of perihelia, deflection of light and
gravitational redshift.

6.5.1 Perihelion precession


The perihelion of an elliptical orbit is the point of closest approach to the centre of the
ellipse; in our case, the centre is where the source of the gravitational field is, namely the
Sun. The precession of perihelia reflects the fact that non-circular orbits in GR are not
perfect ellipses; to a good approximation, they are ellipses that precesses. The idea is to
obtain the radius of the orbit r as a function of the angular coordinate ϕ; for a perfect
ellipse, r(ϕ) is periodic with period 2π, which reflects the fact that the perihelion occurs
4
the fact that the Sun is rotating can be neglected because its rotational velocity is very small compared
to the speed of light.

89
at the same position each orbit. In this subsection, we will see how GR introduces a
small correction to r(ϕ) such that the orbit no longer closes, giving rise to a precession.
We start considering the radial equation of motion for a massive particle in the
dr
Schwarzschild geometry (6.42). To get an equation for dϕ , we multiply it by

1 r4
r2 ϕ̇ = L ⇒ = 2,
ϕ̇2 L

giving
2
r4

dr 2GM 3 2E
+ 2
− 2
r + r2 − 2GM r = 2 r4 . (6.50)
dϕ L L L
To solve this equation, we define a new (dimensionless) variable

L2
u= . (6.51)
GM r
From (6.46) we see that a Newtonian circular orbit corresponds to u = 1. In terms of
the new variable u, the equation of motion (6.50) becomes
2
L2 2G2 M 2 3 2EL2

du 2
+ − 2 u + u − u = . (6.52)
dϕ G2 M 2 L2 G2 M 2

Differentiating again with respect to ϕ, we obtain a second order equation for u(ϕ):

d2 u 3G2 M 2 2
− 1 + u = u . (6.53)
dϕ2 L2
In Newtonian gravity, the term on the right hand side would not be present, and the
equation can be solved exactly for u. Here we will treat this term perturbatively. To do
so, we expand the solution u as a Newtonian solution u0 plus a small perturbation u1 ,

u = u0 + u1 , with u1 ≪ 1 .

The zeroth-order part of (6.53) is then

d2 u0
− 1 + u0 = 0 , (6.54)
dϕ2
while the first-order part is

d2 u1 3G2 M 2 2
+ u1 = u0 . (6.55)
dϕ2 L2
The solution of the zeroth-order equation is given by

L2
u0 = 1 + e cos ϕ , ⇒ r= . (6.56)
GM (1 + e cos ϕ)

This is the well-known result for Newtonian gravity (first found by Kepler): it describes
an ellipse with eccentricity e.5
5
An ellipse can be specified by the semi-major axis a, namely the distance from the centre to the
furthest point on the ellipse, and the semi-minor axis b, which is the distance from the centre to the
2
closes point. The eccentricity is then defined as e2 = 1 − ab 2 .

90
Plugging the Newtonian solution to the first order equation (6.55) yields

d2 u1 3G2 M 2
2
+ u1 = (1 + e cos ϕ)2
dϕ L2
(6.57)
3G2 M 2 
1 + 21 e2 + 2 e cos ϕ + 12 e2 cos 2ϕ .
 
= 2
L
To solve this equation, note

d2
(ϕ sin ϕ) + ϕ sin ϕ = 2 cos ϕ ,
dϕ2
d2
(cos 2ϕ) + cos 2ϕ = −3 cos 2ϕ . (6.58)
dϕ2

Comparing with (6.57), we see that a solution to this equation is

3G2 M 2 
1 + 21 e2 + e ϕ sin ϕ − 16 e2 cos 2ϕ .
 
u1 = 2
(6.59)
L
The first term corresponds to a shift, whilst the third term oscillates around zero. There-
fore, none of these two terms significantly alters the Newtonian solution, at least qualita-
tively. On the other hand, the second can have an important effect because it accumulates
over successive orbits. This type of term is often known as a “secular” term. Combining
this term with the zeroth order solution we find
3G2 M 2 e
u = 1 + e cos ϕ + ϕ sin ϕ . (6.60)
L2
This is not the full solution, even to the perturbed equation, but it captures the key
aspect of the modifications introduced by GR. In particular, note that this equation can
be written as the equation for an ellipse but with an angular period which is not 2π:

u = 1 + e cos[(1 − α)ϕ] , (6.61)

where we have defined


3G2 M 2
α= . (6.62)
L2
The equivalence of (6.60) and (6.61) can be checked by expanding cos[(1 − α)ϕ] in a
Taylor series for small α:

d
cos[(1 − α)ϕ] = cos ϕ + α cos[(1 − α)ϕ] ,
dα α=0
= cos ϕ + α ϕ sin ϕ .

Equation (6.61) implies that, during each orbit, the perihelion changes by some
amount. Indeed, from (6.61) we see that the periodicity of the solution is

∆ϕ = ≈ 2π(1 + α) (6.63)
1−α
Therefore, during each orbit the perihelion advances by an angle

6πG2 M 2
δϕ = 2πα = . (6.64)
L2

91
We can convert L to more familiar quantities using the expressions valid for Newtonian
orbits since the quantity we are considering is a small perturbation. The equation for an
ellipse can be written as
(1 − e2 )a
r= , (6.65)
1 + e cos ϕ
where a is the semi-major axis. Comparing to the zeroth-order solution (6.56) we see
that
L2 ≈ GM (1 − e2 )a .
Plugging this into (6.64) and restoring explicit factors of the speed of light, we get

6πGM
δϕ = . (6.66)
c2 (1− e2 )a

Historically, the precession of Mercury was the first test of GR. In fact, Mercury’s
precession had been known since the mid XIXth century and it was in contradiction
with the predictions of Newtonian gravity. Einstein knew of this discrepancy, and one
of the first things that he did after formulating his theory of gravity was to calculate
the modifications to Mercury’s orbit introduced by GR. To his delight, he found out
that Mercury’s precession was accounted for in GR.6 In the case of Mercury, the shift is
δϕ = 43.0”/century where ” stands for arcseconds.
Remark 1. The effect is largest in Mercury because of its high eccentricity and small
period which results in a large shift.
Remark 2. For Venus one has a predicted shift of 8.6′′ and an observed of 8.4′′ ± 4.8′′ .
For the Earth one has 3.8′′ and 5.0′′ ± 1.2′′ . For the asteroid Icarus 10.3′′ and 9.8′′ ± 0.8′′ .

6.5.2 Bending of light


In this subsection we are going to consider the deflection of light rays that travel near
a massive body, e.g., the Sun. Therefore, we are going to consider null geodesics. We
proceed as in the previous subsection by re-writing the equation of motion (6.42) (with
dr
ϵ = 0 in V (r)) as an equation for dϕ . Defining a new variable u = 1/r, we get

d2 u 3GM u2
2
+u= . (6.67)
dϕ c2

As before, the term on the right hand side is the small correction introduced by GR and
therefore, δ ≡ GM/c2 is small relative to u. We expand u into a Newtonian solution plus
a small correction
u = u0 + u1
so that the zeroth-order part of (6.67) is

d2 u0
+ u0 = 0 , (6.68)
dϕ2

and the first-order part is


d2 u1
+ u1 = 3 δ u20 . (6.69)
dϕ2
6
Note that Mercury, being the closest planet to the Sun, is there the GR effects are larger.

92
Figure 6.2: Deflection of light in a gravitational field.

The solution to the zeroth-order equation (6.68) is


1
u0 = b sin ϕ ⇒ r sin ϕ = b , (6.70)

where b is a constant known as the impact parameter. This solution represents a straight
line corresponding to a photon sent from r = ∞ (for ϕ = 0) and that returns to r = ∞
(ϕ = π) and whose distance of closest approach to the source is given by b. Therefore,
the change in the angle ϕ along the trajectory is ∆ϕ = π.
Plugging the zeroth order solution (6.70) into the first order equation (6.69), we get

d2 u1 3δ
2
+ u1 = 2 sin2 ϕ . (6.71)
dϕ b

This equation has the particular solution

δ
u1 = (1 + C cos ϕ + cos2 ϕ) , (6.72)
b2
where C is an arbitrary integration constant. Therefore, the full solution up to first order
corrections is
1 δ
u = sin ϕ + 2 (1 + C cos ϕ + cos2 ϕ) . (6.73)
b b
We see from this solution that the effects of the GR corrections (the last two terms in this
expression) is to make light deflect from a straight line. We are interested in determining
the deflection angle δϕ for a light ray in the presence of a spherically symmetric source,
e.g., the Sun. Far away from the source, r → ∞ and hence u → 0, which requires the
right hand side of (6.73) to vanish. Without loss of generality, lets take values of ϕ such
that for r → ∞ this angle asymptotes to −ε1 and π + ε2 respectively, see Fig. 6.2.
Expanding (6.73) for small ε1 and ε2 , we find

ε1 δ ε2 δ
− + 2 (2 + C) = 0 , − + 2 (2 − C) = 0 . (6.74)
b b b b

93
Adding these two equations we find the total deflection angle:


δϕ = ε1 + ε2 =
b (6.75)
4GM
= 2 .
c b
For a light ray just grazing the Sun this predicts a deflection of 1.75′′ which compares
well with some recent radio observations yielding ∆ = 1.73′′ ± 0.05′′ .
Remark. This is the second famous test of General Relativity —more generally re-
ferred to as bending of light. A first measurement was carried out by Eddington and
collaborators in 1919.

6.5.3 Gravitational redshift


Consider an observer with four-velocity U a who is stationary in the Schwarzschild coordi-
nates, U i = 0. We could allow the observer to be moving, but this would merely superim-
pose a conventional Doppler shift to the gravitational redshift. Since the four-velocity of
any observer satisfies Ua U a = −1, for a stationary observer in the Schwarzschild geometry
this implies
 1
2GM − 2

0
U = 1− . (6.76)
r
Such an observer measures the frequency of a photon following a null geodesic xa (λ) to
be
ω = −gab U a ẋb ,
since ẋa is the null vector tangent to the geodesic. This relation defines the normalisation
of the affine parameter λ. Therefore, we have
 1
2GM 2
ω = 1− ṫ
r
 1 (6.77)
2GM − 2

= 1− E.
r

where E has been defined in (6.36). Since E is conserved along the geodesic, then ω will
take different values when measured at different values of the radial coordinate r. For a
photon emitted at r1 and observed at r2 , the measured frequencies are related by
 1
ω2 1 − 2GM/r1 2
= . (6.78)
ω1 1 − 2GM/r2

This is an exact result for the frequency shift; in the limit r ≫ 2GM we have

ω2 GM GM
=1− +
ω1 r1 r2 (6.79)
= 1 + Φ 1 − Φ2 ,

where Φ = −GM/r is the Newtonian potential. This result shows that the frequency
decreases as Φ increases, which happens as we climb out of a gravitational field and hence
a redshift.

94
Remark. It used to be thought that gravitational redshift also constituted a test of Gen-
eral Relativity, but it turns out that any other theory compatible with the Equivalence
Principle will predict a redshift.
The gravitational redshift was first detected in 1960 by Pound and Rebka, using
gamma rays travelling upward a distance of 22 meters (the height of the physics building
at Harvard). Subsequent tests have become increasingly more precise, often using aircraft
and atomic clocks. In all cases, agreement with Einstein’s theory has been found.
These are the three classic tests of GR proposed by Einstein. Since then, other tests
of GR have been proposed, including the binary pulsar (to be discussed in the context of
gravitational waves) and the gravitational time delay discovered and observed by Shapiro.

6.6 Black holes


In this section we will study objects that are described by the Schwarzschild metric for
all radii, even r < 2GM . As we will see, this spacetime describes a black hole.
One way to understand the geometry of a spacetime is to determine its causal struc-
ture and for that we consider the light cones. It is sufficient to consider radial null
geodesics, i.e., light rays moving in the radial direction. For such null geodesics, θ and ϕ
are constant and we are left with
2GM −1 2
   
2 2GM 2
0 = ds = − 1 − dt + 1 − dr , (6.80)
r r
from which it follows that
2GM −1
 
dt
=± 1− . (6.81)
dr r
This expression corresponds to the slopes of the light cones in the (t, r) plane. For large
r, the slopes are ±1, just as in flat space. However, as we approach r = 2GM , we have
dt
dr → ±∞, which implies that, in these coordinates, the light cones close up. Therefore,
a light ray that approaches r = 2GM never seems to get there. As we will see, this
apparent inability to get to r = 2GM is just an illusion caused by the fact that there is
a coordinate singularity at r = 2GM ; a light ray (or a massive particle) has no trouble
reaching this radius and continuing to smaller radii. However, a far away observer would
not be able to tell. In other words, if an observer falls towards smaller radii and sends
signals as she progresses, external observers far away would see the signals more and
more slowly, see Fig. 6.3.
The fact that an observer at r > 2GM never sees the infalling observer reach r = 2GM
is a physical statement. However, to determine whether the infalling observer can reach
r = 2GM (and beyond) in finite proper time, we need to change to coordinates that are
dt
well-behaved at r = 2GM . The problem with the current coordinates is that dr →∞
along radial null geodesics that approach r = 2GM , so progress in the r direction becomes
slower and slower compared to the time coordinate. We can fix this problem as follows.
We can integrate the equation (6.81) characterising radial null geodesics to find
t = ±r∗ + constant (6.82)
where r∗ is the so called tortoise coordinate and it is given by7
dr  r 
dr∗ = ⇒ r∗ = r + 2GM ln − 1 . (6.83)
1 − 2GM
r
2GM
7
The tortoise coordinate r∗ is only well-defined for r ≥ 2GM

95
Figure 6.3: An observer falling into a black hole sends signals at intervals of proper time
∆τ1 . Another observer at a radius r > 2GM receives the signals at successively longer
intervals ∆τ2′ > ∆τ2 .

In terms of the tortoise coordinate the Schwarzschild metric becomes


 
2 2GM
ds = 1 − (−dt2 + dr∗2 ) + r2 (dθ2 + sin2 θ dϕ2 ) . (6.84)
r

where r should be regarded as a function of r∗ . Notice that in these coordinates, the


light cones no longer close up but the metric is still singular at r = 2GM . In fact, in
these coordinates, the surface r = 2GM has been pushed to r∗ = −∞.
To proceed, we define coordinates that are adapted to (radial) null geodesics. Defin-
ing,

v = t + r∗ , (6.85)
u = t − r∗ , (6.86)

then infalling radial null geodesics are characterised by v = constant, while outgoing null
geodesics are given by u = constant. Considering the Schwarzschild metric in the original
coordinates (6.31) and changing coordinates as

dr
t = v − r∗ ⇒ dt = dv − ,
1 − 2GM
r

we get,  
2 2GM
ds = − 1 − dv 2 + 2 dv dr + r2 (dθ2 + sin2 θ dϕ2 ) . (6.87)
r
These coordinates are known as ingoing Eddington-Finkelstein coordinates and, in these
coordinates, the metric (6.87) is manifestly regular (and invertible) at r = 2GM . There-
fore we conclude that the apparent singularity in (6.31) at r = 2GM is a mere coordinate

96
Figure 6.4: Light cones in the Schwarzschild geometry in ingoing Eddington-Finkelstein
coordinates (v, r). For r > 2GM , outgoing null rays point towards larger values of r.
On the other hand, for r < 2GM , both ingoing and outgoing light rays point towards
smaller values of r.

singularity. In the ingoing Eddington-Finkelstein coordinates (6.87), radial null geodesics


are given by
dv dv 2
ingoing : = 0, outgoing : = . (6.88)
dr dr 1 − 2GM
r
In these coordinates, light cones are well-behaved at r = 2GM , and this surface is at a
finite coordinate value. In fact, both ingoing null and timelike geodesics can go past this
surface. Note that for null geodesics this is straightforward to see: along an infalling null
geodesic v = constant, the radial coordinate varies from say r = ∞ to r = 0. However,
light cones tilt over at r = 2GM since dv dr for the outgoing null geodesics changes sign
there; inside this surface, all future directed paths go towards smaller values of r, see
Fig. 6.4. Some null cones and radial null geodesics are indicated in Fig. 6.5. Surfaces of
t = const. are also shown; one sees that t becomes infinite on the surface r = 2M .
The surface r = 2GM is a point of no return: once a test particle dips below it, it
can never escape. A surface past which particles can never escape to infinity is defined as
the event horizon of the black hole. In the Schwarzschild space time, the event horizon
is located at r = 2GM . The event horizon is a null surface so it is really the causal
structure of the spacetime that makes it impossible to cross the horizon in an outward
going direction. Since nothing can escape the event horizon, thus the name black hole.
A black hole is simply a region of spacetime separated from infinity by an event horizon.
The notion of an event horizon is a global one; the location of the horizon is a statement
about the spacetime as a whole and cannot be determined by the local geometry.
Consider a general static (and spherically symmetric) spacetime of the form,

dr2
ds2 = −f (r) dt2 + + r2 (dθ2 + sin2 θ dϕ2 ) , (6.89)
f (r)

where f (r) vanishes at some r = r+ with r+ > 0, so f (r+ ) = 0.8 In this case one can
show that r = r+ is just a coordinate singularity of (6.89) corresponding to the location
8
For simplicity we further assume that f ′ (r+ ) ̸= 0.

97
Figure 6.5: Finkelstein diagram in the (v, r) coordinates. Lines at 45◦ are lines of constant
v. The surface r = 2GM is a null surface on which t = ∞.

of a horizon. We can find regular coordinates there by considering the transformation to


general ingoing Eddington-Finkelstein coordinates (v, r) as
dr
dt = dv − . (6.90)
f (r)
Then (6.89) becomes
ds2 = −f (r) dv 2 + 2 dv dr + r2 (dθ2 + sin2 θ dϕ2 ) . (6.91)
This line element is regular and invertible at r = r+ .

Maximally extended Schwarzschild spacetime


In the previous section we have shown that there exist coordinates (v, r) adapted to the
infalling null geodesics such that we can go through the event horizon without encoun-
tering any problems. In fact, a local observer crossing the horizon may not even notice
it since the local geometry at the horizon is no different from anywhere else. There-
fore, r = 2GM is a coordinate singularity in the original metric (6.31), and the region
r ≤ 2GM should be included in our spacetime since physical particles can reach this
region.
We will see now that we extend the spacetime in other directions. In the (v, r)
coordinates, we can cross the horizon along future-directed curves, but not on past-
directed ones. This looks suspicious since we started off with a time-symmetric spacetime
t ↔ −t, and so reversing time should be a symmetry. We can see that this is indeed
the case if we choose the coordinate u adapted to the outgoing null geodesics , (6.86), to
write down the metric:
 
2 2GM
ds = − 1 − du2 − 2 du dr + r2 (dθ2 + sin2 θ dϕ2 ) . (6.92)
r

98
In these coordinates, we can pass through the event horizon, but this time only along
past directed null curves.
This may seem confusing: we can follow either future-directed or past-directed curves
through the event horizon at r = 2GM , but we arrive at different places. To see this, note
that from (6.85)–(6.86), if we keep v = constant and decrease r we must have t → +∞,
while if we keep u constant and decrease r we must have t → −∞ (since r∗ → −∞ as
r → 2GM ). Therefore, we have extended the original Schwarzschild spacetime (6.31) in
two different directions: one to the future and one to the past.
At this stage, one may suspect that the spacetime can be further extended. Indeed,
we can use (u, v) as our coordinates which leads to
 
2 2GM
ds = − 1 − du dv + r2 (dθ2 + sin2 θ dϕ2 ) , (6.93)
r

where r is defined implicitly in terms of u and v as


 r 
1
2 (v − u) = r + 2GM ln − 1 . (6.94)
2GM
In these coordinates however, r = 2GM is “infinitely far away” at either v → −∞
or u → +∞. To bring this surface at a finite coordinate distance, we can define new
coordinates (U, V ) as
U = −e−u/(4GM ) , V = ev/(4GM ) , (6.95)
which in terms of the original (t, r) coordinates are given by
 r 1  r 1
− 1 e−(t−r)/(4GM ) ,
2 2
U =− V = − 1 e(t+r)/(4GM ) . (6.96)
2GM 2GM
In these coordinates, the Schwarzschild metric becomes

32 G3 M 3 −r/(2GM )
ds2 = − e dU dV + r2 (dθ2 + sin2 θ dϕ2 ) , (6.97)
r
where r is defined implicitly by
 r 
UV =− − 1 er/(2GM ) . (6.98)
2GM
In the form of the metric (6.97), it is clear that nothing special happens at r = 2GM ,
and the metric is manifestly regular there.
∂ ∂
The coordinates (U, V ) are null coordinates in the sense that ∂U and ∂V are null
vectors. We may get further intuition about the spacetime defining new coordinates such
that one is timelike and the other one is spacelike. Therefore, we define

1  r 1
2
er/(4GM ) sinh t

T = (U + V ) = −1 4GM , (6.99)
2 2GM
1  r 1
2
er/(4GM ) cosh t

R = (V − U ) = −1 4GM , (6.100)
2 2GM
in terms of which the metric becomes

32 G3 M 3 −r/(2GM )
ds2 = e (−dT 2 + dR2 ) + (dθ2 + sin2 θ dϕ2 ) , (6.101)
r

99
where r is implicitly defined by
 r  r/(2GM )
T 2 − R2 = 1 − e . (6.102)
2GM
The coordinates (T, R, θ, ϕ) are known as Kruskal-Szekeres coordinates or Kruskal coor-
dinates for short.
From (6.101), it is obvious that in Kruskal coordinates, radial null geodesics look like
they do in flat space:
T = ±R + constant (6.103)
Furthermore, from (6.102) one can see that the event horizon r = 2GM corresponds to

T = ±R , (6.104)

consistent with it being a null surface. More generally, from (6.102) we see that surfaces
of constant r appear as hyperbolae in the T − R plane,

T 2 − R2 = constant . (6.105)

On the other hand, from (6.99)–(6.100) we see that surfaces of constant t are given by
 
T t
= tanh , (6.106)
R 4GM
t

which defines straight lines through the origin with slope tanh 4GM . Note that for
t → ±∞ (6.106) reduces to (6.104); therefore, t = ±∞ represents the same surface as
r = 2GM .
The coordinates (T, R) should be allowed to range over every value they can take
without hitting the real singularity at r = 0. Therefore, the allowed range of these
coordinates is
−∞ ≤ R ≤ ∞ , T 2 < R2 + 1 . (6.107)
From (6.99) and (6.100) it may seem that T and R become imaginary for r < 2GM ,
but this is just an illusion caused by the fact that the (t, r) coordinates are not valid
in this region. We can now draw spacetime diagram in the T -R plane (suppressing
the angles on the two-sphere θ and ϕ), known as a Kruskal diagram, see Fig. 6.6.
Each point on this diagram is a two sphere of radius r. This diagram represents the
maximal analytic extension of the Schwarzschild spacetime; the coordinates cover the
entire manifold described by this solution.
The original (t, r) coordinates were only valid for r > 2GM , which is only a portion
of the Kruskal diagram. It is convenient to divide the diagram into four regions, as
shown in Fig. 6.6. Region I corresponds to r > 2GM and it is the region covered by
the original (t, r) coordinates. By following future-directed null rays we reach region II,
and by following past-directed null rays we reach region III. If we had studied spacelike
geodesics we would have discovered region IV. The definitions (6.99) and (6.100) that
relate the (T, R) coordinates to the original (t, r) ones are only valid in region I; in the
other regions, one has to introduce the appropriate signs so that the coordinates remain
real.
Now we describe the physical significance of the various regions in the Kruskal dia-
gram. Region II is the black hole. Any causal curve (i.e., timelike or null) that travels
from region I into II cannot go back. In fact, every future directed causal curve that
enters region II will reach the singularity r = 0 in finite proper time. Therefore, any

100
Figure 6.6: Kruskal diagram of the Schwarzschild spacetime. Null geodesics are straight
lines at ±45◦ . Each point on this diagram is a round sphere of radius r. Regions I and
IV are asymptotically flat regions, while region II is a black hole and region III is a white
hole. r = 0 is a physical singularity and the manifold ends there.

observer that falls into the black hole is doomed. That is, not only the observer that
falls into the black hole cannot escape but also he/she inevitably has to move towards
smaller r since this is a timelike direction. Indeed, in the original (t, r) coordinates, we
see that for r < 2GM , t becomes spacelike and r becomes timelike. Thus, one cannot
stop moving towards the singularity because we cannot stop the flow of time. Since
proper time maximises the length of the geodesics, the observers that do not struggle
against the inevitable fate (i.e., free falling observers) will live the longest, but they too
will inevitably hit the singularity. The approach to the singularity is not particularly
pleasant since the tidal forces become infinite. As the observer falls to the singularity the
feet and the head will be pulled apart from each other, and the torso will be squeezed to
infinitesimal distance. In fact, all the atoms in the body of the unfortunate observer will
be torn apart by the infinite tidal forces. To summarise, ultimate death and destruction
is what awaits the observer that falls into a black hole.
Region III is simply the time-reverse of region II, a part of the spacetime from which
things can escape but they can never go back there. This region is known as the white
hole. There is a singularity in the past at r = 0, out of which the universe springs.
The boundary of region III is the past event horizon, while the boundary of region II
is the future event horizon. Region IV cannot be reached from region I by any causal
curve; likewise, no causal curve from region IV can reach region I. Region IV is another
asymptotically flat region, which is a mirror image of region I. It can be thought of as
being connected to region I by a wormhole (also known as an Einstein-Rosen bridge), a

101
Figure 6.7: Penrose diagram for the Schwarzschild spacetime.

neck-like configuration joining to asymptotically flat regions.


In order to better understand the causal structure of the spacetime, it is convenient
to collapse the whole Kruskal diagram into a finite region by constructing the conformal
diagram, also known as Penrose diagram. Starting with the null version of the Kruskal
coordinates (U, V ) defined in (6.96), one can bring infinity to finite coordinate values by
defining,    
U V
Ũ = arctan √ , Ṽ = arctan √ , (6.108)
2GM 2GM
with ranges
π π π π π π
− < Ũ < , − < Ṽ < , − < Ũ + Ṽ < . (6.109)
2 2 2 2 2 2
In these coordinates, the metric in (6.97) becomes

64 G4 M 4 −r/(2GM ) 1
ds2 = − e dŨ dṼ + r2 (dθ2 + sin2 θ dϕ2 ) , (6.110)
r 2 2
cos Ũ cos Ṽ

In these coordinates, the (Ũ , Ṽ ) part of the metric is conformally related to Minkowski
space. In the new coordinates, the singularities at r = 0 are straight lines that stretch
from timelike infinity in one region to timelike infinity in the other.
The conformal diagram of the maximally extended Schwarzschild solution is shown
in Fig. 6.7. The only real subtlety about this diagram is the necessity to understand
that i− and i+ are distinct from r = 0, since there are plenty of timelike curves that
do not hit the singularity. As in the Kruskal diagram, light cones in the conformal
diagram are straight lines at 45◦ ; the major difference is that the entire spacetime is now
represented in a finite region. Notice that the structure of conformal infinity is just like
that of Minkowski space, consistent with the fact that the Schwarzschild spacetime is
asymptotically flat.

6.7 More general black holes


In this section we will discuss the properties of more general black holes. Most astro-
physical objects such as stars, galaxies, etc., rotate and hence, if black holes form in
natural astrophysical processes one would expect that they also rotate. As have seen

102
in the previous sections, the Schwzarschild solution can describe a static and spherically
symmetric black hole and hence, from the astrophysical point of view, not generic. Whilst
the Schwzarschild solution was found weeks after Einstein published his theory of general
relativity, an analytic solution of the Einstein equations describing an equilibrium rotat-
ing black hole was not found until 1963 by Kerr. The techniques needed to derive this
spacetime are beyond the scope of this course; here we will simply give the Kerr metric:

4GM a r sin2 θ
 
2 2GM r Σ
ds = − 1 − dt2 − dt dϕ + dr2
Σ Σ ∆
(6.111)
2 
2 sin θ 2 2 2 2 2
 2
+ Σ dθ + (r + a ) − a ∆ sin θ dϕ
Σ
where ∆ = r2 − 2GM r + a2 and Σ = r2 + a2 cos2 θ. In (6.111) M corresponds to the
mass of the spacetime and a is the angular momentum per unit mass,

a = J/M .

The coordinates (t, r, θ, ϕ) are known as Boyer-Lindquist coordinates and it is straight-


forward to see that, in these coordinates, a → 0 reduces to the Schwarzschild spacetime.
Note that if we keep a fixed and send M → 0 we recover flat space in the so called
ellipsoidal coordinates. These coordinates are related to the usual Cartesian coordinates
by
1
x = (r2 + a2 ) 2 sin θ cos ϕ ,
1
y = (r2 + a2 ) 2 sin θ sin ϕ , (6.112)
z = r cos θ
One can show that the event horizons of the Kerr metric occur when

∆ = r2 − 2GM r + a2 = 0 . (6.113)

There are three possibilities: GM > a, GM = a and GM < a. The last case corresponds
to a naked singularity, while GM = a is the extremal case, which is believed to be
unstable. Both of these cases are of less physical interest and hence we will concentrate
on the GM > a case. Then, the function ∆ vanishes at two values of the radial coordinate
r, giving: p
r± = GM ± (GM )2 − a2 . (6.114)
Both radii are null surfaces corresponding to the inner (r = r− ) and outer (r = r+ ) event
horizons. Furthermore, one can show that the Kerr spacetime (6.111) has a physical
curvature singularity at Σ = r2 + a2 cos2 θ = 0. Since this is the sum of two manifestly
non-negative quantities, it can only vanish when both quantities are zero:
π
r = 0, θ= .
2
In these coordinates, r = 0 is not a point but a disc, see (6.112). Therefore, the set of
points r = 0, θ = π2 is the ring at the edge of this disc. The Penrose diagram for the
subextremal (i.e., GM > a) Kerr black hole is shown in Fig. 6.8.
The Kerr metric (6.111) is manifestly independent of the coordinates t and ϕ, and
hence it possess two Killing vector fields, namely K = ∂t and R = ∂ϕ . Note however that
K a is not orthogonal to the t = constant hypersurfaces. This metric is stationary but not
static, which makes perfect sense: the black hole is rotating, so it is not static, but it is

103
Figure 6.8: Penrose diagram of the subextremal Kerr black hole.

spinning in the same way at all times, so it is stationary. Alternatively, the metric cannot
be static because it is not invariant under time reversals; to leave the metric invariant
when going back in time t → −t, one also has to reverse the direction of the rotation
ϕ → −ϕ.
Because the Kerr metric is stationary but not static, the event horizons r± are not
Killing horizons of the asymptotic time-translation Killing vector K = ∂t . The norm of
K a is given by
1
K a Ka = − (∆ − a2 sin2 θ) . (6.115)
Σ
Note that at the outer horizon, r = r+ (where ∆ = 0), we have K a Ka ≥ 0 so K a is
already spacelike there, except at the poles (θ = 0, π), where it is null. The points where
K a Ka = 0 is a stationary limit surface known as the ergosphere and it is outside the outer
event horizon. The region between the ergosphere and the outer event horizon is known
as the ergoregion. Inside the ergoregion, observers must move in direction of rotation of
the black hole (ϕ direction). This effect is known as frame dragging. However, observers
can still move towards or away from the event horizon, and can exit the ergoregion.
The closest analog to a family of static observers outside the black hole in the Kerr

104
geometry are the “locally non-rotating observers” (see Coursework 11) whose 4-velocity
is given by ua = −∇a t/(−∇b t∇b t)1/2 . These observers rotate with coordinate angular
velocity
dϕ gtϕ a(r2 + a2 − ∆)
Ω= =− = 2 . (6.116)
dt gϕϕ (r + a2 )2 − ∆ a2 sin2 θ

In the limit as one approaches the black hole outer event horizon, r → r+ , this coordinate
angular velocity becomes
a
ΩH = 2 . (6.117)
r+ + a2

This is related to the fact that it is the Killing vector field

∂ ∂
χ= + ΩH (6.118)
∂t ∂ϕ

(rather than ∂t ) which is tangent to the null geodesic generators of the horizon of the
Kerr black hole. Equation (6.118) can be interpreted as saying that the event horizon of
the Kerr black hole rotates with angular velocity ΩH .

Killing horizons

In the Schwarzschild metric, the Killing vector K = ∂t goes from being timelike to null at
the event horizon. In general, if a Killing vector field χa is null along some hypersurface
Σ, we say that Σ is a Killing horizon of χa . Note that the vector field χa will be normal
to Σ, since a null surface cannot have two linearly independent null tangent vectors.
The notion of a Killing horizon is independent from that of an event horizon, but in
spacetimes with time-translation symmetry the two are closely related. Under certain
reasonable conditions, we have the following classification:

ˆ Every event horizon H in a stationary, asymptotically flat spacetime is a Killing


horizon for some Killing vector field χa .

ˆ If the spacetime is static, χa = K a = (∂t )a , the time-translations Killing vector


field at infinity.

ˆ If the spacetime is stationary but not static, then it must be axisymmetric with
a rotational Killing vector field Ra = (∂ϕ )a , and χa will be a linear combination
K a + ΩH Ra for some constant ΩH .

To every Killing horizon we can associate a quantity called the surface gravity. Con-
sider a Killing vector χa with Killing horizon Σ. Because χa is a normal vector to Σ,
along the Killing horizon it obeys the geodesic equation,

χa ∇a χb = κ χb , (6.119)

where the right-hand side arises because the integral curves of χa may not be affinely
parametrized. The parameter κ is the surface gravity and it will be constant on the
horizon.

105
Uniqueness theorems and astrophysical relevance of the Kerr solution
Black holes can only be astrophysically relevant to describe certain compact objects in
the Universe if they are stable. It has been shown that there are no unstable linear
perturbations around the Kerr black hole. Furthermore, it is believed that the Kerr
solution is dynamically stable at the full non-linear level and the evidence from the
numerical simulations supports this expectation. This dynamical stability is a necessary
condition for the Kerr black hole to be astrophysically relevant. In fact, the celebrated
Uniqueness Theorem (Robinson,...) states that:

Stationary, asymptotically flat black hole solutions to the Einstein vacuum


equations are uniquely characterised by their mass and angular momentum
and they are given by the Kerr family of solutions.

This theorem is also known as the no-hair theorem because it states that black holes
are fully characterised by a small number of parameters, rather than potentially an infinite
number of them as in the case of, for instance, a star. This is a very profound result
since black holes are macroscopic objects and yet the theory of general relativity provides
a complete mathematical description of them in terms of just a handful of parameters.
Therefore, in some sense black holes are like elementary particles. Moreover, according
this result, in general relativity all black holes in the Universe should be described by
the Kerr solution.

Singularity theorems and cosmic censorship


Event horizons and black holes are important in general relativity because they are
thought to be nearly inevitable. This conclusion follows from the celebrated singularity
theorems and the cosmic censorship conjecture.
The ubiquity of singularities in general relativity is a consequence of the singularity
theorems of Penrose (and Hawking and Penrose in the cosmological context).9 Before
these theorems were proven in the 60s, it was possible to hope that gravitational col-
lapse to the Schwarzschild singularity was an accident of spherical symmetry and general
spacetimes would be non-singular. The singularity theorems ensure that, under very gen-
eral conditions, once gravitational collapse has reached a certain point, the formation of
a singularity is inevitable. Therefore, according to general relativity, singularities should
occur in Nature.
Singularities, however, can be problematic because general relativity breaks down
near them and hence we cannot describe them within the theory itself. Therefore, the
existence of singularities signals the incompleteness of the theory. The hope is that a
would-be theory of quantum gravity resolves the singularities that appear in general
relativity. It has been conjectured that even if singularities form generically according to
the singularity theorems, they are always hidden behind event horizons. This is essentially
the content of the cosmic censorship conjecture:

Naked singularities cannot form in gravitational collapse from generic, ini-


tially non-singular states in an asymptotically flat spacetime with reasonable
classical matter.

If this conjecture is true, then our ignorance regarding the physical description of
singularities is irrelevant as far as the physics of the Universe outside black holes is
9
Thanks to the singularity theorems, Penrose was awarded the 2020 Nobel Prize of physics.

106
concerned. Even though a general mathematical proof of the cosmic censorship conjecture
is not available yet, there is very strong evidence that, in four spacetime dimensions (as in
the presently observed Universe) this conjecture is true. However, in dimensions higher
than four the cosmic censorship conjecture is false.
One consequence of the cosmic censorship conjecture is that classical black holes never
shrink, they only grow bigger. Since the size of a black hole is measured by the area of
the event horizon, we have Hawking’s famous area theorem:
Assuming the presence of reasonable classical matter10 and cosmic censorship,
the area of the future event horizon of a black hole in an asymptotically flat
spacetime is non-decreasing.

Black hole thermodynamics


It was observed that, as a consequence of the Einstein equations, perturbing a black hole
results in a change of its physical parameters given by
κ
δM = δA + ΩH δJ , (6.120)
8π G
where A denotes the area of the event horizon. This equation, known as the first law of
black hole mechanics, looks surprisingly similar to the first law of thermodynamics,
dE = T dS − p dV (6.121)
where E is the energy, T the temperature, S the entropy, p is the pressure and V is the
volume. Together with Hawking’s area theorem, this suggested that there might exist an
analogy between black holes and thermodynamics:
E↔M
S ↔ A/(4G) (6.122)
T ↔ κ/(2π)
and the ΩH δJ term is the work that one does on the black hole when we throw matter into
it. The second law of thermodynamics, which states that the entropy never decreases, is
completely analogous to Hawking’s area theorem with the above identification between
the area of the event horizon and the entropy. At the classical level however, black holes
do not have a temperature and their entropy would appear to be rather small since they
are fully characterised in terms of a few parameters.
κ
In equating T dS to 8πG dA we cheated a bit since we do not know how to separately
normalise S/A or T /κ, only their combination. However, Hawking famously showed
that when one considers quantum fields in the background of a black hole, the black
hole evaporates emitting radiation at a temperature of T = κ/(2π). This result shows
that black holes do have a real temperature of T = κ/(2π), and a real entropy given
 2
by S = A/(4G); in astrophysical units, the entropy of a black hole is S ∼ 1090 MM⊙ ,
which is huge.11 Restoring the units, these expressions become12
ℏκ kB c3
T = , S= A, (6.123)
2π kB 4ℏG
10
By reasonable matter we mean matter that satisfies the so called weak energy condition: ρ ≥ 0 and
ρ + p ≥ 0, where ρ is the energy density and p is any of the pressures.
11
For matter fields in the Universe, the entropy is approximately equal to the number of relativistic
particles; within the observable universe this number is Smatter ∼ 1088
12
You can see this formula for the temperature in Stephen Hawking’s tombstone at Westmister Abbey,
where he lies next to Dirac and Newton.

107
where kB is Boltzmann’s constant. These expressions contain the fundamental constants
of Nature, such as kB , c, ℏ and G, and they show that black holes not only link the
fundamental laws of Nature, but they also provide a window into quantum gravity. In
other words, any theory of quantum gravity must be able to reproduce this formula for
the entropy of a black hole.
Given that black holes evaporate, their area decreases with time, which would violate
the 2nd Law of Thermodynamics. To address this issue, Bekenstein proposed a gener-
alised 2nd Law, which states that the combined entropy of matter and black holes never
decreases:  
A
δ Smatter + ≥ 0. (6.124)
4G
We have seen that the entropy of a black hole is huge, but from the point of statistical
mechanics the entropy measures the number of accessible microscopic states compatible
with a macroscopic equilibrium state. Given that a classical black hole is characterised
by a small number of parameters (e.g., mass and angular momentum), it is hard to know
what these states could be. Classically, this is not a big problem since any information
about the microscopic state of a black hole could be hidden behind the horizon. Including
quantum mechanics into this picture leads to the famous black hole information paradox.
The reason is that, as Hawking showed, quantum mechanically black holes evaporate and
hence, in a large but finite amount of time, there won’t be any horizon left to hide the
microscopic states. Furthermore, Hawking argued that the radiation emitted by the black
hole (known as Hawking radiation) is precisely thermal, and hence there should not be
correlations between the emitted particles. Thus, all the information about the system
before it collapsed into a black hole seems to have been erased by the time the black
hole has evaporated. This is the black hole information loss paradox, and it suggests a
potential incompatibility between quantum mechanics and general relativity.
Recent progress inspired by string theory (but independent of string theory) has
shown that if one correctly accounts for the entanglement entropy of the Hawking par-
ticles emitted during the evaporation process, then there is no information loss. In fact,
string theory is the only theory of quantum gravity that has been able to produce a
complete microscopic description of certain black holes. By counting the microstates of
the constituents of certain black holes in string theory, it has been possible to reproduce
the Bekenstein-Hawking entropy of these black holes, including the subleading quantum
corrections to the classical formula. This is regarded as one of the greatest triumphs
of string theory. For other types of black holes, namely black holes in asymptotically
anti-de Sitter spacetimes (not covered in the lectures), string theory provides a complete
description in terms of a (non-gravitational) quantum field theory in one dimension less;
because this quantum field theory obeys is standard laws of quantum mechanics, and
hence its evolution is unitary, necessarily the evolution of the dual black hole, including
the Hawking evaporation process, must also be unitary and therefore there cannot be
information loss. The details of how this happens however, are still under investigation.

108
Chapter 7

Linearised theory and


gravitational waves

7.1 Linearised theory


Solving the Einstein equation in general is tremendously difficult because it is a set of
non-linear PDEs. However, in some circumstances the gravitational field is weak and the
spacetime can be regarded as a perturbation of (flat) Minkowski space. In this context,
we shall assume that the spacetime manifold is M = R4 and that there exist globally
defined “almost intertial” coordinates xa for which the metric can be written as

gab = ηab + hab , ηab = diag{−1, 1, 1, 1} , (7.1)

where the components of hab are small compared to 1 to reflect the weakness of the
gravitational field. Note that the spacetime metric is gab , so free particles move on
geodesics of this metric. In the linearised theory, we regard hab as a tensor field in the
sense of special relativity, i.e., it transforms as a tensor under Lorentz transformations.
To leading order in the perturbations around Minkowski space, the inverse of the
spacetime metric is
g ab = η ab − hab , (7.2)
where hab = η ac η bd hcd . Therefore, from now on, we will raise and lower indices using the
Minkowski metric ηab . This agrees, to leading order, with raising and lowering indices
with the full spacetime metric gab . Indeed, one can show that g ac gcb = δ ab + O(h2 ).
Now we are going to write down Einstein’s equation to first order in the metric
perturbation hab . Recall that the structure of the Einstein equation is of the form,

G ∼ ∂Γ + ∂Γ + Γ Γ + ΓΓ , (7.3)

since the Einstein tensor is obtained from contractions of the Riemann tensor. Therefore,
to leading order we only have to worry about the linear terms in the Christoffel symbols.
To first order, the Christoffel symbols are
1 ad
Γabc = η (∂b hcd + ∂c hbd − ∂d hbc ) , (7.4)
2
and the Riemann tensor is,
Rabcd = ηae (∂c Γebd − ∂d Γebc )
1 (7.5)
= (∂b ∂c had + ∂a ∂d hbc − ∂a ∂c hbd − ∂b ∂d hac ) ,
2

109
and the linearised Ricci tensor is,
1 c 1
Rab = ∂ c ∂(a hb)c − ∂ ∂c hab − ∂a ∂b h , (7.6)
2 2
where h = haa = η ab hab . Finally, we obtain the linearised Einstein tensor:
1 1 1  
Gab = ∂ c ∂(a hb)c − ∂ c ∂c hab − ∂a ∂b h − ηab ∂ c ∂ d hcd − ∂ c ∂c h . (7.7)
2 2 2
The RHS of the Einstein equation is 8π G Tab therefore, by consistency, we must assume
that Tab is also small. To proceed, it is convenient to define
1
h̄ab = hab − h ηab , (7.8)
2
with the inverse hab = h̄ab − 12 h̄ ηab and h̄ = h̄aa = −h. In terms of the new metric
perturbation h̄ab , the linearised Einstein equation becomes
1 1
− ∂ c ∂c h̄ab + ∂ c ∂(a h̄b)c − ηab ∂ c ∂ d h̄cd = 8π G Tab . (7.9)
2 2
Note that the first term in this equation looks like the typical wave operator acting
on the the tensor field h̄ab . However, there are two extra terms that obscure the wave-
like nature of the equation. As we shall see now, these terms can be removed using the
freedom to choose coordinates in the theory. This freedom is also known as gauge freedom.
Consider an infinitesimal coordinate transformation xa → xa + ξ a , where ξ a = ξ a (x) is
of the same order as h. Then, since the line element is invariant under coordinate
transformations, we have
ds2 = (ηab + hab ) dxa dxb
= (ηab + hab ) d(xa + ξ a ) d(xb + ξ b ) ,
= (ηab + hab ) dxa dxb + ηab ∂c ξ a dxc dxb + ηab ∂c ξ b dxa dxc
= (ηab + hab + ∂a ξb + ∂b ξa ) dxa dxb + O(ϵ2 ) (7.10)
Therefore, under an infinitesimal coordinate transformation, the metric perturbation
transforms as
hab → hab + ∂a ξb + ∂b ξa . (7.11)
This implies that hab and hab + ∂a ξb + ∂b ξa are physically equivalent. From the transfor-
mation of hab under infinitesimal coordinate transformations, it follows that
h̄ab → h̄ab + ∂a ξb + ∂b ξa − ηab ∂ c ξc . (7.12)
We can use this symmetry to choose ξ a to simplify the equations as much as possible. In
particular, it is always possible, and convenient, to choose ξ a such that h̄ab satisfies,
∂ a h̄ab = 0 . (7.13)
To see this, note that under (7.12) we have
∂ a h̄ab → ∂ a h̄ab + ∂ a ∂a ξb . (7.14)
So, if (7.13) is not obeyed, we can choose ξa to satisfy a wave equation ∂ a ∂a ξb = −∂ a h̄ab ,
which always has a solution. The gauge (7.13) is known as the Lorentz, de Donder or
harmonic gauge, and in these coordinates the linearised Einstein equation becomes
∂ c ∂c h̄ab = −16π G Tab . (7.15)
This is a standard wave equation, which has a unique solution given appropriate boundary
conditions.

110
7.2 Gravitational waves
The linearised Einstein equations in vacuum can be reduced to the following source-free
wave equation for the metric perturbation h̄ab ,

∂ c ∂c h̄ab = 0 . (7.16)

This shows that the theory admits wave solutions that propagate at the speed of light.
As usual for the (linear) wave equation, we can construct any solution by superposing
plane wave solutions. The latter are of the form,
 c

h̄ab (x) = Re Hab eikc x , (7.17)

where Hab is a constant symmetric complex matrix describing the polarisation of the wave
and k a is the wave vector. From now on, we shall suppress ‘Re’ from all the equations.
The wave equation (7.17) implies

ka k a = 0 . (7.18)

Therefore, the wave vector k a must be null, in accordance with the fact that gravitational
waves propagate at the speed of light. The gauge condition (7.13) implies

k a Hab = 0 , (7.19)

which shows that the waves are transverse (i.e., the polarisation vectors lie on the plane
orthogonal to the direction of propagation).
The gauge condition (7.13) does not eliminate all gauge freedom. It follows from
(7.14) that we can still perform a gauge transformation (7.14) that preserves the gauge
condition (7.13) if ξ a obeys a source-free wave equation,

∂ c ∂ c ξa = 0 . (7.20)

We can use this residual gauge freedom to simplify the solution further. Consider,
c
ξa (x) = Xa eikc x , (7.21)

that satisfies (7.20) since k a is null. Using

h̄ab → h̄ab + ∂a ξb + ∂b ξa − ηab ∂ c ξc , (7.22)

under a gauge transformation, we see that the residual gauge freedom in our case is

Hab → Hab + i (ka Xb + kb Xa − ηab k c Xc ) . (7.23)

This residual gauge freedom can be used to achieve a “longitudinal” gauge,

H0a = 0 . (7.24)

This does not fully determine Xa . There is some remaining freedom that can be used to
impose the additional “trace-free” condition,

H aa = 0 . (7.25)

111
In this “transverse-traceless” gauge we have

hab = h̄ab . (7.26)

Example: Consider a wave travelling along the z-direction, k a = ω(1, 0, 0, 1). The lon-
gitudinal gauge condition (7.24) together with the transversality condition (7.19) implies
H3a = 0. Using the trace-free condition gives,
 
0 0 0 0
 0 H+ H× 0 
Hab =  0 H× −H+ 0  .
 (7.27)
0 0 0 0

This solution is specified by two constants, H+ and H× , corresponding to the two inde-
pendent polarisations of the gravitational waves. This shows that the gravitational field
has two independent degrees of freedom per spacetime point.

7.2.1 Tidal accelerations and polarisation of gravitational waves


To measure the effect of gravitational waves on free particles we consider the geodesic
deviation equation (5.8). Consider two nearby freely falling particles, which therefore
move along two nearby geodesics. In addition, consider a local inertial frame at the point
of the first geodesic, where the connecting vector ξ a between the two geodesics originates.
Let U a denote the vector tangent to the geodesic. Hence, in this situation the geodesic
deviation equation becomes

∇U ∇U ξ a = Rabcd U b U c ξ d . (7.28)

In this frame, coordinate distances are proper distances, as long as we can neglect
quadratic terms in the coordinates. This means that in these coordinates the com-
ponents of ξ a correspond to measurable distances if the geodesics are near enough to one
another. Furthermore, in this frame the second covariant derivative in (7.28) simplifies.
a
The first derivative acting on ξ a gives dξ
dτ , but the second one is a covariant one, which
d
contains dτ but also a term with the Christoffel symbols. However, in a local inertial
frame the Christoffel symbols all vanish at this point and therefore the second covariant
derivative is just an ordinary derivative with respect to the proper time τ . Hence, in a
locally inertial frame (7.28) becomes

d2 ξ a
= Rabcd U b U c ξ c , (7.29)
dτ 2
a
where U a = dx dτ is the four-velocity of the two particles. In these coordinates the compo-
nents of U are needed to the lowest order around flat space since any corrections to U a
a

will depend on hab and hence they will give rise to terms that are second order in hab in
the equation above (because Rabcd is already first order in hab ). Therefore, under these
approximations we can write, without loss of generality, U a = (1, 0, 0, 0) and, initially,
ξ a = (0, ε, 0, 0). Then, to first order in hab , equation (7.29) reduces to

d2 ξ a ∂ξ a
= = ε Rattx = −ε Ratxt . (7.30)
dτ 2 ∂t2
This equation shows that the Riemann tensor is locally measurable by simply watching
the changes in the proper distance between nearby freely falling particles.

112
Figure 7.1: (a) Circle of free particles before a gravitational wave travelling in the z
direction reaches them. (b) Distortions of the circle produced by a wave with the ‘+’
polarisation. The two pictures show the same wave at phases separated by 180◦ . (c) As
in (b) but now for the ‘×’ polarisation.

The non-vanishing of the Riemann tensor is gauge invariant so the left-hand-side of


(7.30) must have an interpretation that is independent of the coordinates. We identify
ξ a as the proper lengths of the components of the connecting vector, that is, the proper
distances along the four coordinate directions over the coordinate intervals spanned by
the vector.
We can now write (7.30) in the transverse traceless (TT) gauge. For a wave travelling
in the z direction, the components of the Riemann tensor are
1
Rxtxt = Rxtxt = − ∂t2 hTT
xx ,
2
1
Rytxt = Rytxt = − ∂t2 hTT
xy , (7.31)
2
1
Rytyt = Rytyt = − ∂t2 hTT x
yy = −R txt ,
2
and all the other independent components vanishing. This means that two particles
initially separated by ε in the x direction have a separation vector ξ a whose components’
proper lengths obey

∂2ξx 1 ∂2ξy 1
2
= ε ∂t2 hTT
xx , 2
= ε ∂t2 hTT
xy . (7.32)
∂t 2 ∂t 2
Similarly, two particles initially separated by ε in the y direction obey,

∂2ξy 1 1 2 TT ∂2ξx 1
= ε ∂t2 hTT
yy = − ε ∂t hxx , = ε ∂t2 hTT
xy , (7.33)
∂t2 2 2 ∂t2 2
since hTT TT
yy = −hxx from the tracelessness condition.
Equations (7.32) and (7.33) define the polarisation of a gravitational wave. Consider
a ring of particles initially at rest in the x–y plane as in Fig. 7.1 (a). Suppose that at
the time the wave reaches the particles it has hTT TT
xx ̸= 0, hyy = 0. Then the particles
will be moved (in terms of the proper distance relative to the particle at the centre) in

113
the way shown in Fig. 7.1 (b), as the wave oscillates and hTT TT
xx = −hyy changes sign. If,
TT TT TT
instead, the wave had hxy ̸= 0 and hxx = hyy = 0, then the particles would be distorted
as in Fig. 7.1 (c). Since hTT TT
xx and hxy are independent, (b) and (c) provide a pictorial
representation for the two physically distinct linear polarisations of the gravitational
waves. Notice that the two polarisations are simply rotated 45◦ relative to one another.
This contrasts with the two polarisation states of an electromagnetic wave, which are at
90◦ to each other.

7.2.2 The far field


As we shall see, gravitational waves are produced by moving masses. To see this, lets
return to the linearised Einstein equation with a source:

∂ c ∂c h̄ab = −16π G Tab . (7.34)

This is a standard wave equation, and hence we can write down the solution using the
retarded Green’s function:
Tab (t − |x − x′ |, x′ )
Z
h̄ab (t, x) = 4 G d3 x′ , (7.35)
|x − x′ |

where |x − x′ | is calculated using the Euclidean metric. Assume that matter is confined
to a compact region near the origin of size d. Then, far from the source we have r ≡
|x| ≫ |x′ | ∼ d, and hence we can expand

2 x̂ · x′
  2 
′ 2 2 ′ ′2 2 d
|x − x | = r − 2 x · x + x = r 1 − +O , (7.36)
r r2

where x̂ ≡ x/r. Therefore,

|x − x′ | = r − x̂ · x′ + O(d2 /r) , (7.37)


′ ′ ′ ′ ′ ′ ′
Tab (t − |x − x |, x ) = Tab (t , x ) + x̂ · x (∂0 Tab )(t , x ) + . . . (7.38)

where t′ = t − r. Now let τ denote the times scale over which Tab is varying, so ∂0 Tab ∼
Tab /τ . For example, if the source is a binary black hole system, then τ would correspond
to the orbital period. Then, the second term in (7.38) is of order (d/τ )Tab . Note that
d is the time that it takes light to cross the region containing the source. Therefore,
d/τ ≪ 1 if the matter is moving non-relativistically. For most systems, including black
hole binaries, this is indeed the case for most of the time. Therefore, we will assume this
from now on. This assumption implies that the second term is negligible compared to
the first and hence
Z
4G
h̄ij (t, x) ≈ d3 x′ Tij (t′ , x′ ) , t′ = t − r . (7.39)
r

Note that we only need to consider the spatial components of h̄ab ; the other components
can be obtained from the gauge condition (7.13), which gives

∂0 h̄0i = ∂j h̄ji , ∂0 h̄00 = ∂i h̄0i . (7.40)

So, given h̄ij , the first equation can be integrated to give h̄0i , and then the second can
be integrated to get h̄00 .

114
To obtain h̄ij we have to evaluate the integral on the RHS of (7.39). This can be
done as follows. Since matter is compactly supported by assumption, we can integrate
by parts and discard surface terms. We can also use the fact that the energy-momentum
tensor is conserved, ∂a T ab = 0. Then,
Z Z h i
3 ij
d xT = d3 x ∂k (T ik xj ) − (∂k T ik )xj
Z
= − d3 x (∂k T ik )xj drop surface term
Z (7.41)
= d3 x (∂0 T 0i )xj use conservation law
Z
= ∂0 d3 x T 0i xj

Note that the LHS of this equation is symmetric in ij, and hence we have to symmetrise
the RHS too:
Z Z
d x T = ∂0 d3 x T 0(i xj)
3 ij

Z  
1 1
= ∂0 d3 x ∂k (T 0k xi xj ) − (∂k T 0k )xi xj
2 2
Z
1
= − ∂0 d3 x(∂k T 0k )xi xj , drop surface term
2 (7.42)
Z
1 3 00 i j
= ∂0 d x(∂0 T )x x , use conservation law
2
Z
1
= ∂0 ∂0 d3 x T 00 xi xj ,
2
1
= I¨ij (t) ,
2

where Z
Iij (t) = d3 x T00 (t, x) xi xj , (7.43)

is the quadrupole moment tensor, also known as the second moment of the energy density
(note that to leading order, T00 = T 00 and Tij = T ij ). This object is a proper tensor in
the Cartesian sense, i.e., it transforms in the usual way under rotations of the Cartesian
coordinates xi . Hence, we have

2G ¨
h̄ij (t, x) ≈ Iij (t − r) . (7.44)
r

This result is valid for r ≫ d and τ ≫ d, and it describes the propagation of a disturbance
moving away from the source at the speed of light. If the source is undergoing an
oscillatory motion, e.g., a binary black hole system, then h̄ij will describe waves with the
same period τ as the motion of the source.
To obtain the remaining components of the gravitational field, we go back to (7.40).
The first equation gives,
 
2G ¨
∂0 h̄0i ≈ ∂j Iij (t − r) . (7.45)
r

115
Integrating with respect to time and using that ∂i r = xi /r = x̂i , gives
 
2G ˙ 2 G x̂j ˙ 2 G x̂j ¨
h̄0i ≈ ∂j Iij (t − r) = − 2
Iij (t − r) − Iij (t − r)
r r r
(7.46)
2 G x̂j ¨
≈− Iij (t − r) ,
r
where in the final line we have assumed that in the radiation zone r ≫ τ . This allows us
to neglect the term proportional to I˙ij because it is smaller than the term that we have
kept by a factor of τ /r. In the radiation zone, space and time derivatives are of the same
order of magnitude.
Similarly, integrating the second equation in (7.40) we get
 
2 G x̂j ˙ 2 G x̂i x̂j ¨
h̄00 ≈ ∂i − Iij (t − r) ≈ Iij (t − r) . (7.47)
r r

These expressions are not quite right because when integrating (7.40) we should have
included an arbitrary time-independent term in h̄0i , which would lead to a term in h̄00
linear in time, as well as an arbitrary time-indepedent term in h̄00 . The latter can be
determined from (7.35), which to leading order gives
Z
4GE
h̄00 ≈ , E = d3 x′ T00 (t′ , x′ ) , (7.48)
r
where E is the total energy of the matter. Note that the leading-order time-dependent
piece (7.47) is smaller than the time-independent part by a factor of d2 /dτ 2 , so in order
to get this term from (7.35) we would have to go to a higher order. Similarly, we find
Z
4 G Pi
h̄0i ≈ − , Pi = − d3 x′ T0i (t′ , x′ ) , (7.49)
r

where Pi is the total 3-momentum of the matter. Note that the term in h̄00 that is linear
in t is proportional to Pi .
Remark. We will see shortly that gravitational waves carry away energy, so why is the
total energy of matter constant? In fact, the total energy of matter is not constant, but
to see this one has to go beyond the linearised theory.
A final simplification is possible: we are free to choose our almost inertial coordinates
to correspond to the “centre of momentum” frame, i.e., Pi = 0. If we do this, then E is
the total mass of the matter, which we denote by M . Then, in this frame we have
4GM 2 G x̂i x̂j ¨ 2 G x̂j ¨
h̄00 (t, x) ≈ + Iij (t − r) , h̄0i (t, x) ≈ − Iij (t − r) . (7.50)
r r r

7.2.3 Energy in gravitational waves


We have seen that gravitational waves arise whenever there is a non-trivial quadrupole
moment Iij that varies in time. To calculate the energy that gravitational waves carry
we have to go to second order in perturbation theory:
(2)
gab = ηab + hab + hab . (7.51)
(2)
So, if the components of hab are O(ϵ), then the components of hab are O(ϵ2 ). Now we
have to calculate the Einstein tensor to second order. We have calculated the first order

116
(1)
term in (7.7); lets call this piece Gab [h]. The second order piece will contain terms that
are linear in the second order perturbation, h(2) , and terms that are quadratic in the
first order perturbation, h. The terms linear in h(2) are simply given by (7.7) replacing
(1)
h → h(2) ; we denote them by Gab [h(2) ]. Therefore, to second order we have
(1) (1) (2)
Gab [g] = Gab [h] + Gab [h(2) ] + Gab [h] , (7.52)
(2)
where Gab [h] is the term in Gab that is quadratic in h. This is given by

(2) (2) 1 (1) 1


Gab [h] = Rab [h] − R [h] hab − R(2) [h] ηab , (7.53)
2 2
(2)
where Rab [h] is the term in the Ricci tensor that is quadratic in h, and R(1) and R(2)
are the terms in the Ricci scalar that are linear and quadratic in h respectively. We can
write the latter as
(2) (1)
R(2) [h] = η ab Rab [h] − hab Rab [h] . (7.54)
A rather lengthy calculation gives,
(2) 1 cd 1
Rab [h] = h ∂a ∂b hcd − hcd ∂c ∂(a hb)d + (∂a hcd )(∂b hcd ) + (∂ c hdb )∂[c hd]a
2 4   (7.55)
1 cd 1 c cd 1 d
+ ∂c (h ∂d hab ) − (∂ h)(∂c hab ) − ∂c h − ∂ h ∂(a hb)d .
2 4 2
For simplicity, lets assume that no matter is present. Then, at first order the Einstein
(1)
equation is Gab [h] = 0. At second order, we have

(1) 1 (2)
Gab [h(2) ] = 8π G tab [h] , tab [h] ≡ − Gab [h] . (7.56)
8π G
This is the equation of motion for h(2) . Since h satisfies the linearised Einstein equation,
(1)
we have Rab [h] = 0 and hence
 
1 (2) 1 cd (2)
tab [h] = − Rab [h] − η Rcd [h] ηab . (7.57)
8π G 2
This object can be interpreted as the energy-momentum tensor of the gravitational field
itself. As we shall show now, at this order in perturbation theory, it is conserved. To
see this, consider the contracted Bianchi identity, g bc ∇c Gba = 0, valid for any metric g.
Expanding this to first order we get,
(1)
∂ a Gab [h] = 0 , (7.58)

for an arbitrary first order perturbation h. At second order, one finds


 
(1) (2)
∂ a Gab [h(2) ] + Gab [h] + h G(1) [h] = 0 , (7.59)

where the last term denotes schematically the terms that arise from a first order pertur-
bation in the inverse metric and the Christoffel symbols. Now, since (7.58) holds for an
(1)
arbitrary h, it also holds if we replace h by h(2) and hence ∂ a Gab [h(2) ] = 0. Furthermore,
(1)
if we assume that h satisfies the first order equation of motion then Gab [h] = 0 and the
last term (7.59) vanishes. Therefore, this equation reduces to

∂ a tab = 0 . (7.60)

117
Hence, tab is a symmetric tensor that is (i) quadratic in the metric perturbation h, (ii)
conserved if h satisfies the linear equation of motion, and (iii) appears on the RHS of the
second order Einstein equation (7.56). Therefore, it seems rather natural to interpret this
object as the energy-momentum tensor of the gravitational field, as we claimed earlier.
However, there is a problem: tab is not invariant under a gauge transformation (7.11).
This is how the impossibility of localising gravitational energy arises in the linearised
theory, which is ultimately related to the fact that any spacetime is locally flat. Never-
theless it can be shown that the integral of t00 over a surface of constant time t is gauge
invariant provided that one considers metric perturbations hab that decay sufficiently
fast at infinity and restricts to gauge transformations that preserve this property. This
integral provides a satisfactory notion of total energy in the linearised gravitational field.
Therefore, gravitational energy exists but it is non-local.
One can use the second order Einstein equation (7.56) to convert the integral defining
the energy, which is quadratic in h, into a surface integral at infinity which is linear in h(2) .
In fact, the latter can be made fully non-linear in the sense that these integrals are valid
in any asymptotically flat space, irrespective of whether the linearised approximation
holds in the interior of the spacetime. This notion of energy is known as ADM energy.
We can proceed by following a more intuitive route and convert tab into a gauge-
invariant quantity by averaging. This can be done as follows. For any point p, consider
some region R of R4 of typical coordinate size ℓ centred on p. Define the average of a
tensor Xab at p by Z
⟨Xab ⟩ = d4 x W (x) Xab (x) , (7.61)
R
where the averaging function W (x) is positive, satisfies R d4 x W = 1, and tends smoothly
R

to zero on ∂R. Note that it makes sense to integrate Xab because we are treating it as a
tensor in Minkowski space, and we can add tensors at different points.
We are interested in averaging in a region far from the source, in which the grav-
itational radiation has some typical wavelength λ (and hence τ ∼ λ). Assume that
the components of Xab vary on a region of typical size x. Since the wavelength of the
radiation is λ, ∂a Xbc will have components of typical size x/λ. But the average is
Z
⟨∂a Xbc ⟩ = − d4 x ∂a W (x) Xbc (x) , (7.62)
R

where we have integrated by parts and used that W = 0 on ∂R. Now, ∂a W has com-
ponents of order W/ℓ, so the RHS has components of order x/ℓ. Hence, if we choose
ℓ ≫ λ then the averaging has the effect of reducing ∂a Xbc by a factor of λ/ℓ ≪ 1. So if
we choose ℓ ≫ λ then we can neglect total derivatives inside averages. This implies that
we can freely integrate by parts inside averages:
⟨A∂B⟩ = ⟨∂(AB)⟩ − ⟨(∂A)B⟩ ≈ −⟨(∂A)B⟩ , (7.63)
because ⟨∂(AB)⟩ is a factor λ/ℓ smaller than ⟨(∂A)B⟩. From now on we assume ℓ ≫ λ.
Using the linearised Einstein equation one can show that, in vacuum,
(2)
⟨η ab Rab [h]⟩ = 0 , (7.64)
and hence the second term in tµν [h] averages to zero. Using this result, one finds
1 1
⟨tab ⟩ = ⟨(∂a h̄cd )∂b h̄cd − (∂a h̄)∂b h̄ − 2 (∂c h̄cd )∂(a h̄b)d ⟩ . (7.65)
32π G 2
Furthermore, one can show that ⟨tab ⟩ is gauge invariant.

118
7.2.4 The quadrupole formula
Now we are ready to calculate the energy loss from a compact source due to the emission
of gravitational waves. The averaged energy flux 3-vector is −⟨t0i ⟩. Consider a large
sphere of radius r far away from the source. The unit normal to such sphere (in a surface
of constant t) is x̂i . Hence, the average total energy flux across this sphere, i.e., the
average power radiated across the sphere is
Z
⟨P ⟩ = − dΩ r2 ⟨t0i ⟩ x̂i , (7.66)

where dΩ is the standard volume element on a unit round S 2 .


We can now substitute the results in §7.2.2 into (7.65), remembering that those results
were derived in harmonic gauge. We get
 
1 1
⟨t0i ⟩ = (∂0 h̄cd )∂i h̄cd − (∂0 h̄)∂i h̄
32π G 2
  (7.67)
1 1
= (∂0 h̄jk )∂i h̄jk − 2(∂0 h̄0j )∂i h̄0j + (∂0 h̄00 )∂i h̄00 − (∂0 h̄)∂i h̄ .
32π G 2
2G ¨
Since h̄jk (t, x) = r Ijk (t − r) we have
2G ...
∂0 h̄jk = I jk (t − r) , (7.68)
r
2G ...

2G ¨
∂i h̄jk = − I jk (t − r) − 2 Ijk (t − r) x̂i . (7.69)
r r
The second term is smaller than the first by a factor of τ /r ≪ 1, and hence negligible
for large enough r. Therefore,
G ... ...
Z
1
− dΩ r2 ⟨(∂0 h̄jk )∂i h̄jk ⟩x̂i = ⟨ I ij I ij ⟩t−r . (7.70)
32π G 2
On the RHS, the average is a time average, take over an interval a ≫ λ ∼ τ centered on
the retarded time t − r.
Next we have h̄0j = −(2 G x̂k /r)I¨jk (t − r). Hence,
2 G x̂k ... 2 G x̂k ...
∂0 h̄0j = − I jk (t − r) , ∂i h̄0j ≈ I jk (t − r) x̂i , (7.71)
r r
where in the second expression we have used that τ /r ≪ 1 to neglect the terms that arise
from differentiating x̂k /r. Hence,
G ... ...
Z Z
1
− dΩ r2 ⟨−2(∂0 h̄0j )∂i h̄0j ⟩x̂i = − ⟨ I jk I jl ⟩t−r dΩ x̂k x̂l . (7.72)
32π G 4π
R
Now recall that dΩ x̂k x̂l is isotropic (i.e., rotationally invariant) and hence it must be
equal to κ δkl for some constant κ. Taking the trace fixes κ = 4π 3 . Hence, the RHS above
is
G ... ...
− ⟨ I ij I ij ⟩t−r . (7.73)
3
2Gx̂j x̂k ¨
Next we use that h̄00 = 4GM + r Ijk (t − r) to obtain
r

2 G x̂j x̂k ...


∂0 h̄00 = I jk (t − r) , (7.74)
r

119
and
2 G x̂j x̂k ... 2 G x̂j x̂k ...
 
4GM
∂i h̄00 ≈ − 2 − I jk (t − r) x̂i ≈ − I jk (t − r) x̂i , (7.75)
r r r
where we have neglected terms arising from differentiating x̂j x̂k /r in the first equality
because in the radiation zone (τ /r ≪ 1) they are negligible with respect to the second
term that we have kept. In the second equality we have neglected...the first term in
brackets because this leads to a term in the integral proportional to ⟨ I jk ⟩, which is the
average of a derivative and thus negligible. Hence we have
G ... ...
Z
1
− dΩ r2 ⟨(∂0 h̄00 )∂i h̄00 ⟩ x̂i = ⟨ I ij I kl ⟩t−r Xijkl , (7.76)
32π G 8π
where Z
Xijkl = dΩ x̂i x̂j x̂k x̂l , (7.77)

is another isotropic integral which we will evaluate shortly.


Next we use h̄ = h̄jj − h̄00 and the above results to obtain
2 G ... 2 G x̂j x̂k ...
∂0 h̄ = I jj (t − r) − I jk (t − r) , (7.78)
r r
2 G ... 2 G x̂j x̂k ...

∂i h̄ = − I jj (t − r) + I jk (t − r) x̂i , (7.79)
r r
and hence
... ... 1 ... ... 1 ... ...
Z    
1 2 1
− dΩ r − (∂0 h̄)∂i h̄ x̂i = G − I jj I kk + I jj I kk − I ij I kl Xijkl .
32π G 2 4 6 16π
(7.80)
Putting everything together we have
1 ... ... 1 ... ... 1 ... ...
 
⟨P ⟩t = G I ij I ij − I ii I jj + I ij I kl Xijkl . (7.81)
6 12 16π t−r

To evaluate Xijkl , we use the fact that any isotropic Cartesian tensor must be a
product of Kronecker’s delta factors, δij , and Levi-Civita totally anti-symmetric tensor
factors, ϵijk . Since Xijkl has rank 4, it can only be a product of δ’s, so it must be of the
form Xijkl = α δij δkl + β δik δjl + γ δil δjk , for some constants α, β and γ. The symmetry
of Xijkl implies that α = β = γ. Taking the trace on the ij and on the kl indices fixes
α = 4π
15 . Therefore, the final term above gives

1 ... ... ... ...


⟨ I ii I jj + 2 I ij I ij ⟩ , (7.82)
60
and hence
... ... 1 ... ...
 
G
⟨P ⟩t = I ij I ij − I ii I jj . (7.83)
5 3 t−r
Finally, considering the traceless part of the mass/energy quadrupole moment tensor Iij ,
1
Qij = Iij − Ikk δij . (7.84)
3
we have
G ... ...
⟨P ⟩t = ⟨ Q ij Q ij ⟩t−r . (7.85)
5

120
Figure 7.2: Decrease of the period of the famous Hulse-Taylor binary pulsar over the
years due to the emission of gravitational waves. The solid curve is the GR prediction
calculated using the quadrupole formula (7.85). This was the first (indirect) experimental
evidence of the existence of gravitational waves.

This is the celebrated quadrupole formula for the power (i.e., energy loss) radiated via
gravitational wave emission. It is valid in the radiation zone far from a non-relativistic
source, i.e., for r ≫ τ ≫ d.
We conclude that a body whose quadrupole tensor is varying in time will emit grav-
itational radiation. A spherically symmetric body has Qij = 0 and hence it cannot
radiate. This is in agreement with Birkhoff ’s theorem, which asserts that the unique
spherically symmetric solution of the vacuum Einstein equation is the Schwarzschild so-
lution. Hence, the spacetime outside a spherically symmetric body is time independent
because it is described by the Schwarzschild solution.
Example: A circular binary system. Consider a binary system consisting of two black
holes of masses m1 and m2 moving in a circular orbit of radius R around each other on
the xy-plane, see Fig. 7.3. Because black holes are very small, we can approximate them
by point particles. Then, the position of the black holes as a function of time is
µ
xi1 = R {cos(ωt), sin(ωt), 0} ,
m1
µ (7.86)
xi2 = R {− cos(ωt), − sin(ωt), 0} ,
m2
where R = |⃗x1 + ⃗x2 |, µ = mm11+m
m2
2
is the reduced mass, and ω is the frequency of the orbit.
In Newtonian gravity, the frequency ω of the orbit is given by
r
2 3 GM
GM = ω R ⇒ ω= , (7.87)
R3
where M = m1 + m2 is the total mass.

121
Figure 7.3: Unequal mass black hole binary.

Since we are treating the black holes as point particles, we can straightforwardly write
down the mass density in terms of delta functions that localise it onto the black holes:

ρ = [m1 δ(x − x1 )δ(y − y1 ) + m2 δ(x − x2 )δ(y − y2 )] δ(z) . (7.88)

Now we can evaluate the components of the quadrupole moment tensor of the energy
density:
Z
Ixx = d3 x T00 x2
Z
= d3 x ρ x2 = m1 x21 + m2 x22 ,
  (7.89)
2 2 1 1 2
=µ R + cos (ω t) ,
m1 m2
µ R2
= (1 + cos(2 ω t)) .
2
Note that this result shows that the frequency of the gravitational wave is twice the orbital
frequency. In other words, for each cycle made by the binary motion, the gravitational
wave goes through two cycles and hence there are two maxima and two minima per orbit.
For this reason, gravitational waves are called quadrupolar waves.
The other non-vanishing components of the quadrupole moment tensor are

µ R2 µ R2
Iyy = (1 − cos(2 ω t)) , Ixy = sin(2 ω t) , (7.90)
2 2

and the trace is


Iii = µ R2 . (7.91)

122
Therefore, we can now write down the traceless part of the full quadrupole moment
tensor:
cos(2 ω t) + 13
 
2 sin(2 ω t) 0
1 µR 
Qij = Iij − Ikk δij = sin(2 ω t) − cos(2 ω t) + 31 0  . (7.92)
3 2
0 0 − 32

The third time derivative of this object is


 
... sin(2 ω t) − cos(2 ω t) 0
Q ij = 4 µ R2 ω 3  − cos(2 ω t) − sin(2 ω t) 0  . (7.93)
0 0 0

Plugging this expression into (7.85) and averaging over one period we get that the power
radiated in gravitational waves is

32 G4 m21 m22 (m1 + m2 )


⟨P ⟩ = , (7.94)
5 c5 R5
where we have reinstated all the constants. For m1 = 36 M⊙ and m2 = 29 M⊙ (M⊙ =
2 × 1030 kg is the mass of the Sun), and R = 106 light years, is ⟨P ⟩ a large number?
The loss of energy results in the two black holes coming closer together, a process
that is called inspiral. Using the virial theorem, the total energy of the binary is given
by,
G m1 m2
E=− . (7.95)
2R
Since the quadrupole formula (7.94) calculates the rate of change of the energy of the
binary, dEdt = ⟨P ⟩, using the chain rule, we can calculate the rate of change of the
separation between the two black holes:

2R2 32G4 (m1 m2 )2 (m1 + m2 ) 64G3 m1 m2 (m1 + m2 )


 
dR dR dE
= =− = − .
dt dE dt Gm1 m2 5c5 R5 5c5 R3
(7.96)
Note that the negative sign indicates that the orbit is shrinking, so eventually the two
black holes will merge. Given some initial separation R0 , we can calculate the time to
merger by integrating (7.96):
0
5c5 5c5 R04
Z
∆tmerger =− dR R3 = . (7.97)
64G3 m1 m2 (m1 + m2 ) R0 256G3 m1 m2 (m1 + m2 )

123
124

You might also like