0% found this document useful (0 votes)
109 views

Classical Fields

This document provides an introduction to relativistic covariance through classical fields. It begins by defining inertial coordinate systems and Lorentz transformations between observers in relative motion. Vectors are introduced as having four components with the zeroth being ct. Lorentz transformations mix space and time components, preserving the length of four-vectors. Covariant and contravariant vectors are defined based on their transformation properties, with the Minkowski metric relating the two. Exercises are provided to work through key concepts.

Uploaded by

Hari Bilasi
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views

Classical Fields

This document provides an introduction to relativistic covariance through classical fields. It begins by defining inertial coordinate systems and Lorentz transformations between observers in relative motion. Vectors are introduced as having four components with the zeroth being ct. Lorentz transformations mix space and time components, preserving the length of four-vectors. Covariant and contravariant vectors are defined based on their transformation properties, with the Minkowski metric relating the two. Exercises are provided to work through key concepts.

Uploaded by

Hari Bilasi
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

Classical Fields

Part I:
Relativistic Covariance
Prof. J.J. Binney
Oxford University
Michaelmas Term 2005

Books: (i) Introduction to Einstein’s Relativity, Ray d’Inverno, OUP; (ii)


The Classical Theory of Fields, L.D. Landau & E.M. Lifschitz (Pergamon);
(iii) Gravitation and Cosmology, S., Weinberg (Wiley)

Vacation work: Study §1 Relativistic Covariance and work the eight em-
bedded Exercises
1 Relativistic Covariance
Observers who move relative to one another do not always agree about the values of
quantities, such as speed, mass, energy etc, associated with the same physical system.
The special theory of relativity tells us how we may predict the values measured by
any observer once we know the values assigned by one particular observer, for example
ourselves.
Special relativity teaches us to think of experience as being made up of ‘events’,
each with a definite location in the four-dimensional continuum of spacetime. Any
given observer assigns to each event a unique 4-tuple of numbers (t, x, y, z). Of course
he can do this in many, many ways. But special relativity claims that there are certain
specially favoured systems for assigning coordinates to events, the so-called inertial
coordinate systems. O chooses one inertial system and another observer, O 0 , sets up
a different one. But according to special relativity the coordinates (t 0 , x0 , y 0 , z 0 ) O0
assigns to any event can be related to O’s coordinates (t, x, y, z) of the same event by
     
ct0 ct0 ct
0
 x   x0  x
 0=  + L ·  , (1.1)
y y0 y
z0 z0 z

where c is the speed of light and (t0 , x0 , y0 , z0 ) is a set of numbers characteristic of the
two observers, as is the 4 × 4 matrix L.
Clearly, (t0 , x0 , y0 , z0 ) are the coordinates O0 assigns to the event that marks the
origin of O’s coordinates. For simplicity we shall assume that (t 0 , x0 , y0 , z0 ) = 0. In
general L can be represented as the product of matrices generating a rotation, a boost
parallel to a coordinate direction and a second rotation: L = R 0 ·L0 ·R, where R rotates
the coordinate axes so as to align the boost direction with a coordinate direction, L 0
effects the boost along the given axis and R0 rotates the coordinates to any desired
final orientation. If R is chosen such that the x-axis becomes the boost direction, L 0
has the form
 
γ −βγ 0 0
 −βγ γ 0 0  β ≡ v/c
L0 =   where p . (1.2)
0 0 1 0 γ ≡ 1/ 1 − β 2
0 0 0 1

For simplicity we confine ourselves to observers whose spatial coordinate systems


are aligned, and whose relative motion lies along their (mutually parallel) x-axes. Then
in (1.1) L = L0 and we get the familiar equations of a Lorentz transformation:

t0 = γt − γvx/c2
x0 = γx − γvt
(1.3)
y0 = y
z0 = z
2 Chapter 1: Relativistic Covariance

4-vectors Lorentz transformations mix up space and time, so it is useful to define


new coordinates which all have dimensions of length. We write x 0 ≡ ct, x1 ≡ x,
x2 ≡ y, x3 ≡ z, and refer to a general component of the 4-vector (x0 , x1 , x2 , x3 ) as xµ .
(The reason for labelling the components with superscripts rather than subscripts will
emerge shortly.) Then we write a Lorentz transformation as
xµ = Λ µ ν xν , (1.4a)
where  
γ −βγ 0 0
 −βγ γ 0 0
Λ≡ . (1.4b
0 0 1 0
0 0 0 1
In (1.4a) the Einstein summation convention is being used in that the summation
P1
sign ν=0 has been omitted for brevity. You know it’s really there because ν appears
twice on the right-hand side of the equation, once up and once down.
Why do we write the row index of Λ as a superscript and the column index as a
subscript?
A key property of a Lorentz transformation is that −(ct0 )2 + x02 + y 02 + z 02 =
−(ct)2 +x2 +y 2 +z 2 . This is analogous to the fact that if two vectors a and a0 are related
by a rotation matrix, then a02 02 02 2 2 2
x + ay + az = ax + ay + az . So a Lorentz transformation
is a sort of modified, four-dimensional rotation. When we rotate a vector a we like to
say that the length |a| is invariant (i.e., stays constant). Analogously we define the
length of the 4-vector x to be
|x| ≡ −(x0 )2 + (x1 )2 + (x2 )2 + (x3 )2 . (1.5)
Notes:
(i) We don’t extract a square root because we have no guarantee that |x| ≥ 0.
(ii) 4-vectors that have negative lengths are called time-like, while those with positive
lengths are space-like. Vectors with zero length are said to be null.
(iii) Every book on relativity uses a different convention. The sign of the lengths of
space-like vectors is called the “signature of the metric”.
The lengths of 4-vectors are sufficiently important for it to be useful to have a
way of writing them that does not involve writing out all the components explicitly.
To achieve this we introduce this matrix, called the Minkowski metric:
 
−1 0 0 0
 0 1 0 0
η≡ . (1.6)
0 0 1 0
0 0 0 1
Then we have
|x| = x · η · x, (1.7a)
or in component form
|x| = xµ ηµν xν . (1.7b
The Einstein convention is here being used to drop two summation signs. We write
both of η’s indices as subscripts so that each sum is over one up and one down index.
Introduction to Relativistic Covariance 3

Covariant and contravariant vectors We write the result of matrix multipli-


cation of x by η as
xµ ≡ ηµν xν .
We have x0 = −x0 = −ct, x1 = x1 , x2 = x2 and x3 = x3 . Thus the length of x is

xµ xµ = −c2 t2 + x2 + y 2 + z 2 .

Notice that here as everywhere else, we are summing over one up and one down index.
In order to stick rigidly to this rule, we define
 
−1 0 0 0
 0 1 0 0
η µν ≡ ηµν ≡ . (1.8)
0 0 1 0
0 0 0 1

Note:
We have η µγ ηγν = δνµ , or in matrix form η · η = I, where I and δνµ are two ways
of writing the 4 × 4 identity matrix. Also η µν = η µγ δγν , so in a sense η is merely
the up-up and down-down forms of the identity matrix.
From xµ we can recover xµ ;
xµ = η µν xν . (1.9)

xµ is a 4-vector, but of a slightly different type than xµ , because under a Lorentz


transformation we have
ν
x0µ = ηµν x0 = ηµν Λν κ xκ = ηµν Λν κ η κλ xλ
    
−1 0 0 0 γ −βγ 0 0 −1 0 0 0 x0
 0 1 0 0   −βγ γ 0 0 0 1 0 0   x1 
=    
0 0 1 0 0 0 1 0 0 0 1 0 x2
0 0 0 1 0 0 0 1 0 0 0 1 x3 (1.10)
  
γ βγ 0 0 x0
 βγ γ 0 0   x1  ν
=    ≡ Λ µ xν ,
0 0 1 0 x2
0 0 0 1 x3

where we have defined a new matrix

Λµ λ ≡ ηµν Λν κ η κλ . (1.11)

Notice that the transpose of Λµ ν is the inverse of Λµ ν :

Λµ κ Λµ ν = δκν , (1.12)

where we have again written the 4 × 4 identity matrix as δκν .


4 Chapter 1: Relativistic Covariance

Exercise (1):
Obtain (1.12) from the requirement that for any two vectors x, y, we have x 0µ y 0µ =
xµ y µ .
Vectors with their indices below are called covariant (xµ ). Vectors with indices
above are called contravariant (xµ ). I shall call them down and up vectors. The
operation of setting two indices equal and summing from 0 to 3 is calledP
contraction.
In a contraction one index must be up and one down. Quantities like µ xµ xµ have
nothing to do with physics. An important motivation for writing x µ rather than x
is to distinguish the up from the down form of x. Often an expression is equally
valid for up or down vectors provided the basic rules are obeyed, and then it is neater
to use conventional vector notation than to stick in indices. For example, if a and
b are vectors and M is a matrix, we can interpret a = M · b as aµ = M µν bν , as
aµ = Mµν bν , or in yet other ways. But if you ever express a 4-vector in component
form, you must come clean and say whether you’re giving the up or the down vector,
as in xµ = (ct, x, y, z).

According to special relativity, all quantities of physical interest can be grouped


into n-tuples.

1.1 1-tuples (4-scalars)

On some things all observers agree, for example the charge and total spin of the an
electron. These quantities are called 4-scalars or relativistic invariants. The length
of a 4-vector is a 4-scalar.

1.2 4-tuples (4-vectors)

If O measures the wave-vector and frequency of a photon to be k and ω, then an ob-


server O0 who moves at speed v along O’s x-axis measures wave-vector k 0 and frequency
ω 0 given by
 0    
ω /c γ −βγ 0 0 ω/c
 kx0   −βγ γ 0 0   kx 
 0 =  . (1.13a)
ky 0 0 1 0 ky
kz0 0 0 0 1 kz
The matrix form of this equation is
 
µ ¶ µ ¶ γ −βγ 0 0
ω 0 /c ω/c  −βγ γ 0 0
=Λ· where Λ≡ . (1.13b
k0 k 0 0 1 0
0 0 0 1

Notes:
(i) The Lorentz transformation matrix Λ is dimensionless, so ω has to be divided by c
to give the same dimensions as k before being put into the last place of a 4-vector
with k.
1.3 6-tuples (antisymmetric 2nd rank tensors) 5

(ii) Vectors written in italic boldface (k) are 3-vectors, while those written in Roman
boldface (k) are 4-vectors.
If we define k 0 ≡ ω/c, then

k0 = Λ · k i.e., k 0µ = Λµ ν k ν . (1.14)

Exercise (2):
Determine whether the photon is blue or red shifted between its emission by O
and its detection by O0 . Relate this to the question of whether O0 is approaching
or receding from O.
The length of a photon’s 4-vector is the scalar

ω2
|k| ≡ −(k 0 )2 + (k 1 )2 + (k 2 )2 + (k 3 )2 = − + |k|2 = 0.
c2
One can prove that this really is a scalar by brute force:

|k0 | = −(k 00 )2 + (k 01 )2 + (k 02 )2 + (k 03 )2
³ ω ´2 ³ ω ´2
1 1
= − γ − βγk + − βγ + γk + (k 2 )2 + (k 3 )2
c c
³ ´ ω2 ³ ´
= −γ 2 1 − β 2 2 + γ 2 1 − β 2 (k 1 )2 + (k 2 )2 + (k 3 )2
c
= −(k ) + (k ) + (k 2 )2 + (k 3 )2 .
0 2 1 2

Another familiar 4-tuple: if observer O measures energy E and momentum p for


some particle, then O0 will measure E 0 and p0 given by
µ 0 ¶ µ ¶
E /c E/c
=Λ· , (1.15)
p0 p

or setting p0 ≡ E/c, we have p0µ = Λµ ν pν .


The length of the momentum-energy 4-vector of a particle of rest mass m 0 6= 0 is
just −c2 times the square of its rest mass m0 . We show this by arguing that it doesn’t
matter in whose frame we evaluate a scalar. We choose the particle’s rest frame. Then
p = 0 and E = cp0 = m0 c2 , so

−(p0 )2 + (p1 )2 + (p2 )2 + (p3 )2 = −m20 c2 .

1.3 6-tuples (antisymmetric 2nd rank tensors)


If the electric and magnetic fields measured by O are arranged into the antisymmetric
matrix F,
 
0 Ex /c Ey /c Ez /c
 −Ex /c 0 Bz −By 
F µν ≡   (SI units), (1.16)
−Ey /c −Bz 0 Bx
−Ez /c By −Bx 0
6 Chapter 1: Relativistic Covariance

then O0 will measure E 0 and B 0 as

 
0 E 0 x /c E 0 y /c E 0 z /c
 −E 0 x /c 0 B0z −B 0 y  0µν
 ≡F = Λµ κ Λν λ F κλ . (1.17)
−E 0 y /c −B 0 z 0 B0x
−E 0 z /c B0y −B 0 x 0

Note that F µν transforms as if it were the product pµ pν of two down-vectors (which


it isn’t). Objects that transform in this way are called second-rank tensors.
F is called the Maxwell field tensor.

Exercise (3):
Transform F κλ with the matrix Λµ ν defined by (1.13b) to show that an ob-
server who moves at speed v down the x-axis of an observer who sees fields E =
(Ex , Ey , 0) and B = 0, perceives fields E 0 = (Ex , γEy , 0) and B 0 = (0, 0, γvEy /c).
[Hint: since Λ is symmetric, we can write F0 = Λ·F·Λ.] Hence deduce the general
rules Ek0 = Ek , E⊥0
= γ(E⊥ + v × B), Bk0 = Bk , B⊥ = γ(B⊥ − v × E/c2 ). Verify
2 2
that (B 2 − E 2 /c2 ) = (B 0 − E 0 /c2 ).
Some 6-tuples correspond to elements of area. This correspondence works as
follows. With any two displacements, say u and v, we associate the parallelogram
bounded by u and v. Information about the size and orientation of this parallelogram
is conveyed by the antisymmetric tensor S αβ ≡ uα v β − uβ v α ; in particular, if u = v,
then S = 0. S has fewer degrees of freedom than the eight numbers involved in u and
v because we can add to u any multiple of v without affecting S, and vice versa for v
and u.

Exercise (4):
Consider transformation u → u0 = au + bv, v → v0 = cu + dv with the corre-
sponding mapping S → S0 . Show that the equation S0 = S imposes one constraint
on the numbers a, b, c, d. Hence only 8 − 3 = 5 numbers are needed to specify S.
Give a geometrical interpretation of this result.
In three-space the size and orientation of a parallelogram may be specified by
giving the magnitude and direction of the normal. Hence in three-space full infor-
mation about an antisymmetric 2nd rank tensor can be packed into the three com-
ponents of the 3-vector which we call the cross-product of the parallelogram’s sides.
In four-dimensional spacetime each parallelogram has a magnitude and two mutually
perpendicular normals, requiring five numbers for its full specification. Consequently
there is no direct analogue of the cross product and we must represent areas directly
with antisymmetric tensors.

Exercise (5):
Relate the above statements to the number of independent components of an
antisymmetric n × n matrix for n = 2, 3, 4.
1.3 6-tuples (antisymmetric 2nd rank tensors) 7

A physically interesting 6-tuple that describes an area is the tensor (x µ pν − xν pµ )


formed from the space-time coordinate vector xµ = (ct, x, y, z) and the 4-momentum
of a particle. If the angular momentum about the origin is L, we have
 
.. ..
0 . .
 .. .. 
 2
c(xE/c − tpx ) 0 . .
H ≡ (x p − x p ) = 
µν µ ν ν µ

,
 (1.18)
 c(yE/c2 − tp ) −L . . 
y z 0 .
2
c(zE/c − tpz ) Ly −Lx 0
where the diagonal dots stand for minus the quantities in the lower left triangle of the
matrix. The numbers in the first column of this matrix give mc times the particle’s
initial position vector.
With every 6-tuple we get two free scalars. If the 6-tuple is of the form
(u v − uβ v α ), then one of these is twice the squared magnitude of the corresponding
α β

parallelogram:
S µν (ηµκ ηνλ S κλ ) ≡ S µν Sµν = − Tr S · S
= (uµ v ν − uν v µ )(uµ vν − uν vµ ) = 2[|u||v| − (u · v)2 ].

Note:
Here by Tr M we mean Mα α = M α α . That is, the sum implied by Tr must always
be over one up and one down index.
Evaluation in the particle’s rest frame shows that the scalar 12 Hµν H µν = [|x||p| −
(x · p)2 ] = −(m0 cr0 )2 , where r0 is the distance (in the rest frame) between the particle
and the origin at t = 0.
It is interesting to evaluate this same scalar for the Maxwell field tensor. Straight-
forward matrix multiplication shows that the down-down shadow of F µν is1
 
0 −Ex /c −Ey /c −Ez /c
 E /c 0 Bz −By 
Fµν ≡  x  (SI units), (1.19)
Ey /c −Bz 0 Bx
Ez /c By −Bx 0
Multiplying each element of Fµν by the corresponding element of F µν we find
m ≡ 12 Fµν F µν = − 12 Tr F · F
= 21 (each element of Fµν ) × (corresponding element of F µν ) (1.20)
= (B 2 − E 2 /c2 ).

To extract another scalar from a 6-tuple we need to introduce the Levi-Civita


symbol: (
+1 if αβγδ is an even permutation of 0123
²αβγδ = −1 if αβγδ is an odd permutation of 0123 (1.21)
0 otherwise.
1 It is worth remembering that in special relativity the lowering operation only changes the sign
of the mixed space-time components.
8 Chapter 1: Relativistic Covariance

Note:
Whereas when n is odd, the cyclic interchange i1 → i2 → . . . → in−1 → in → i1
is an even permutation of the ik , when n is even, this permutation is odd. (To
prove this exchange i1 and in and then make n − 2 exchanges to work i1 back to
the second place.) So whereas for 3-dimensional tensors ²jki = ²ijk , we now have
²βγδα = −²αβγδ .
²αβγδ allows us to form the dual F of F:
αβ
F ≡ 21 ²αβγδ Fγδ
 
0 Bx By Bz
 −Bx 0 −Ez /c Ey /c  (1.22)
= .
−By Ez /c 0 −Ex /c
−Bz −Ey /c Ex /c 0

F can be obtained from F by the transformation E → B, B → −E. The other scalar


is the trace of the product of F with its dual:

f ≡ Tr F · F
αβ
= −(each element of Fαβ ) × (corresponding element of F ) (1.23)
4
= E · B.
c

Exercise (6):
Show that with Sµν = uµ vν − uν vµ , Tr S · S = 0. This result explains why S has
only 5 degrees of freedom (Exercise 4).

1.4 10-tuples (symmetric 2nd rank tensors)


Imagine that we move some charges around. Then the rate at which we do work on
the e.m. field is Z
Ė = − E · j d3 x
Z ³ (1.24)
1 1 ∂E ´ 3
=− E· ∇×B− 2 d x
µ0 c ∂t
But ∇ · (E × B) = B · (∇ × E) − E · (∇ × B), so (1.24) can be rewritten
Z Z ³
1 3 1 1 ∂E ´ 3
Ė = ∇ · (E × B) d x + − B · (∇ × E) + 2 E · d x
µ0 µ0 c ∂t
I Z (1.25)
1 2 1 ∂¡ 2 ¢
= (E × B) · d S + B + E 2 /c2 d3 x.
µ0 2µ0 ∂t
If energy is to be conserved, the energy we deploy moving the charges has to go
somewhere. According to (1.25) energy will be conserved if we interpret the Poynting
vector
1
N≡ E×B (1.26)
µ0
1.4 10-tuples (symmetric 2nd rank tensors) 9

as the flux of e.m. energy, and


1 ¡ 2 ¢
B + E 2 /c2 (1.27)
2µ0
as the density of e.m. energy.
How do the Poynting vector and the e.m. energy-density fit into the scheme of
n-tuples? From F we can construct the following important tensor:
1
T µν = [− 1 (Fδγ F δγ )η µν − F µ γ F γν ];
µ0 4
(1.28)
1 £1 ¤
T= 4 Tr(F · F)η − F · F ,
µ0
where F is, as usual, the Maxwell field tensor (1.16). It’s easy to see that Tr T = 0. A
little slog shows that in terms of E and B the tensor T is
 1 2 2 2

2µ0 (B + E /c ) Nx /c Ny /c Nz /c
 Nx /c 
T µν = 
,
 (1.29)
Ny /c Pij
Nz /c
where
· ³ ¸
1 1 2 E2 ´ ³ Ei Ej ´
Pij ≡ 2 δij B + 2 − Bi Bj + 2 (i, j = 1, 2, 3). (1.30)
µ0 c c
Thus the energy density in the e.m. field is the 00 component of T and the Poynting
vector occupies the mixed space-time components of T. It turns out that the 3 ×
3 matrix Pij describes the flux of the three kinds of momentum: Pix = flux of x-
momentum etc.

Exercise (7):
Show that a uniform magnetic field parallel to the z-axis is associated with tension
(negative pressure) along the axis, and pressure in the perpendicular directions.
As an example of T consider a plane e.m. wave running along î polarized parallel
to ĵ. Then
E = (0, E, 0) cos(ωt − kx)
B = (0, 0, B) cos(ωt − kx).
E and B are related by −∂B/∂t = ∇ × E ⇒ B = kE/ω = E/c. Hence
N = (E 2 /µ0 c, 0, 0) cos2 (ωt − kx).
The first term in our expression (1.30) is non-zero only on the diagonal. The second
term is non-zero only in the yy and zz slots and there cancels the first term. So P is
 
1 0 0
E2
Pij =  0 0 0  cos2 (ωt − kx),
µ0 c 2
0 0 0
10 Chapter 1: Relativistic Covariance

and finally
 
1 1 0 0
 1 1 0 0  E2
T µν =   cos2 (ωt − kx). (1.31)
0 0 0 0 µ0 c 2
0 0 0 0
The stress tensor P has only an entry in the xx slot because our wave is engaged in the
business of carrying x-type momentum in the x-direction; the wave would push back
a mirror placed in a plane x = constant. Clearly the Poynting vector is also directed
along the x axis, which accounts for the off-diagonal units in T. In proper relativistic
units the wave employs unit energy density (“capital employed”) to carry unit fluxes of
energy and momentum (“turnover”). Notice that the wave’s phase is the scalar −k · x.
When we do cosmology we’ll need T µν for a fluid. At each event a fluid has
a streaming motion that’s characterized by the 4-velocity uα and an associated rest
frame. In this rest frame there’s an energy density ρc2 and a pressure P . If the fluid
is “perfect” there are no other stresses (such as viscous shear) and we’ll only consider
perfect fluids. T µν has to be a symmetric second-rank tensor made from the scalars ρ
and P , the vector uµ and the tensor η µν . A candidate is
T µν = (ρ + P/c2 )uµ uν + P η µν . (1.32)
It’s the tensor we want because in the fluid’s rest frame it becomes
 2 
ρc 0 0 0
 0 P 0 0
 .
0 0 P 0
0 0 0 P

1.5 Derivatives of tensors


Derivatives with respect to any system of coordinates can be expressed in terms of
derivatives w.r.t. any other system by use of the chain rule:
∂ ∂xν ∂
= . (1.33)
∂x0 µ ∂x0 µ ∂xν
If the primed and unprimed systems are linked by a Lorentz transformation,
ν
x0 = Λ ν µ xµ , (1.34)
κ
we have on multiplying by Λν and summing over ν,
ν
Λν κ x 0 = Λ ν κ Λν µ x µ = x κ ,
where the last step follows by (1.12). Differentiating we get
∂xκ
= Λν κ . (1.35)
∂x0 ν
Thus
∂ ν ∂
0 µ = Λµ , (1.36)
∂x ∂xν
and we see that ³1 ∂ ∂ ∂ ∂ ´
µ
∂µ ≡ ∂/∂x = , , , (1.37)
c ∂t ∂x ∂y ∂z
transforms like a down vector.
1.5 Derivatives of tensors 11

Notes:
(i)
∂ ∂φ
µ
operates on scalars to produce vectors: Gµ ≡ ≡ ∂µ φ ≡ φ,µ
∂x ∂xµ

µ
operates on vectors to produce 2nd rank tensors:
∂x
∂Aν
Gµν ≡ ≡ ∂ µ Aν ≡ A ν ,µ
∂xµ

operates on tensors to produce higher-rank tensors:
∂xµ
∂Bλν
Gµλν ≡ ≡ ∂µ Bλν ≡ Bλν ,µ
∂xµ
The operand’s indices can be either up or down: Gµ ν = ∂µ Aν .
(ii) If we contract the tensor produced by operating on a vector, we get a scalar, the
4-divergence ψ = ∂µ Aµ .
(iii) We can reduce the number of indices on a higher-rank tensor by contraction:
Aν = ∂µ Gµν .
(iv) The 4-analogue of taking the curl of a vector is to antisymmetrize the tensor
formed by operating on a vector: Fµν = (∂µ Aν − ∂ν Aµ ). If Aν = ∂ν φ, then
Fµν = 0 because partial derivatives commute.
(v) A natural generalization of the divergence theorem reads
Z I
4 ∂Tα...
d x = (d3 x)µ Tα... , (1.38)
V ∂xµ S

where S is the boundary of the 4-d region V . Notice that T may have as many
indices as it pleases and that one of them may be contracted with µ if you wish.

Example:
In e.m. the usual vector potential A and the electrostatic potential φ form the
four components of an up vector

Aµ = (φ/c, Ax , Ay , Az ) [⇒ Aµ = (−φ/c, Ax , Ay , Az )]. (1.39)

Our old friend the Maxwell field tensor F is then

Fµν = ∂µ Aν − ∂ν Aµ . (1.40)

∂Ay ∂Ax Ȧx 1 ∂φ


Thus F12 = − = Bz and F01 = + = −Ex /c.
∂x ∂y c c ∂x

Derivatives with respect to proper time The history of a particle defines


a curve in space-time. Let λ be a parameter which labels points on the curve in
12 Chapter 1: Relativistic Covariance

a continuous way. Then the coordinates xµ of points on the curve are continuous
functions xµ (λ). For δλ ¿ 1 the small vector
dx
δx ≡ δλ

almost joins two points on the curve. Hence it is time-like and |δx| < 0. For any two
points A and B on the curve, we define
Z Bs ¯ ¯
1 ¯ dx ¯
τ≡ −¯¯ ¯¯ dλ (1.41)
c A dλ

to be the proper time difference between A and B along the curve. If the curve is a
straight line, we may transform to the coordinate system in which x µ = (ct, 0, 0, 0) at
all points on the curve, and then
Z r
1 B dct d(−ct)
τ= − dλ = [tB − tA ]. (1.42)
c A dλ dλ
Hence the name. We regard the coordinates xµ of events along the trajectory as
functions xµ (τ ) of the proper time. Differentiating w.r.t. τ and multiplying through
by the rest mass m0 we obtain a 4-vector, the momentum
dx
p ≡ m0 . (1.43)

From the zeroth component of the up version of this equation we have dt = γdτ ; the
hearts of passengers on a fast train (they mark off units of τ ) appear to beat slowly to
a medic on the station platform (whose watch keeps t).

1.6 Laws of e.m. and mechanics in tensor form


The relativistic generalization of Newton’s second law is
d2 x d ³ dx ´ dp
m0 2 = m0 = = f, (1.44)
dτ dτ dτ dτ
where f is the 4-force. The last three components of f µ are just the Newtonian force
components fi . With µ = 0 equation (1.44) states that the zeroth component of f µ is
to 1/c times the rate of change of the particle’s energy cp0 ; hence physically f 0 is 1/c
times the rate of working of the force w. In summary
f µ = (w/c, fx , fy , fz ). (1.45)

The divergence of (1.16) consists of these four equations:


 1 ∂Ex 1 ∂Ey 1 ∂Ez 
+ + Ã 1 !
 c ∂x c ∂y c ∂z  ∇·E
  c
F µν ,ν =  ∂Bz /∂y − ∂By /∂z − c12 ∂Ex /∂t  = 1 ∂E . (1.46)
 −∂B /∂x + ∂B /∂z − 1 ∂E /∂t  ∇×B− 2
z x c2 y c ∂t
∂By /∂x − ∂Bx /∂y − c12 ∂Ez /∂t
1.6 Laws of e.m. and mechanics in tensor form 13

The zeroth component is by Poisson’s equation equal to ρ/(c²0 ) = cµ0 ρ, where ρ is the
charge density. By Ampere’s law, the last three of these equations are equal to µ 0 j,
where j is the current density. Hence if we form a 4-vector

j µ = (cρ, jx , jy , jz ), (1.47)

we may write four of Maxwell’s equations as

F µν ,ν = µ0 j µ . (1.48)

It is straightforward to verify that Maxwell’s other four equations can be written

Fµν ,λ +Fλµ ,ν +Fνλ ,µ = 0 (µ 6= ν 6= λ). (1.49)

Exercises (8):
(i) Show that when λ, µ and ν equal 1, 2 and 3 respectively, (1.49) becomes ∇·B = 0.
µν
(ii) Show that with equation (1.22) equation (1.49) may also be written F ,ν = 0.
Charge conservation is expressed as

µ0 ∂ · j = µ0 j µ ,µ = F µν ,νµ = 0, (1.50)

where the last step follows by the antisymmetry of F.


The natural definition of the 4-current associated with a particle of charge q is
dx
J=q . (1.51)

Since the force exerted on a charged particle by an e.m. field has to be linear in q, the
fields represented by F, and the particle’s velocity vector, a suitable 4-vector to try as
the force is
f = F · J. (1.52)
Tentatively inserting this into (1.44) and multiplying through by dτ /dt = 1/γ to obtain
the acceleration as measured in the laboratory frame, we get
dp dx
= qF · . (1.53)
dt dt
It is straightforward to check that the last three components of the up form of this
vector are
d³ dx ´
m0 γ = q(v × B + E),
dt dt
while the zeroth component is
d(m0 cγ) q
= E · v,
dt c
or, in words, “the rate of change of the particle’s energy mc2 is equal to the rate of
working of the Lorentz force.”
14 Chapter 1: Relativistic Covariance

Gauge invariance At a classical (i.e. non-quantum level) only E and B are


physically meaningful—A is just an abstraction from which E and B can be calculated
via Fµν = (∂µ Aν − ∂ν Aµ ). So nothing physical changes if we replace A by

A0 ≡ A + ∂Λ, (1.54)

where Λ(x) is any scalar-valued function of space-time coordinates. The change (1.54)
in A is called a gauge transformation.
Gauge transformations can be used to ensure that A satisfies an additional equa-
tion. In particular, given A we can choose Λ s.t. A0 satisfies one of these gauge
conditions:
(i) Lorentz gauge:2
∂ · A0 = 0 ⇒ 2Λ = ∂ · A (1.55)
The Lorentz condition (1.55) does not uniquely specify A0 since many non-trivial
functions satisfy 2φ = 0 and so given one Λ satisfying the 2nd of eqs (1.55), we
can construct many others Λ0 = Λ + φ.
(ii) Coulomb or radiation or transverse gauge

∇ · A0 = 0 ⇒ ∇2 Λ = ∇ · A (1.56)

In this gauge the 0th eqn of the set ∂ ν Fµν = µ0 jµ reads


ρ
= −µ0 j0 = −∂ ν (∂0 Aν − ∂ν A0 )
c²0
= −∂0 ∂ ν Aν + ∂ ν ∂ν A0
= −∂0 ∂ 0 A0 + ∂ ν ∂ν A0 (1.57)
= ∂ i ∂i A0
= −∇2 φ/c

i.e., in this gauge the electrostatic potential satisfies Poisson’s eqn, which explains
the gauge’s name.

1.7 Summary
The special theory of relativity requires that any physical quantity must fit into an
n-tuple of numbers, where n = 1, 4, 6, 10, . . .. Physical laws must be expressed as
equations connecting the n-tuples associated with different physical quantities. These
equations must be constructed in accordance with the rules of tensor calculus, which
permit only:
(i) the multiplication of n-tuples to form either higher-rank n-tuples (as in H µν =
xµ pν − xν pµ ) or lower-rank n-tuples (as in fµ = Fµ ν Jν ), or
2 We denote the d’Alembertian opertor by 2 ≡ ∂ ∂ µ by analogy with the notation 4 ≡ ∇2 = ∂ ∂ i
µ i
for the Laplacian operator.
1.7 Summary 15

(ii) the addition of n-tuples of the same rank.


In particular, both sides of every acceptable equation always form valid n-tuples of the
same kind.
Rest-mass, electric charge and total spin are scalars (1-tuples). The most impor-
tant 4-vectors (4-tuples) include any particle’s energy-momentum p, e.m. current J or
acceleration dp/dτ , and the potential A of the e.m. field. Important 6-tuples include
any particle’s angular momentum H and the Maxwell field tensor F. An important
10-tuple is the density T of the energy-momentum due to the e.m. field.
In 4-vector notation the key equation of mechanics and e.m. are

dx
v= ; p = m0 v ; J = qv

dp
f =F·J ; =f

µν
Fµν = ∂µ Aν − ∂ν Aµ ; F µν ,ν = µ0 jµ ; F ,ν = 0,

µν
where F µν ≡ η µγ η νδ Fγδ and F ≡ 21 ²µνγδ Fγδ . The energy-monentum tensor of
the e.m. field is

1 £1 ¤
T µν = 4 Tr(F · F)η µν − F µ γ F γν .
µ0
16 Chapter 2: Groups & their representations

2 Groups & their representations


Rotations (and Lorentz transformations) form what mathematicians call a group be-
cause:
i) If you follow one rotation by another, the result could be achieved by a single
rotation; in mathematical language, the product of two group members is itself a
member of the group.
ii) Doing nothing can be considered to be a rotation about zero angle; in mathe-
matical language there is an identity element I such that IR = R for all group
members R.
iii) Any rotation can be reversed, that is each rotation R has an inverse R −1 such
that R−1 R = I.
The group of rotations is called the three-dimensional special orthogonal group or
SO(3).
If we are concerned with the effect of rotations on vectors, we associate each
rotation with an orthogonal matrix such as
 
cos ψ − sin ψ 0
M(k̂, ψ) =  sin ψ cos ψ 0  . (2.1)
0 0 1
When these matrices are multiplied, we get the matrix associated with the product of
the two rotations:
R3 = R 2 R1 ↔ M 3 = M 2 M1 (2.2)
Matrices that are associated with all group members such that this relation holds, are
said to form a representation of the group.
Arbitrarily many different representations of a group like SO(3) are possible. To
widen our horizons away from 3 × 3 rotation matrices, consider the following scheme.
Each rotation moves points around spheres. Consider the sphere of radius R.
Positions on this sphere are elegantly described by stereographically projecting points
(x, y, z) on the sphere to points (X, Y, 0) in the plane z = 0 as shown in the figure.
(Stereographic projections are much used by crystallographers.)

The upper hemisphere is mapped to X 2 +Y 2 < R2 ,


while the lower hemisphere is mapped to the rest
of the XY plane. Suppose y = Y = 0. Then
from the triangles x = X(R + z)/R. Using this
to eliminate x from the equation of the circle we
get a quadratic in z with solution z = R(R2 −
X 2 )/(R2 + X 2 ). Back-substituting we then get
x = 2XR2 /(R2 + X 2 ).
Introduction to Groups & their representations 17

We define ζ ≡ (X + iY )/R. It’s clear that the phase of ζ will be the same as the phase
of x + iy. So from X 2 + Y 2 = R2 ζζ ∗ and the results we already have, it follows that

ζ 1 − ζζ ∗
x + iy = 2R ; z=R . (2.3)
1 + ζζ ∗ 1 + ζζ ∗

Writing ζ = η2 /η1 , we have

η2 η1∗ |η1 |2 − |η2 |2


x + iy = 2R ; z=R . (2.4)
|η1 |2 + |η2 |2 |η1 |2 + |η2 |2

We fix the length of the complex 2-vector (Pauli spinor) η ≡ (η1 , η2 ) by setting
R = |η1 |2 + |η2 |2 so we have simply

x + iy = 2η2 η1∗ ; z = |η1 |2 − |η2 |2 . (2.5)

A unitary transformation η → η 0 ≡ U · η leaves the normalization invariant


and through equations (2.5) generates a new point on the sphere. We can show (see
Problems) that a given unitary transformation leaves invariant the distance between
different points on the sphere, so the transformation is a rotation, potentially plus an
inversion. Conversely, any rotation of the sphere transforms η into some other spinor
η 0 in a unitary way. Thus a rotation is associated with each 2 × 2 unitary matrix U,
and any rotation is generated by some U.

Exercise (9):
Show that
x = η † σx η y = η † σy η z = η † σz η, (2.6a)
where η † is the complex-conjugate-transpose of η and
µ ¶ µ ¶ µ ¶
0 1 0 −i 1 0
σx ≡ ; σy ≡ ; σz ≡ (2.6b)
1 0 i 0 0 −1

are the Pauli spin matrices. Notice that they are Hermitian and that [σi , σj ] =
2i²ijk σk .
Bearing in mind that |η2 |2 = R − |η1 |2 , let’s arrange the orginal coordinates into
a matrix: µ ¶ µ ¶
1 z x − iy |η1 |2 − 12 R η1 η2∗
X≡ 2 = , (2.7)
x + iy −z η2 η1∗ |η2 |2 − 12 R
which can also be written
Xij = ηi ηj∗ − 12 Rδij . (2.8)
e where
The transformation η → ηe ≡ U · η maps X → X

eij = Uik ηk (Ujl ηl )∗ − 1 Rδij = Uik (ηk ηl∗ − 1 Rδkl )U †


X i.e. e = UXU† .
X (2.9)
2 2 lj
18 Chapter 2: Groups & their representations

To this point we have confined ourselves to unitary matrices in order to preserve the
normalization |η|2 = R. However, a general linear transformation η → ηe = Mη
induces the transformation
µ ¶ µ 0 ¶
∗ 1 ∗ 1 R + z x − iy † 1 R + z 0 x0 − iy 0
ηi ηj = Xij + 2 Rδij → ηei ηej = 2 M M =2 .
x + iy R − z x0 + iy 0 R0 − z 0
(2.10)
If we impose the restriction det(M) = ±1, we will be making a transformation such that
R02 − x02 − y 02 − z 02 = R2 − x2 − y 2 − z 2 . Hence, if we set R = ct, we will be performing
a Lorentz transformation. The 2 × 2 complex matrices with unit determinant are
considered to form the group SL(2,C) (SL = special linear).

Exercise (10):
Show that with R = ct we can complement equations (2.6a) with

ct = η † Iη. (2.11)

The rotations are the sub-group of the Lorentz group that are obtained by requir-
ing M to be not merely of unit determinant, but unitary. The 2 × 2 unitary matrices
with unit determinant form the group SU(2). Thus we have shown that SL(2,C) can
be mapped into the Lorentz group, and SU(2) can be mapped onto SO(3).
Notice that these transformations cannot change the sign of of R = ct, so they do
not include reversals of time. It turns out that they do not include inversions of space
either. The mappings are not 1-1 because −M induces the same transformation of
space-time as does M. So we have found a representation of the subgroup of proper
orthochronous Lorentz transformations or proper Lorentz group for short.
In classical physics spinors are no more than mathematical devices. But the
amplitudes a± for a spin-half particle to have its spin up or down along any chosen
axis transform under Lorentz transformations like the components of a spinor.

2.1 Generators

It’s easy to show that the (Hermitian) Pauli matrices [eq. (2.6b)] all square up to the
identity matrix: σi2 = I. Let n be a unit vector, then this property applies equally to
the matrix µ ¶
nz nx − iny
σn ≡ n · σ = . (2.12)
nx + iny −nz
We define the exponential of iθσn through the power series

θ2 θ3
eiθσn = I + iθσn − σn2 − i σn3 + · · ·
2! 3!
³ θ 2 ´ ³ θ3 ´ (2.13)
= 1− + ··· I + i θ − + · · · σn
2! 3!
= cos θ I + i sin θ σn .
2.1 Generators 19

Now for any θ, eiθσn is a unitary matrix:


¡ iθσn ¢† iθσn ¡ ¢¡ ¢ ¡ ¢
e e = cos θ I − i sin θ σn cos θ I + i sin θ σn = cos2 θ + sin2 θ I. (2.14)

Moreover, eiθσn contains three free parameters (θ and the two angles required to spec-
ify the direction n). Given that any rotation can be specified by three parameters
(for example the Euler angles), we might suspect that the unitary matrix required to
generate any rotation can be obtained as eiθσn for appropriate θ and n. In fact, eiθσn
is the matrix that rotates the coordinates by angle −θ/2 about the axis n – as one
may easily verify when n is one of the coordinate vectors i, j or k.

Exercise (11):
Show that “rotating” η with the matrix
µ ¶
e−iφ/2 0
sz (φ) ≡ (2.15)
0 eiφ/2

has the effect of rotating the (x, y, z) coordinates through φ about the z axis.
What happens to η when the (x, y, z) axes are rotated through 2π?
Since the Pauli matrices enable us to generate any member of SU(2) through this
mechanism, we refer to them as the generators of SU(2). (To be pedantic, the
generators are 12 σi .)
Exponentiating θσn we obtain
³ θ2 ´ ³ θ3 ´
eθσn = 1 + + ··· I + θ + + · · · σn
2! 3! (2.16)
= cosh θ I + sinh θ σn .

The determinant of this matrix is 1:


¯ ¯
¯ cosh θ + nz sinh θ (nx − iny ) sinh θ ¯¯
|cosh θ I + sinh θ σn | = ¯¯
(nx + iny ) sinh θ cosh θ − nz sinh θ ¯ (2.17)
2 2
= cosh θ − n2z sinh θ − (n2x + n2y ) sinh2 θ = 1.

Hence through (2.10) eθσn generates a Lorentz transformation. To see which transfor-
mation we align the z axis with n. Then eθσn is a diagonal matrix and
µ ¶ µ ¶µ ¶
ct0 + z 0 x0 − iy 0 θσn ct + z x − iy cosh θ + sinh θ 0
=e
x0 + iy 0 ct0 − z 0 x + iy ct − z 0 cosh θ − sinh θ
µ ¶µ ¶
cosh θ + sinh θ 0 (ct + z)(cosh θ + sinh θ) (x − iy)(cosh θ − sinh θ)
=
0 cosh θ − sinh θ (x + iy)(cosh θ + sinh θ) (ct − z)(cosh θ − sinh θ)
µ 2

(ct + z)(cosh θ + sinh θ) (x − iy)
=
(x + iy) (ct − z)(cosh θ − sinh θ)2
(2.18)
20 Chapter 2: Groups & their representations

From the off-diagonal components of this equation, x0 = x, y 0 = y. Adding and


subtracting the diagonal components we learn that
µ ¶µ ¶
ct0 = ct(cosh2 θ + sinh2 θ) + z2 sinh θ cosh θ cosh 2θ sinh 2θ ct
= . (2.19)
2 2
z 0 = ct2 sinh θ cosh θ + z(cosh θ + sinh θ) sinh 2θ cosh 2θ z

Thus eθσn generates the boost along n with Lorentz factor γ = cosh 2θ and speed
β = tanh 2θ. We say that i 12 σn is the generator of this Lorentz transformation.
The boosts taken on their own do not form a group because the product of boosts
along two non-parallel axes cannot always be expressed as a boost along a third axis:
in general a rotation is required in addition to a boost.3 As a specific example, consider
the product
A ≡ e−θσy e−φσx eθσy eφσx , (2.20)
which effects a boost along the x axis, followed by one along the y axis, followed by
inverse bosts along the x and then the y axes. For infinitesimal θ, φ we have
B± ≡ e±θσy e±φσx = (I ± θσy + 12 θ2 I + · · ·)(I ± φσx + 21 φ2 I + · · ·)
(2.21)
= [1 + 21 (θ2 + φ2 )]I ± [θσy + φσx ] + θφσy σx + · · ·
Hence
A = B− B+ ' {[I + 21 (θ2 + φ2 )I + θφσy σx ] − [θσy + φσx ]}
× {[I + 12 (θ2 + φ2 )I + θφσy σx ] + [θσy + φσx ]} + · · ·
(2.22)
= [I + 21 (θ2 + φ2 )I + θφσy σx ]2 − [θσy + φσx ]2 + O(θ 3 )
= I + θφ[σy , σx ] + O(θ 3 ) = I − 2iθφσz + O(θ 3 )
Thus this sequence of boosts effects a rotation by angle 4θφ around z. Consequently,
boosts are inextricably intertwined with rotations, and we must consider the form
taken by a general Lorentz transformation, that is, a transformation that combines a
boost with a rotation. The natural object to consider is
M ≡ e(iθn+φm)·σ , (2.23)
which combines a boost along m with a rotation around n. A 2 × 2 complex matrix
is defined by eight real numbers, and when we require the matrix to have unit deter-
minant, we impose two restrictions on these numbers, leaving six degrees of freedom.
Equation (2.23) for M has six parameters, so any matrix with unit determinant should
be of this form. Consequently, the product of two objects of this type will be a third
object of the same type, so these objects provide a representation of the proper Lorentz
group.
Remarkably, (2.23) combines the pseudo-vector n with the polar vector m. If we
transform to axes that are mirror images of our original axes, n won’t change sign, but
m will, and M will change into
M 0 ≡ e(iθn−φm)·σ . (2.24)
3 This phenomenon is the origin of Thomas precession in the theory of spin-orbit coupling.
2.2 Spinor invariants 21

It follows that the objects M 0 must also provide a representation of the proper Lorentz
group. The representations provided by M and M 0 are inequivalent in the sense that
there is no matrix S such that M 0 = SM S −1 for all M .
There are two types of Pauli spinors. A right-handed Pauli spinor η R is trans-
formed by M under a Lorentz transformation, while a left-handed one η L is trans-
formed by M 0 :

ηR → ηeR = e(iθn+φm)·σ ηR ; ηL → ηeL = e(iθn−φm)·σ ηL . (2.25)

Under a coordinate inversion a right-handed spinor transforms into a left-handed


one, and vice versa. Consequently, the Pauli spinors of one type do not support a
representation of the full Lorentz group (the group you get by adding inversion through
the origin and time reversal to the proper Lorentz group). A Dirac spinor is a pair
of spinors, one of each type:
ψ = (ηR , ηL ). (2.26)
It has four components, the first two being the components of ηR , etc. We represent a
coordinate inversion by the operation of swapping ηR with ηL . This convention makes
sense because after a coordinate inversion ηR remains a right-handed Pauli spinor, but
because we are now using a left-handed coordinate system, its transformation rule is
the one we previously associated with a left-handed spinor. By moving η R to the lower
slot in ψ we arrange that we don’t have to change the transformation rules we apply
to the top & bottom slots. In summary, a coordinate inversion reprsented by
µ ¶µ ¶ µ ¶
e= 0 I ηR e=γ ψ 0 0 0 I
ψ→ψ or ψ where γ ≡ , (2.27)
I 0 ηL I 0

with I the 2 × 2 identity matrix. In this way Dirac spinors support a representation
of the full Lorentz group.

2.2 Spinor invariants

When we do Lagrangian field theory, we’ll be interested in Lorentz invariants. So now


we ask what invariants we can make out of spinors. If the Lorentz transformation
matrices M and M 0 were unitary (as they are for a pure rotation; φ = 0), η † · η would
be a Lorentz invariant. But in the presence of a non-zero boost, M and M 0 are not
unitary. Taking the Hermitian adjoints of equations (2.25) we find

† † † (−iθn+φm)·σ † † † (−iθn−φm)·σ
ηR → ηeR = ηR e ; ηL → ηeL = ηL e (2.28)

From equations (2.25) and (2.28) we see that under proper Lorentz transformations
† †
both ηL · ηR and ηR · ηL are invariant. To obtain a quantity that’s still invariant when
inversions are included, we add these two invariants. In terms of the adjoint spinor
† †
ψ ≡ ψ † γ 0 = (ηL , ηR ) (2.29)
22 Chapter 3: Lagrangian Dynamics

our invariant is
† †
ψ · ψ = ηL · ηR + ηR · ηL . (2.30)

We’ll also find it useful to know how to construct a 4-vector from a Dirac spinor.
Equations (2.6a) and (2.11) imply that under rotations η † Iη, η † σx η, η † σy η,and η † σz η
transform like the components of a four vector. How should we generalize these expres-
sions to the case in which right- and left-handed spinors are distinguishable because
boosts occur? A component of a vector should not be invariant, so contrary to what
happens in equation (2.30), the left and right spinors should be of the same handedness.
But both halves of the Dirac spinor must be used. Moreover, under interchange of η L
and ηR the time component should stay the same, while the space components should
† †
change sign. This suggests that the time component is ηR · ηR + ηL · ηL while the
† †
space components are ηR σi ηR − ηL σi ηL . To achieve this result in an elegant notation
we define three new matrices
µ ¶ µ ¶ µ ¶
1 0 −σx 2 0 −σy 3 0 −σz
γ ≡ ; γ ≡ ; γ ≡ . (2.31)
σx 0 σy 0 σz 0

Then bearing in mind the definitions (2.27) and (2.29) of γ 0 and ψ, we have that

† † † †
ψγ 0 ψ = (ηL , ηR )(ηL , ηR ) ψγ i ψ = (ηL , ηR )(−σi ηL , σi ηR ) (2.32)

as required – so ψγ µ ψ is a 4-vector.

Exercise (12):
Show that γ 0 γ i = −γ i γ 0 and that γ i γ j = −γ j γ i . (This anticommutation property
is often written {γ µ , γ ν } = 0.)
The spinor representation of the Lorentz group is fundamental in the sense that
every other representation can be constructed from it. We started by studying an
example of this phenomenon: the components of a second-rank tensor in spinor space
transform like the combinations ct + z, x − iy, etc, of the components of a 4-vector.
From the rule for transforming third-rank tensors on spinor space, we could extract
the spin- 23 representation of the Lorentz group, and so on. This corner of group theory
is taught in quantum-mechanics courses under the heading of ‘addition of angular
momenta’. The total spin angular momentum of two spin-half particles can be zero
(spin-0 representation of the LG) or one (spin-1 rep.). With three spin-half particles
the possible spin angular momenta are 32 , 1 and 0 because a third-rank tensor in spinor
space contains the components of a 4-vector (which comes with a free scalar) as well
as the 4 components of a spin- 23 object.
The spin-n representations of the Lorentz group have a special property: they
are irreducible (or an irrep for short) in the sense that no linear subspace of the
representing space is invariant under the action of the matrices of the representation.

3 Lagrangian Dynamics
3.1 Single charged particle with given e.m. field 23

Box 1: Functionals and the Euler–Lagrange Equations

Let y(t) be a function of the scalar parameter t. Then a functional F [y(t)] is some
rule that
R t2 assigns to eachR tfunction y a single number.
R t2 For example F might be
2
F1 ≡ t1 dt y(t) or F2 ≡ t1 dt y(t)ẏ(t) or FK ≡ t1 dt K(t)y(t), where K(t) is any
given function, or Fab ≡ y(a) − y(b), where a and b are any two given values of
t. The function y(t) may be scalar-, vector- or even tensor-valued. Vector-valued
functions y(t) can be thought of as paths.
Physicists are particularly interested in extremizing functions of the type
Z
F [y(t)] = dt f (y, ẏ), (B1.1)

where f is a known function of two variables. That is, they wish to find the func-
tion y(t) such that F [y(t)] takes a larger/smaller value than all nearby functions.
The calculus of variations shows that the extremizing function is the one that
satisfies the Euler–Lagrange (EL) equation:
d ³ ∂f ´ ∂f
− = 0. (B1.2)
dt ∂ ẏ ∂y
For given f this is an o.d.e. for y(t).

The sharp predictions that are characteristic of classical physics arise because destruc-
tive quantum interference excludes practically every future configuration of a system:
a shell will blast through one spot on the roof of a dugout because it is at this spot
alone that the quantum amplitudes for the shell’s presence interfere constructively.
Even in classical physics the most elegant way to do dynamics is to write down an
expression for the phase of this amplitude for each path by which the system might
travel between initial and final configuration, and find for what path it is stationary
and constructive interference is possible.
This phase times h̄ is called the action S. It is a scalar and is obtained by
integrating along the prospective path the rate of change of phase with proper time, s:
Z
S = dτ s. (3.1)

Since S and τ are scalars, s must be too.

3.1 Single charged particle with given e.m. field


To determine s we have only to ask what scalars can be constructed from the world-line
x(τ ) and quantities such as A, F associated with the e.m. field.
First we note that S shouldn’t depend on our choice of origin, so only derivatives
ẋ, ẍ etc should occur in s, not x itself. Furthermore, the EL eqn (Box 1) involves
differentiation with respect to the variable that parameterizes position along the ex-
tremal path, in this case τ . So we will get as 2nd -order eqn of motion, if s depends on
24 Chapter 3: Lagrangian Dynamics

ẋ, but not on higher derivatives of x(τ ). Similarly, the EL eqn involves differentiation
w.r.t. the general position vector x, so if the eqn of motion is to depend on F and not
its derivatives, s should depend on A but not F. So the invariants to consider are
(i) |ẋ|2 = −c2 , (ii) ẋ · A and (iii) |A|2 . We further require that any gauge-dependent
contribution to S should be path-independent. ẋ · A satisfies this requirement, while
|A|2 does not.

Exercise (13):
Show that the gauge-dependent contribution to S from ẋ · A is path-independent,
while the gauge-dependent contribution from a term proportional to |A| 2 would
not be path-independent.
So the simplest thing to try is
Z
S = dτ (−m0 c2 + q ẋ · A), (3.2)

where we’ve included the rest mass m0 for future convenience and q is some constant.
Unfortunately we cannot apply the EL eqn (Box 1) to (3.2) as it stands because
we want to hold constant the events of arrival and departure, x1 and x2 , rather than
the proper-time elapse between these events. So we have first to eliminate τ from (3.2)
in favour of some parameter λ that always runs over the same range, say, 0 to 1. Using
s ¯ ¯
dτ 1 ¯ dx ¯
= − ¯¯ ¯¯, (3.3)
dλ c dλ

we have Z Ã r !
1
dxµ dxν dxµ
S= dλ −m0 c −ηµν +q Aµ , (3.4)
0 dλ dλ dλ

which is now in a form that to which we can apply the EL eqn. Since

∂ p η ẋν + ηµβ ẋµ


µ ẋν = − βν
−ẋβ dxβ /dλ
β
−η µν ẋ p =p =− ,
∂ ẋ µ
2 −ηµν ẋ ẋ ν µ
−ηµν ẋ ẋ ν cdτ /dλ

the EL eqn yields µ ¶


d dxβ dxµ ∂Aµ
m0 + qAβ −q = 0. (3.5)
dλ dτ dλ ∂xβ
Multiplying through by dλ/dτ this becomes
µ ¶
d dxβ dxµ ∂Aµ
0= m0 + qAβ − q
dτ dτ dτ ∂xβ
µ ¶
dvβ dxµ ∂Aβ ∂Aµ
= m0 +q − (3.6)
dτ dτ ∂xµ ∂xβ
dvβ dxµ
= m0 +q Fµβ .
dτ dτ
3.2 Principles of Lagrangian field theory 25

Thus our action gives the required equation of motion.


Since Aµ = (−φ/c, A), with λ = t, (3.4) can be written
Z h p i
S = dt −m20 c2 1 − v 2 /c2 + q(−φ + v · A)

If the field is electrostic (A = 0) and the motion is non-relativistic, the action is


Z
£ ¤
S ' dt − m0 c2 + L(x, ẋ, t) , where L(x, ẋ) ≡ 12 m0 ẋ2 − qφ(x, t). (3.7)
R
Since dt m0 c2 is the same for all paths that start and finish at the given events, it
plays no role in picking out the true path. So it can be dropped, and we obtain the
familiar principle of least action:
Z
δS = 0 where S ≡ dt L(x, ẋ, t). (3.8)

The function L is called the Lagrangian. By (3.7) it is in this case the difference
between the particle’s kinetic and potential energies.
Starting with an action has many advantages:
• Since L is a scalar, transforming to new coordinates is easy;
• It’s easy to ensure that the eqns of motion are Lorentz invariant (or Gallilean
invariant as appropriate) by imposing the desired invariance on L;
• Given the required invariance and the basic form of the desired eqns (second-order,
linear, say) only a few simple expressions are candidates for Lagrangians;
• Certain constants of motion can be readily derived from evident symmetries of L
(Noether’s theorem).

3.2 Principles of Lagrangian field theory


How do we obtain partial differential eqns such as the wave eqn or Maxwell’s eqns from
Lagrangians? Specimen problem: derive the wave eqn

1 ∂2φ ∂2φ
− = 0. (3.9)
c2 ∂t2 ∂x2

Regard φ(t, x) as a set of ∞-dimensional vectors φx (t), where x labels components


of φ. The Lagrangian has to be a scalar, so φ’s indices have to be ‘soaked up’ somehow.
We make a scalar out of an ordinary vector by dotting it with another vector—this
soaks up the indices of both vectors by introducing a sum over that index. Analogously,
we soak up indices x with generalizations of dot products; that is, one sums over x by
means of an integral:
X Z
s=a·b= ai bi ↔ s = (ψ, φ) = dx ψ(x)φ(x). (3.10)
i
26 Chapter 3: Lagrangian Dynamics

This leads one to expect that many (but not all) actions for partial differential equations
are evaluated by integrating a Lagrangian density L over space before performing the
usual integral over time: Z Z
S[φ] = dt dx L(φ, φ̇). (3.11)

In Lagrangian mechanics, S is a functional of the particle’s history x(t). Now S is


a functional of the field’s history φ(t, x). So φ has stepped into x’s place, and x has
become an independent variable with a similar standing to that of t. Consequently, in
(3.11) we’re integrating over both space and time.
In order to make the symmetry between x and t complete we henceforth allow L
to involve derivatives w.r.t. x as well as w.r.t. t; then L = L(φ, ∂ µ φ) and
Z
S[φ] = dt dx L(φ, ∂µ φ). (3.12)

Finally, it doesn’t make things significantly more complicated to allow space to be fully
three-dimensional. So x becomes the 3-vector x and (ct, x) becomes the usual 4-vector
x. Since d4 x = cdtd3 x, we write simply
Z
1
S[φ] = d4 x L(φ, ∂µ φ). (3.13)
c

At each t between ti and tf the field’s configuration φ(t, x) is chosen such that
the integral (3.13) through the space-time volume bounded by t = ti and t = tf is
extremized:

As in Lagrangian mechanics we are specifying a solution to the 2 nd order equations


of motion by giving values of the ‘coordinates’ at two times, t i and tf , rather than the
coordinates and velocities at a single time. In this case specifying the ‘coordinates’
involves giving the functional dependence of φ on x at some fixed t.
Here’s how we extremize S:

0 = δS = S[φ + ψ] − S[φ] where |ψ(t, x)| ¿ |φ(t, x)|


Z µ ¶
1 4 ∂L ∂L
' d x ψ+ ∂µ ψ
c ∂φ ∂(∂µ φ) (3.14)
Z µ ¶ I
1 ∂L ∂ ∂L ∂L
= d4 x − µ
ψ + d3 x µ ψ.
c ∂φ ∂x ∂(∂µ φ) ∂(∂µ φ)
H
Here the final integral is over the closed 3-surface that bounds the 4-dimensional
region of space-time through which L is integrated. The surface consists of the initial
and final hypersurfaces, and the 3-surface swept out by a 2-surface at spatial ∞ as t
3.4 Klein-Gordon equation 27

varies from ti to tf . This integral vanishes because ψ is zero throughout the domain
integrated over: the variation ψ vanishes on the initial and final hypersurfaces by
hypothesis, and we force it to vanish at spatial ∞ also in order to ensure that the
varied field φ + ψ satisfies the same bdy condition as the unvaried field φ. Thus
Z µ ¶
1 4 ∂L ∂ ∂L
δS = d x − ψ (3.15)
c ∂φ ∂xµ ∂(∂µ φ)

If this is to hold for any ψ(t, x) that vanishes on the initial and final hypersurfaces, we
clearly require that
∂L ∂ ³ ∂L ´
− = 0. (3.16)
∂φ ∂xµ ∂(∂µ φ)
This p.d.e. is the Euler-Lagrange equation for a field. It is the field equation that
follows from the Lagrangian density L.

3.3 Real scalar field


What p.d.e.s can we derive from a Langrangian density for a real scalar field φ? The
scalars to consider are φ itself and powers of φ. The only way to make a scalar out of
the gradient ∂µ φ is to contract it on itself. Consider therefore
¡ ¢
L = 21 (−|∂φ|2 − K 2 φ2 ) = 12 −η αβ ∂α φ∂β φ − K 2 φ2 , (3.17)

where the sign of |∂φ|2 has been chosen so that its contributions to L are k.e. − p.e.
and the term with the constant K is the field’s self-energy. Then ∂L/∂φ = −K 2 φ and
∂L/∂(∂µ φ) = − 12 (η µβ ∂β φ + η αµ ∂α φ) = −∂ µ φ, so (3.16) yields

∂2φ
0 = −∂µ ∂ µ φ + K 2 φ = − ∇2 φ + K 2 φ
∂x0 2
1 ∂2φ
= − ∇2 φ + K 2 φ.
c2 ∂t2
Thus the wave equation emerges with K = 0 from the Lagrangian density which is the
simplest possible function of ∂µ φ only. If K 6= 0 waves are evanescent (complex k) if
ω < Kc, just as electromagnetic waves are evanescent in a plasma below the plasma
frequency.

3.4 Klein-Gordon equation


What p.d.e.s can we derive for a complex-valued scalar field ψ? Minor generalizaton
of our work on the real scalar field leads us to
³ ´
1 2 2 2
L(ψ, ∂µ ψ) = − 2 |∂ψ| + K |ψ| . (3.18)

By |∂ψ|2 we mean
1 ∂ψ ∗ ∂ψ
|∂ψ|2 = − 2
+ ∇ψ ∗ · ∇ψ. (3.19)
c ∂t ∂t
28 Chapter 3: Lagrangian Dynamics

Differentiating w.r.t. ψ is slightly tricky because ψ ∗ is a function ψ ∗ (ψ) of ψ. We


handle this by writing ψ = u + iv and treating the real and imaginary parts of u and
v as independent real fields:

∂|ψ|2 ∂ 2
= (u + v 2 ) = 2u,
∂u ∂u (3.20)
∂|ψ|2
= 2v.
∂v

Further
|∂ψ|2 = ∂(u − iv) · ∂(u + iv) = |∂u|2 + |∂v|2 .

So
∂|∂ψ|2 ∂|∂ψ|2
= 2∂ µ u ; = 2∂ µ v. (3.21)
∂(∂µ u) ∂(∂µ v)

Hence the field eqns are



∂ µ 2 
∂ u − K u = 0 
∂xµ ⇒ ∂µ ∂ µ ψ − K 2 ψ = 0. (3.22)
∂ µ 
µ
∂ v − K 2v = 0 
∂x

Spin-0 particles of mass m0 are excitations of a scalar field that satisfies p̂2 ψ =
−m20 c2 ψ.Substituting Ê = ih̄∂t and p̂i = −ih̄∂i this becomes the Klein-Gordon eqn

m20 c2
∂µ ∂ µ ψ = ψ. (3.23)
h̄2

The K–G eqn is obtained from equations (3.22) by setting K = m0 c/h̄.


The following result simplifies the variation of an action that depends on a complex
field ψ. Suppose δf (ψ, ψ ∗ ) = 0. We have

∂f ∂f
0 = δf = (δu + iδv) + (δu − iδv).
∂ψ ∂ψ ∗

Since δu and δv are arbitrary, we conclude

∂f ∂f ) ( 0 = ∂f
0= +
∂ψ ∂ψ ∗ ∂ψ

∂f ∂f ∂f
0= − 0= .
∂ψ ∂ψ ∗ ∂ψ ∗

Thus we can proceed as though δψ and δψ ∗ were independent, though they are not.
3.6 Maxwell’s equations 29

3.5 Dirac equation

In §2.2 we saw that from a Dirac spinor we can construct the scalar ψ · ψ and the
vector ψγ µ ψ. In light of our discussion of the Klein–Gordon equation it is natural
to take the potential energy density of a Dirac field to be proportional to ψ · ψ. For
the kinetic term we could choose (∂ µ ψ)(∂µ ψ) but a simpler choice is iψγ µ ∂µ ψ, where
the factor i is inserted for later convenience. Consider therefore the field equation that
follows from
m0 c
L = ψiγ µ ∂µ ψ − ψ · ψ. (3.24)

A variation of ψ induces a corresponding variation in ψ and thus causes L to change
by ³
µ m0 c ´ ³
µ m0 c ´
δL = δψ iγ ∂µ − ψ + ψ iγ ∂µ − δψ (3.25)
h̄ h̄
Suppose we choose to vary only the first component of ψ, that is we choose δψ = (a +
ib, 0, 0, 0), where a and b are real functions on space-time. Then δψ = (a − ib, 0, 0, 0)γ 0 .
We consider two variations, one with b = 0 and then one with a = 0 and b set equal
to the function a that we used in the first case. From the stationarity of the action it
follows that
Z n ³ m0 c ´ ³ m0 c ´ o
0 = d4 x (a, 0, 0, 0)γ 0 iγ µ ∂µ − ψ + ψ iγ µ ∂µ − (a, 0, 0, 0)
h̄ h̄
Z n ³ ´ ³ o (3.26)
4 0 µ m0 c µ m0 c ´
0 = d x (−a, 0, 0, 0)γ iγ ∂µ − ψ + ψ iγ ∂µ − (a, 0, 0, 0) .
h̄ h̄

Subtracting the equations and exploiting the arbitrariness of a(x), we obtain


³ m0 c ´
0 µ
0 = (a, 0, 0, 0)γ iγ ∂µ − ψ. (3.27)

Repeating this exercise for each component of ψ, we obtain the Dirac equation
³ m0 c ´
0 = iγ µ ∂µ − ψ. (3.28)

As in the case of the Klein–Gordon action, the equation we get at the end is the one
we would have obtained if we had (incorrectly) argued that ψ and ψ are independent
variables.

Exercise (14):
Show that when we add equations (3.26) we obtain

m0 c
0 = −∂µ ψiγ µ − ψ

and show that this is just the adjoint of the Dirac equation. [Hint: (γ 0 γ µ )† =
γ 0 γ µ .]
30 Chapter 3: Lagrangian Dynamics

3.6 Maxwell’s equations


What about Maxwell’s equations? These are 2nd order in A, so we look for a La-
grangian density L that depends on A and its derivatives, ∂µ A. Moreover, Maxwell’s
eqns are linear in the fields, and thus in A. So L should be quadratic in A and ∂ µ A.
Finally, L should be invariant under gauge transformations A → A 0 + ∂Λ, and should
involve ∂µ A only in the combination contained in F. The shortlist of functions satis-
fying these criteria contains (up to an unimportant normalization) only one candidate:

1
Lvac (A, ∂µ A) = Tr F · F
4µ0
1
=− Fµν F µν (3.29)
4µ0
1
= (E 2 /c2 − B 2 ),
2µ0

where the last equality is from (1.20). (Notice that if we associate E with kinetic
energy (E = −Ȧ/c + · · ·) and B with potential energy, Lvac is of the form k.e. − p.e..)
The field equations associated with the Lagrangian (3.29) density are

∂ ³ ∂Lvac ´
= 0.
∂xβ ∂(∂β Aµ )

Now
∂Fµν ∂ ¡ ¢
= ∂µ Aν − ∂ ν Aµ
∂(∂β Aα ) ∂(∂β Aα ) (3.30)
= δµβ δνα − δνβ δµα ,
so
∂Lvac 1 ∂(Fµν F µν ) 1 ∂(Fµν η µκ η νλ Fκλ )
=− =−
∂(∂β Aα ) 4µ0 ∂(∂β Aα ) 4µ0 ∂(∂β Aα )
1
=− (δµβ δνα − δνβ δµα )F µν
2µ0 (3.31)
1
=− (F βα − F αβ )
2µ0
1 αβ
= F .
µ0
The field equations are therefore
∂F αβ
= 0, (3.32)
∂xβ
that is, 4 of Maxwell’s 8 field eqns for an e.m. field in vacuo.
To get Maxwell’s eqns in the presence of charges we need to add to the action S
obtained by integrating (3.29) over spacetime, the action of particles in a given e.m.
field. For a single charged particle the latter is given by (3.2). What does this suggest
for the action associated with a swarm of particles of charge q, mass m 0 that are
3.7 Noether’s theorem for internal symmetries 31

moving with 4-velocity v(x) and in their rest-frame have number density n(x)? Well,
the form of (3.2) suggests that the part of L which depends on both the e.m. field
and the particles (the ‘interaction term’), is proportional to the dot product of A with
the current density j = qn0 v associated with the particles. So we speculate that the
interaction term is j · A. The current density contributed by a particle of charge q that
moves on the world-line X(τ ), is
Z Z
j(x) = qc dτ Ẋδ(x − X) = qc dX δ(x − X) . (3.33)

Exercise (15):
Check the validity
R of (3.33) by (i) showing that it is dimensionally correct, (ii)
showing that d3 x j = q(dX/dt), i.e., the total current is just q times the Newto-
nian velocity, and (iii) showing similarly that the total charge in any spatial slice
is always q.
Using this result, the contribution to the action from our conjectured term is
Z
1 ¯
Sinteraction = d4 x (j · A)¯x
c
Z
=q d4 x dτ Ẋ · A(x)δ(x − X) (3.34)
Z
=q dτ Ẋ · A(X)

which agrees with (3.2).


So long as we are only interested in getting the field eqns, which are obtained
by varying A, we don’t need to bother with the contribution to S from matter alone
(which is independent of A). So let’s see whether this action begets Maxwell’s eqns
with sources: Z
1 ¡ 1 ¢
S= d4 x j · A + Tr F · F . (3.35)
c 4µ0
Varying A with the aid of previous results, the field eqns are found to be

1 ∂Fµν
jµ − =0 (3.36)
µ0 ∂xν

in agreement with (1.48). The other four Maxwell’s eqns don’t come from minimizing
the action but from the fact that F is the 4-curl of A. So they are geometrical rather
than dynamical in nature.

3.7 Noether’s theorem for internal symmetries

Does Noether’s theorem for the Lagrangians of particle motion extend to Lagrangian
densities for fields? Actually it yields two closely related results: one for internal
32 Chapter 3: Lagrangian Dynamics

symmetries and one for external symmetries, such as Lorentz invariance. We deal with
internal symmetries first.
Often L(A, ∂µ A) is invariant under some transformation of the field A. For
example, in the case of e.m. L is invariant under A → A + ∂Λ where Λ(x) is any scalar
function.4 Whenever there is a point-by-point invariance of this type, we can write
∂L ∂L
0 = δL = · δA + · δ(∂µ A)
∂A ∂(∂µ A)
∂ ³ ∂L ´ ∂L
= µ
· δA + · ∂µ (δA) (3.37)
∂x ∂(∂µ A) ∂(∂µ A)
∂ ³ ∂L ´
= δA · ,
∂xµ ∂(∂µ A)
where the field eqns (3.16) have been used. The final line states that the current
density j µ has vanishing divergence, where
∂L
j µ ≡ δA · . (3.38)
∂∂µ A
R R
The vanishing of ∂ · j implies that the integral J ≡ d3 xµ j µ ≡ dxα dxβ dxγ j µ ²µαβγ
is the same for any two large 3-dimensional spatial slices: Given two such slices we
can extend these into the closed surface bounding a spacetime volume by adding the
3-surface formed by a very large spherical shell as it propagates in time from one spatial
slice to the other [see fig. above (3.14)]. ∂ · j = 0 implies that the flux into this volume
has to equal that out of it, so provided j vanishes on the shell, the flux in through the
earlier spatial slice has to equal the flux out through the later slice. Thus the internal
symmetry of L has generated a conserved flux J.

E.m. charge conservation How does this idea work out in e.m? Setting δA =
∂Λ, we have
¡ ¢ ∂Lvac
j µ = ∂α Λ
∂(∂µ Aα )
(3.39)
1 ¡ ¢ αµ
= ∂α Λ F ,
µ0
where use has been made of (3.31). Equating to zero the divergence of this we find
that
∂2Λ αµ ∂Λ ∂F αµ
0= F +
∂xµ ∂xα ∂xα ∂xµ
∂Λ
= α
∂µ F αµ ,
∂x
where the first term on the right has been eliminated by virtue of F’s antisymmetry.
Since we can arrange for ∂Λ to be any vector at a given point, (3.39) implies that
∂µ F αµ = 0. This is just (3.32), the standard field eqn for e.m. in vacuo.
To obtain a more interesting Noether invariant one has to start from L for the
e.m. field plus a matter field, say ψ.
4 Notice the difference with the least-action principle, which states that 0 = cδS = δ
R 4
d x L for
any variation δ A; for most variations, L changes at each point, it is just its integral which is invariant.
3.7 Noether’s theorem for internal symmetries 33

Klein-Gordon current The Klein-Gordon L (3.18) is invariant under changes


in the phase of ψ, i.e., ψ → eiθ ψ. When θ is small we have δu + iδv = δψ ' iθψ, so
the changes in the real and imaginary parts of ψ are
δu = −θv ; δv = θu. (3.40)
Since we are considering L to be a function of (u, v) and their derivatives, the dot
in (3.38) has to be interpreted as a sum over u and v. Using our results (3.21) we find
that the conserved current is
jµ = δu∂µ u + δv∂µ v
³ ∂u ∂v ´
=θ −v µ +u µ (3.41)
∂x ∂x
θ ³ ∂ψ ∂ψ ∗´
= ψ∗ µ − ψ µ .
2i ∂x ∂x
It is simple to verify ∂ · j = 0 by taking the divergence and using the Klein-Gordon
equation and its complex conjugate to eliminate 2.
Consider the particle flux through some small region W of spacetime. To the
past and future W is bounded by the 3-dimensional sets of events that occur in some
physical container (an empty beer can?) at the times t0 and t1 > t0 in the can’s
(0) (1)
rest frame. In spacetime these sets are represented by 4-vectors Vµ and Vµ . We
(0) (1)
orientate Vµ so that it points into the past, while Vµ looks to the future. Since the
contents of the can may not be uniform, we decompose both V (0) and V(1) into a large
number of small pieces dV, each centred on a different position within the can. The
balance of W ’s boundary comprises the 3-dimensional set of events that occur on the
can’s surface at times between t0 and t1 . We represent this part of W ’s boundary by
(s)
elements dVµ (x), each of which points out of the can.
R (0)
The number of particles in the can at t0 is N (0) = − can,t0 dVν j ν , while the
R (1)
number present at t1 is N (1) = can,t1 dVν j ν . If particle number is to be conserved,
the difference N (1) − N (0) must represent the number of particles that that flow into
the can between t0 and t1 . Thus particle conservation requires that
ZZZ ZZZ ZZZ
(1) ν (0) ν
dVν j + dVν j = − dVν(s) j ν .
surface
can,t1 can,t0 t0 <t<t1
34 Chapter 3: Lagrangian Dynamics

Thus in a natural notation we have


I
dVν j ν = 0 ⇔ ∂ν j ν = 0. (3.42)

This discussion and equation (3.41) show that

θ ³ ∗ ∂ψ ∂ψ ∗ ´
j0 = ψ −ψ (3.43)
2ic ∂t ∂t

is proportional to the particle density in the coordinate rest-frame, and because in that
frame dV (s) = d2 xi cdt, the flux of particles in the coordinate rest frame is proportional
to
θc ³ ∗ ∂ψ ∂ψ ∗ ´
ji = ψ − ψ . (3.44)
2i ∂xi ∂xi
By comparison, non-relativistic quantum mechanics yields for Hamiltonian H =
p2 /2m Z Z
d ³ ∂ψ ∗ ´
3 2 3 ∗ ∂ψ
d x |ψ| = d x ψ+ψ
dt ∂t ∂t
Z ³ Hψ ∗ ´
3 ∗ Hψ
= d x ψ+ψ
−ih̄ ih̄
Z ³ ´ (3.45)
h̄ 3 2 ∗ ∗ 2
= d x (∇ ψ )ψ − ψ ∇ ψ
2im
I ³ ´

= d2 xi (∇i ψ ∗ )ψ − ψ ∗ ∇i ψ .
2im
Hence the Klein-Gordon expression for the particle flux is essentially identical with the
non-relativistic one, but the expressions for the particle density are rather different in
the two cases.

3.8 Noether’s theorem and Lorentz invariance

The Lagrangian density L of a Lorentz-covariant theory depends on x only through


the field A and its derivatives, i.e., it has no explicit space-time dependence. Consider
an infinitesimal shift in the coordinate origin which changes the coordinates of the
point x to x0 ≡ x + a, where a is very small. Then the difference in the value of L at
x and at the point x + a whose coordinates in the unprimed frame coincide with x’s
coordinates in the primed frame is
³ ∂L∂A ∂L ∂(∂ν A) ´ α
δL = ·+ · a
∂A ∂xα ∂(∂ν A) ∂xα
µ ¶
∂ ³ ∂L ´ ∂A ∂L ∂2A
= · + · aα (3.46)
∂xν ∂(∂ν A) ∂xα ∂(∂ν A) ∂xα ∂xν
∂ ³ ∂L ∂A ´ α
= · a .
∂xν ∂(∂ν A) ∂xα
3.8 Noether’s theorem and Lorentz invariance 35

On the other hand, if we simply regard L as a function of x through the fields, we have
∂L ∂ ¡ ν α¢
δL = aα α
= Lδα a . (3.47)
∂x ∂xν
Equating these two expressions for δL we have
∂ ³ ∂L ∂A ´
0= ν
· α
ν
− Lδα aα . (3.48)
∂x ∂(∂ν A) ∂x
Furthermore, a is an arbitrary small vector so its coefficient in (3.48) must vanish.
Thus from the fact that L depends on x only through the fields we can conclude that
the tensor
µ ¶
ν ∂L ∂A ν ∂L
T̂ µ ≡ − · µ
− Lδµ ⇒ T̂ 00 = · Ȧ − L (3.49)
∂(∂ν A) ∂x ∂ Ȧ

has vanishing divergence: ∂ν T̂ ν µ = 0. T is the canonical energy-momentum


tensor. The vanishing of its divergence expresses energy-momentum conservation in
the same way that ∂ν j ν = 0 implies conservation of particles – the difference between
the two cases is that ∂ν T̂ ν µ = 0 implies conservation of four quanties: energy and x,
y and z momentum. Notice the similarity of (3.49) to the conventional definition of a
system’s Hamiltonian: H = pµ q̇ µ − L.
Again using (3.31), we find for the canonical energy-momentum tensor of the e.m.
field
1 ³ αν ∂Aα ´
T̂ ν µ = − F + 1
4 F αβ F αβ ν
δ µ . (3.50)
µ0 ∂xµ
Even when we lower T’s first index by premultiplying by ηκν , this isn’t symmetric
like the T of §1.4. We’d very much like T̂ to be symmetric, if only because Einstein’s
equations require it to be so. Also we’d like the energy-momentum tensor to depend
on A only through F. We can attain both goals by adding into T̂ what’s necessary to
upgrade the derivative of A in the first term into an F. The required item is
1 αν ∂Aµ
∆ν µ = F . (3.51)
µ0 ∂xα
In the absence of sources (which is when we would expect the energy-momentum tensor
to be divergence-free) ∆ is itself divergence free:

1 ∂ 2 (F αν Aµ )
∂ ν ∆ν µ = = 0. (3.52)
µ0 ∂xν ∂xα

So if we define T ≡ T̂ + ∆, T will be symmetric and divergence-free in vacuo. The


energy-momentum tensor of the e.m. field is then
1 ³ αν ∂Aα αν ∂Aµ
´
Tµν = − F − F + 1 αβ
4 F F δ
αβ µ
ν
µ0 ∂xµ ∂xα
(3.53)
1³ αν 1 αβ ν
´
=− Fµα F + 4 F Fαβ δµ
µ0
36 Chapter 3: Lagrangian Dynamics

in agreement with (1.28).


When charges are present, the field Lagrangian density includes a term j · A which
breaks translational invariance if j is regarded as fixed. Consequently, the energy-
momentum tensor made out of j · A + Lvac does not have vanishing divergvence.
Physically, this is because the e.m. field is exchanging energy and momentum with
the charges. If we add a term to the Lagrangian that describes the dynamics of the
charges, the entire Langrangian – charges plus interaction plus vacuum field – will
be translationally invariant and give rise to an energy-momentum tensor that has
vanishing divergence. In fact, we cannot regard a system as isolated until it has been
expanded to the point that its Lagrangian is translationally invariant, and gives rise
to a conserved energy-momentum tensor.
We shall see below that one of the strange features of gravity is that a system that
interacts with other systems only gravitationally has a conserved energy-momentum
tensor even though, physically, it is exchanging energy and momentum with other
systems – for example by emitting gravitational radiation. This singular feature of
gravity makes it very hard to pin energy down in G.R.
4.1 Newton’s Theory 37

4 Newton’s Theory & the Principle of


Equivalence

4.1 Newton’s Theory


According to Newton, every body attracts every other body with a force that is pro-
portional to the product of the masses of the two bodies and inversely proportional to
the square of the distance between them. Hence the force on a unit mass at x that is
generated by a distribution of matter of density ρ(x0 ) is
Z
x0 − x
f (x) = G ρ(x0 ) d3 x0 , (4.1)
|x0 − x|3

where G = 6.672(4) × 10−11 m3 kg−1 sec−2 is Newton’s constant. If we define the


gravitational potential Φ(x) by
Z
ρ(x0 ) 3 0
Φ(x) = −G d x,
|x0 − x|

and notice that µ ¶


1 x0 − x
∇x 0
= 0 ,
|x − x| |x − x|3
we find that we may write f as
Z
Gρ(x0 ) 3 0
f (x) = ∇x d x
|x0 − x| (4.2)
= −∇Φ.

If we take the divergence of equation (4.1), we find


Z µ 0 ¶
x −x
∇ · f (x) = G ∇x · ρ(x0 ) d3 x0 . (4.3)
|x0 − x|3

But
µ ¶
x0 − x
∇x · = −4πδ(x0 − x) (where δ is the Dirac δ-function) (4.4)
|x0 − x|3

as one may show, on the one hand by evaluating the derivative at x 6= x 0 , and on
the other hand by using the divergence theorem to integrate the left side through a
small sphere centred on x = x0 . Combining equations (4.2), (4.3) and (4.4) we obtain
Poisson’s equation
4πGρ = ∇2 Φ = −∇ · f . (4.5)

Elegant though it is, this equation cannot represent the whole truth about grav-
itational physics since it is not constructed according to the rules of tensor calculus
38 Chapter 4: Newton’s Theory & the Principle of Equivalence

summarized in §2.7; if the right side of equation (5) is to form an n-tuple, it must
form a scalar since it has only one component. On the other hand, since mass is just
a manifestation of energy, we expect the quantity ρ appearing on the left side of equa-
tion (5) to represent energy density, and this we know to form the 00-component of
the 10-tuple T. So we either have to think of some scalar thing to put on the left in
the place of ρ, or we have to augment Φ with a whole bunch of extra potentials, its
companions in some new 10-tuple g, and somehow extend the single equation (4.5) to
a set of ten equations from which the whole set of potentials can be determined.
Consideration of the predicament of a physicist who knows about relativity and
electrostatics but not about magnetism will clarify this point. This person looks at the
electrostatic form of Poisson’s equation

∇2 φ = −q/²0 , where q is charge density,

and thinks
“ q isn’t a scalar because of the Lorentz-Fitzgerald contraction (in fact, q is the 0 th
component of the current density j),5 so φ can’t be a scalar either. Seems I’ll have
to augment φ with three other potentials, say Ax , Ay and Az . Then that ∇2 won’t
do either, because it’s no kind of n-tuple. I’ll replace it with the d’Alembertian,
which is a scalar. Then I’ll have
³ 1 ∂2 ´ q ³ 1 ∂2 ´ ji
∇2 − 2 2 φ = − and ∇2 − 2 2 A i = − . ”
c ∂t ²0 c ∂t ²0

By this point our friend would be well on the way to a Nobel prize.
We shall see that the natural generalization of this argument to the case of gravity
yields
³ 1 ∂2 ´
∇2 − 2 2 g = constant × T.
c ∂t
However, Einstein showed that the way forward is not to tinker thus with Newtonian
gravity, but to assign to the gravitational force a unique position as the force generated
by the very dynamics of spacetime itself. The stimulus for this remarkable intellectual
leap was the modern form of Galileo’s famous observation that all bodies fall at the
same speed.

4.2 The Principle of Equivalence

Inertial & gravitational mass As conventionally stated Newton’s laws of mo-


tion are part definition and part empirical law. The purely empirical content can be
summed up by the statements:
(i) the more carefully one isolates a body from external influences, the more nearly
does its velocity v remain constant;
5 See equation (1.47).
4.3 Dicke–Eötvös Experiments 39

(ii) when several otherwise isolated bodies α = 1, . . . , N interact with one another,
P it is
possible to assign a number mα to each body such that the quantity p ≡ α mα vα
remains constant.
We call mα the inertial mass of body α. When bodies are interacting, and therefore
have changing individual momenta pα ≡ mα vα , it is convenient to imagine that they
are
P acting on one another with a quantity “force”, fα ≡ dpα /dt. By statement (ii),
α fα = 0.
Again according to Newton, the gravitational force between bodies α and β is
xα − x β
fαβ = F ,
|xα − xβ |3

where the constant F = GMα Mβ is proportional to the product of two numbers Mα


and Mβ characteristic of the bodies—we call these masses gravitational masses
of the bodies. If we place two bodies β and γ at the same distance from α, their
accelerations will be in the ratio
|dvβ /dt| Mβ m γ Γβ Mν
= = , where Γν ≡ .
|dvγ /dt| m β Mγ Γγ mν

Thus β and γ will fall towards α at the same rate only if Γβ = Γγ . Newton followed
Galileo in thinking that all bodies fall at the same rate, and therefore assumed (with
a suitable choice of G) that Γ = 1 for all particles. But in the 17th century the
experimental basis of this step was not strong.

4.3 Dicke–Eötvös Experiments


The most straightforward way to check whether Γ is the same for all masses is to
compare the periods of pendulums made of different materials but having the same
lengths. However, the impossibility of eliminating frictional resistance to the motion
of a pendulum severely restricts the accuracy that can be attained in experiments of
this kind.
In 1890 a Hungarian, Baron Roland v. Eötvös carried out a much more sensitive
test of the proportionality of inertial and gravitational mass. A modified form of this
experiment was repeated with greater accuracy by Robert Dicke and his students in
Princeton in the 1960’s.
Fig. 3 shows a schematic apparatus for the Dicke experiment. Two balls of ap-
proximately equal weight are attached to the ends of a short rod. This is attached to
a wire so that it can execute torsional oscillations about a vertical axis. For simplicity
we assume that a new moon is nearly eclipsing the Sun at the time of the experiment,
which begins at dusk. Then in the lower panels the acceleration of the balls on account
of the Earth’s spin lies in the plane of the paper, while that due to the Earth’s rotation
about the Sun and Moon is perpendicular to the paper. Hence we may forget about
the spin of the Earth as we balance the books as regards forces perpendicular to the
paper. The bar is aligned North-South and released. If Γ is identical for both balls and
equal to Γ for the Earth as a whole, the gravitational force towards the Sun and Moon
40 Chapter 4: Newton’s Theory & the Principle of Equivalence

exactly equals the acceleration due to their instantaneous motion transverse to the
Earth-Sun line, and there is no tendency for the wire to twist. But if Γ is abnormally
large for one of the balls, say that to the South, this ball will start to fall towards the
Sun faster than the other ball, and the rod will start to twist in the direction indicated.
Consequently, the bar (which has a period of about one hour) will oscillate about an
equilibrium position that is skewed with respect to the N-S line.

Schematic of the Dicke experiment to determine Γ.


During the evening, the torque on the wire due to the extra gravitational force
on the southern ball diminishes. After midnight the torque starts to grow again, but
with reversed sign. By dawn its displacement of the centre of oscillation is exactly
opposite to that operating at dusk. By looking for a component in the motion of the
bar with period 24 hrs and the expected phase with respect to solar time, Dicke and
his collaborators were able to establish the limit |Γ − 1| < 10−11 .
What material should be used for the balls? Various things were tried but it is
most interesting to compare heavy with light atoms, for example aluminium with gold,
because:
(i) the nuclei of such atoms have very different proton/neutron numbers (Al = 13/14,
Au = 79/118).
(ii) such atoms have very different contributions to their mass from:
(a) electrostatic energy [ 53 (Ze)2 r−1 /mc2 ' 0.003 (Al) or 0.008 (Au)];
(b) overall binding energy [Mass defect/mc2 = 0.0089 (Al) or 0.0084 (Au)];
(c) virtual positrons [me+ /mc2 ' 3 × 10−7 (Al) or 2 × 10−6 (Au); see p. 33 of
Gravitation & Relativity by M. G. Bowler for details].
Hence from these experiments we may conclude that |Γ − 1| ¿ 1 for all forms
of mass-energy, with the exceptions of energy associated with weak and gravitational
Introduction to Tensors in General Relativity 41

interactions.6
Extrapolating wildly from these experiments we hypothesize:
Strong Principle of Equivalence: No experiment could distinguish between a
homogeneous gravitational field and an accelerating frame of reference. In particular,
in any frame which falls freely through such a field all the laws of physics are the same
as if no field were present.
Real gravitational fields are never homogeneous, so they can be distinguished from
an accelerating frame of reference. For example, consider a star-warrior who regains
consciousness in a closed cabin some time after being taken prisoner. He reaches for
his watch and knocks it to the floor. Fortunately it falls only slowly, so it continues
to tick. Is he in a (possibly elastic) accelerating spaceship, or is he on an asteroid?
By now fully alert he determines that plumb bobs on either side of the cabin point
towards a spot some ten miles away. He instantly concludes that he is either on an
asteroid or that opposite sides of his cabin are accelerating away from one another.
Moments later he verifies that his bobs have not moved apart. Hence he must be in
the gravitational field of an asteroid.

Exercise (16):
What would he have concluded if he had found that his bobs pointed away from
a spot thirty yards distant?
This example shows that a gravitational field is generally not equivalent to an
accelerating frame of reference. From the Principle of Equivalence we merely conclude
that physics in an accelerating frame of reference must look like physics in a particular
type of gravitational field. However, this observation suggests a strategy for discovering
how things behave in a strong gravitational field: we first work out the equations
governing motion in the absence of a gravitational field (which we understand) when
referred to a non-inertial frame of reference. This is a purely mathematical exercise.
The equations we derive will contain terms associated with pseudo-forces generated by
our accelerating frame of reference. Since there is really no gravitational field present,
these pseudo-force terms will be restricted in form. The plan is to obtain equations for
physics in the presence of a true gravitational field by lifting these restrictions.

5 Tensors in General Relativity


We start by discovering what the laws of e.m. and mechanics look like in a non-inertial
µ
frame. Let x0 be such a non-inertial frame and xµ an inertial frame. Then each primed
µ
coordinate is a smooth function x0 (xν ) of the four inertial coordinates. Let xµ (τ ) be
an arbitrary trajectory through space-time and ψ(xµ ) an arbitrary scalar function of
the inertial coordinates xµ . Then the rate of change of ψ as perceived by an observer
who moves along the trajectory xµ (τ ) is
dψ dxµ ∂ψ ∂ψ
= µ
≡ vµ µ ,
dτ dτ ∂x ∂x
6 These contribute negligibly to the masses of atoms. However, since weak interactions are known
to be intimately connected with electromagnetism, it is extremely unlikely that the value of Γ associ-
ated with weak-interaction energy differs from that associated with e.m. energy.
42 Chapter 5: Tensors in General Relativity

where we have defined the observer’s 4-velocity v µ ≡ dxµ /dτ . Since by the chain rule
ν
∂ ∂x0 ∂
= (5.1)
∂xµ ∂xµ ∂x0 ν
we have ν
dψ ∂x0 ∂ψ
= vµ µ .
dτ ∂x ∂x0 ν
If we define the observer’s 4-velocity in the non-inertial primed frame to be
ν
0ν ∂x0 µ
v ≡ v , (5.2)
∂xµ
then we may write
dψ ν ∂ψ
= v0 .
dτ ∂x0 ν
A natural extension of this argument leads us to define the primed components of
any up vector Aµ as given in terms of the un-primed components by
ν
0ν ∂x0 µ
A ≡ A . (5.3)
∂xµ
ν
Note that if the primed frame were inertial, we would have x0 = xν0 + Λν µ xµ (xν0 a
ν
constant 4-vector), so that ∂x0 /∂xµ = Λν µ and the transformation (5.3) would reduce
to a standard Lorentz transformation of an up vector.
If v µ and uµ are two up vectors, all inertial observers will agree on the value of
the scalar
s ≡ ηµν uµ v ν . (5.4)
µ µ
How can we recover this number from the primed components v 0 and u0 ? First we
µ µ
express v µ in terms of v 0 . We use the chain rule to express dx0 as
µ
0µ ∂x0
dx = dxν . (5.5)
∂xν
κ κ
Dividing by dx0 and proceeding to the limit dx0 → 0 at fixed values of all the other
coordinates, we get
µ µ
∂x0 ∂x0 ∂xν
δκµ = = . (5.6)
∂x0 κ ∂xν ∂x0 κ
κ µ
Thus the matrix ∂xν /∂x0 is the inverse of the matrix ∂x0 /∂xν . Premultiplying
equation (2) by this matrix we solve for v µ :
∂xµ 0 ν
vµ = v . (5.7)
∂x0 ν
Using this relation to eliminate the unprimed components from (5.4) we get
³ ∂xµ ∂xν ´ 0 κ 0 λ
s = ηµν 0 κ u v .
∂x ∂x0 λ
Introduction to Tensors in General Relativity 43

If we define
0 ∂xµ ∂xν
gκλ ≡ ηµν , (5.8)
∂x0 κ ∂x0 λ
we have
0 κ λ
s = gκλ u0 v 0 . (5.9)
0 0 0
Like ηκλ the general metric tensor gκλ is symmetric; gκλ = gλκ . However, it is
not necessarily diagonal. It is called the metric tensor because it allows us to calculate
λ
the lengths of vectors such as v 0 .
0
We may use gκλ to lower indices;
λ
vκ0 ≡ gκλ
0
v0 . (5.10)
µν
Let g 0 be the tensor which raises indices. Then in order that the operations of raising
µ
and lowering be mutual inverses we require that for all v 0
λ µ µκ 0 0λ
δλµ v 0 = v 0 = g 0 g κλ v .
µκ 0 µκ
i.e. that g 0 gκλ = δλµ and hence that g 0 is the inverse of g 0 κλ .

Exercise (17):
µκ
Show that this definition of g 0 is equivalent to the definition
κ λ
κλ ∂x0 ∂x0 µν
g0 = η . (5.11)
∂xµ ∂xν

Similarly, if for any tensors F and G we define


κ λ
κλ ∂x0 ∂x0 µν ∂xµ ∂xν
F0 ≡ F and G0κλ ≡ Gµν , (5.12)
∂xµ ∂xν ∂x0 κ ∂x0 λ

we ensure that the primed observer will be able to calculate the scalar quantities
F µν vµ uν and Gµν v µ uν from primed quantities. The generalization to tensors of arbi-
trary rank is obvious.

Exercise (18):
µ µ
Show that if x0 and x00 are two non-inertial frames, the transformation rules
µ ν
00 µ ∂x00 0 ν 00 ∂x0 0
v = ν v ; v µ = vν (5.13a)
∂x0 ∂x00 µ
µ ν
µν ∂x00 ∂x00 0 κλ
F 00 = F etc (5.13b)
∂x0 κ ∂x0 λ

apply.
µ µ
00 κ ∂x00 ∂xκ ∂x00 ¤
[Hint: divide (5.5) by dx to obtain a relation equivalent to = .
∂xκ ∂x0 ν ∂x0 ν
44 Chapter 5: Tensors in General Relativity
ν
Notice that there is an easy way to figure out whether to multiply by ∂x µ /∂x0
µ
or by ∂x0 /∂xν when transforming an object Gµ... or Gµ... : If the prime are up on the
µ
left, put them up on the right by using ∂x0 /∂xν ; if the unprimes are up on the left
ν
put them on top on the right with ∂xµ /∂x0 . The other kind of index in the equation
will “cancel out” just as in ordinary multiplication of fractions. These rules extend in
the obvious way to down vectors.
0 µ
The metric tensor gµν enables us to calculate the length s of any curve x0 (λ) in
space-time: r¯
Z b µ ν¯
¯ 0 dx0 dx0 ¯
s≡ dλ ¯gµν ¯. (5.14)
a dλ dλ
If the curve is time-like, s is just c times the elapse ∆τ of time on the watch of the
µ
observer whose trajectory x0 (λ) is. If there is an inertial frame in which all the points
on the curve have the same value of x0 , s coincides with the length of the curve as
measured with meter rules etc by an observer who is stationary in that privileged frame.
We shall call s the affine parameter along the curve and use it to characterize points
µ
on the curve; hence we write x0 (s).

5.1 Equation of motion in a non-inertial frame


We now use the principle of least action to obtain the equation of motion of a charged
particle in a crazy coordinate system. In this frame the action (3.4) reads
Z 1 Ã r !
0µ dx0ν 0µ
0
dx dx 0
S= dλ −m0 c −gµν +q A , (5.15)
0 dλ dλ dλ µ

and the EL eqn to which it gives rise is


µ ¶ 0
d 0 dx

0 m0 cẋ0µ ẋ0ν ∂gµν dx0µ ∂A0µ
0= m0 gβµ + qAβ − p 0 − q , (5.16)
dλ dτ 2 −gκλ ẋ0κ ẋ0λ ∂x0β dλ ∂x0β

where a dot denotes differentiation w.r.t. λ. As in §3.1, we multiply through by dλ/dτ ,


obtaining
µ ¶ 0
d 0 dx

0 dx0µ dx0ν ∂gµν dx0µ ∂A0µ
0= m0 gβµ + qAβ − 12 m0 − q
dτ dτ dτ dτ ∂x0β dτ ∂x0β
½ µ 0 0 ¶¾ µ ¶ (5.17)
2 0µ
0 d x dx0µ dx0ν ∂gβµ 1
∂gµν dx0µ ∂A0β ∂A0µ
= m0 gβµ + − 2 0β +q −
dτ 2 dτ dτ ∂x0ν ∂x dτ ∂x0µ ∂x0β
If we now define
µ 0 ¶ µ 0 0 ¶
0
∂Aβ ∂A0µ ∂gβµ ∂gβν 0
∂gµν
Fµβ ≡ − ; Γ0µν,β ≡ 1
2 + − , (5.18)
∂x0µ ∂x0β ∂x0ν ∂x0µ ∂x0β
then (5.17) takes the suggestive form

0 d2 x0µ dx0µ dx0ν 0 q dx0µ 0


0 = gβµ + Γ + F . (5.19)
dτ 2 dτ dτ µν,β m0 dτ µβ
5.1 Equation of motion in a non-inertial frame 45

Box 2: Calculating Christoffel Symbols

In the case A = 0, the first lineR of0 eq (5.17) is exactly what we would get if
we applied the EL equation to dτ gµν (dx0µ /dτ )(dx0ν /dτ ). This fact is worth re-
membering as it often provides the easiest way to calculate the Christoffel symbols,
which are the coefficients of products of velocity components when the derivatives
in (5.17) are worked through. Note, however, that we have no a priori justification
for applying the EL eqn to this integral; the procedure is just an algebraic trick
that is justified by our derivation of (5.17).

To clean up our act, we define Γ with an index up as


³ ∂g 0 0
∂gβν 0 ´
∂gαβ
µ µν να
Γ0 αβ ≡ g 0µν Γ0αβ,ν = 21 g 0 + − . (5.20)
∂x 0β ∂x0 α ∂x0 ν

Notice the pattern of this important formula: the three terms in (. . .) are just the first
derivative of g with the indices cyclically permuted. The minus assign attaches to the
term which groups the indices in the same way as Γ. Now multiplying equation (5.19)
through by g 0αβ and writing v 0µ ≡ dx0µ /dτ , we can write it

dv 0α q 0α 0µ
= −Γ0α 0µ 0ν
µν v v + F µv . (5.21)
dτ m0

This equation relates the apparent acceleration in our non-inertial frame to the e.m.
force given by the second term on the right, and a pseudo-force given by the first term.
The principle of equivalence suggests that gravitational forces will take the same form
as pseudo-forces. Thus Γ should play the same role for the gravitational field as F
does for the e.m. field. Notice that where the e.m. force is obtained by contracting F
with v, the gravitational force is obtained by contracting Γ with two copies of v: in
quantum mechanics it follows from this that whereas photons are spin-one particles,
gravitons (likely to be detected within 5 years!) are spin-two particles. As is required
by the principle of equivalence, the charge-to-mass ratio for gravity is unity.
Γ is called the Christoffel symbol. From its definition (5.20) it is symmetric
in its first two indices. Γ cannot be a tensor since all its components are zero in an
inertial frame, so if it transformed like a third-rank tensor, all its components would
be zero in any coordinate system. Notice from (5.20) that the relationship between Γ
and g mirrors the relationship between F and A; that is, g is the potential for gravity
in the same way that A is the potential for electromagnetism.
Below we will find it useful to have an expression for Γ in terms of double deriva-
tives of the inertial coordinates with respect to the non-inertial ones. From (5.8) and
(5.18) we have
½ µ ¶ µ ¶ µ ¶¾
∂ ∂xκ ∂xλ ∂ ∂xκ ∂xλ ∂ ∂xκ ∂xλ
2Γ0µν,β = ηκλ + − .
∂x0ν ∂x0β ∂x0µ ∂x0µ ∂x0β ∂x0ν ∂x0β ∂x0µ ∂x0ν
46 Chapter 5: Tensors in General Relativity

When we differentiate these products, the two terms generated by the last product
will each be cancelled by a term generated by one of the first two products. The two
remaining terms will be identical. Thus we have

∂xκ ∂ 2 xλ
Γ0µν,β = ηκλ . (5.22)
∂x0β ∂x0µ ∂x0ν
Raising the last index, we obtain

δ² ∂x ∂x0β ∂xκ ∂ 2 xλ ∂x0α ∂ 2 xλ
Γ0α
µν ≡g 0αβ
Γ0µν,β =η ηκλ 0β 0µ 0ν = . (5.23)
∂xδ ∂x² ∂x ∂x ∂x ∂xλ ∂x0µ ∂x0ν

5.2 Covariant differentiation


We shall need to compare vectors at different points on the curve. In an inertial frame
this is easy: two vectors are the same iff all their components are the same. But
in passing from an inertial to a non-inertial frame by equations (5.3), we change the
components of vectors in a position-dependent way. So two vectors that are equal in
the sense that in an inertial frame all their components are equal, can have different
components in a non-inertial frame. We need a way of diagnosing this condition of
hidden equality.
Suppose that in an inertial frame we have a vector field A(x). By (5.3) this gives
rise to a vector field A0 (x0 ) in a non-inertial frame. As we go along a curve x(s) the
rate of change in the vector of the field is
κ
d dx0 ∂A
Ȧ ≡ A= , (5.24)
ds ds ∂x0 κ
where the affine parameter s is defined by (5.14). Using (5.7) we move the A on the
right into the primed system and get

dx0 ∂ ³ ∂xµ 0 α ´
κ
µ
Ȧ = A
ds ∂x0 κ ∂x0 α
dx0 ³ ∂xµ ∂A0 ´
κ α
∂ 2 xµ 0α
= + A .
ds ∂x0 α ∂x0 κ ∂x0 κ ∂x0 α
ν
Finally, premultiplying by ∂x0 /∂xµ and using (5.23) we get

dx0 ³ ∂A0 ´
ν κ ν
0ν ∂x0 µ 0ν 0α
Ȧ ≡ Ȧ = + Γκα A . (5.25)
∂xµ ds ∂x0 κ
(Notice that Ȧ0ν , the ν th component in the primed system of the vector Ȧ, is defined
by this equation. It must not be confused with the rate of change with s of the ν th
component of A0 . In (5.21), by contrast, dv 0α /dτ is just the rate of change of the
number v 0α .) If we define a new type of derivative, the covariant derivative by
ν
ν 0ν ∂A0 0α
A0 ;κ ≡ ∇κ A ≡ 0

κ + Γκα A , (5.26)
∂x
5.2 Covariant differentiation 47

then equation (25) can be written


κ
dx0 ν
Ȧ0ν = ∇κ A 0 . (5.27)
ds

The second term in the definition (5.26) of the covariant derivative has the fol-
lowing physical interpretation. For each value of κ, say κ = 1, we have a matrix Γ 0ν
1α .
1
When we multiply this matrix by δx we obtain the Lorentz transformation matrix Λ
which tells us by how much the speed and orientation of the frame used at x differs
from that used at (x0 , x1 + δx1 , x2 , x3 ).7
If A is really the same all along the curve, and only seems to change because
we are using a non-inertial coordinate system, we have Ȧ0ν = 0, and thus that the
ν ν κ
“gradient” ∇κ A0 of A0 either vanishes or is “perpendicular” to the direction dx0 /ds
in which we are moving.
How does ∇ operate on down vectors? Consider
κ
d 0µ 0 dx0 ∂ µ
(A B µ ) = (A0 B 0 µ )
ds ds ∂x0 κ
dx0 h³ ∂A0 ´ 0 ³ 0 µ ´i
κ µ
0 µ ∂B
= B µ+A
ds ∂x0 κ ∂x0 κ
κ
dx0 ¡h ¢ 0 ∂Bµ0 i
0µ 0µ 0α 0 0µ
= ∇κ A Bµ − Γκα A Bµ + A
ds ∂x0 κ
dx0 h¡ i
κ 0
µ¢ µ ∂Bµ 0α 0 µ 0
= ∇κ A0 Bµ0 + A0 − Γ κµ A B α .
ds ∂x0 κ
This suggests that we define

∂Bµ0
∇κ Bµ0 ≡ 0
0α 0
κ − Γκµ Bα (5.28)
∂x
for then we will have ∇κ (A0µ B 0µ ) = B 0µ ∇κ A0µ + A0µ ∇κ B 0µ and ∇ will operate on such
products like any other derivative operator.
µ ν
The same argument applied to quantities like G0µν A0 B 0 leads to the rules

∂G0µν
G0µν;κ ≡ ∇κ G0µν ≡ − Γ0α 0α
κµ Gαν − Γκν Gµα (5.29a)
∂x0 κ
µν
µν µν ∂G0
G0 ;κ ≡ ∇κ G0 ≡ + Γ0µ
κα G
αν
+ Γ0ν
κα G
µα
. (5.29b)
∂x0 κ
Notice that each index requires a Γ-symbol, with a plus or a minus sign according as
the index is up or down.
7 In “gauge field theories” this idea is generalized to define covariant derivatives for objects ψ
that live in spaces other than space-time. In the simplest case ψ lives in the two-dimensional space
of complex numbers, for which the analogue of a Lorentz transformation is multiplication by another
complex number, say iqA1 . The covariant derivative is now Dµ ≡ ∂µ + iqAµ . If ψ is the wavefunction
of a spin-zero particle of charge q, Aµ proves to be the regular e.m. potential.
48 Chapter 5: Tensors in General Relativity

In the same spirit we define the operation of ∇ on scalars to be identical with


partial differentiation:
∂ψ
∇κ ψ =
∂x0 κ
What action does ∇ have on the metric tensor? Suppose that A and B are two
vector fields that everywhere have the same components in an inertial frame. Then
µ µ µ ν
∇κ A0 = ∇κ B 0 = 0. Also Aµ Bµ = gµν 0
A0 B 0 is everywhere the same. Hence for all
curves x0 (s)
d ¡ 0 0µ 0ν ¢
0= g A B .
ds µν
κ
Replacing d/ds with (dx0 /ds)∇κ and differentiating each item in the bracket, we get
dx0 n¡ ¢¤o
κ
0
¢ 0µ 0ν 0
£¡ 0µ
¢ 0ν 0µ
¡ 0ν
0= ∇κ gµν A B + gµν ∇κ A B + A ∇κ B
ds
κ
dx0 0 µ 0 ν 0
= A B ∇κ gµν .
ds
κ µ ν
Since dx0 /ds, A0 and B 0 are all arbitrary, it follows that
0
∇κ gµν = 0. (5.30)
In words, the covariant derivative of the metric tensor is always zero.
If xµ (s) is a straight line, all components of the “tangent vector” dxµ /ds are
constant in an inertial frame. Hence in any coordinate system the tangent vector
µ
x0 (s) of a straight line satisfies the o.d.e.
κ µ
dx0 dx0
0= ∇κ . (5.31)
ds ds
Substituting for ∇κ this becomes
dx0 ³ ∂ dx0 0α ´
κ µ
0µ dx
0= + Γ κα
ds ∂x0 κ ds ds (5.32)
2 0 µ 0 κ 0α
d x dx dx µ
= 2
+ Γ0µ
κα (x0 (s) a straight line.)
ds ds ds
Exercise (19):
Obtain (5.32) by extremizing the integral (5.14) with respect to variations of the
µ
path x0 (s); a straight line is the least distance between two points.
In terms of covariant derivatives, Newton’s law of motion (1.44) and the Maxwell
equations (1.49) become
κ µ µ
m 0 v 0 ∇κ v 0 = f 0 , (5.33a)
0µν 0µ
F ;ν = µ0 j . (5.33b)
The other laws of e.m. (3.32) (1.49) remain unchanged because the Christoffel symbols
introduced in going over from partial to covariant derivatives magically cancel.

Exercise (20):
Prove that A0µ;ν − A0ν;µ = A0µ ,ν −A0ν ,µ .
Einstein’s idea 49

5.3 Summary
The rules for transforming between non-inertial frames are the same as those for making
regular Lorentz transformation with the substitutions

ν ∂x0ν µ ∂x00µ ∂x0ν 0


Λµ → ; Λ ν → . Thus A00µ = A .
∂x00µ ∂x0ν ∂x00µ ν

The Minkowski metric η is replaced by the metric tensor g, which remains symmetric
but is no longer its own inverse; consequently the up-up and down-down forms of g
are in general distinct.
In a non-inertial frame x the partial derivative operator ∂µ ≡ ∂/∂xµ should be
replaced with the covariant derivative operator ∇µ :

∇µ ψ = ∂ µ ψ
Aν;µ ≡ ∇µ Aν = ∂µ Aν + Γνµα Aα ; ∇µ B νλ = ∂µ B νλ + Γνµα B αλ + Γλµα B να
∇µ A ν = ∂ µ A ν − Γ α
µν Aα ; ∇µ Bνλ = ∂µ Bνλ − Γα α
µν Bαλ − Γµν Bνα

The Christoffel symbol Γ is

³ ∂g ∂gβν ∂gαβ ´
να
Γµαβ = 1 µν
2g + − .
∂xβ ∂xα ∂xν

The covariant derivative of g always vanishes: ∇g = 0

6 Gravity, Geometry & the Einstein Field


Equations
Now that we have completed our programme for discovering what physics looks like in
a non-inertial frame, it is a good idea to take a rest from all these acres of indices and
summarize the physical content of our formulae.
0
We have defined quantities gµν , p0µ , Fµν
0
, jµ0 , Γ0µ
αβ etc which enable us to use a non-
inertial coordinate system x0 to find the space-time trajectory of a charged particle in
an e.m. field. We defined these quantities in terms of the momenta, e.m. field tensor etc
in an underlying inertial coordinate system x and the coordinate transformation x 0 (x)
50 Chapter 6: Gravity, Geometry & the Einstein Field Equations

that couples the inertial and non-inertial systems. But we have found formulae (5.13)
00
and (5.20) which enable us to calculate the values gµν etc of all needful quantities in a
second non-inertial coordinate system without reference back to the inertial system x.
Since we shall no longer need to refer constantly to an inertial system, we now
drop the convention that the unprimed system x is inertial; from here on all systems
are to be assumed to be non-inertial unless explicitly specified as inertial.
The principle of equivalence suggests that a gravitational field will look very much
like a pseudo-force in an accelerating frame of reference. The Christoffel symbol Γ gen-
erates the pseudo-force associated with an accelerating frame, so when a gravitational
field is present Γ will play the role of the Newtonian force f . We have identified the
metric g as the relativistic generalization of the Newtonian potential Φ on the ground
that Γ can be written in terms of derivatives of g just as f = −∇Φ.
In Newton’s theory f and Φ are related to the density ρ of gravitating matter via
Poisson’s equation (4.5). The relativistic generalization of (4.5) should be a second-
order p.d.e. in g, or equivalently, a first-order p.d.e. in Γ. What is this equation?
Since we can make Γ as large as we like simply by choosing a perverse coordinate
system, it is clear that the trick in finding suitable field equations is to find a differential
operator on Γ which differentiates away all the contribution to Γ that is caused by
mere perversity of the coordinate system. The key to finding this operator proves to
be an examination of the geometrical relationships between the lengths of lines and
the magnitudes of angles between lines.
We have seen that the metric tensor enables us to define the length of any curve
in space-time, and in particular to determine through (5.32) which curves x 0 (s) are
straight. Now suppose we draw a straight line in a portion of space in which there is
no gravitational field and then draw a unit circle around some point on this line. Then
no matter what coordinate system we use for the calculations, we shall find that the
length s of the circumference is exactly π = 3.14159 . . . times the length of the circle’s
diameter. How come? By changing the coordinate system we can change g at any given
point to almost any value [see (5.8)]. So how come that when we evaluate the integral
(5.14) over two completely different sets of points, we always get answers in the same
ratio? It must be that g at one point is not independent of g at neighbouring points:
g must satisfy some differential equation. Einstein’s idea, and it was pure magic, was
that it is this differential equation which tells us that there is no gravitational field
present, only a perverse coordinate system. Let us find this differential equation.
There are many geometrical relationships in addition to the one just discussed
which g must furnish if there is no gravitational field present. For example, there
are 180◦ in a triangle. But the key to the equation we are seeking turns out to be
something slightly odd. It is to consider what happens when we slide a vector around
a closed curve while being careful not to rotate the vector. If we do this on a table, the
vector (a pencil, say) will be back in its old configuration at the end of the experiment:
6.1 The curvature tensor 51

But on a sphere things go differently:

In fact, on a sphere of radius r, the angle through which a pencil rotates on being
“parallel-transported” around a curve is equal to the area enclosed by this curve divided
by r 2 .

6.1 The curvature tensor

If we parallel-transport a vector A around a closed curve x(s) in space-time, we have


that at each point on the curve ẋ · ∇A = 0 (this is just a statement of the invariance
along the curve of A’s components in an inertial frame)

dxα dxα ³ ∂Aµ µ


´
0= ∇α A µ = + Γ αβ A β
. (6.1)
ds ds ∂xα

Consequently, the total change in each component Aµ on going around is


I I
∂Aµ dxα dxα
µ
∆A = ds = − Γµαβ Aβ ds. (6.2)
∂xα ds ds

In this integral both Γµαβ and Aβ are functions of s through x(s). However, if we
consider only infinitesimal loops we may expand each component of Γ and A in power
52 Chapter 6: Gravity, Geometry & the Einstein Field Equations

series about some point, say X, on the loop:


∂Γµαβ
Γµαβ (x) = Γµαβ (X) ν ν
+ ···
+ (x − X )
∂xν (6.3)
∂Aµ
Aµ (x) = Aµ (X) + (xν − X ν ) ν + · · ·
∂x
Multiplying these two expansions together and substituting the result into (6.2), we
get
I n
µ
£ µ β¤ h
µ ∂A
β
β
∂Γµαβ i ν ν
o dxα
∆A = − Γαβ A X + Γαβ ν + A (x − X ) + · · · ds.
∂x ∂xν X ds
Since the first square bracket is constant, it can be taken outside the integral sign.
Integrating its coefficient dxα /ds around our closed contour we then obtain zero. The
second square bracket may also be taken outside the integral sign. Integrating (6.1)
along our contour we find
X ∂Aµ ¯¯ ¡ ¢ ³ ´ ¡ ¢
¯ xα − X α = − Aβ Γµ x α
− X α
+ O(s2 ), (6.4)
∂xα ¯ αβ
X
α X

so we may eliminate (∂Aµ /∂xν ) from (6.3) and get


I
µ
h
µ β γ
∂Γµαβ β i
∆A = Γαβ Γνγ A − A (xν − X ν )dxα + · · ·
∂xν X
h I (6.5)
µ β
∂Γµαγ i γ
= Γαβ Γνγ − ν
A xν dxα + · · ·
∂x X
The integrals in (6.5) for which ν = α vanish because each such integral is simply the
Hchange in 21 (xα )2 on going around the loop. Furthermore, when α 6=H ν, the integral
xν dxα is equal in magnitude and opposite in sign to the integral xα dxν as this
picture of the (xα , xν ) plane shows:

We define the directed area enclosed by the loop to be the antisymmetric tensor
I
∆S ≡ xν dxα .
να
(6.6)
6.2 Derivation of the Field Equations 53

This done we may write


h ∂Γµαγ i γ
∆Aµ = Γµαβ Γβνγ − A ∆S να + . . . (6.7)
∂xν X

In the absence of a gravitational field, ∆Aµ = 0 for any Aµ . Furthermore, by an


appropriate choice of loop ∆S να can be set equal to any given antisymmetric tensor.8
So it is tempting to conclude that the square bracket in the last equation vanishes.
However, when we contract an antisymmetric tensor with a tensor of mixed symmetry,
only the antisymmetric portion of the mixed tensor contributes to the sums. Hence
from the vanishing of ∆Aµ for arbitrary Aµ and ∆S να we can infer only the vanishing
of the portion of the square bracket that is antisymmetric on exchange of ν and α. We
therefore define the curvature tensor as minus twice this part of the square bracket
in (6.7)
∂Γµαγ ∂Γµνγ
µ
Rγαν ≡ ν
− α
+ Γµνβ Γβαγ − Γµαβ Γβνγ , (6.8)
∂x ∂x
and rewrite (6.7) as
∆Aµ = − 12 Rγαν
µ
Aγ ∆S να + · · · (6.9)
Since ∆Aµ is the difference between two vectors that are based at the same point, it
is itself a vector. Furthermore, both Aγ and ∆Sνα are tensors. Hence Rγαν µ
must also
be a tensor as its name implies. In the absence of a gravitational field we have

µ
Rγαν = 0. (6.10)

This is the relativistic generalization of the Newtonian equation ∂ 2 Φ/∂xα ∂xβ = 0 of


which Laplace’s equation is the trace. As promised, it is first-order in Γ and second-
order in g. Notice that it is non-linear in both these quantities; this is highly significant
(and very inconvenient!).

6.2 Derivation of the Field Equations

If we are to upgrade (6.10) into the relativistic generalization of Poisson’s equation


(4.5), we must replace the zero on the right with something that involves the density
of mass-energy. We have seen [equations (1.29)] that the mass-energy density forms
one component of a symmetric second-rank tensor T. If we want a covariant theory
of gravity we are going to have to allow the mass-energy density to bring along all its
friends in T into the field equations. So consider replacing the zero in (6.10) with

constant × Tαβ .

This has only two indices, whereas the left of (6.10) has four indices. Hence we must
either use g (which is the only generally available tensor) to add two more indices on
8 This is a lie, as the discussion of 6-tuples in §2.5 shows. However, the argument can be fixed up
by adding the changes ∆A around two non-coplanar paths.
54 Chapter 6: Gravity, Geometry & the Einstein Field Equations

the right, or we must contract away two indices on the left. It is not hard to see that
these two courses are equivalent. We do it the second way.
Which two indices should we contract? Well, from the defining expression (6.8)
one may show that Rµναβ has the following symmetries:

Rµναβ = Rαβµν ; Rµνβα = −Rµναβ = Rνµαβ . (6.11)

In words; R is symmetric on interchange of the first pair of indices with the second
pair, and antisymmetric under interchange of the indices within each of these pairs.
Thus we get zero if we contract within any pair, and the same answer (to within a
sign) if we contract between pairs. It is conventional to define the Ricci tensor by
µ
Rαβ ≡ Rαµβ . (6.12)

Note:
In terms of Γ, Rαβ is by (6.8)

∂Γµµα ∂Γµαβ
Rαβ = − + Γλαµ Γµβλ − Γµλµ Γλαβ . (6.13a)
∂xβ ∂xµ

Furthermore, by (5.20)
∂gµν
Γµαµ = Γµµα = 12 g µν . (6.13b)
∂xα

While Rαβ has the right number of indices to go on the left of our field equations,
the law we seek is not Rαβ = Tαβ because mass-energy conservation is expressed by
the vanishing of the covariant divergence of T. Hence whatever goes on the left of our
field equations must have zero divergence. Unfortunately, the divergence of R αβ is not
always zero. However, it turns out that (see Appendix B)

Rα β ;β = 12 R;α , (6.14)

where the Ricci scalar R is defined by

R ≡ Rβ β . (6.15)
µ
From (6.14) it follows that a tensor made out of Rναβ which has zero divergence is

Gαβ ≡ (Rαβ − 12 g αβ R). (6.16)

G is called the Einstein tensor because the p.d.e.’s which describe the generation of
a gravitational field by matter are

8πG αβ
Gαβ = − T . (6.17)
c4
6.3 The Newtonian Limit 55

Here G is Newton’s gravitational constant, as we shall shortly show. An alternative,


and often handier, form of (6.17) is obtained by contracting both sides of the equation
to obtain
8πG
Gα α = (Rα α − 12 δαα R) = −R = − 4 Tα α .
c
Substituting this value of R into (6.17) we get
8πG ¡ αβ 1 αβ γ ¢
Rαβ = − T − 2 g Tγ . (6.18)
c4
Equations (6.17) and (6.18) are the relativistic equivalents of Poisson’s equation ∇ 2 Φ =
4πGρ. As expected, these equations are second-order in the ten potentials g µν and
involve all the energy-density’s friends in T.
There is a close analogy between (6.18) and its e.m. counterpart F µν ;ν = µ0 j µ as
may be seen by substituting for R from (6.13)

∂Γµµα ∂Γµαβ 8πG ¡ ¢


β
− µ
+ Γλαµ Γµβλ − Γµλµ Γλαβ = − 4 Tαβ − 12 gαβ Tγ γ . (6.19)
∂x ∂x c
Worse still, the relationship (5.20) between Γ and the tensor potential g is a good deal
more complex than the corresponding e.m. relation Fµν = Aν ,µ −Aµ ,ν . So it is hardly
surprising that not many exact solutions of the Einstein equations are known! But we
shall be able to deduce some extremely interesting solutions nevertheless.

6.3 The Newtonian Limit


We now check that Einstein’s theory agrees with Newton’s when (i) the field is very
weak and (ii) the field is generated by slowly-moving matter. The prototype of slowly-
moving matter is ‘dust’: the matter at each event x has a well defined proper velocity
v(x), and in the rest frame defined by v the matter density is ρ0 . Physically it is clear
that in this rest frame the only non-zero component of T is T 00 = ρ0 c2 , and from this
it follows easily that in a general frame a dust has

T µν = ρ0 v µ v ν . (6.20)

Since the gravitational field is assumed very weak, we can find a nearly inertial
coordinate system. In this system

gαβ = ηαβ + hαβ where |hαβ | ¿ 1. (6.21)

We neglect squares and higher powers of h. By (5.20) we then have


³ ∂hαβ ´
µ 1 µν ∂hνα ∂hνβ
Γαβ = 2 η + − . (6.22)
∂xβ ∂xα ∂xν

Consider the equation of motion (5.21) to which this gives rise for a non-relativistic
µ
free particle (a0 = 0). The motion is governed by a gravitational force

f µ = −Γµαβ v α v β , (6.23)
56 Chapter 6: Gravity, Geometry & the Einstein Field Equations

where v is the particle’s 4-velocity. Since the zeroth component v 0 = γc of the 4-


velocity of a non-relativistic particle is very much larger than any of v’s spatial com-
ponents, we expect the dominant term in the implied sum of (6.23) to be that for which
α = β = 0. Thus we expect
f µ ' −γ 2 c2 Γµ00 . (6.24)

A typical spatial component of the equations of motion is then

d ³ dxj ´ 2 2 j 21
³ ∂h
j0 ∂h00 ´
γ γ = −γ c Γ00 ' −c 2 2 0 − .
dt dt ∂x ∂xj

If the field is stationary in our chosen coordinate system (and we are free to boost until
it is), then ∂hj0 /∂x0 = 0 and to leading order in v/c

d 2 xj ∂ ¡1 2 ¢
= c h00 . (6.25)
dt 2 ∂xj 2

If this is to agree with Newton’s theory, we require

Φ = − 12 c2 h00 , (6.26)

where Φ is the Newtonian gravitational potential.


We now check whether Einstein’s field equations (6.18) reduce in the same weak-
field limit to Poisson’s equation for Φ. We expect the source of Φ to be the energy
density ρc2 = T 00 , where T is the energy-momentum tensor, so we concentrate on the
00-component of (6.18).
From (6.13a,b), (6.21) and (6.22), Rαβ is to first order in h

³ ∂2h ∂ 2 hαν ∂ 2 hνβ ∂ 2 hαβ ´


1 µν µν
Rαβ = 2η − − + . (6.27)
∂xα ∂xβ ∂xµ ∂xβ ∂xµ ∂xα ∂xν ∂xµ

In particular, for a time-independent field

1 2
R00 = R00 = 12 ∇2 h00 = − ∇ Φ.
c2

If the only contributor to the energy-momentum tensor is a dust of stationary particles,


T is given by (6.20) with v = (c, 0, 0, 0). Hence Tγγ = −ρ0 c2 and the 00-component of
(6.18) is
1 4πG
R00 = − 2 ∇2 Φ = − 2 ρ0
c c
as expected.
7.1 Gravitational Redshift 57

6.4 Summary
µ
The curvature tensor Rναβ tells us by how much a vector changes on being parallel
transported around a small circuit. Hence we detect the use of a crazy coordinate
system for flat space-time by seeing if the curvature tensor R = 0. If R 6= 0 there is a
true gravitational field.
µ
The presence of matter at x is signalled by Rαβ (x) ≡ Rαµβ (x) 6= 0.
Formally, there is a far-reaching analogy between g.r. and e.m.:
Parallelism of e.m. and g.r.

Aµ ↔ gµν
Fµν = −(Aµ ,ν −Aν ,µ ) ↔ Γµ,αβ = 12 (gµα ,β +gµβ ,α −gαβ ,µ )
q µ α
fµ = F αv ↔ f µ = −Γµαβ v α v β
m0
F µν ,ν = µ0 j µ ↔ eq. (6.19)
µν νρ ρµ κ κ κ
F ,ρ +F ,µ +F ,ν = 0 ↔ Rλµν;ρ + Rλνρ;µ + Rλρµ;ν =0
(Bianchi identity)

The parallel between Newton’s theory and g.r. is less tight: Φ ↔ g, f ↔ Γ,


∇2 Φ ↔ Rαβ .
In a weak gravitational field we can have g ' η with −2Φ/c2 as an estimate of
(g00 − η00 ).

7 Weak-field gravity
The simplest applications of GR are to weak gravitational fields, which are unbiquitous
in the Universe at large as here on Earth.

7.1 Gravitational Redshift


We have just seen that in a weak gravitational field g00 ' η00 − 2Φ/c2 is closely related
to the Newtonian gravitational potential. This conclusion has interesting physical
consequences. Consider an observer at rest in a weak gravitational field. We choose
spatial coordinates so that the field and the observer are stationary. Let the observer
be at potential Φo and observe a stationary atom at potential Φa . Setting λ = x0 in
(5.14) and differentiating both sides of this equation we find that the observer’s proper
time elapses at a rate
r
dτo dxµ dxν
= −gµν 0 0
dt dx dx
r
√ Φo (7.1)
= −g00 ' 1 + 2 2
c
|Φo |
'1− 2 (because Φo < 0).
c
58 Chapter 7: Weak-field gravity

Similarly, the atom’s proper time elapses at a rate

dτa |Φa |
=1− 2 . (7.2)
dt c

If the atom is emitting e.m. radiation of frequency ν, then during an interval ∆τ o


on the observer’s clock it will emit (ν∆τo ) × (dτa /dτo ) wave fronts. Of course, these
wavefronts will take some time (as measured by either clock) to reach the observer,
but because our situation is static, the delay before each front reaches the observer is
always the same. Hence the fronts will be received in time ∆τo on the observer’s clock
and the observer measures frequency
³ dτ ´ 1 − |Φa |/c2 ³ |Φa − Φo | ´
a
ν= ν ' 1 − ν. (7.3)
dτo 1 − |Φo |/c2 c2

In words: radiation that comes up out of a gravitational well is redshifted.

Exercise (21):
Consider a machine which lowers boxes full of excited atoms on ropes down a well,
deexcites the atoms at the bottom, pulls the atoms back up and then reexcites the
atoms with the photons released at the bottom and beamed up to the top. Show
that this machine will violate energy conservation unless the photons’ frequencies
at top and bottom of the well satisfy (7.3).

7.2 Hydrodynamics

In GR the energy-momentum tensor of a perfect fluid (1.32) clearly becomes

T µν = (ρ + P/c2 )uµ uν + P g µν . (7.4)

We now show how the equations of hydrodynamics emerge from ∇µ T µν = 0. We have

∂P
0 = uν uµ ∇µ (ρ + P/c2 ) + (ρ + P/c2 )∇µ (uµ uν ) + g µν , (7.5)
∂xν

where we’ve used (5.30). To recover the familiar equations of hydrodynamics we assume
that c ' u0 À ui and that g µν = η µν + hµν with |hµν | ¿ 1. Then the 00 equation
may be approximated by

∂P
0 = cuµ ∇µ (ρ + P/c2 ) + (ρ + P/c2 )∇µ (cuµ ) − . (7.6)
∂x0

We multiply this equation by ui /c and subtract it from the i equation of the set (7.5)
to find
∂P ui ∂P
0 = (ρ + P/c2 )uµ ∇µ ui + + (7.7)
∂xi c ∂x0
7.3 Harmonic coordinates & Gravitational Waves 59

Now in view of (6.25) we can write


∂ui ∂ui j ∂u
i
uµ ∇ µ ui = u µ + Γ i
µα u µ α
u ' + u + Γi00 u0 u0
∂xµ µ ∂t ¶ ∂x j
(7.8)
dui 21 ∂hj0 ∂h00
' +c 2 2 0 − ,
dt ∂x ∂xj
where the derivative of ui is the Eulerian derivative of hydrodynamics. We neglect the
derivative of hj0 w.r.t. x0 on the grounds that the gravitational field is nearly static,
and in (7.7) we neglect the derivative of P w.r.t. time on the grounds that it is smaller
than the derivative w.r.t. xi by a factor of order cs /c, where cs is the sound speed.
Then substituting (7.8) into (7.7) and using h00 = −2Φ/c2 we obtain
dui ∂P ∂Φ
(ρ + P/c2 ) = − i − (ρ + P/c2 ) i . (7.9)
dt ∂x ∂x
In the limit P ¿ ρc2 this agrees with Euler’s equation of hydrodynamics.
Equation (7.6) is a statement of energy conservation: ρc2 contains both the fluid’s
rest-mass energy ρ0 c2 and its thermodynamic internal energy U . Since ρ0 is con-
tributed by a conserved number of baryons, we have an additional conservation equa-
tion ∇µ (ρ0 uµ ) = 0, and this equation reduces in the Newtonian limit to the familiar
equation of continuity
dρ0 ∂uj
+ ρ0 j = 0. (7.10)
dt ∂x

7.3 Harmonic coordinates & Gravitational Waves


We formulated the equations of physics in arbitrary coordinates as a mathematical ruse
to extract the implications of the principle of equivalence. But in GR, as in every other
branch of physics, it’s politic, even vital, to use the best coordinates for the job. So we
don’t really want the freedom to use any old coordinates; we need a way of choosing
sensible coordinates. When discussing black holes and cosmology we’ll be guided to
a coordinate system by the symmetries of the problem. But generic problems don’t
have much symmetry and then we should use coordinates that satisfy the harmonic
gauge condition
g µν Γα
µν = 0. (7.11)
In Problem set 2 you can show that the harmonic gauge condition is satisfied when the
coordinates share with Cartesian coordinates the property that they satisfy the wave
equation: 2xα = 0. To first order in hµν the gauge condition reads
µ ¶
∂hβν ∂hµβ ∂hµν
0'η ηµν αβ
µ
+ ν
− β
= 2∂µ hαµ − ∂ α h where h ≡ hββ . (7.12)
∂x ∂x ∂x
Equation (6.27) can be rewritten
" Ã µ
! #
∂ 2
h ∂ ∂h µ
α
∂h β
Rαβ = 12 − + + 2hαβ . (7.13)
∂xα ∂xβ ∂xµ ∂xβ ∂xα
60 Chapter 7: Weak-field gravity

When (7.12) is used three of the terms cancel and we have

Rαβ = 21 2hαβ . (7.14)

Let’s see what happens when we use this form of the Ricci tensor in (6.18) when
T µν on the right is for a perfect fluid [eq. (7.4)]. Since uα uα = −c2 we have Tγγ =
3P − ρc2 , so (6.18) reads

16πG £ ¤
2hαβ = − 4
(ρ + P/c2 )uα uβ + 12 (ρc2 − P )ηαβ = 0. (7.15)
c
From the occurrence of 2 on the left it follows that GR predicts the existence of
gravitational waves that propagate at the speed of light. The right side of this
equation provides a source for these waves in the same way that the e.m. current j µ
provides the source for electromagnetic waves through the analogous equation 2A µ =
µ0 j µ .
If the gravitational field is static in the rest frame of the fluid, in this frame, and
using h00 = −2Φ/c2 , the 00 component of equation (7.15) reads

∇2 Φ = 4πG(ρ + 3P/c2 ). (7.16)

From this equation we see that GR predicts that pressure is a source of the gravitational
field independently of the energy density that’s associated with it. In the early Universe
and inside very massive stars the energy density is dominated by black-body radiation,
for which P = 31 ρc2 . So for this fluid Poisson’s equation reads

∇2 Φ = 8πGρ. (7.17)

The factor 8 on the right implies that black body radiation is twice as powerful as a
source of gravity as rest-mass energy.

7.4 Deflection of Light by Gravity


Naive treatment A simple back-of-the-envelope argument based on the Strong
Principle of Equivalence shows that light must be deflected by the Sun and allows
us to obtain a quick order-of-magnitude estimate of the magnitude of this effect: the
S.P. of E. implies that the path of a photon beam must be approached by a particle
beam in the limit as the particles’ speed v → c. So let’s calculate the deflection of fast
(but non-relativistic) particles by the Sun.

Since the beam is fast, its deflection will be small, and we can estimate the net
gravitational impulse delivered to each particle by integrating the gravitational force
7.4 Deflection of Light by Gravity 61

along a straight line. We neglect variations in the particle’s speed parallel to this line,
so z ' vt. Hence after a fly-by to within distance b of the Sun, the particle has a
component of velocity perpendicular to the original line of magnitude
Z Z ∞ Z
1 ∞ G M¯ b dz c2 rs (¯) ∞ dζ
v⊥ ' F⊥ dt = 2 2
= ,
m −∞ 0 r r v bv 0 (1 + ζ 2 )3/2

where Pythagoras’ useful result has been pressed into service. The substitution ζ =
sinh θ enables one to show that the integral equals 1. So the beam is deflected through
the small angle
v⊥ rs c2
α' ' 2 .
v v b
In the limit v → c, this tends to rs /b ' 0.87500 for b = R¯ .

Relativistic treatment A proper calculation will show that our neglect of rela-
tivity has cost us a factor of 2, and Murphy’s law notwithstanding, the true deflection
is larger than our naive estimate predicts. In 1919 general relativity hit the headlines
when its prediction for α was confirmed by measurements made during a solar eclipse.
Since 1979 observations of the gravitational deflection of light have become impor-
tant astronomical tools for determining not only the structure of the Milky Way, but
even the scale and the density of the Universe. In these applications the gravitational
field is always weak (|Φ|/c2 ¿ 1), and for this case we can derive a general formula for
deflection by an arbitrary weak gravitational field.
By imposing the harmonic gauge condition (7.11) the metric of a weak, static
gravitational field can be put into the form (see Problem Set 2)
³ Φ´ ³ Φ´
dτ 2 = 1 + 2 2 dt2 − c−2 1 − 2 2 (dx2 + dy 2 + dz 2 ). (7.18)
c c
Let (dx, dy, dz) be the change in the spatial coordinates of a photon in time dt, then
since dτ has to vanish along a photon’s world line, we have from (7.18) that
µ ¶1/2
1 1 − 2Φ/c2
dt = ds,
c 1 + 2Φ/c2 (7.19)
1³ Φ´
' 1 − 2 2 ds
c c
p
where ds ≡ dx2 + dy 2 + dz 2 is the coordinate distance between two points on the
ray. Since Φ ≤ 0, equation (7.19) states that in our coordinate system light propagates
precisely as it would if there were no gravitational field but space were filled by a
medium of refractive index
Φ
n = 1 − 2 2 ≥ 1. (7.20)
c
Thus looking through a region that is permeated by a gravitational field should be
like looking through a sheet of bobbly glass: the images of background light sources
62 Chapter 7: Weak-field gravity

will be shifted in position as well as distorted in shape and changed in brightness by


refraction.
These effects are most readily quantified by use of Fermat’s principle, which states
that the paths taken by light rays between a source and an observer extremize the elapse
of coordinate time as photons pass between source and observer.9 Thus, we determine
the paths for which the light-travel time
Z
1
t= ds n (7.21)
c
is stationary with respect to small changes in the path. Since we are interested in light
paths that are nearly rectilinear, we may orient our coordinate system such that one
coordinate, say z, increases monotonically along the path. When we employ z rather
than s as the integration variable, (7.21) becomes
Z ·³ ´ ¸1/2 Z
dx 2 ³ dy ´2
ct = dz n(x) + +1 ≡ dz f (x, y, x0 , y 0 , z). (7.22)
dz dz
Finding the path x(z) that extremizes this integral is a standard problem in the calculus
of variations. The desired path satisfies the Euler–Lagrange equation
µ ·³ ´ ¸−1/2 ¶ · ¸1/2
d dx 2 ³ dy ´2 dx ∂n ³ dx ´2 ³ dy ´2
n(x) + +1 = + +1 (7.23)
dz dz dz dz ∂x dz dz
We integrate both sides of this differential equation with respect to z between the
source and the observer. Since
·³ ´ ¸1/2
dx 2 ³ dy ´2
ds = + +1 dz,
dz dz
we find ·¸obs Z O
dx ∂n
n(x) = ds . (7.24)
ds source S ∂x
Both the source and the observer are assumed to lie far from the deflecting mass, in
regions within which n = 1, so
¯ ¯ Z O
dx ¯¯ dx ¯¯ ∂n
¯ − ¯ = ds . (7.25)
ds obs ds source S ∂x

9 Physically Fermat’s principle applies because when the time elapse is not stationary, neighbouring
paths allow the observer to ‘see’ the source at different times when it is in different phases of oscillation.
These different ‘views’ average to zero.
7.4 Deflection of Light by Gravity 63

The figure shows that the left-hand side of this equation is the angle through which
the projection onto the xy-plane of the ray from source to observer is bent. We define
αx to be this angle and note that equivalent relations hold for the yz-plane. Hence the
angle between the direction of the ray at the source S and at the observer O is given
by
Z O
α=− ∇⊥ n ds, (7.26)
S

where the integral is along the ray’s path and ∇⊥ denotes the derivative perpendicular
to the path. When we substitute from (7.20) for n, we have
Z O
2 4
α= 2 ∇⊥ Φ ds = ∇⊥ Φ2 , (7.27a)
c S c2

where Z O
1
Φ2 ≡ Φ ds. (7.27b)
2 S

The potential Φ is related to the lens’s mass-density ρ by Poisson’s equation,


2
∇ Φ = 4πGρ. We orient our coordinate system so that the z axis passes through the
observer and is tangent to the light path near the latter’s point of closest approach to
the deflector. Then we integrate Poisson’s equation with respect to z:
Z z2 Z z2 ³
∂2Φ ´
4πΣ ≡ 4πG ρ dz = ∇2⊥ Φ + dz
z1 z1 ∂z 2
Z z2 h ∂Φ iz2 (7.28)
2
= ∇⊥ Φ dz + .
z1 ∂z z1

On account of the smallness of the deflection angle, the integral over z of Φ in the first
term on the right side of this equation differs insignificantly from twice the quantity
Φ2 that is defined by (7.27b). Moreover, the square bracket in (7.28) represents the
gravitational accelerations that the lensing object generates at source and observer. In
practical applications these can be neglected because source and observer are extremely
remote from the lens. Hence (7.28) yields

2πΣ ' ∇2⊥ Φ2 , (7.29)

which is identical with the two-dimensional Poisson equation. Consequently, the po-
tential Φ2 that governs the deflection through equation (7.27a) is the gravitational
potential that would be generated in a 2-dimensional world by the lens’s projected
density Σ.
An important special case is that in which the matter distribution is effectively
that of a point mass M – that is, the deflecting matter distribution is confined well
inside the impact parameter x⊥ of every ray of interest. Then Φ2 (x⊥ ) = GM ln |x⊥ |
and
4GM
α= 2 . (7.30)
c x⊥
64 Chapter 7: Weak-field gravity

In particular, when light from a star that lies behind the Sun just grazes the Sun’s limb,
x⊥ = R¯ so α = 2rs /R¯ = 1.75 arcsec is exactly twice our non-relativistic estimate
(7.18). In the early 1990s the Hipparcos satellite measured the positions of a few ×10 5
stars at various times of the year. Since the positions were accurate to of order one
milliarcsec, the effects of light deflection by the Sun were apparent to large distances
from the Sun.

The situation when the source, mass and observer all lie on a straight line is
described by the figure: rays that encounter the mass at small impact parameter cross
the source–observer line in front of the observer, while rays that pass the mass at
large impact parameters cross behind her. The ray that passes the mass with impact
parameter rE reaches the observer. Since, in the notation of the figure, θS ' rE /DSL ,
θO ' rE /DL and α = θS + θO , it follows with a little algebra from (7.30) that

r r
4GM DSL DL
rE = .
c2 DSL + DL

Although it depends on the relative positions of source, observer and deflector, r E


is usually called the Einstein radius of the deflector. If the source, deflector and
observer are colinear as in the figure, the observer sees a bright ring of radius r E
around the deflector. The angular radius of this ring is the Einstein angle

r s
4GM DSL
θE ≡ . (7.31)
c2 DL (DSL + DL )

When the source and lies to one side of the observer–deflector line, the Einstein ring
degenerates into one or more arcs. An image of the cluster of galaxies Abell 2218 that
was obtained by the Hubble Space Telescope provides several spectacular examples of
this phenomenon.
7.4 Deflection of Light by Gravity 65

When the deflecting mass distribution is not axisymmetric, several images of the
source may form. The Einstein Cross consists of four images of a background quasar
QSO 2237+0305 that happens to lie almost exactly behind a spiral galaxy at redshift
z = 0.04 .10

In general the time required for photons to pass from the source to the observer
is different for each image of the source. Since the luminosity of a quasar typically
varies on timescales of a year and even less, the differences between the times of flight
to each image can be measured by cross-correlating as functions of time the measured
brightnesses of each image. These time differences enable one to constrain the scale of
the Universe, since they are clearly proportional to the distance to, and therefore the
linear scale of, the deflector. For example a delay of 12 ± 3 d between two images in B
0218+357 yields Hubble constant H0 ' 60 km s−1 Mpc−1 .11
The Einstein radius is a dimensionally important quantity because lensing signif-
icantly modifies the appearance of a source that lies within about r E of the deflector–
observer line, while a source that lies further than rE from this line will be seen very
much as it would be if the deflector were not present. It is conventional to say that a
source is lensed if it lies within rE of a deflector.
10 See Ostensen et al., 1996, Astron. Astrophys, 309, 59.
11 Corbett et al., 1996, in proc. IAU Symp. 173; see also Saha & Williams 2003, Astron. J., 125,
2769.
66 Chapter 8: The Schwarzschild Solution

Consider the case in which both the source and the deflector are stars that lie
within the Milky Way:

DSL = DL = 10 kpc = 3.08 × 1019 m ⇒ θE = 0.9(M/M¯ )1/2 mas.

This angle is too small to be measured even with the Hubble Space Telescope. But
it is easy to show that the relative motion of the source, deflector and the Sun will
cause the amount by which deflection magnifies the background star to change within
several weeks. So by monitoring the brightnesses of millions of stars lensing events can
be detected even though their constituent images cannot be resolved. This technique
has proved to be a powerful way of detecting faint objects in the Milky Way – the
objects themselves are too faint to be seen, but they are detected by the effect they
have on luminous background stars.12
The effects of gravitational deflection can also be important well outside r E .
Specifically, when light from an extended source such as a galaxy passes to one side of
a large mass concentration, differences in the deflections suffered by rays that come to
the observer from different points on the source will distort the observer’s image of the
source. In particular, the image will tend to be stretched in the direction perpendicular
to the line on the sky that runs from the source to the mass concentration. This effect
is called weak lensing. By measuring the shapes of galaxy images in the vicinity of
a cluster of galaxies, one can constrain the cluster’s gravitational field. 13

7.5 Summary
GR predicts that a gravitational field makes clocks run slow by a factor 1 − |Φ|/c 2 that
manifests itself in gravitational redshifts. The equations of hydrodynamics, recovered
from ∇µ T µν = 0, predict that pressure augments the inertia of matter: in Euler’s
equation ρ is replaced by ρ+P/c2 . The harmonic gauge condition g µν Γα µν = 0 simplifies
the equations by guiding us to sensible “near Cartesian” coordinates. With its help we
see that GR predicts the existence of gravitational waves, and predicts that pressure
is a source of gravity in its own right, so Poisson’s equation has to be modified to
∇2 Φ = 4πG(ρ + P/c2 ). In the case of ultrarelativistic matter such as black-body
radiation, P = 13 ρc2 so the strength of gravity is effectively doubled. Gravity appears
to endow the vacuum with a non-trivial refractive index n = 1 − 2Φ/c 2 > 1 – this
phenomenon is an aspect of the slowing of clocks that are gravitational potential wells.
The distortion of the images of distant objects to which n 6= 1 gives rise now provides
a crucial probe of the Universe.

8 The Schwarzschild Solution


Now that we have the field equations (6.18) it is natural to seek the solution g that de-
scribes the gravitational field in the solar system. A useful step in this direction would
be to find the metric associated with a point mass in an otherwise empty universe.
12 See Popowski et al., 2004, (astro-ph/0410319)
13 See Kaiser & Squires, 1993, Astrophys. J., 404 441; also Cypriano et al. 2004, Astrophys. J. 613,
95.
Introduction to The Schwarzschild Solution 67

The way we derive most solutions to Einstein’s equations is at root the same
as that by which we are accustomed to solve other partial differential equations, for
example Maxwell’s equations. If we want to find the electrostatic potential inside
a charged spherical surface, we start by looking for potentials of the special form
Φ(r, θ, φ) = R(r)Θ(θ)eimφ . We are not initially certain that such solutions exist, but
we try the idea out anyway in the knowledge that if there are no such solutions we
shall derive inconsistent conditions on R and Θ and thus discover our mistake, but if
no inconsistencies arise, we shall get a valid solution and it will not matter that we
found it by leaping into the dark.
Proceeding in this spirit towards the metric outside a point mass, we first argue
that we should be able to find coordinates in which the metric is diagonal. To see why
this is so, suppose we are given a metric tensor g for some two-dimensional space. Then
from simple matrix algebra we know that at any point in the space we can find two
mutually perpendicular directions, the eigenvectors u and v of g, such that g would be
a diagonal matrix if our coordinate directions coincided with u and v. Now imagine
marking the directions u, v as small crosses on a grid of points in the space. Since
g is a smoothly varying function of position, the orientation of neighbouring crosses
will be similar. Hence we may draw smooth curves through neighbouring crosses, thus
covering the space with a curvilinear grid. Finally, if we are able to label each curve of
this doubly infinite family of curves with numbers (a, b), these numbers will constitute
a valid coordinate system for the space and g will be diagonal in this coordinate system.
If we start from the metric tensor of a 4-space, the situation is fundamentally the
same as in our two-dimensional example; the only difference is that there are now four
special directions at each point. So it is reasonable to conjecture that we can find
coordinates in which the metric of any simple spacetime is everywhere diagonal.
Furthermore, since the gravitational field we seek to describe is time-independent,
we should be able to choose coordinates in such a way that none of the metric coeffi-
cients depends on time. Also the gravitational field will be spherically symmetric, so
there must be closed 2-surfaces on which the geometry is that of a sphere. If we label
these surfaces with the coordinates (r, t) and indicate position on each surface with the
angle variables (θ, φ), we have

ds2 ≡ gµν dxµ dxν


(8.1)
= −D(r)c2 dt2 + A(r)(dθ 2 + sin2 θdφ2 ) + B(r)dr 2 .

We next fix the meaning of r by determining that the sphere with labels (r, t) should
have area 4πr 2 . This yields

ds2 = −D(r)c2 dt2 + r2 (dθ2 + sin2 θdφ2 ) + B(r)dr 2 . (8.2)

The metric now takes the form


 
t −c2 D 0 0 0
r  0 B 0 0 
gµν =  . (8.3)
θ 0 0 r2 0
φ 0 0 0 r 2 sin2 θ
68 Chapter 8: The Schwarzschild Solution

Exercise (22):
By making an appropriate coordinate transformation x0 (x) show that when, as
here, one uses t rather than ct for the 0th coordinate, the 4-vector of a photon
becomes k µ = (ω/c2 , k).
We next calculate the Christoffel symbols. We could proceed directly from (5.20),
but when one wants to calculate large numbers of Christoffel symbols it is generally
more cost-effective to use the procedure described in Box 2. We apply the EL eqn to
the Lagrangian
³ dt ´2 ³ dr ´2 ·³ ´ ³ dφ ´2 ¸
2 2 dθ 2 2
L ≡ −c D +B +r + sin θ (8.4)
dτ dτ dτ dτ
finding
d ³ dt ´
0= D
dτ dτ ·³ ´
d ³ dr ´ 1 2 0 ³ dt ´2 1 0 ³ dr ´2 dθ 2 ³ dφ ´2 ¸
2
0= B + 2c D − 2B −r + sin θ
dτ dτ dτ dτ dτ dτ
³ ´ ³ ´ (8.5)
d 2 dθ dφ 2
0= r − r2 sin θ cos θ
dτ dτ dτ
d ³ 2 2 dφ ´
0= r sin θ .
dτ dτ
After differentiating the products in these equations we can read off the Christoffel
symbols by comparing the resulting equations of motion with (5.32):
D0
Γttr = Γtrt =
2D
B0 r r sin2 θ c2 D 0
Γrrr = Γrθθ = − Γrφφ = − Γrtt =
2B B B 2B (8.6)
1
Γθφφ = − sin θ cos θ Γθθr = Γθrθ =
r
1
Γφφr = Γφrφ = Γφφθ = Γφθφ = cot θ.
r
Hence
B0 2 D0
Γµrµ
= + + , Γµθµ = cot θ, Γµφµ = 0, Γµtµ = 0. (8.7)
2B r 2D
By hard slog and (6.13) one can now obtain
c2 D00 c2 D 0 ³ B 0 D 0 ´ c2 D 0
Rtt = − + + − (8.8a)
2B 4B B D rB
00 0 ³ 0 0´ 0
D D B D B
Rrr = − + − (8.8b)
2D 4D B D rB
r ³ B0 D0 ´ 1
Rθθ = −1 + − + + (8.8c)
2B B D B
2
Rφφ = sin θRθθ (8.8d)
Rµν = 0 µ 6= ν. (8.8e)
8.1 Constants of Motion 69

We require Rµν = 0 everywhere except at r = 0 (where these expressions fail anyway).


Multiplying (8.8a) by B/c2 D and adding the result to (8.8b) yields

B0 D0
=− ⇒ BD = constant. (8.9)
B D

As r → ∞ the metric should become that of flat spacetime for which B = D = 1.


Thus
1
B(r) = ∀ r > 0. (8.10)
D(r)
By (8.8c) the equation Rθθ = 0 now becomes

0 = Rθθ = −1 + rD 0 + D ⇒ D = 1 + constant/r. (8.11)

By (6.26) we know that as r → ∞ and the field becomes weak, D → 1 + 2Φ/c 2 =


1 − rs /r, where M is the mass at the centre and the Schwarzschild radius rs is
defined by
2GM
rs ≡ . (8.12)
c2
Hence we may identify the constant in (8.11) as −rs , giving

rs
D =1− . (8.13)
r

Collecting everything together we have the Schwarzschild metric


   
t −c2 D −c2 (1 − rs /r)
r  D−1   (1 − rs /r)−1 
gµν =  2 = 2 .
θ r r
φ r2 sin2 θ r2 sin2 θ
(8.14)
The metric (8.14) deviates markedly from the metric associated with spherical polar
coordinates (which has gtt = −c2 and grr = 1) for values of r up to a few times
larger than rs . If M has the same mass as the Sun, M¯ = 1.99 × 1030 kg, we find
rs = 2.95 km.

8.1 Constants of Motion

It is clear that a possible solution to the θ-equation of the set (8.5) is θ = π2 ; that
is, a particle can move always in the equatorial plane of the coordinate system. We
shall assume that our coordinate system has been oriented to ensure θ = π2 . The t
equation of the set (8.5) implies that dt/dτ = constant/D. In special relativity, dt/dτ
is constant and we call this constant γ. So let’s call the constant of integration that
arises here γ too. Then we have
dt γ
= . (8.15)
dτ D
70 Chapter 8: The Schwarzschild Solution

Similarly, the φ equation of the set (8.5) implies that r 2 (dφ/dτ ) = constant. Calling
this constant γL, we obtain the statement of angular-momentum conservation

r2 = γL. (8.16)

The physical interpretation of γ is clarified by going back to the definition
−c2 dτ 2 = −c2 Ddt2 + D−1 dr2 + r2 dφ2 of proper time: dividing both sides by dτ 2
and using equations (8.15) and (8.16) we obtain

2 c2 γ 2 1 ³ dr ´2 γ 2 L2
−c = − + + 2 . (8.17)
D D dτ r
Rearranging, we have
c2 c2 1 ³ dr ´2 L2
= − 3 − 2. (8.18)
γ2 D D dt r
Expanding the r.h.s. in powers of rs /r and then using the binomial theorem to take a
square root, we find
µ ³ ´2 h i L2 ¶
−2 c2 rs 1 dr rs
γ =1+c − +2 1 + 3 + ··· + 2 + ··· . (8.19)
2r dt r 2r
Since −c2 rs /2r is the Newtonian potential energy −GM/r, it is clear that γc 2 is just
the energy per unit mass of the orbiting particle, as we might have anticipated by
analogy with the special-relativistic case.
π
With θ = 2 the r-equation of motion is
2 0³
d2 r 1c D dt ´2 1 B 0 ³ dr ´2 r ³ dφ ´2
0= +2 +2 − .
dτ 2 B dτ B dτ B dτ
With (8.10), (8.18) and (8.19) this becomes

d2 r 1 2 2D
0
Dγ 2 L2 1D dr ´2
0= + 2 γ c − − 2 D dτ . (8.20)
dτ 2 D r3
We shall see that in Newton’s theory slightly modified forms of the first, second and
third terms occur. The last represents a new, speed dependent force.

Exercise (23):
From (8.16) and (8.18) show that L2 = r3 c2 D0 /(2D2 ) and hence that the angular
frequency of a circular orbit as seen by an observer at infinity is
r
dφ GM
=
dt r3
exactly as in Newton’s theory.

Exercise (24):
2 dr
Multiply (8.20) by and integrate the result to rederive the energy equation
D dτ
(8.18).
8.2 The Perihelion of Mercury 71

8.2 The Perihelion of Mercury


When Einstein introduced g.r. in 1916, the only significant discrepancy between Newto-
nian dynamics and solar system observations was the rate of advance of the perihelion
of Mercury. One of g.r.’s early triumphs was to account for this discrepancy. We start
by reviewing Newton’s results for motion in the gravitational field of a point mass.

Newtonian motion around a point mass The equation of motion of a par-


ticle in the Newtonian field of a mass M located at the origin is r̈ = −GM r/r 3 =
− 12 c2 rs r/r 3 . On crossing this equation through by r we obtain L̇ = 0 where L is the
angular momentum vector L ≡ r × ṙ. From the constancy of L we deduce that the
motion is confined to the plane L · r = 0 perpendicular to the angular momentum
vector L. Let r and φ be polar coordinates for this plane. Conservation of angular
momentum requires r 2 φ̇ = L, while the equation of motion of r is r̈ −r φ̇2 = − 12 c2 rs /r2 .
Eliminating φ̇ in favour of L the latter reads
d2 r c 2 r s L2
0= + − . (8.21)
dt2 2r2 r3
This is the Newtonian analogue of (8.20): to see this recall that D = 1 − r s /r and
D0 /D ' rs /r2 .
We obtain the shape of Newtonian orbits by eliminating t from (8.21) through the
substitution dt = (r 2 /L)dφ, and eliminating r in favour of a new variable u ≡ 1/r. We
then find
d2 u c2 rs
+ u = . (8.22)
dφ2 2L2
This is just the equation of motion of a simple harmonic oscillator. So the orbit is
given by
1 1
r(φ) = = , (8.23)
u A cos(φ − φ0 ) + 12 c2 rs /L2
where A and φ0 are suitable constants of integration. This is actually the equation of
an ellipse with one focus at the origin. But the most important point is that since the
right side of (8.23) is periodic in φ with period 2π, r(φ + 2π) = r(φ) for any φ and thus
(8.23) defines a closed curve. Consequently, a planet in undisturbed orbit around the
Sun would always come closest to the Sun (in the jargon, “move through perihelion”)
at the same value of φ. Actually the perihelia of all the planets precess, that is, they
move very slowly around the plane of the planet’s orbit.
The planet with the most rapidly precessing perihelion is Mercury because it is
the planet with the shortest year. Its perihelion precesses by 576 seconds of arc (576 00 )
per century. Most of this precession is caused by the gravitational field of Jupiter. 14
14 One may understand how Jupiter causes Mercury’s perihelion to precess by imagining Jupiter’s
mass to be uniformly distributed in an annulus centred on Jupiter’s orbit. This material pulls Mercury
outwards. Hence Mercury’s net acceleration towards the Sun falls off with r more steeply than as
r −2 . This in turn slightly depresses the frequency at which Mercury’s radius oscillates around its
mean value, and these radial oscillations gradually get out of phase with the overall rotation about
the Sun.
72 Chapter 8: The Schwarzschild Solution

In the late 19th century Bessel showed that disturbance of Mercury’s orbit by all the
planets gives rise to a net precession of 53200 per century. Thus Bessel was able to
account for all but 4400 per century of Mercury’s precession. Since Mercury’s year is
0.24 siderial years long, 4400 per century corresponds to 0.10600 per Mercury year.

Relativistic precession Working from (8.20) in close analogy with the our New-
tonian calculation, we eliminate τ between (8.16) and (8.20) to obtain

γL d ³ γL dr ´ 1 2 2 D0 Dγ 2 L2 0 2 2³
1D γ L dr ´2
0= + 2 γ c − − 2 D r4 .
r2 dφ r2 dφ D r3 dφ

We define u ≡ 1/r, substitute for D and divide through by −γ 2 L2 u2 to obtain


µ ¶2
d2 u rs du c2 rs
2
+ u(1 − rs u) + 12 = . (8.24)
dφ 1 − rs u dφ (1 − rs u)2L2

The Newtonian equivalent of (8.24) is equation (8.22). Clearly the former is much
harder to solve than (8.22): on the left the coefficient of u had changed from 1 to
(1 − rs u) and a term proportional to (du/dφ)2 has appeared, while on the right L2
has been replaced by L2 (1 − rs u). But it is immediately apparent that solutions to
(8.24) are unlikely to be periodic with period 2π and thus we do not expect relativistic
orbits around a point mass to be closed. Let us calculated the angle between successive
perihelia and compare it with Bessel’s discrepancy of 0.10600 .
We first obtain the “energy equation” associated with (8.24) by multiplying
2 du
through by and integrating:
(1 − rs u) dφ

1 ³ du ´2 c2
+ u2 = 2 + K, (8.25)
(1 − rs u) dφ L (1 − rs u)

where K is a constant. The angle ∆φ between apo- and perihelion is therefore


Z u2
du
∆φ = p , (8.26)
u1 c2 /L2 + K(1 − rs u) − u2 (1 − rs u)

where u1 and u2 are the smallest and largest values of u along the orbit. The denom-
inator in (8.26) involves a cubic in u. Two roots of the cubic are u 1 and u2 , so if the
third root is u3 the cubic may be written

H(u − u1 )(u2 − u)(1 − u/u3 ), (8.27)

where H is a constant to be determined. Comparing coefficients of u2 and u3 in (8.27)


and the denominator of (8.26) we find
à !
u 1 + u 2 H
u2 : −H 1 + = −1 u3 : = rs ,
u3 u3
8.3 Tests based on Planetary and Pulsar Dynamics 73

so
1 1
u3 = − (u1 + u2 ) ' and H = 1 − rs (u1 + u2 ). (8.28)
rs rs
Thus u3 À max(u1 , u2 ) and with equations (8.27) and (8.28) we can rewrite equation
(8.26) as
Z u2 ³ ´
1 du u
∆φ = √ p 1 + 12 + ···
H u1 (u − u1 )(u2 − u) u3
Z u2
du (8.29)
' [1 + 12 rs (u1 + u2 )] p (1 + 12 urs )
u1 (u − u1 )(u2 − u)
' π[1 + 32 rs 21 (u2 + u1 )].
For Mercury 21 (u1 + u2 ) ' 1/rMerc = 1/(5.83 × 107 km), so the perihelion of Mercury
should advance in one Mercury year by
rs
3π ' 0.098300
rMerc
in excellent agreement with Bessel’s discrepancy.
In 1975 Hulse & Taylor discovered a pulsar, PSR 1913+16, that proved to be
one component of a tight and eccentric binary: the binary period is 7 34 h and the
eccentricity is e = 0.617. The periastron of this orbit has been shown to precess by
4.22◦ yr−1 . Both components have mass close to M = 1.42 M¯ and are presumably
neutron stars, although pulses are detected from only one of them. Thus PSR 1913+16
is a system in which general relativity is of prime importance rather than a marginal
correction.

Exercise (25):
Show that the semi-major axis of the orbit of PSR 1913+16 is a = 1.9 × 10 9 m,
about three times the radius of the Sun, and that the each neutron star moves
with a speed of order 220 km s−1 .

8.3 Tests based on Planetary and Pulsar Dynamics


If one claims to know the orbits of the planets and g in the intervening space, one can
calculate the time for a signal to pass from one planet to another or the time required
for an e.m. signal to reach us from a specified point outside the Solar System. G.R.
can be tested by comparing these calculated times with observed delays. There are
two main types of experiment to consider: (i) a signal goes out from Earth, bounces
off a planet or satellite within the Solar System and returns to us; (ii) a steady stream
of signals reaches us from a pulsar after traversing the Solar System.
In each case the time ∆τ (t) required for the signal to reach us is a complex function
of the parameters (“orbital elements”) that define planetary orbits and any relevant
pulsar orbits. In practice these parameters have to be adjusted to optimize the fit
between the calculated and observed values of ∆τ . Thus these experiments not only
test g.r.; they also refine our knowledge of the structure of the Solar System and certain
pulsars.
74 Chapter 8: The Schwarzschild Solution

Bouncing signals within the solar system The earliest work involved bounc-
ing radar signals off the inner planets. One measures the delay before the first signals
return. This gives ∆τ

There are two important difficulties:


(i) The reflecting planetary surface is not a smooth mirror. Hence the returning pulse
has a complex shape. One looks for the leading edge of the pulse and tries to use
frequency information:

(ii) The most interesting lines of sight pass close to the Sun. Free electrons near the
Sun cause the refractive index to differ from unity.
Later experiments concentrated on timing signals sent to artificial satellites. Since
a satellite is too small to give a detectable radar reflection, one programmes the satellite
to respond to a pulse from Earth by emitting a similar pulse after a known small delay.
With this technique one does not have to worry about planetary topography. By
sending signals at several frequencies one can eliminate the effect of dispersion by free
electrons along the line of sight.
Analysis of these data has to proceed via a computer program which adjusts
orbital elements, the masses of the planets and asteroids, the oblateness of the Sun,
the orientation of an inertial coordinate system, etc., until the fit of the predicted ∆τ ’s
to the observed ∆τ ’s is optimized. One finds that the agreement with g.r. is excellent.
8.3 Tests based on Planetary and Pulsar Dynamics 75

The quality of the fit is normally judged by calculating predictions from the met-
ric15
h
rs ³ r ´2 i ³ rs ´ 2
s
2 1
ds = − 1 − α + 2 β 2 2
c dt + 1 + γ [dρ + ρ2 (dθ2 + sin2 θdφ2 )], (8.30)
ρ ρ ρ

where α, β and γ are dimensionless parameters to be determined by fitting the calcu-


lated to the observed ∆τ ’s. If we identify ρ with
£ p ¤
ρ ≡ 12 r − 21 rs + r(r − rs ) , (8.31)

this metric agrees with the Schwarzschild metric (8.14) up to order rs /r in space and
(rs /r)2 in time when α = β = γ = 1. (In the equations of motion the tt-component
of gµν is multiplied by the largest components of v µ .) Hence if Einstein was right, the
observations should lead to α ' 1 etc. Data from missions to Mercury & Mars give

α − 1 = (2.1 ± 1.9) × 10−4


β − 1 = (−2.9 ± 3.1) × 10−3
γ − 1 = (−0.7 ± 1.7) × 10−3
J2 = (−1.4 ± 1.5) × 10−6

where J2 is a parameter describing the oblateness of the Sun.


It is interesting that the precision of these measurements is such that
(i) they determine the inertial frame of reference as accurately as can be done by
looking right across the Universe at quasars with redshift z = 2 (see below);
(ii) they furnish the best estimates of the mass of the asteroid Ceres (the old value
proved to be in error by 15%);
(iii) Dirac speculated that Newton’s “constant” might decrease as the Universe ex-
pands. These measurements yield Ġ/G = (0.2 ± 0.4) × 10−11 yr−1 .

Pulsar timing The discovery of PSR 1913+16 in 1975 facilitated a dramatic


extension and refinement of results based on solar-system dynamics. By virtue of its
spin, the pulsar is an accurate clock that is carried around a fast and eccentric orbit
in a strong gravitational field. The time taken for the electromagnetic pulses it emits
to reach Earth is affected by
(i) the positions as functions of time of PSR 1913+16 and the Earth – g.r. has to be
used to calculate these to the required accuracy;
(ii) variations in the gravitational redshift of the pulsar as it moves closer to and
further from its companion;
(iii) variations in the effective refractive index of the vacuum along the line of sight
from Earth to the pulsar – the moving gravitational fields of PSR 1913+16’s
15 This may be thought of as generated by expanding the functions B and D of (8.2) in powers of
rs /r.
76 Chapter 8: The Schwarzschild Solution

companion and objects in the solar system all make non-negligible contributions
to the measured delays;
(iv) evolution of the pulsar orbit that is driven by the radiation of gravitational waves.
The evolution of the orbit is predicted by calculating the energy and angular mo-
mentum that the waves should carry away in a given time, and then adjusting the
orbit to ensure global conservation of E and L. Different variants of g.r. predict differ-
ent rates of E and L loss. Only Einstein’s original (and simplest) theory successfully
predicts the observed evolution of the period Ṗ = −2.4 × 10−12 .

8.4 The Schwarzschild Singularity


For r = rs ≡ 2GM/c2 , the component gtt of the Schwarzschild metric (8.14) vanishes.
Hence the trajectory r = rs is null rather than time-like. Furthermore, since gtt changes
sign at r = rs , the trajectory r = constant < rs is space-like. Consequently an explorer
who penetrates to r < rs is doomed: no matter how hard he fires his rockets, his
trajectory must remain time-like. Hence he cannot pass from the condition dr/dτ < 0
through the condition dr/dτ = 0 as he must if he is to escape. He is carried down to
r = 0 as surely as you and I are carried into next year.
It is interesting to investigate this predicament more closely. Suppose for simplicity
that our explorer’s angular momentum L is zero and that at t = τ = 0 he is falling
towards the centre at radius r0 with the speed he would have picked up had he fallen
all the way from rest at infinity. Then evaluating (8.17) at infinity we find that the
constant γ is one. Hence, by (8.17) the elapse of time on his watch as he falls to r s is
Z rs Z
dτ 1 r0 dr
∆τ = dr = √
r0 dr c rs 1−D
Z r0 (8.32)
1 √ 2 ¡ 3/2 3/2
¢
= √ r dr = √ r0 − rs ,
c r s rs 3c rs

which is perfectly finite. Furthermore, he clearly reaches r = rs with dr/dτ < 0. Hence
he would be well advised to fire his rockets before he reaches rs .
Why does grr diverge at r = rs ? Is this divergence caused by gravity or our
choice of coordinates? It is straightforward, if tedious, to check that no components of
the curvature tensor Rµ ναβ diverge at rs . So our explorer can endure the tidal forces
he experiences if he is stocky enough. The reason grr diverges at rs turns out to be
that Schwarzschild’s coordinate system assigns to all events that occur at r s the time
coordinate t = ∞. As a specific example, let us calculate the time coordinate at which
our explorer crosses r = rs :
Z τ Z rs
dt dt dτ
t= dτ = dr.
0 dτ r0 dτ dr

With (8.15) and (8.32) this becomes


Z rs Z rs 3/2
dr 1 r dr
t= √ =√ = ∞. (8.33)
r0 D 1 − D r s r0 r − r s
8.4 The Schwarzschild Singularity 77

Thus no matter when our explorer sets off, an observer who uses Schwarzschild’s co-
ordinates always assigns t = ∞ to the event at which the explorer crosses r = r s . We
should not be surprised that such a foolish convention leads to a singular metric; if
we choose coordinates qi in ordinary space in such a way that all points on the edge
of a ruler are assigned the same three numbers qi , an expression for the length of the
ruler in terms of the coordinates of the ruler’s ends is going to involve multiplication
by some awfully big numbers!
To bring this problem under control we need to choose a new coordinate system.
In 1960 M. Kruskal showed that when new coordinates (r 0 , t0 ) are defined through
³ ´
02 02 2 r
r − t = rs − 1 er/rs
rs
³ ct ´ (8.34a)
0 0 cosh(ct/r s) − 1 0
t =r = r tanh
sinh(ct/rs ) 2rs
the metric takes the non-singular form
2 2 rs −r/rs
ds2 = r2 (dθ2 + sin2 θdφ2 ) + 4(dr 0 − dt0 ) e . (8.34b)
r
The lines r 0 = constant are always timelike. Radially directed photons move along the
45◦ lines dr 0 = ±dt0 in the (r 0 , t0 ) plane. In particular, the null line r = rs becomes
r0 = t0 . If we plot curves of constant r and t in the (r 0 , t0 ) plane, we get a picture like
this

It is now obvious that Schwarzschild’s coordinates (r, t) break down as r 0 = t0 is ap-


proached. To first order in ct/rs (8.34a) becomes t0 ' 12 ctr 0 /rs , so t0 may be considered
a stretched form of t at r = ∞. Near r = rs , t0 ' r0 and by (8.34a) all events cor-
respond to large t as expected. The region t0 > r0 corresponds to r < rs . At r = 0,
2 2
corresponding to t0 − r0 = rs2 , there is a bona-fide singularity in the gravitational
field.
The Schwarzschild radius rs corresponding to the mass of the Sun is 2.96 km. The
black holes that power quasars and other very active galactic nuclei have Schwarzschild
radii between the radius of the Sun and that of the Earth’s orbit.
78 Chapter 9: Cosmology

Exercise (26):
Show that a cubic light-year of water (supposed incompressible) would be con-
tained within its Schwarzschild radius.

8.5 Summary
The metric outside a point mass can be written to look like that of ordinary spherical
polar coordinates with 1 → (1 − rs /r) in the tt slot and 1 → 1/(1 − rs /r) in the rr slot.
The singularity of these correction factors when r = rs = 2GM/c2 is not physically
interesting. However the geometry of spacetime is singular at r = 0 and r = r s is
special in that an “outward” running photon on this sphere would actually not move
away from the centre.
The Schwarzschild metric accounts for the last 10% of the precession of Mercury’s
perihelion and for the measured bending of light by the Sun. The magnitude of both
these effects is of order n × rs /r, where n ∼ 4 and r is the smallest distance of the test
body from the Sun. Detailed studies of the Solar System’s dynamics show that any
errors in the g.r.’s corrections to Newtonian dynamics are smaller that ∼ 0.1%.

9 Cosmology

9.1 Empirical Basis


Between 1920 and 1928 it became clear that the Universe is populated by countless
galaxies like the Milky Way, and that these are receding from one another with veloc-
ities that are proportional to separation. If we follow the trajectories of these galaxies
back in time, we find that some 1010 yr ago the mean density of the Universe must
have been extremely high. Indeed, a naive extrapolation leads to the conclusion that
a finite time in the past any density was reached, no matter how great.
In 1946 G. Gamow at Cornell, and 20 years later R. Dicke in Princeton, argued
that the large abundance (about 25% by weight) of He in the present Universe could
have been generated some minutes after the formation of the Universe if a black-body
radiation field fills the present Universe. The first estimate of the current temperature
of this radiation field was 25 K, but this later fell to ≈ 3 K. In 1964 A. Penzias & R.
Wilson at Bell Labs discovered this cosmic background serendipitously. This triumph
of the big-bang theory quickly killed all interest in attempts to construct a steady-state
cosmology.
It is now known that the spectrum of the cosmic background is accurately Planck-
ian with T = 2.7 ± 0.1 K. An observer who moves with respect to the centre of our
Galaxy at ≈ 400 km s−1 in a certain direction would see the same spectrum in all di-
rections, to within a few parts in 105 . At any point in the Universe a natural standard
of rest is defined as that of an observer whose cosmic background is isotropic. Such
observers are called fundamental observers. Any two fundamental observers recede
from one another with a speed v ≈ D/13.6 Gyr, where D is their separation. 16
16 Astronomers write v = H D with H = 72 ± 5 km s−1 Mpc−1 .
0 0
9.2 Friedmann Metrics 79

Constructing the unit n-sphere

1-sphere: (x1 , x2 ) = (sin φ, cos φ)


2-sphere: (x1 , x2 , x3 ) = (sin φ sin θ, cos φ sin θ, cos θ)
3-sphere:
··· (x1 , x2 , x3 , x4 ) = (sin φ sin θ sin η, cos φ sin θ sin η, cos θ sin η, cos η)
n-sphere: (x1 , . . . , xn+1 ) = (sin θ1 sin θ2 . . . sin θn , . . . , cos θn−1 sin θn , cos θn )

As the Universe expands, the photons of the cosmic background are doppler shifted
to lower frequencies and the temperature characterizing their distribution falls.

9.2 Friedmann Metrics


The first step towards finding a solution of Einstein’s equations to describe the expand-
ing Universe is to choose a good coordinate system. The cosmic radiation background
is a great help in this: we may say that two events occur at the same place if they
occur on the world-line of a single fundamental observer. Similarly, two events that
occur at different places may be said to occur simultaneously if the background tem-
perature measured by fundamental observers local to those events are the same. With
this natural division into space and time we would expect ds2 to be of the form
ds2 = −c2 dt2 + gij dxi dxj , (9.1)
g is the metric of a 3-space of simultaneous events.
The structure of g is strongly restricted by the fact that fundamental observers
observe the cosmic background to be highly isotropic: the photons they receive were
last scattered at a point several thousands of millions of light years away, at a time
when the mean density of the Universe was about 109 times its present value. In fact,
until these photons collide with an observer’s telescope they have been flying freely
through space since the Universe was a mere 10−4 of its present age. Consequently,
when a fundamental observer compares the temperature he sees in the forward and
backward directions, he is comparing physical conditions in the early Universe at points
that are now separated by thousands of millions of light years. Since these conditions
are found to be identical to within a few parts in 10,000 we conclude that the Universe
is extremely homogeneous on any time-slice t = constant. Hence the geometry of such
a space, which is described by g, should be extremely homogeneous too.
A theorem in differential geometry states that any homogeneous and isotropic
3-space must be a scaled version of one of three basic models:

(i) Flat space Obviously this admits spherical polar coordinates in which the
line element can be written
ds2 = dr 2 + r2 (dθ2 + sin2 θdφ2 ). (9.2)
80 Chapter 9: Cosmology

(ii) The 3-sphere Suppose we parameterize the coordinates of points x in a


4-dimensional Euclidean space (nothing to do with spacetime) by
(x1 , x2 , x3 , x4 ) = a(sin ψ sin θ cos φ, sin ψ sin θ sin φ, sin ψ cos θ, cos ψ).
P
Then it is easy to show that µ x2µ = a2 . Hence as we vary the three angles (ψ, θ, φ)
the point x moves over a 3-sphere. The small vector ∆(φ) that joins two points whose
coordinates differ only by a small change δφ in φ is
∂x
∆(φ) = δφ
∂φ
= a(− sin ψ sin θ sin φ, sin ψ sin θ cos φ, 0, 0) δφ.
Similarly,
∆(θ) = a(sin ψ cos θ cos φ, sin ψ cos θ sin φ, − sin ψ sin θ, 0) δθ
∆(ψ) = a(cos ψ sin θ cos φ, cos ψ sin θ sin φ, cos ψ cos θ, − sin ψ) δψ.
It is straightforward to check that these three small vectors are mutually perpendicular.
Hence when we move by an arbitrary small amounts (δψ, δθ, δφ) over the sphere, the
distance traversed δs is given by
δs2 = |∆(ψ) |2 + |∆(θ) |2 + |∆(φ) |2
(9.3)
= a2 (δψ 2 + sin2 ψδθ2 + sin2 ψ sin2 θδφ2 ).
If we introduce a new coordinate in place of ψ
r ≡ a sin ψ ⇒ dr 2 = (a2 − r2 ) dψ 2 , (9.4)
and define the curvature K of the sphere as
1
K≡ , (9.5)
a2
then (9.3) becomes
dr2
ds2 = + r2 (dθ2 + sin2 θ dφ2 ). (9.6)
1 − Kr 2

Notice that the 2-sphere with area 4πr 2 has radius aψ > r. Thus within the 3-sphere
the areas of the members of a nested sequence of 2-spheres increase more slowly than
they would in Euclidean space. (Similarly, for concentric small circles on a two sphere
circumference/2π increases more slowly than radius.)
9.2 Friedmann Metrics 81

(iii) Hyperbolic space If we set K = 0, the line element (9.6) of the 3-sphere
becomes the line-element (9.2) of flat Euclidean space. The line element of the only
other homogeneous, isotropic 3-space is given by (9.6) with K set equal to a negative
number. This space is called hyperbolic space, and is harder to visualize than the
3-sphere. The characteristic property of hyperbolic space is that in it a 2-sphere with
area 4πr 2 has radius
Z r ³ p ´
dr 1
R= p =p sinh−1 r |K| < r.
0 1 + |K|r 2 |K|

That is, in this space the areas of a sequence of nested 2-spheres increase faster than
in Euclidean space.
In summary, a spatial section of simultaneous events must form either a 3-sphere,
flat space or hyperbolic space. In each case the line element may be expressed in the
form (9.6) with an appropriate value of K.
We want to use coordinates on these spatial sections such that the coordinates of
each fundamental observer are constant. These are called comoving coordinates.
Since fundamental observers are receding from one another, it follows that our desired
coordinates cannot at all times coincide with those in which the line element takes the
form (9.6). However, if at one time, for example now, the comoving coordinates (r, θ, φ)
are such that the line element is of this form, then at an earlier time, when fundamental
observers were closer to one another, the separation δs between neighbouring observers
was some fraction a(t) of their current separation. Hence at all times the metric of
spacetime can be written
· ¸
2 2 2 2 dr2 2 2 2 2
ds = −c dt + a + r (dθ + sin θdφ ) , (9.7)
1 − Kr 2

where K is the curvature of the current time-slice t = t0 and a(t0 ) = 1.


Using the trick of Box 2 we obtain the eqns of motion by applying the Euler-
Lagrange equations to
· ³ ´¸
2 2 2 ṙ2 2 2 2 2
L = −c ṫ + a + r θ̇ + sin θφ̇ .
1 − Kr 2

where a dot denotes d/dτ . Using the convention that a prime denotes d/dt the equa-
tions of motion are
· ³ ´¸
d 2 0 ṙ2 2 2 2 2
0= (−c ṫ) − aa + r θ̇ + sin θφ̇
dτ 1 − Kr 2
µ ¶ · ³ ´¸
d a2 ṙ 2 ṙ2 Kr 2 2 2
0= −a + r θ̇ + sin θφ̇
dτ 1 − Kr 2 (1 − Kr 2 )2 (9.8)
d 2 2
0= (a r θ̇) − a2 r2 sin θ cos θ φ̇2

d ³ 2 2 2 ´
0= a r sin θφ̇ .

82 Chapter 9: Cosmology

From the first equation we read off the non-vanishing Γs with top index t:

aa0 aa0 r2 aa0 r2 sin2 θ


Γtrr = ; Γtθθ = ; Γtφφ = . (9.9a)
c2 (1 − Kr 2 ) c2 c2

The equation of motion for r cleans up to

2a0 Kr
0 = r̈ + ṫṙ + ṙ2 − r(1 − Kr 2 )(θ̇2 + sin2 θφ̇2 )
a 1 − Kr 2
from which we read off the non-vanishing Γs with top index r:

a0 Kr
Γrtr = ; Γrrr = ; Γrθθ = −r(1 − Kr 2 ) ; Γrφφ = −r(1 − Kr 2 ) sin2 θ (9.9b)
a 1 − Kr 2
The angular equations of motion are

2a0 2 2a0 2
0 = θ̈ + ṫθ̇ + ṙ θ̇ − sin θ cos θ φ̇2 ; 0 = φ̈ + ṫφ̇ + ṙ φ̇ + 2 cot θ θ̇φ̇
a r a r
so the remaining non-vanishing Γs are

a0 1
Γθtθ = ; Γθrθ = ; Γθφφ = − sin θ cos θ
a r (9.9c)
a0 1
Γφtφ = ; Γφrφ = ; Γφθφ = cot θ.
a r

9.3 The Cosmological Redshift


We know that the Universe is expanding because we observe the frequencies of spectral
lines from distant galaxies to be shifted towards lower frequencies. It turns out that
the magnitude of this spectral shift is related in a remarkably simple way to the scale
of the Universe when the light by which we see galaxies set out towards us.
The redshift z is defined by
ωemit
1+z ≡ .
ωobserve

If we elevate our status to that of a fundamental observer, and suppose that the atoms
that emit the radiation we receive were stationary with respect to a local fundamental
observer, then k 0 = ωemit /c2 on emission of a photon and k 0 = ωobs /c2 on its observa-
tion.17 The definition (5.14) of the affine parameter s fails when applied to a trajectory
xµ (λ) of a photon. Instead we define s by requiring that

dxµ
= k µ (s), (9.10)
ds
17 See Exercise (22).
9.4 Field Equations for Friedmann Cosmologies 83

where k µ is the wavevector (ω/c2 , k) of the photon. The equation of motion of the
photon is 0 = k µ ∇µ k ν . Multiplying this equation through by ds/dt, we find for the
time component of the resulting equation
· ¸
ds dxµ ∂ω/c2 t γ
0= + Γµγ k
dt ds ∂xµ
(9.11)
dω/c2 ds
= + Γtµγ k µ k γ .
dt dt

We evaluate this for a radially propagating photon. Henceforth using the convention
that ȧ = da/dt, (9.9a) states that Γtrr = ȧgrr /(ac2 ) while (9.10) gives ds/dt = 1/k 0 =
c2 /ω, so (9.11) yields
dω ȧ ¡ ¢ c2 ȧ
= − grr k r k r = − ω,
dt a ω a
where we have used the null property of k µ in the form grr k r k r + gtt (ω/c2 )2 = 0.
Integrating we get
ωemit a(tobs )
1+z = = .
ωobs a(temit )
In words, 1 + z gives the factor by which the Universe has expanded since the photons
we receive were emitted. Notice that this result has been obtained without using
Einstein’s equations to determine the dynamics of the Universe.

9.4 Field Equations for Friedmann Cosmologies

When using equations (9.9) in (6.13) to calculate Rαβ , it is helpful to isolate all terms
that involve a t index. One finds

∂Γµtµ ä
Rit = Rti = 0 Rtt = + Γjtk Γktj = 3
∂t a
∂Γtij
eij −
Rij = R + 2Γtik Γkjt − Γtij Γktk
∂t
· µ ¶2 ¸
eij − ä ȧ gij
=R +2 ,
a a c2

eij is the Ricci tensor of the 3-space whose metric is


where R
µ ¶
2 1 2 2 2
gij = a diag , r , r sin θ .
1 − Kr 2

e ∝ g. Hence
Since the 3-space is homogeneous and isotropic, it is obvious that R
e err . A tedious
it is only necessary to calculate one non-zero component of R, say R
calculation yields
eij = − 2K gij .
R (9.12)
a2
84 Chapter 9: Cosmology

Hence  3ä 
 a 
 −f (t)grr /c2 
Rαβ = , (9.13a)
 −f (t)gθθ /c2 
−f (t)gφφ /c2
where µ ¶2
2Kc2 ä ȧ
f (t) ≡ + + 2 . (9.13b)
a2 a a
We now turn our attention to the right side of the Einstein equations (6.18). We
take T to be the energy-momentum tensor (7.4) of a fluid that is at rest in the frame
of the local fundamental observer. With T of the form (7.4), Tαα = 3P − ρc2 . With
our (t, r, θ, φ) coordinates, uα = (1, 0, 0, 0), uα = (−c2 , 0, 0, 0), and the tt-equation of
the set (6.18) reads
3ä 8πG
= − 2 ( 32 P + 21 ρc2 ). (9.14a)
a c
The rr-equation reads
· µ ¶2 ¸
2Kc2 ä ȧ grr 8πG
− 2
+ +2 2
= − 4 12 (ρc2 − P )grr . (9.14b)
a a a c c
Eliminating ä between these equations yields the cosmic energy equation
ȧ2 + Kc2 = 83 πGρa2 . (9.15)
We also have the equation of mass-energy conservation
µ ¶
αβ α β αβ uα uβ
0 = ∇ β T = u u ∇β ρ + g + 2 ∇β P + (ρ + P/c2 )∇β (uα uβ ), (9.16)
c
where we’ve used (5.30). Now
β
∇β (uα uβ ) = ∂β (uα uβ ) + Γα γ β α γ
γβ u u + Γγβ u u .

With α = t this yields ∇β (ut uβ ) = Γββt = 3ȧ/a. For α = t the first term in (9.15) is ρ̇
and the second vanishes, so we find

2 3ȧ dρa3 3a2 P


0 = ρ̇ + (ρ + P/c ) ⇒ =− 2 . (9.17)
a da c
There are three possible contributers to the cosmic energy density.

Rest-mass energy The random motions of galaxies with respect to the cosmic
−1
background radiation are < ∼ 1000 km s ¿ c, as are the random motions of particles
within galaxies. So in the frame of the local Fundamental Observer, the energy of
such matter is dominated by its rest mass and we may adopt for T the formula (6.20)
for dust, or equivalently (7.4) for a perfect fluid with P = 0. Numerically, ρ dust >

10−27 kg m−3 = 5.6 × 108 eV m−3 . Given that P = 0, equation (9.17) implies ρdust ∼
1/a3 as we would expect naively.
9.4 Field Equations for Friedmann Cosmologies 85

Relativistic matter At early times the Universe was so hot that its constituent
particles had thermal velocities near c. Moreover, even at the present time photons of
the cosmic background radiation form such a relativistic gas. We know from thermo-
dynamics that in its rest frame the pressure of a photon gas is one third of its energy
density. Hence, in the frame of a Fundamental Observer the energy-momentum tensor
of such a relativistic gas is given by (7.4) with ρ = 3P = ρrad . Eliminating P from
(9.17) we find that ρrad ∼ 1/a4 .

Exercise (27):
Recover ρrad ∼ 1/a4 by considering the adiabatic expension of a gas with ratio of
principal specific heats γ = 43 .
At the present epoch the energy density contributed by the cosmic background is
as (2.7)4 ' 1.9 × 105 eV m−3 , which is significantly smaller than the rest-mass energy
density of dust. However, since ρrad /ρdust ∼ 1/a, for a < −4
∼ 10 radiation will have been
dominant.

Vacuum energy The vacuum is a complex, non-linear dynamical system: it


carries fields (electro-magnetic, electron, muon, quark. . . ) that obey field equations
that are sometimes non-linear and are always coupled to one another by non-linear
terms. According to quantum-field theory, even in the ground state the fields have
non-zero mean-square values by virtue of zero-point fluctuations. When you calculate
the energy-density to which these fluctuations give rise, you obtain the answer infinity.
While this result is not encouraging, it does lead to a valid experimental prediction: if
you calculate the zero-point energy per unit volume in the space between two grounded
capacitor plates, you obtain a (formally infinite) expression that depends on the sep-
aration between the plates, s. The differential of this energy w.r.t. s is finite and
positive. Thus the energy density between the plates rises as the plates move apart.
By conservation of energy, you have to work on the plates to get them apart – the
plates attract one another (the Casimir effect). This prediction has been confirmed
experimentally.
The Casimir effect suggests that differences in zero-point vacuum energy are phys-
ical, even if baseline values are not, and we conjecture that when the energy-density
of the vacuum is for any reason greater than its minimum value, the excess energy is
classically manifest. A vacuum with excess energy is called a ‘false vacuum’. Obvi-
ously a vacuum must be Lorentz invariant, so the energy-momentum tensor of a false
vacuum must be a multiple of the metric tensor. Thus

Tµν = −λgµν (λ a constant). (9.18)

In a locally freely-falling frame gµν = ηµν , so a positive energy density corresponds


to λ > 0. It follows that a false vacuum exerts a negative pressure; P = −λ. When
we plug P = −ρc2 = λ into (9.17) we find ρ = constant, so the energy-density of
the vacuum is unchanged by cosmic expansion. A simple physical argument shows
the connection between negative pressure and constant energy-density: Imagine what
happens when we increase by dV the volume of a cylinder containing a false vacuum.
86 Chapter 9: Cosmology

The false vacuum’s mass increases by ρvac dV , so its energy increases by ρvac c2 dV . The
latter increase must equal the work done on the piston, −P dV . Thus the pressure of
the false vacuum is P = −ρvac c2 .

Note:
The constant λ has units of energy density. In 1917, from a desire to construct a
static universe, Einstein replaced Gµν in the field equations by Gµν − Λgµν . He
called Λ, which has units of length−2 the cosmological constant. It is easy to
see that Λ = 8πGλ/c4 .
We now return to equation (9.15) and replace ρ(t) by ρ(t0 ) times a(t)−n , where
a(t0 ) = 1 and n = 3, 4, 0 for the cases of dust, radiation and vacuum energy, respec-
tively. We find 
 8πG ρ(t0 ) − Kc2
 (dust)


 3a
8πG
ȧ2 = ρ(t0 ) − Kc2 (radiation) (9.19)

 3a 2

 8πG a2 ρ(t ) − Kc2 (vacuum).

0
3
Currently the Universe is expanding, so ȧ > 0. Equation (9.19) states that, if it
is matter dominated, it will expand for ever if K ≤ 0. But if K > 0 (the case in which
spatial sections are 3-spheres), the expansion will cease when

8πGρ(t0 ) 1 ρ(t0 )
a= 2
= 10 2
× −27 .
3c K (7.5 × 10 light yr) K 10 kg m−3
Thus our longevity hangs ultimately on how the radius of curvature of the Universe
compares with some tens of billions of light years.

Exercise (28):
Integrate (9.19) in the case of dust to show
p ½ p
c |K| θ − 12 sin 2θ when K > 0 [θ ≡ arcsin( a/am )]
t(a) = 1 p
am 2 sinh 2θ − θ when K < 0 [θ ≡ arcsinh( a/am )]

Sketch a(t) in the two cases.


The special case K = 0 divides a doom-laden future from one of ultimate boredom. In
this case the present density is given by
¯
3ȧ2 ¯¯
ρcrit (t0 ) = . (9.20)
8πGa2 ¯t0

The distance between nearby fundamental observers, ∆s ' a(t)∆r, increases at a


rate ȧ∆r = (ȧ/a)∆s. Thus (ȧ/a) is the quantity H in Hubble’s relation v = Hs. Its
current value lies near 75 km s−1 Mpc−1 in idiotic astronomical units; this translates
to 2.43 × 10−18 s−1 , so
ρcrit (t0 ) = 1.06 × 10−26 kg m−3 . (9.21)
9.5 Inflation 87

The best observational evidence suggests that the actual density of matter is a factor
of several lower than this: unless vacuum energy is significant, the future is more likely
to be boring than otherwise. Note that if ρ ≤ ρcrit , the Universe is spatially infinite
and contains infinite mass, while if ρ > ρcrit the total mass is finite.

Exercises (29):
(i) Show for a dust-dominated universe with K = 0 that a = (t/t0 )2/3 . Hence
estimate the age of the Universe if ρ(t0 ) = ρcrit (t0 ).
p
(ii) Show for a radiation-dominated universe with K = 0 that a = t/t0 .
(iii) Show that in Newton’s theory the radial coordinate a(t) of a particle embedded in
a homogeneous spherical cloud of mutually gravitating particles which are initially
receding from the origin with speeds proportional to radius, obeys (9.15). Identify
the analogue of K in this case.

9.5 Inflation

When a thermodynamic system is rapidly expanded and therefore adiabatically cooled,


it is liable to ‘supercool’ when it encounters the temperature at which a phase transition
would occur if it were slowly cooled. A classic example of this phenomenon is water
vapour in a Wilson cloud chamber: a sudden expansion supercools the vapour just
before debris from a collision flies through, and water droplets rapidly condense along
the tracks of the debris.
Since the vacuum is a complex, non-linear dynamical system, it is expected to
exhibit phase transitions. In 1981 Alan Guth of M.I.T. pointed out 18 that supercooling
at the temperature of a transition could have caused the vacuum to stumble temporarily
into a false vacuum. Then the cosmic scale factor would obey the third option in
equation (9.19) and we have
µr ¶
8πGλ 8πGλ
ä = a ⇒ a(t) = a(0) exp t . (9.22)
3c2 3c2

Grand unified theories of the strong, weak and electromagnetic force suggest that the
time constant associated with this exponential growth is ≈ 10−34 s.

Exercise (30):
Let the present age of the Universe be tH and the distance over the current time-
slice t = tH to the most distant fundamental observer it is in principle possible to
see be DH . Show that if the Universe had inflated from t = 0 to the present day
we would have DH = ctH , while we would have DH = 2ctH if the Universe had
been always flat and radiation-dominated. The furthest fundamental observer we
can see is said to be on the particle horizon. [Hint: use 0 = grr dr2 + gtt dt2 .]
Guth’s inflationary conjecture has two very seductive properties:

18 Phys. Rev., D23,347.


88 Chapter 9: Cosmology

(i) It offers an explanation of why the Universe is so homogeneous on a large scale by


suggesting that everything we see may have emerged from the explosive expansion
of a single causally-connected fluctuation in the preinflationary Universe.
(ii) It offers an explanation of why ρ(t0 )/ρcrit (t0 ) ' 1: with the definition (9.20) of
ρcrit the cosmic energy equation (9.15) can be written

ρ(t) Kc2
=1+ 2 . (9.23)
ρcrit (t) ȧ

Whatever the initial value of K, after a sufficient number of e-folding times ȧ


becomes enormous and the deviation of each side of (9.23) from unity becomes
extremely small.
The inflationary period is supposed to have ended when the vacuum finally made the
phase transition into the lower-energy configuration, releasing its former energy density
as normal thermal radiation.
Extraordinarily, several astronomical phenomena are easier to explain if we live
in a universe that is now mildly dominated by vacuum energy-density. 19 If vacuum
energy-density is indeed significant now, it will soon become dominant and we must
be at the start of a new inflationary episode. This proposition is harder to believe
than that the long chain of astronomical inference upon which it rests is somewhere
defective.

9.6 Cosmic Strings


It is thought that when the vacuum changed its phase from a symmetric high-
temperature form to a less symmetrical low-temperature form, discontinuities may
have arisen that would have persisted to the present day. The general idea is illus-
trated by what happens when a lump of iron cools in zero magnetic field through the
Curie temperature Tc (at which iron becomes ferromagnetic). At Tc groups of atoms
here and there in the lump decide to align their spins in some common direction. Since
the direction is chosen at random, widely separated groups choose different directions.
So long as the groups remain isolated they can all grow by convincing adjacent un-
committed atoms to align with them. But eventually the swelling groups touch each
other – the lump has become a mass of interlocking domains. Between the domains
are regions of high B and therefore of large magnetic energy. So it is energetically
desirable for each domain boundary to shrink. But usually the boundary around one
domain can shrink only if the boundaries of adjacent domains grow. So the domains
are effectively locked into place.
When the Universe cools two-dimensional domain boundaries may form, but the
most important discontinuities are one-dimensional – strings. The complex field ψ
associated with charged particles such as electrons can give rise to a string like this. 20
Imagine that it is decided that the field shall everywhere have amplitude |ψ| = 1 and
19 e.g., Efstathiou et al., Mon. Not. R. Astr. Soc., 303 L47.
20 The treatment here is a little oversimplified inasmuch as it neglects the fact that for electrons ψ
is a Dirac spinor rather than a scalar.
9.6 Cosmic Strings 89

you are told to specify its phase 0 ≤ arg(ψ) ≤ 2π throughout space. You decide to
set arg[ψ(x)] = φ(x), where φ is the usual cylindrical-polar coordinate of the point
x. This assignment works fine everywhere except at your coordinate origin, r = 0.
Here ∇ arg(ψ) diverges since any phase can be reached arbitrarily close to r = 0. It
is not hard to persuade oneself that by adjusting the values of ψ in any finite volume
you can move but not eliminate this singularity, which is associated with a line of
energy-momentum. This is a cosmic string.
What does the energy momentum tensor T look like in the narrow tube around
r = 0 in which T 6= 0? We’d expect T to be Lorentz invariant with respect to boosts
parallel to the string’s line. So in the (t, z) plane T has to be proportional to the
Minkowski metric. Also it’s hard to see how the string could be carrying anything in
the x or y directions. So
 
−c2 0 0 0
 0 0 0 0
Tµν = −ρc2  , (9.24)
0 0 0 0
0 0 0 1

where ρ is a constant.
Now consider the line element

ds2 = −c2 dt2 + r02 (dθ2 + sin2 θdφ2 ) + dz 2 , (9.25)

where r0 is a constant. This is almost the line element ds2 = −c2 dt2 +dr 2 +r2 dφ2 +dz 2
of flat spacetime in cylindrical polars; r0 θ is a kind of radial variable. The only non-zero
Christoffel symbols generated by (9.25) are

Γθφφ = − 12 sin 2θ ; Γφφθ = Γφθφ = cot θ.

The only non-zero components of the Ricci tensor are

Rθθ = Rφφ = −r0−2 .

Thus R = −2r0−2 and the Einstein equations (6.17) read


 
r0−2 0 0 0
β  0 0 0 0  8πG β
Rα − 21 δαβ R =   = − 4 Tα
0 0 0 0 c
0 0 0 r0−2
  (9.26)
1 0 0 0
8πGρ  0 0 0 0
=  .
c2 0 0 0 0
0 0 0 1

Hence with ρ > 0 (which corresponds to a positive energy density and tension in the
string) the metric (9.25) solves Einstein’s equations inside the string.
90 Chapter 9: Cosmology

What we really need is the metric outside the string, where we live. Let the outer
surface of the string be θ = θm . Then the exterior metric is
³ cos2 θ ´
2 2 2 2 2
ds = −c dt + r0 2 2
dθ + sin θdφ + dz 2 . (9.27)
cos2 θm
This metric obviously joins smoothly to the interior metric (9.25) on θ = θ m . To show
that it is a vacuum solution of Einstein’s equations, we transform to a new coordinate
set (t, r 0 , φ0 , z), where the t and z coordinates are the old ones and
sin θ
r 0 ≡ r0 ; φ0 ≡ cos(θm )φ. (9.28)
cos θm
The metric (9.27) now becomes
ds2 = −c2 dt2 + dr 02 + r02 dφ02 + dz 2 , (9.29)
which is just the cylindrical-polar metric of flat spacetime. But on a large scale the
spacetime outside the string is very odd because the range of φ0 is (0, 2π cos θm ). [This
follows from (9.28) and the fact that φ is in (0, 2π)]. Consider for example a large circle
r0 = a À r0 . The radius of this circle is
Z a

R= gr0 r0 dr0 ' a, (9.30a)
0

while its circumference is


Z

C= gφ0 φ0 dφ0 = a2π cos(θm ). (9.30b)

So the usual flat-space relation C = 2πR does not apply. Thinking about a cone may
help to clarify this strange state of affairs. At each point a cone is flat in the sense
that it can be made out of a piece of paper without stretching the paper (you can’t
make a paper sphere as easily), but circles distance a from the cone’s apex have a
circumference smaller than 2πa.
How could we detect a cosmic string? Our best bet is to look for lines of grav-
itationally lensed objects. To understand how a string lenses an object, think of the
exterior space as a piece of paper with a wedge of angle
θdef ≡ 2π(1 − cos θm ) (9.31)
cut out and corresponding points along the cuts identified. Place the object to be
lensed at radius r 0 = aq on the cut and yourself directly opposite at r 0 = ao .
A Matrix Manipulation 91

Rays travel over the paper in straight lines, so you can see the object along two
lines of sight separated by 2αs , where

sin(π − 12 θdef ) sin αs


q = .
a2o + a2q + 2ao aq cos(π − 12 θdef ) aq

The largest possible value of αs is clearly 12 θdef . It should be possible to detect a


cosmic string by looking for a line in the sky either side of which lie members of pairs
of similar objects.
The mass per unit length µ of the string would follow immediately from θ def : from
the interior metric (9.25) it follows that the string’s cross-sectional area is
Z θm Z 2π
A= r0 dθ r0 sin θdφ = 2πr02 (1 − cos θm ).
0 0

Hence using (9.26) we have that the string’s mass per unit length is µ = ρA = c 2 (1 −
cos θm )/(4G) = c2 θdef /(8πG) independently of the string’s physical width r0 . There
won’t be room outside the string for the Universe as we know it unless µ < 14 c2 /G =
3.37 × 1026 kg m−1 . Particle theorists think strings may exist with line densities of
order a thousandth of this.

9.7 Summary

The cosmic microwave background defines a natural coordinate system for cosmology.
On large scales the Universe appears to be strikingly homogeneous and isotropic. This
implies that equal-time hypersurfaces must have the geometry of either (i) the 3-sphere,
(ii) flat space, or (iii) hyperbolic space according as the mean cosmic density ρ is greater
than, equal to, or less than ρcrit ' 10−26 kg m−3 . It is widely believed that ρ = ρcrit
although measurements suggest a smaller value.
The cosmic scale when the light we detect from a distant object was emitted
can be deduced from the redshift z of the object’s spectrum: 1 + z = ωemit /ωobs =
a(tobs )/a(temit ). The most distant objects are seen at an epoch when a was smaller
than now by more than a factor 5.
The expansion of the Universe will cease only if ρ > ρcrit . At early times we
always have ρ ' ρcrit and the cosmic scale grows as a ∝ t2/3 . If the wild speculations
of high-energy physicists are to be believed, very early on there may have been an
inflationary phase in which a ∝ eγt and the entire observable Universe grew out of a
single quantum fluctuation. If the calibrations of astronomers are to be believed, about
two thirds of the energy density in the Universe is currently contributed by vacuum
energy, and the Universe is just starting on a second inflationary episode.

A Appendices
92 A Appendices

A Matrix Manipulation

Many calculations in relativity are best performed by matrix multiplication. Con-


ventionally the first index i on a matrix Aij labels a row and the second, j a column.
Then we form the product A · B by summing over adjacent indices:
¡ ¢ X
A·B ik
= Aij Bjk
j

Thus to evaluate Aλ ν ≡ gµν B λµ we first rearrange to ensure that we are summing over
adjacent indices:
Aλ ν ≡ gµν B λµ
= B λµ gµν = (B · g)λ ν .
We may have to transpose a tensor to do this:

Aν λ ≡ Bµν C µλ
λ
= (B T )νµ C µλ = (BT · C)ν .

In particular:
(i) to raise/lower first index, premultiply by g – in special relativity this just changes
the sign of the top row;
(ii) to raise/lower second index, postmultiply by g – in special relativity this just
changes the sign of the left column.
Doubly contracted 2nd rank tensors are just the trace of a product matrix:
X¡ ¢
Aµν Bµν = A · BT µµ
µ

= trace(A · BT ).

The epsilon symbol In special relativity we define the Levi-Civita symbol by


(
0 if any two indices equal
²αβγδ = = +1 if α, β, γ, δ cyclic permutation of 0,1,2,3 (special rel. only).
= −1 if α, β, γ, δ anticyclic permutation of 0,1,2,3

It’s easy to see that on raising each index with an η we get the same pattern for the
up symbol.
The ² symbols are useful for taking determinants:

²αβγδ |A| = ²κλµν Aκα Aλβ Aµγ Aνδ


²αβγδ |B| = ²κλµν B κα B λβ B µγ B νδ
B Derivation of Rα β ;β = 12 R;α 93

etc.
Transforming to a curvilinear coordinate system we find
α β γ δ
0 αβγδ ∂x0 ∂x0 ∂x0 ∂x0 λκµν
² = ²
∂xκ ∂xλ ∂xµ ∂xν (9.1)
∂(x0 ) αβγδ
= ²
∂(x)
But ¯ ¯
¯ 0 αβ ¯ ¯ ∂x0 α ∂x0 β κλ ¯
¯g ¯ = ¯ ¯
¯ ∂xκ ∂xλ η ¯
µ ¶2
∂(x0 )
=− .
∂(x)
So we can write (9.1) as p
αβγδ
²0 = −|g µν |²αβγδ .
(9.2)
p p
Similarly, ²0 0123 = −|gµν | = 1/ −|g µν |. Hence in g.r. the ² symbols are not made
p
up of nought and one, but of nought and −|g µν |. Also in g.r. the up and down forms
of ² are distinct.
In general ² has two jobs: (i) it extracts the totally antisymmetric parts of tensors;
(ii) it maps one-to-one totally antisymmetric nth rank tensors into totally antisymmet-
ric tensors of rank (4 − n). The correspondence F ↔ F is an example of this map at
work.

B Derivation of Rα β ;β = 12 R;α

We can calculate Rα β ;β at a point X most cheaply as follows. We adopt a locally freely


falling coordinate system at X. In this system there are no pseudo-forces at X, so Γ
(but not its derivatives) vanishes there. Consequently, at X covariant derivatives are
equivalent to partial derivatives, and we obtain from (6.13a,b) and (5.20)

β ∂ h βγ ³ ∂Γµαµ ∂Γµαγ ´i
Rα ;β = g −
∂xβ ∂xγ ∂xµ
∂ h ∂ ³ ∂gµν ´i
= 21 β g βγ γ g µν (B.3)
∂x ∂x ∂xα
∂ n ∂ h ³ ∂gαν ∂gνγ ∂gγα ´io
− 21 β g βγ µ g µν + − .
∂x ∂x ∂xγ ∂xα ∂xν
Since the covariant derivative of g always vanishes, and Γ = 0 at X, all first derivatives
of g must vanish at X. Dropping from Rα β ;β all terms which contain a first derivative
of g, we find
³ ∂ 3 gµν ∂ 3 gαν ∂ 3 gνγ ∂ 3 gγα ´
β 1 βγ µν
Rα ;β = 2g g − − + .
∂xα ∂xβ ∂xγ ∂xβ ∂xγ ∂xµ ∂xα ∂xβ ∂xµ ∂xβ ∂xµ ∂xν
94 A Appendices

The second and fourth terms cancel because in each case two of the partial derivatives
are contracted together and the third is contracted with an index of the component of
g being differentiated. Hence

β 1 ∂ h βγ µν ³ ∂ 2 gµν ∂ 2 gνγ ´i
Rα ;β = 2 g g − . (B.4)
∂xα ∂xβ ∂xγ ∂xβ ∂xµ

From the definition (6.15) of the Ricci scalar R we have in our special coordinate
system
³ ∂Γµ ∂Γµβγ ´
βγ βµ
R=g −
∂xγ ∂xµ (B.5)
³ ∂2g ∂ 2
g ∂ 2
g ∂ 2
g ´
µν νβ νγ γβ
= 21 g βγ g µν − − + .
∂xγ ∂xβ ∂xγ ∂xµ ∂xµ ∂xβ ∂xµ ∂xν
The first and fourth terms are equal, as are the second and third. Comparing this
expression with (9.4) we obtain the desired relation.

You might also like