Classical Fields
Classical Fields
Part I:
Relativistic Covariance
Prof. J.J. Binney
Oxford University
Michaelmas Term 2005
Vacation work: Study §1 Relativistic Covariance and work the eight em-
bedded Exercises
1 Relativistic Covariance
Observers who move relative to one another do not always agree about the values of
quantities, such as speed, mass, energy etc, associated with the same physical system.
The special theory of relativity tells us how we may predict the values measured by
any observer once we know the values assigned by one particular observer, for example
ourselves.
Special relativity teaches us to think of experience as being made up of ‘events’,
each with a definite location in the four-dimensional continuum of spacetime. Any
given observer assigns to each event a unique 4-tuple of numbers (t, x, y, z). Of course
he can do this in many, many ways. But special relativity claims that there are certain
specially favoured systems for assigning coordinates to events, the so-called inertial
coordinate systems. O chooses one inertial system and another observer, O 0 , sets up
a different one. But according to special relativity the coordinates (t 0 , x0 , y 0 , z 0 ) O0
assigns to any event can be related to O’s coordinates (t, x, y, z) of the same event by
ct0 ct0 ct
0
x x0 x
0= + L · , (1.1)
y y0 y
z0 z0 z
where c is the speed of light and (t0 , x0 , y0 , z0 ) is a set of numbers characteristic of the
two observers, as is the 4 × 4 matrix L.
Clearly, (t0 , x0 , y0 , z0 ) are the coordinates O0 assigns to the event that marks the
origin of O’s coordinates. For simplicity we shall assume that (t 0 , x0 , y0 , z0 ) = 0. In
general L can be represented as the product of matrices generating a rotation, a boost
parallel to a coordinate direction and a second rotation: L = R 0 ·L0 ·R, where R rotates
the coordinate axes so as to align the boost direction with a coordinate direction, L 0
effects the boost along the given axis and R0 rotates the coordinates to any desired
final orientation. If R is chosen such that the x-axis becomes the boost direction, L 0
has the form
γ −βγ 0 0
−βγ γ 0 0 β ≡ v/c
L0 = where p . (1.2)
0 0 1 0 γ ≡ 1/ 1 − β 2
0 0 0 1
t0 = γt − γvx/c2
x0 = γx − γvt
(1.3)
y0 = y
z0 = z
2 Chapter 1: Relativistic Covariance
xµ xµ = −c2 t2 + x2 + y 2 + z 2 .
Notice that here as everywhere else, we are summing over one up and one down index.
In order to stick rigidly to this rule, we define
−1 0 0 0
0 1 0 0
η µν ≡ ηµν ≡ . (1.8)
0 0 1 0
0 0 0 1
Note:
We have η µγ ηγν = δνµ , or in matrix form η · η = I, where I and δνµ are two ways
of writing the 4 × 4 identity matrix. Also η µν = η µγ δγν , so in a sense η is merely
the up-up and down-down forms of the identity matrix.
From xµ we can recover xµ ;
xµ = η µν xν . (1.9)
Λµ λ ≡ ηµν Λν κ η κλ . (1.11)
Λµ κ Λµ ν = δκν , (1.12)
Exercise (1):
Obtain (1.12) from the requirement that for any two vectors x, y, we have x 0µ y 0µ =
xµ y µ .
Vectors with their indices below are called covariant (xµ ). Vectors with indices
above are called contravariant (xµ ). I shall call them down and up vectors. The
operation of setting two indices equal and summing from 0 to 3 is calledP
contraction.
In a contraction one index must be up and one down. Quantities like µ xµ xµ have
nothing to do with physics. An important motivation for writing x µ rather than x
is to distinguish the up from the down form of x. Often an expression is equally
valid for up or down vectors provided the basic rules are obeyed, and then it is neater
to use conventional vector notation than to stick in indices. For example, if a and
b are vectors and M is a matrix, we can interpret a = M · b as aµ = M µν bν , as
aµ = Mµν bν , or in yet other ways. But if you ever express a 4-vector in component
form, you must come clean and say whether you’re giving the up or the down vector,
as in xµ = (ct, x, y, z).
On some things all observers agree, for example the charge and total spin of the an
electron. These quantities are called 4-scalars or relativistic invariants. The length
of a 4-vector is a 4-scalar.
Notes:
(i) The Lorentz transformation matrix Λ is dimensionless, so ω has to be divided by c
to give the same dimensions as k before being put into the last place of a 4-vector
with k.
1.3 6-tuples (antisymmetric 2nd rank tensors) 5
(ii) Vectors written in italic boldface (k) are 3-vectors, while those written in Roman
boldface (k) are 4-vectors.
If we define k 0 ≡ ω/c, then
k0 = Λ · k i.e., k 0µ = Λµ ν k ν . (1.14)
Exercise (2):
Determine whether the photon is blue or red shifted between its emission by O
and its detection by O0 . Relate this to the question of whether O0 is approaching
or receding from O.
The length of a photon’s 4-vector is the scalar
ω2
|k| ≡ −(k 0 )2 + (k 1 )2 + (k 2 )2 + (k 3 )2 = − + |k|2 = 0.
c2
One can prove that this really is a scalar by brute force:
|k0 | = −(k 00 )2 + (k 01 )2 + (k 02 )2 + (k 03 )2
³ ω ´2 ³ ω ´2
1 1
= − γ − βγk + − βγ + γk + (k 2 )2 + (k 3 )2
c c
³ ´ ω2 ³ ´
= −γ 2 1 − β 2 2 + γ 2 1 − β 2 (k 1 )2 + (k 2 )2 + (k 3 )2
c
= −(k ) + (k ) + (k 2 )2 + (k 3 )2 .
0 2 1 2
0 E 0 x /c E 0 y /c E 0 z /c
−E 0 x /c 0 B0z −B 0 y 0µν
≡F = Λµ κ Λν λ F κλ . (1.17)
−E 0 y /c −B 0 z 0 B0x
−E 0 z /c B0y −B 0 x 0
Exercise (3):
Transform F κλ with the matrix Λµ ν defined by (1.13b) to show that an ob-
server who moves at speed v down the x-axis of an observer who sees fields E =
(Ex , Ey , 0) and B = 0, perceives fields E 0 = (Ex , γEy , 0) and B 0 = (0, 0, γvEy /c).
[Hint: since Λ is symmetric, we can write F0 = Λ·F·Λ.] Hence deduce the general
rules Ek0 = Ek , E⊥0
= γ(E⊥ + v × B), Bk0 = Bk , B⊥ = γ(B⊥ − v × E/c2 ). Verify
2 2
that (B 2 − E 2 /c2 ) = (B 0 − E 0 /c2 ).
Some 6-tuples correspond to elements of area. This correspondence works as
follows. With any two displacements, say u and v, we associate the parallelogram
bounded by u and v. Information about the size and orientation of this parallelogram
is conveyed by the antisymmetric tensor S αβ ≡ uα v β − uβ v α ; in particular, if u = v,
then S = 0. S has fewer degrees of freedom than the eight numbers involved in u and
v because we can add to u any multiple of v without affecting S, and vice versa for v
and u.
Exercise (4):
Consider transformation u → u0 = au + bv, v → v0 = cu + dv with the corre-
sponding mapping S → S0 . Show that the equation S0 = S imposes one constraint
on the numbers a, b, c, d. Hence only 8 − 3 = 5 numbers are needed to specify S.
Give a geometrical interpretation of this result.
In three-space the size and orientation of a parallelogram may be specified by
giving the magnitude and direction of the normal. Hence in three-space full infor-
mation about an antisymmetric 2nd rank tensor can be packed into the three com-
ponents of the 3-vector which we call the cross-product of the parallelogram’s sides.
In four-dimensional spacetime each parallelogram has a magnitude and two mutually
perpendicular normals, requiring five numbers for its full specification. Consequently
there is no direct analogue of the cross product and we must represent areas directly
with antisymmetric tensors.
Exercise (5):
Relate the above statements to the number of independent components of an
antisymmetric n × n matrix for n = 2, 3, 4.
1.3 6-tuples (antisymmetric 2nd rank tensors) 7
parallelogram:
S µν (ηµκ ηνλ S κλ ) ≡ S µν Sµν = − Tr S · S
= (uµ v ν − uν v µ )(uµ vν − uν vµ ) = 2[|u||v| − (u · v)2 ].
Note:
Here by Tr M we mean Mα α = M α α . That is, the sum implied by Tr must always
be over one up and one down index.
Evaluation in the particle’s rest frame shows that the scalar 12 Hµν H µν = [|x||p| −
(x · p)2 ] = −(m0 cr0 )2 , where r0 is the distance (in the rest frame) between the particle
and the origin at t = 0.
It is interesting to evaluate this same scalar for the Maxwell field tensor. Straight-
forward matrix multiplication shows that the down-down shadow of F µν is1
0 −Ex /c −Ey /c −Ez /c
E /c 0 Bz −By
Fµν ≡ x (SI units), (1.19)
Ey /c −Bz 0 Bx
Ez /c By −Bx 0
Multiplying each element of Fµν by the corresponding element of F µν we find
m ≡ 12 Fµν F µν = − 12 Tr F · F
= 21 (each element of Fµν ) × (corresponding element of F µν ) (1.20)
= (B 2 − E 2 /c2 ).
Note:
Whereas when n is odd, the cyclic interchange i1 → i2 → . . . → in−1 → in → i1
is an even permutation of the ik , when n is even, this permutation is odd. (To
prove this exchange i1 and in and then make n − 2 exchanges to work i1 back to
the second place.) So whereas for 3-dimensional tensors ²jki = ²ijk , we now have
²βγδα = −²αβγδ .
²αβγδ allows us to form the dual F of F:
αβ
F ≡ 21 ²αβγδ Fγδ
0 Bx By Bz
−Bx 0 −Ez /c Ey /c (1.22)
= .
−By Ez /c 0 −Ex /c
−Bz −Ey /c Ex /c 0
f ≡ Tr F · F
αβ
= −(each element of Fαβ ) × (corresponding element of F ) (1.23)
4
= E · B.
c
Exercise (6):
Show that with Sµν = uµ vν − uν vµ , Tr S · S = 0. This result explains why S has
only 5 degrees of freedom (Exercise 4).
Exercise (7):
Show that a uniform magnetic field parallel to the z-axis is associated with tension
(negative pressure) along the axis, and pressure in the perpendicular directions.
As an example of T consider a plane e.m. wave running along î polarized parallel
to ĵ. Then
E = (0, E, 0) cos(ωt − kx)
B = (0, 0, B) cos(ωt − kx).
E and B are related by −∂B/∂t = ∇ × E ⇒ B = kE/ω = E/c. Hence
N = (E 2 /µ0 c, 0, 0) cos2 (ωt − kx).
The first term in our expression (1.30) is non-zero only on the diagonal. The second
term is non-zero only in the yy and zz slots and there cancels the first term. So P is
1 0 0
E2
Pij = 0 0 0 cos2 (ωt − kx),
µ0 c 2
0 0 0
10 Chapter 1: Relativistic Covariance
and finally
1 1 0 0
1 1 0 0 E2
T µν = cos2 (ωt − kx). (1.31)
0 0 0 0 µ0 c 2
0 0 0 0
The stress tensor P has only an entry in the xx slot because our wave is engaged in the
business of carrying x-type momentum in the x-direction; the wave would push back
a mirror placed in a plane x = constant. Clearly the Poynting vector is also directed
along the x axis, which accounts for the off-diagonal units in T. In proper relativistic
units the wave employs unit energy density (“capital employed”) to carry unit fluxes of
energy and momentum (“turnover”). Notice that the wave’s phase is the scalar −k · x.
When we do cosmology we’ll need T µν for a fluid. At each event a fluid has
a streaming motion that’s characterized by the 4-velocity uα and an associated rest
frame. In this rest frame there’s an energy density ρc2 and a pressure P . If the fluid
is “perfect” there are no other stresses (such as viscous shear) and we’ll only consider
perfect fluids. T µν has to be a symmetric second-rank tensor made from the scalars ρ
and P , the vector uµ and the tensor η µν . A candidate is
T µν = (ρ + P/c2 )uµ uν + P η µν . (1.32)
It’s the tensor we want because in the fluid’s rest frame it becomes
2
ρc 0 0 0
0 P 0 0
.
0 0 P 0
0 0 0 P
Notes:
(i)
∂ ∂φ
µ
operates on scalars to produce vectors: Gµ ≡ ≡ ∂µ φ ≡ φ,µ
∂x ∂xµ
∂
µ
operates on vectors to produce 2nd rank tensors:
∂x
∂Aν
Gµν ≡ ≡ ∂ µ Aν ≡ A ν ,µ
∂xµ
∂
operates on tensors to produce higher-rank tensors:
∂xµ
∂Bλν
Gµλν ≡ ≡ ∂µ Bλν ≡ Bλν ,µ
∂xµ
The operand’s indices can be either up or down: Gµ ν = ∂µ Aν .
(ii) If we contract the tensor produced by operating on a vector, we get a scalar, the
4-divergence ψ = ∂µ Aµ .
(iii) We can reduce the number of indices on a higher-rank tensor by contraction:
Aν = ∂µ Gµν .
(iv) The 4-analogue of taking the curl of a vector is to antisymmetrize the tensor
formed by operating on a vector: Fµν = (∂µ Aν − ∂ν Aµ ). If Aν = ∂ν φ, then
Fµν = 0 because partial derivatives commute.
(v) A natural generalization of the divergence theorem reads
Z I
4 ∂Tα...
d x = (d3 x)µ Tα... , (1.38)
V ∂xµ S
where S is the boundary of the 4-d region V . Notice that T may have as many
indices as it pleases and that one of them may be contracted with µ if you wish.
Example:
In e.m. the usual vector potential A and the electrostatic potential φ form the
four components of an up vector
Fµν = ∂µ Aν − ∂ν Aµ . (1.40)
a continuous way. Then the coordinates xµ of points on the curve are continuous
functions xµ (λ). For δλ ¿ 1 the small vector
dx
δx ≡ δλ
dλ
almost joins two points on the curve. Hence it is time-like and |δx| < 0. For any two
points A and B on the curve, we define
Z Bs ¯ ¯
1 ¯ dx ¯
τ≡ −¯¯ ¯¯ dλ (1.41)
c A dλ
to be the proper time difference between A and B along the curve. If the curve is a
straight line, we may transform to the coordinate system in which x µ = (ct, 0, 0, 0) at
all points on the curve, and then
Z r
1 B dct d(−ct)
τ= − dλ = [tB − tA ]. (1.42)
c A dλ dλ
Hence the name. We regard the coordinates xµ of events along the trajectory as
functions xµ (τ ) of the proper time. Differentiating w.r.t. τ and multiplying through
by the rest mass m0 we obtain a 4-vector, the momentum
dx
p ≡ m0 . (1.43)
dτ
From the zeroth component of the up version of this equation we have dt = γdτ ; the
hearts of passengers on a fast train (they mark off units of τ ) appear to beat slowly to
a medic on the station platform (whose watch keeps t).
The zeroth component is by Poisson’s equation equal to ρ/(c²0 ) = cµ0 ρ, where ρ is the
charge density. By Ampere’s law, the last three of these equations are equal to µ 0 j,
where j is the current density. Hence if we form a 4-vector
j µ = (cρ, jx , jy , jz ), (1.47)
F µν ,ν = µ0 j µ . (1.48)
Exercises (8):
(i) Show that when λ, µ and ν equal 1, 2 and 3 respectively, (1.49) becomes ∇·B = 0.
µν
(ii) Show that with equation (1.22) equation (1.49) may also be written F ,ν = 0.
Charge conservation is expressed as
µ0 ∂ · j = µ0 j µ ,µ = F µν ,νµ = 0, (1.50)
A0 ≡ A + ∂Λ, (1.54)
where Λ(x) is any scalar-valued function of space-time coordinates. The change (1.54)
in A is called a gauge transformation.
Gauge transformations can be used to ensure that A satisfies an additional equa-
tion. In particular, given A we can choose Λ s.t. A0 satisfies one of these gauge
conditions:
(i) Lorentz gauge:2
∂ · A0 = 0 ⇒ 2Λ = ∂ · A (1.55)
The Lorentz condition (1.55) does not uniquely specify A0 since many non-trivial
functions satisfy 2φ = 0 and so given one Λ satisfying the 2nd of eqs (1.55), we
can construct many others Λ0 = Λ + φ.
(ii) Coulomb or radiation or transverse gauge
∇ · A0 = 0 ⇒ ∇2 Λ = ∇ · A (1.56)
i.e., in this gauge the electrostatic potential satisfies Poisson’s eqn, which explains
the gauge’s name.
1.7 Summary
The special theory of relativity requires that any physical quantity must fit into an
n-tuple of numbers, where n = 1, 4, 6, 10, . . .. Physical laws must be expressed as
equations connecting the n-tuples associated with different physical quantities. These
equations must be constructed in accordance with the rules of tensor calculus, which
permit only:
(i) the multiplication of n-tuples to form either higher-rank n-tuples (as in H µν =
xµ pν − xν pµ ) or lower-rank n-tuples (as in fµ = Fµ ν Jν ), or
2 We denote the d’Alembertian opertor by 2 ≡ ∂ ∂ µ by analogy with the notation 4 ≡ ∇2 = ∂ ∂ i
µ i
for the Laplacian operator.
1.7 Summary 15
dx
v= ; p = m0 v ; J = qv
dτ
dp
f =F·J ; =f
dτ
µν
Fµν = ∂µ Aν − ∂ν Aµ ; F µν ,ν = µ0 jµ ; F ,ν = 0,
µν
where F µν ≡ η µγ η νδ Fγδ and F ≡ 21 ²µνγδ Fγδ . The energy-monentum tensor of
the e.m. field is
1 £1 ¤
T µν = 4 Tr(F · F)η µν − F µ γ F γν .
µ0
16 Chapter 2: Groups & their representations
We define ζ ≡ (X + iY )/R. It’s clear that the phase of ζ will be the same as the phase
of x + iy. So from X 2 + Y 2 = R2 ζζ ∗ and the results we already have, it follows that
ζ 1 − ζζ ∗
x + iy = 2R ; z=R . (2.3)
1 + ζζ ∗ 1 + ζζ ∗
We fix the length of the complex 2-vector (Pauli spinor) η ≡ (η1 , η2 ) by setting
R = |η1 |2 + |η2 |2 so we have simply
Exercise (9):
Show that
x = η † σx η y = η † σy η z = η † σz η, (2.6a)
where η † is the complex-conjugate-transpose of η and
µ ¶ µ ¶ µ ¶
0 1 0 −i 1 0
σx ≡ ; σy ≡ ; σz ≡ (2.6b)
1 0 i 0 0 −1
are the Pauli spin matrices. Notice that they are Hermitian and that [σi , σj ] =
2i²ijk σk .
Bearing in mind that |η2 |2 = R − |η1 |2 , let’s arrange the orginal coordinates into
a matrix: µ ¶ µ ¶
1 z x − iy |η1 |2 − 12 R η1 η2∗
X≡ 2 = , (2.7)
x + iy −z η2 η1∗ |η2 |2 − 12 R
which can also be written
Xij = ηi ηj∗ − 12 Rδij . (2.8)
e where
The transformation η → ηe ≡ U · η maps X → X
To this point we have confined ourselves to unitary matrices in order to preserve the
normalization |η|2 = R. However, a general linear transformation η → ηe = Mη
induces the transformation
µ ¶ µ 0 ¶
∗ 1 ∗ 1 R + z x − iy † 1 R + z 0 x0 − iy 0
ηi ηj = Xij + 2 Rδij → ηei ηej = 2 M M =2 .
x + iy R − z x0 + iy 0 R0 − z 0
(2.10)
If we impose the restriction det(M) = ±1, we will be making a transformation such that
R02 − x02 − y 02 − z 02 = R2 − x2 − y 2 − z 2 . Hence, if we set R = ct, we will be performing
a Lorentz transformation. The 2 × 2 complex matrices with unit determinant are
considered to form the group SL(2,C) (SL = special linear).
Exercise (10):
Show that with R = ct we can complement equations (2.6a) with
ct = η † Iη. (2.11)
The rotations are the sub-group of the Lorentz group that are obtained by requir-
ing M to be not merely of unit determinant, but unitary. The 2 × 2 unitary matrices
with unit determinant form the group SU(2). Thus we have shown that SL(2,C) can
be mapped into the Lorentz group, and SU(2) can be mapped onto SO(3).
Notice that these transformations cannot change the sign of of R = ct, so they do
not include reversals of time. It turns out that they do not include inversions of space
either. The mappings are not 1-1 because −M induces the same transformation of
space-time as does M. So we have found a representation of the subgroup of proper
orthochronous Lorentz transformations or proper Lorentz group for short.
In classical physics spinors are no more than mathematical devices. But the
amplitudes a± for a spin-half particle to have its spin up or down along any chosen
axis transform under Lorentz transformations like the components of a spinor.
2.1 Generators
It’s easy to show that the (Hermitian) Pauli matrices [eq. (2.6b)] all square up to the
identity matrix: σi2 = I. Let n be a unit vector, then this property applies equally to
the matrix µ ¶
nz nx − iny
σn ≡ n · σ = . (2.12)
nx + iny −nz
We define the exponential of iθσn through the power series
θ2 θ3
eiθσn = I + iθσn − σn2 − i σn3 + · · ·
2! 3!
³ θ 2 ´ ³ θ3 ´ (2.13)
= 1− + ··· I + i θ − + · · · σn
2! 3!
= cos θ I + i sin θ σn .
2.1 Generators 19
Moreover, eiθσn contains three free parameters (θ and the two angles required to spec-
ify the direction n). Given that any rotation can be specified by three parameters
(for example the Euler angles), we might suspect that the unitary matrix required to
generate any rotation can be obtained as eiθσn for appropriate θ and n. In fact, eiθσn
is the matrix that rotates the coordinates by angle −θ/2 about the axis n – as one
may easily verify when n is one of the coordinate vectors i, j or k.
Exercise (11):
Show that “rotating” η with the matrix
µ ¶
e−iφ/2 0
sz (φ) ≡ (2.15)
0 eiφ/2
has the effect of rotating the (x, y, z) coordinates through φ about the z axis.
What happens to η when the (x, y, z) axes are rotated through 2π?
Since the Pauli matrices enable us to generate any member of SU(2) through this
mechanism, we refer to them as the generators of SU(2). (To be pedantic, the
generators are 12 σi .)
Exponentiating θσn we obtain
³ θ2 ´ ³ θ3 ´
eθσn = 1 + + ··· I + θ + + · · · σn
2! 3! (2.16)
= cosh θ I + sinh θ σn .
Hence through (2.10) eθσn generates a Lorentz transformation. To see which transfor-
mation we align the z axis with n. Then eθσn is a diagonal matrix and
µ ¶ µ ¶µ ¶
ct0 + z 0 x0 − iy 0 θσn ct + z x − iy cosh θ + sinh θ 0
=e
x0 + iy 0 ct0 − z 0 x + iy ct − z 0 cosh θ − sinh θ
µ ¶µ ¶
cosh θ + sinh θ 0 (ct + z)(cosh θ + sinh θ) (x − iy)(cosh θ − sinh θ)
=
0 cosh θ − sinh θ (x + iy)(cosh θ + sinh θ) (ct − z)(cosh θ − sinh θ)
µ 2
¶
(ct + z)(cosh θ + sinh θ) (x − iy)
=
(x + iy) (ct − z)(cosh θ − sinh θ)2
(2.18)
20 Chapter 2: Groups & their representations
Thus eθσn generates the boost along n with Lorentz factor γ = cosh 2θ and speed
β = tanh 2θ. We say that i 12 σn is the generator of this Lorentz transformation.
The boosts taken on their own do not form a group because the product of boosts
along two non-parallel axes cannot always be expressed as a boost along a third axis:
in general a rotation is required in addition to a boost.3 As a specific example, consider
the product
A ≡ e−θσy e−φσx eθσy eφσx , (2.20)
which effects a boost along the x axis, followed by one along the y axis, followed by
inverse bosts along the x and then the y axes. For infinitesimal θ, φ we have
B± ≡ e±θσy e±φσx = (I ± θσy + 12 θ2 I + · · ·)(I ± φσx + 21 φ2 I + · · ·)
(2.21)
= [1 + 21 (θ2 + φ2 )]I ± [θσy + φσx ] + θφσy σx + · · ·
Hence
A = B− B+ ' {[I + 21 (θ2 + φ2 )I + θφσy σx ] − [θσy + φσx ]}
× {[I + 12 (θ2 + φ2 )I + θφσy σx ] + [θσy + φσx ]} + · · ·
(2.22)
= [I + 21 (θ2 + φ2 )I + θφσy σx ]2 − [θσy + φσx ]2 + O(θ 3 )
= I + θφ[σy , σx ] + O(θ 3 ) = I − 2iθφσz + O(θ 3 )
Thus this sequence of boosts effects a rotation by angle 4θφ around z. Consequently,
boosts are inextricably intertwined with rotations, and we must consider the form
taken by a general Lorentz transformation, that is, a transformation that combines a
boost with a rotation. The natural object to consider is
M ≡ e(iθn+φm)·σ , (2.23)
which combines a boost along m with a rotation around n. A 2 × 2 complex matrix
is defined by eight real numbers, and when we require the matrix to have unit deter-
minant, we impose two restrictions on these numbers, leaving six degrees of freedom.
Equation (2.23) for M has six parameters, so any matrix with unit determinant should
be of this form. Consequently, the product of two objects of this type will be a third
object of the same type, so these objects provide a representation of the proper Lorentz
group.
Remarkably, (2.23) combines the pseudo-vector n with the polar vector m. If we
transform to axes that are mirror images of our original axes, n won’t change sign, but
m will, and M will change into
M 0 ≡ e(iθn−φm)·σ . (2.24)
3 This phenomenon is the origin of Thomas precession in the theory of spin-orbit coupling.
2.2 Spinor invariants 21
It follows that the objects M 0 must also provide a representation of the proper Lorentz
group. The representations provided by M and M 0 are inequivalent in the sense that
there is no matrix S such that M 0 = SM S −1 for all M .
There are two types of Pauli spinors. A right-handed Pauli spinor η R is trans-
formed by M under a Lorentz transformation, while a left-handed one η L is trans-
formed by M 0 :
with I the 2 × 2 identity matrix. In this way Dirac spinors support a representation
of the full Lorentz group.
† † † (−iθn+φm)·σ † † † (−iθn−φm)·σ
ηR → ηeR = ηR e ; ηL → ηeL = ηL e (2.28)
From equations (2.25) and (2.28) we see that under proper Lorentz transformations
† †
both ηL · ηR and ηR · ηL are invariant. To obtain a quantity that’s still invariant when
inversions are included, we add these two invariants. In terms of the adjoint spinor
† †
ψ ≡ ψ † γ 0 = (ηL , ηR ) (2.29)
22 Chapter 3: Lagrangian Dynamics
our invariant is
† †
ψ · ψ = ηL · ηR + ηR · ηL . (2.30)
We’ll also find it useful to know how to construct a 4-vector from a Dirac spinor.
Equations (2.6a) and (2.11) imply that under rotations η † Iη, η † σx η, η † σy η,and η † σz η
transform like the components of a four vector. How should we generalize these expres-
sions to the case in which right- and left-handed spinors are distinguishable because
boosts occur? A component of a vector should not be invariant, so contrary to what
happens in equation (2.30), the left and right spinors should be of the same handedness.
But both halves of the Dirac spinor must be used. Moreover, under interchange of η L
and ηR the time component should stay the same, while the space components should
† †
change sign. This suggests that the time component is ηR · ηR + ηL · ηL while the
† †
space components are ηR σi ηR − ηL σi ηL . To achieve this result in an elegant notation
we define three new matrices
µ ¶ µ ¶ µ ¶
1 0 −σx 2 0 −σy 3 0 −σz
γ ≡ ; γ ≡ ; γ ≡ . (2.31)
σx 0 σy 0 σz 0
Then bearing in mind the definitions (2.27) and (2.29) of γ 0 and ψ, we have that
† † † †
ψγ 0 ψ = (ηL , ηR )(ηL , ηR ) ψγ i ψ = (ηL , ηR )(−σi ηL , σi ηR ) (2.32)
as required – so ψγ µ ψ is a 4-vector.
Exercise (12):
Show that γ 0 γ i = −γ i γ 0 and that γ i γ j = −γ j γ i . (This anticommutation property
is often written {γ µ , γ ν } = 0.)
The spinor representation of the Lorentz group is fundamental in the sense that
every other representation can be constructed from it. We started by studying an
example of this phenomenon: the components of a second-rank tensor in spinor space
transform like the combinations ct + z, x − iy, etc, of the components of a 4-vector.
From the rule for transforming third-rank tensors on spinor space, we could extract
the spin- 23 representation of the Lorentz group, and so on. This corner of group theory
is taught in quantum-mechanics courses under the heading of ‘addition of angular
momenta’. The total spin angular momentum of two spin-half particles can be zero
(spin-0 representation of the LG) or one (spin-1 rep.). With three spin-half particles
the possible spin angular momenta are 32 , 1 and 0 because a third-rank tensor in spinor
space contains the components of a 4-vector (which comes with a free scalar) as well
as the 4 components of a spin- 23 object.
The spin-n representations of the Lorentz group have a special property: they
are irreducible (or an irrep for short) in the sense that no linear subspace of the
representing space is invariant under the action of the matrices of the representation.
3 Lagrangian Dynamics
3.1 Single charged particle with given e.m. field 23
Let y(t) be a function of the scalar parameter t. Then a functional F [y(t)] is some
rule that
R t2 assigns to eachR tfunction y a single number.
R t2 For example F might be
2
F1 ≡ t1 dt y(t) or F2 ≡ t1 dt y(t)ẏ(t) or FK ≡ t1 dt K(t)y(t), where K(t) is any
given function, or Fab ≡ y(a) − y(b), where a and b are any two given values of
t. The function y(t) may be scalar-, vector- or even tensor-valued. Vector-valued
functions y(t) can be thought of as paths.
Physicists are particularly interested in extremizing functions of the type
Z
F [y(t)] = dt f (y, ẏ), (B1.1)
where f is a known function of two variables. That is, they wish to find the func-
tion y(t) such that F [y(t)] takes a larger/smaller value than all nearby functions.
The calculus of variations shows that the extremizing function is the one that
satisfies the Euler–Lagrange (EL) equation:
d ³ ∂f ´ ∂f
− = 0. (B1.2)
dt ∂ ẏ ∂y
For given f this is an o.d.e. for y(t).
The sharp predictions that are characteristic of classical physics arise because destruc-
tive quantum interference excludes practically every future configuration of a system:
a shell will blast through one spot on the roof of a dugout because it is at this spot
alone that the quantum amplitudes for the shell’s presence interfere constructively.
Even in classical physics the most elegant way to do dynamics is to write down an
expression for the phase of this amplitude for each path by which the system might
travel between initial and final configuration, and find for what path it is stationary
and constructive interference is possible.
This phase times h̄ is called the action S. It is a scalar and is obtained by
integrating along the prospective path the rate of change of phase with proper time, s:
Z
S = dτ s. (3.1)
ẋ, but not on higher derivatives of x(τ ). Similarly, the EL eqn involves differentiation
w.r.t. the general position vector x, so if the eqn of motion is to depend on F and not
its derivatives, s should depend on A but not F. So the invariants to consider are
(i) |ẋ|2 = −c2 , (ii) ẋ · A and (iii) |A|2 . We further require that any gauge-dependent
contribution to S should be path-independent. ẋ · A satisfies this requirement, while
|A|2 does not.
Exercise (13):
Show that the gauge-dependent contribution to S from ẋ · A is path-independent,
while the gauge-dependent contribution from a term proportional to |A| 2 would
not be path-independent.
So the simplest thing to try is
Z
S = dτ (−m0 c2 + q ẋ · A), (3.2)
where we’ve included the rest mass m0 for future convenience and q is some constant.
Unfortunately we cannot apply the EL eqn (Box 1) to (3.2) as it stands because
we want to hold constant the events of arrival and departure, x1 and x2 , rather than
the proper-time elapse between these events. So we have first to eliminate τ from (3.2)
in favour of some parameter λ that always runs over the same range, say, 0 to 1. Using
s ¯ ¯
dτ 1 ¯ dx ¯
= − ¯¯ ¯¯, (3.3)
dλ c dλ
we have Z Ã r !
1
dxµ dxν dxµ
S= dλ −m0 c −ηµν +q Aµ , (3.4)
0 dλ dλ dλ
which is now in a form that to which we can apply the EL eqn. Since
The function L is called the Lagrangian. By (3.7) it is in this case the difference
between the particle’s kinetic and potential energies.
Starting with an action has many advantages:
• Since L is a scalar, transforming to new coordinates is easy;
• It’s easy to ensure that the eqns of motion are Lorentz invariant (or Gallilean
invariant as appropriate) by imposing the desired invariance on L;
• Given the required invariance and the basic form of the desired eqns (second-order,
linear, say) only a few simple expressions are candidates for Lagrangians;
• Certain constants of motion can be readily derived from evident symmetries of L
(Noether’s theorem).
1 ∂2φ ∂2φ
− = 0. (3.9)
c2 ∂t2 ∂x2
This leads one to expect that many (but not all) actions for partial differential equations
are evaluated by integrating a Lagrangian density L over space before performing the
usual integral over time: Z Z
S[φ] = dt dx L(φ, φ̇). (3.11)
Finally, it doesn’t make things significantly more complicated to allow space to be fully
three-dimensional. So x becomes the 3-vector x and (ct, x) becomes the usual 4-vector
x. Since d4 x = cdtd3 x, we write simply
Z
1
S[φ] = d4 x L(φ, ∂µ φ). (3.13)
c
At each t between ti and tf the field’s configuration φ(t, x) is chosen such that
the integral (3.13) through the space-time volume bounded by t = ti and t = tf is
extremized:
varies from ti to tf . This integral vanishes because ψ is zero throughout the domain
integrated over: the variation ψ vanishes on the initial and final hypersurfaces by
hypothesis, and we force it to vanish at spatial ∞ also in order to ensure that the
varied field φ + ψ satisfies the same bdy condition as the unvaried field φ. Thus
Z µ ¶
1 4 ∂L ∂ ∂L
δS = d x − ψ (3.15)
c ∂φ ∂xµ ∂(∂µ φ)
If this is to hold for any ψ(t, x) that vanishes on the initial and final hypersurfaces, we
clearly require that
∂L ∂ ³ ∂L ´
− = 0. (3.16)
∂φ ∂xµ ∂(∂µ φ)
This p.d.e. is the Euler-Lagrange equation for a field. It is the field equation that
follows from the Lagrangian density L.
where the sign of |∂φ|2 has been chosen so that its contributions to L are k.e. − p.e.
and the term with the constant K is the field’s self-energy. Then ∂L/∂φ = −K 2 φ and
∂L/∂(∂µ φ) = − 12 (η µβ ∂β φ + η αµ ∂α φ) = −∂ µ φ, so (3.16) yields
∂2φ
0 = −∂µ ∂ µ φ + K 2 φ = − ∇2 φ + K 2 φ
∂x0 2
1 ∂2φ
= − ∇2 φ + K 2 φ.
c2 ∂t2
Thus the wave equation emerges with K = 0 from the Lagrangian density which is the
simplest possible function of ∂µ φ only. If K 6= 0 waves are evanescent (complex k) if
ω < Kc, just as electromagnetic waves are evanescent in a plasma below the plasma
frequency.
By |∂ψ|2 we mean
1 ∂ψ ∗ ∂ψ
|∂ψ|2 = − 2
+ ∇ψ ∗ · ∇ψ. (3.19)
c ∂t ∂t
28 Chapter 3: Lagrangian Dynamics
∂|ψ|2 ∂ 2
= (u + v 2 ) = 2u,
∂u ∂u (3.20)
∂|ψ|2
= 2v.
∂v
Further
|∂ψ|2 = ∂(u − iv) · ∂(u + iv) = |∂u|2 + |∂v|2 .
So
∂|∂ψ|2 ∂|∂ψ|2
= 2∂ µ u ; = 2∂ µ v. (3.21)
∂(∂µ u) ∂(∂µ v)
Spin-0 particles of mass m0 are excitations of a scalar field that satisfies p̂2 ψ =
−m20 c2 ψ.Substituting Ê = ih̄∂t and p̂i = −ih̄∂i this becomes the Klein-Gordon eqn
m20 c2
∂µ ∂ µ ψ = ψ. (3.23)
h̄2
∂f ∂f
0 = δf = (δu + iδv) + (δu − iδv).
∂ψ ∂ψ ∗
∂f ∂f ) ( 0 = ∂f
0= +
∂ψ ∂ψ ∗ ∂ψ
⇔
∂f ∂f ∂f
0= − 0= .
∂ψ ∂ψ ∗ ∂ψ ∗
Thus we can proceed as though δψ and δψ ∗ were independent, though they are not.
3.6 Maxwell’s equations 29
In §2.2 we saw that from a Dirac spinor we can construct the scalar ψ · ψ and the
vector ψγ µ ψ. In light of our discussion of the Klein–Gordon equation it is natural
to take the potential energy density of a Dirac field to be proportional to ψ · ψ. For
the kinetic term we could choose (∂ µ ψ)(∂µ ψ) but a simpler choice is iψγ µ ∂µ ψ, where
the factor i is inserted for later convenience. Consider therefore the field equation that
follows from
m0 c
L = ψiγ µ ∂µ ψ − ψ · ψ. (3.24)
h̄
A variation of ψ induces a corresponding variation in ψ and thus causes L to change
by ³
µ m0 c ´ ³
µ m0 c ´
δL = δψ iγ ∂µ − ψ + ψ iγ ∂µ − δψ (3.25)
h̄ h̄
Suppose we choose to vary only the first component of ψ, that is we choose δψ = (a +
ib, 0, 0, 0), where a and b are real functions on space-time. Then δψ = (a − ib, 0, 0, 0)γ 0 .
We consider two variations, one with b = 0 and then one with a = 0 and b set equal
to the function a that we used in the first case. From the stationarity of the action it
follows that
Z n ³ m0 c ´ ³ m0 c ´ o
0 = d4 x (a, 0, 0, 0)γ 0 iγ µ ∂µ − ψ + ψ iγ µ ∂µ − (a, 0, 0, 0)
h̄ h̄
Z n ³ ´ ³ o (3.26)
4 0 µ m0 c µ m0 c ´
0 = d x (−a, 0, 0, 0)γ iγ ∂µ − ψ + ψ iγ ∂µ − (a, 0, 0, 0) .
h̄ h̄
Repeating this exercise for each component of ψ, we obtain the Dirac equation
³ m0 c ´
0 = iγ µ ∂µ − ψ. (3.28)
h̄
As in the case of the Klein–Gordon action, the equation we get at the end is the one
we would have obtained if we had (incorrectly) argued that ψ and ψ are independent
variables.
Exercise (14):
Show that when we add equations (3.26) we obtain
m0 c
0 = −∂µ ψiγ µ − ψ
h̄
and show that this is just the adjoint of the Dirac equation. [Hint: (γ 0 γ µ )† =
γ 0 γ µ .]
30 Chapter 3: Lagrangian Dynamics
1
Lvac (A, ∂µ A) = Tr F · F
4µ0
1
=− Fµν F µν (3.29)
4µ0
1
= (E 2 /c2 − B 2 ),
2µ0
where the last equality is from (1.20). (Notice that if we associate E with kinetic
energy (E = −Ȧ/c + · · ·) and B with potential energy, Lvac is of the form k.e. − p.e..)
The field equations associated with the Lagrangian (3.29) density are
∂ ³ ∂Lvac ´
= 0.
∂xβ ∂(∂β Aµ )
Now
∂Fµν ∂ ¡ ¢
= ∂µ Aν − ∂ ν Aµ
∂(∂β Aα ) ∂(∂β Aα ) (3.30)
= δµβ δνα − δνβ δµα ,
so
∂Lvac 1 ∂(Fµν F µν ) 1 ∂(Fµν η µκ η νλ Fκλ )
=− =−
∂(∂β Aα ) 4µ0 ∂(∂β Aα ) 4µ0 ∂(∂β Aα )
1
=− (δµβ δνα − δνβ δµα )F µν
2µ0 (3.31)
1
=− (F βα − F αβ )
2µ0
1 αβ
= F .
µ0
The field equations are therefore
∂F αβ
= 0, (3.32)
∂xβ
that is, 4 of Maxwell’s 8 field eqns for an e.m. field in vacuo.
To get Maxwell’s eqns in the presence of charges we need to add to the action S
obtained by integrating (3.29) over spacetime, the action of particles in a given e.m.
field. For a single charged particle the latter is given by (3.2). What does this suggest
for the action associated with a swarm of particles of charge q, mass m 0 that are
3.7 Noether’s theorem for internal symmetries 31
moving with 4-velocity v(x) and in their rest-frame have number density n(x)? Well,
the form of (3.2) suggests that the part of L which depends on both the e.m. field
and the particles (the ‘interaction term’), is proportional to the dot product of A with
the current density j = qn0 v associated with the particles. So we speculate that the
interaction term is j · A. The current density contributed by a particle of charge q that
moves on the world-line X(τ ), is
Z Z
j(x) = qc dτ Ẋδ(x − X) = qc dX δ(x − X) . (3.33)
Exercise (15):
Check the validity
R of (3.33) by (i) showing that it is dimensionally correct, (ii)
showing that d3 x j = q(dX/dt), i.e., the total current is just q times the Newto-
nian velocity, and (iii) showing similarly that the total charge in any spatial slice
is always q.
Using this result, the contribution to the action from our conjectured term is
Z
1 ¯
Sinteraction = d4 x (j · A)¯x
c
Z
=q d4 x dτ Ẋ · A(x)δ(x − X) (3.34)
Z
=q dτ Ẋ · A(X)
1 ∂Fµν
jµ − =0 (3.36)
µ0 ∂xν
in agreement with (1.48). The other four Maxwell’s eqns don’t come from minimizing
the action but from the fact that F is the 4-curl of A. So they are geometrical rather
than dynamical in nature.
Does Noether’s theorem for the Lagrangians of particle motion extend to Lagrangian
densities for fields? Actually it yields two closely related results: one for internal
32 Chapter 3: Lagrangian Dynamics
symmetries and one for external symmetries, such as Lorentz invariance. We deal with
internal symmetries first.
Often L(A, ∂µ A) is invariant under some transformation of the field A. For
example, in the case of e.m. L is invariant under A → A + ∂Λ where Λ(x) is any scalar
function.4 Whenever there is a point-by-point invariance of this type, we can write
∂L ∂L
0 = δL = · δA + · δ(∂µ A)
∂A ∂(∂µ A)
∂ ³ ∂L ´ ∂L
= µ
· δA + · ∂µ (δA) (3.37)
∂x ∂(∂µ A) ∂(∂µ A)
∂ ³ ∂L ´
= δA · ,
∂xµ ∂(∂µ A)
where the field eqns (3.16) have been used. The final line states that the current
density j µ has vanishing divergence, where
∂L
j µ ≡ δA · . (3.38)
∂∂µ A
R R
The vanishing of ∂ · j implies that the integral J ≡ d3 xµ j µ ≡ dxα dxβ dxγ j µ ²µαβγ
is the same for any two large 3-dimensional spatial slices: Given two such slices we
can extend these into the closed surface bounding a spacetime volume by adding the
3-surface formed by a very large spherical shell as it propagates in time from one spatial
slice to the other [see fig. above (3.14)]. ∂ · j = 0 implies that the flux into this volume
has to equal that out of it, so provided j vanishes on the shell, the flux in through the
earlier spatial slice has to equal the flux out through the later slice. Thus the internal
symmetry of L has generated a conserved flux J.
E.m. charge conservation How does this idea work out in e.m? Setting δA =
∂Λ, we have
¡ ¢ ∂Lvac
j µ = ∂α Λ
∂(∂µ Aα )
(3.39)
1 ¡ ¢ αµ
= ∂α Λ F ,
µ0
where use has been made of (3.31). Equating to zero the divergence of this we find
that
∂2Λ αµ ∂Λ ∂F αµ
0= F +
∂xµ ∂xα ∂xα ∂xµ
∂Λ
= α
∂µ F αµ ,
∂x
where the first term on the right has been eliminated by virtue of F’s antisymmetry.
Since we can arrange for ∂Λ to be any vector at a given point, (3.39) implies that
∂µ F αµ = 0. This is just (3.32), the standard field eqn for e.m. in vacuo.
To obtain a more interesting Noether invariant one has to start from L for the
e.m. field plus a matter field, say ψ.
4 Notice the difference with the least-action principle, which states that 0 = cδS = δ
R 4
d x L for
any variation δ A; for most variations, L changes at each point, it is just its integral which is invariant.
3.7 Noether’s theorem for internal symmetries 33
θ ³ ∗ ∂ψ ∂ψ ∗ ´
j0 = ψ −ψ (3.43)
2ic ∂t ∂t
is proportional to the particle density in the coordinate rest-frame, and because in that
frame dV (s) = d2 xi cdt, the flux of particles in the coordinate rest frame is proportional
to
θc ³ ∗ ∂ψ ∂ψ ∗ ´
ji = ψ − ψ . (3.44)
2i ∂xi ∂xi
By comparison, non-relativistic quantum mechanics yields for Hamiltonian H =
p2 /2m Z Z
d ³ ∂ψ ∗ ´
3 2 3 ∗ ∂ψ
d x |ψ| = d x ψ+ψ
dt ∂t ∂t
Z ³ Hψ ∗ ´
3 ∗ Hψ
= d x ψ+ψ
−ih̄ ih̄
Z ³ ´ (3.45)
h̄ 3 2 ∗ ∗ 2
= d x (∇ ψ )ψ − ψ ∇ ψ
2im
I ³ ´
h̄
= d2 xi (∇i ψ ∗ )ψ − ψ ∗ ∇i ψ .
2im
Hence the Klein-Gordon expression for the particle flux is essentially identical with the
non-relativistic one, but the expressions for the particle density are rather different in
the two cases.
On the other hand, if we simply regard L as a function of x through the fields, we have
∂L ∂ ¡ ν α¢
δL = aα α
= Lδα a . (3.47)
∂x ∂xν
Equating these two expressions for δL we have
∂ ³ ∂L ∂A ´
0= ν
· α
ν
− Lδα aα . (3.48)
∂x ∂(∂ν A) ∂x
Furthermore, a is an arbitrary small vector so its coefficient in (3.48) must vanish.
Thus from the fact that L depends on x only through the fields we can conclude that
the tensor
µ ¶
ν ∂L ∂A ν ∂L
T̂ µ ≡ − · µ
− Lδµ ⇒ T̂ 00 = · Ȧ − L (3.49)
∂(∂ν A) ∂x ∂ Ȧ
1 ∂ 2 (F αν Aµ )
∂ ν ∆ν µ = = 0. (3.52)
µ0 ∂xν ∂xα
But
µ ¶
x0 − x
∇x · = −4πδ(x0 − x) (where δ is the Dirac δ-function) (4.4)
|x0 − x|3
as one may show, on the one hand by evaluating the derivative at x 6= x 0 , and on
the other hand by using the divergence theorem to integrate the left side through a
small sphere centred on x = x0 . Combining equations (4.2), (4.3) and (4.4) we obtain
Poisson’s equation
4πGρ = ∇2 Φ = −∇ · f . (4.5)
Elegant though it is, this equation cannot represent the whole truth about grav-
itational physics since it is not constructed according to the rules of tensor calculus
38 Chapter 4: Newton’s Theory & the Principle of Equivalence
summarized in §2.7; if the right side of equation (5) is to form an n-tuple, it must
form a scalar since it has only one component. On the other hand, since mass is just
a manifestation of energy, we expect the quantity ρ appearing on the left side of equa-
tion (5) to represent energy density, and this we know to form the 00-component of
the 10-tuple T. So we either have to think of some scalar thing to put on the left in
the place of ρ, or we have to augment Φ with a whole bunch of extra potentials, its
companions in some new 10-tuple g, and somehow extend the single equation (4.5) to
a set of ten equations from which the whole set of potentials can be determined.
Consideration of the predicament of a physicist who knows about relativity and
electrostatics but not about magnetism will clarify this point. This person looks at the
electrostatic form of Poisson’s equation
and thinks
“ q isn’t a scalar because of the Lorentz-Fitzgerald contraction (in fact, q is the 0 th
component of the current density j),5 so φ can’t be a scalar either. Seems I’ll have
to augment φ with three other potentials, say Ax , Ay and Az . Then that ∇2 won’t
do either, because it’s no kind of n-tuple. I’ll replace it with the d’Alembertian,
which is a scalar. Then I’ll have
³ 1 ∂2 ´ q ³ 1 ∂2 ´ ji
∇2 − 2 2 φ = − and ∇2 − 2 2 A i = − . ”
c ∂t ²0 c ∂t ²0
By this point our friend would be well on the way to a Nobel prize.
We shall see that the natural generalization of this argument to the case of gravity
yields
³ 1 ∂2 ´
∇2 − 2 2 g = constant × T.
c ∂t
However, Einstein showed that the way forward is not to tinker thus with Newtonian
gravity, but to assign to the gravitational force a unique position as the force generated
by the very dynamics of spacetime itself. The stimulus for this remarkable intellectual
leap was the modern form of Galileo’s famous observation that all bodies fall at the
same speed.
(ii) when several otherwise isolated bodies α = 1, . . . , N interact with one another,
P it is
possible to assign a number mα to each body such that the quantity p ≡ α mα vα
remains constant.
We call mα the inertial mass of body α. When bodies are interacting, and therefore
have changing individual momenta pα ≡ mα vα , it is convenient to imagine that they
are
P acting on one another with a quantity “force”, fα ≡ dpα /dt. By statement (ii),
α fα = 0.
Again according to Newton, the gravitational force between bodies α and β is
xα − x β
fαβ = F ,
|xα − xβ |3
Thus β and γ will fall towards α at the same rate only if Γβ = Γγ . Newton followed
Galileo in thinking that all bodies fall at the same rate, and therefore assumed (with
a suitable choice of G) that Γ = 1 for all particles. But in the 17th century the
experimental basis of this step was not strong.
exactly equals the acceleration due to their instantaneous motion transverse to the
Earth-Sun line, and there is no tendency for the wire to twist. But if Γ is abnormally
large for one of the balls, say that to the South, this ball will start to fall towards the
Sun faster than the other ball, and the rod will start to twist in the direction indicated.
Consequently, the bar (which has a period of about one hour) will oscillate about an
equilibrium position that is skewed with respect to the N-S line.
interactions.6
Extrapolating wildly from these experiments we hypothesize:
Strong Principle of Equivalence: No experiment could distinguish between a
homogeneous gravitational field and an accelerating frame of reference. In particular,
in any frame which falls freely through such a field all the laws of physics are the same
as if no field were present.
Real gravitational fields are never homogeneous, so they can be distinguished from
an accelerating frame of reference. For example, consider a star-warrior who regains
consciousness in a closed cabin some time after being taken prisoner. He reaches for
his watch and knocks it to the floor. Fortunately it falls only slowly, so it continues
to tick. Is he in a (possibly elastic) accelerating spaceship, or is he on an asteroid?
By now fully alert he determines that plumb bobs on either side of the cabin point
towards a spot some ten miles away. He instantly concludes that he is either on an
asteroid or that opposite sides of his cabin are accelerating away from one another.
Moments later he verifies that his bobs have not moved apart. Hence he must be in
the gravitational field of an asteroid.
Exercise (16):
What would he have concluded if he had found that his bobs pointed away from
a spot thirty yards distant?
This example shows that a gravitational field is generally not equivalent to an
accelerating frame of reference. From the Principle of Equivalence we merely conclude
that physics in an accelerating frame of reference must look like physics in a particular
type of gravitational field. However, this observation suggests a strategy for discovering
how things behave in a strong gravitational field: we first work out the equations
governing motion in the absence of a gravitational field (which we understand) when
referred to a non-inertial frame of reference. This is a purely mathematical exercise.
The equations we derive will contain terms associated with pseudo-forces generated by
our accelerating frame of reference. Since there is really no gravitational field present,
these pseudo-force terms will be restricted in form. The plan is to obtain equations for
physics in the presence of a true gravitational field by lifting these restrictions.
where we have defined the observer’s 4-velocity v µ ≡ dxµ /dτ . Since by the chain rule
ν
∂ ∂x0 ∂
= (5.1)
∂xµ ∂xµ ∂x0 ν
we have ν
dψ ∂x0 ∂ψ
= vµ µ .
dτ ∂x ∂x0 ν
If we define the observer’s 4-velocity in the non-inertial primed frame to be
ν
0ν ∂x0 µ
v ≡ v , (5.2)
∂xµ
then we may write
dψ ν ∂ψ
= v0 .
dτ ∂x0 ν
A natural extension of this argument leads us to define the primed components of
any up vector Aµ as given in terms of the un-primed components by
ν
0ν ∂x0 µ
A ≡ A . (5.3)
∂xµ
ν
Note that if the primed frame were inertial, we would have x0 = xν0 + Λν µ xµ (xν0 a
ν
constant 4-vector), so that ∂x0 /∂xµ = Λν µ and the transformation (5.3) would reduce
to a standard Lorentz transformation of an up vector.
If v µ and uµ are two up vectors, all inertial observers will agree on the value of
the scalar
s ≡ ηµν uµ v ν . (5.4)
µ µ
How can we recover this number from the primed components v 0 and u0 ? First we
µ µ
express v µ in terms of v 0 . We use the chain rule to express dx0 as
µ
0µ ∂x0
dx = dxν . (5.5)
∂xν
κ κ
Dividing by dx0 and proceeding to the limit dx0 → 0 at fixed values of all the other
coordinates, we get
µ µ
∂x0 ∂x0 ∂xν
δκµ = = . (5.6)
∂x0 κ ∂xν ∂x0 κ
κ µ
Thus the matrix ∂xν /∂x0 is the inverse of the matrix ∂x0 /∂xν . Premultiplying
equation (2) by this matrix we solve for v µ :
∂xµ 0 ν
vµ = v . (5.7)
∂x0 ν
Using this relation to eliminate the unprimed components from (5.4) we get
³ ∂xµ ∂xν ´ 0 κ 0 λ
s = ηµν 0 κ u v .
∂x ∂x0 λ
Introduction to Tensors in General Relativity 43
If we define
0 ∂xµ ∂xν
gκλ ≡ ηµν , (5.8)
∂x0 κ ∂x0 λ
we have
0 κ λ
s = gκλ u0 v 0 . (5.9)
0 0 0
Like ηκλ the general metric tensor gκλ is symmetric; gκλ = gλκ . However, it is
not necessarily diagonal. It is called the metric tensor because it allows us to calculate
λ
the lengths of vectors such as v 0 .
0
We may use gκλ to lower indices;
λ
vκ0 ≡ gκλ
0
v0 . (5.10)
µν
Let g 0 be the tensor which raises indices. Then in order that the operations of raising
µ
and lowering be mutual inverses we require that for all v 0
λ µ µκ 0 0λ
δλµ v 0 = v 0 = g 0 g κλ v .
µκ 0 µκ
i.e. that g 0 gκλ = δλµ and hence that g 0 is the inverse of g 0 κλ .
Exercise (17):
µκ
Show that this definition of g 0 is equivalent to the definition
κ λ
κλ ∂x0 ∂x0 µν
g0 = η . (5.11)
∂xµ ∂xν
we ensure that the primed observer will be able to calculate the scalar quantities
F µν vµ uν and Gµν v µ uν from primed quantities. The generalization to tensors of arbi-
trary rank is obvious.
Exercise (18):
µ µ
Show that if x0 and x00 are two non-inertial frames, the transformation rules
µ ν
00 µ ∂x00 0 ν 00 ∂x0 0
v = ν v ; v µ = vν (5.13a)
∂x0 ∂x00 µ
µ ν
µν ∂x00 ∂x00 0 κλ
F 00 = F etc (5.13b)
∂x0 κ ∂x0 λ
apply.
µ µ
00 κ ∂x00 ∂xκ ∂x00 ¤
[Hint: divide (5.5) by dx to obtain a relation equivalent to = .
∂xκ ∂x0 ν ∂x0 ν
44 Chapter 5: Tensors in General Relativity
ν
Notice that there is an easy way to figure out whether to multiply by ∂x µ /∂x0
µ
or by ∂x0 /∂xν when transforming an object Gµ... or Gµ... : If the prime are up on the
µ
left, put them up on the right by using ∂x0 /∂xν ; if the unprimes are up on the left
ν
put them on top on the right with ∂xµ /∂x0 . The other kind of index in the equation
will “cancel out” just as in ordinary multiplication of fractions. These rules extend in
the obvious way to down vectors.
0 µ
The metric tensor gµν enables us to calculate the length s of any curve x0 (λ) in
space-time: r¯
Z b µ ν¯
¯ 0 dx0 dx0 ¯
s≡ dλ ¯gµν ¯. (5.14)
a dλ dλ
If the curve is time-like, s is just c times the elapse ∆τ of time on the watch of the
µ
observer whose trajectory x0 (λ) is. If there is an inertial frame in which all the points
on the curve have the same value of x0 , s coincides with the length of the curve as
measured with meter rules etc by an observer who is stationary in that privileged frame.
We shall call s the affine parameter along the curve and use it to characterize points
µ
on the curve; hence we write x0 (s).
In the case A = 0, the first lineR of0 eq (5.17) is exactly what we would get if
we applied the EL equation to dτ gµν (dx0µ /dτ )(dx0ν /dτ ). This fact is worth re-
membering as it often provides the easiest way to calculate the Christoffel symbols,
which are the coefficients of products of velocity components when the derivatives
in (5.17) are worked through. Note, however, that we have no a priori justification
for applying the EL eqn to this integral; the procedure is just an algebraic trick
that is justified by our derivation of (5.17).
Notice the pattern of this important formula: the three terms in (. . .) are just the first
derivative of g with the indices cyclically permuted. The minus assign attaches to the
term which groups the indices in the same way as Γ. Now multiplying equation (5.19)
through by g 0αβ and writing v 0µ ≡ dx0µ /dτ , we can write it
dv 0α q 0α 0µ
= −Γ0α 0µ 0ν
µν v v + F µv . (5.21)
dτ m0
This equation relates the apparent acceleration in our non-inertial frame to the e.m.
force given by the second term on the right, and a pseudo-force given by the first term.
The principle of equivalence suggests that gravitational forces will take the same form
as pseudo-forces. Thus Γ should play the same role for the gravitational field as F
does for the e.m. field. Notice that where the e.m. force is obtained by contracting F
with v, the gravitational force is obtained by contracting Γ with two copies of v: in
quantum mechanics it follows from this that whereas photons are spin-one particles,
gravitons (likely to be detected within 5 years!) are spin-two particles. As is required
by the principle of equivalence, the charge-to-mass ratio for gravity is unity.
Γ is called the Christoffel symbol. From its definition (5.20) it is symmetric
in its first two indices. Γ cannot be a tensor since all its components are zero in an
inertial frame, so if it transformed like a third-rank tensor, all its components would
be zero in any coordinate system. Notice from (5.20) that the relationship between Γ
and g mirrors the relationship between F and A; that is, g is the potential for gravity
in the same way that A is the potential for electromagnetism.
Below we will find it useful to have an expression for Γ in terms of double deriva-
tives of the inertial coordinates with respect to the non-inertial ones. From (5.8) and
(5.18) we have
½ µ ¶ µ ¶ µ ¶¾
∂ ∂xκ ∂xλ ∂ ∂xκ ∂xλ ∂ ∂xκ ∂xλ
2Γ0µν,β = ηκλ + − .
∂x0ν ∂x0β ∂x0µ ∂x0µ ∂x0β ∂x0ν ∂x0β ∂x0µ ∂x0ν
46 Chapter 5: Tensors in General Relativity
When we differentiate these products, the two terms generated by the last product
will each be cancelled by a term generated by one of the first two products. The two
remaining terms will be identical. Thus we have
∂xκ ∂ 2 xλ
Γ0µν,β = ηκλ . (5.22)
∂x0β ∂x0µ ∂x0ν
Raising the last index, we obtain
0α
δ² ∂x ∂x0β ∂xκ ∂ 2 xλ ∂x0α ∂ 2 xλ
Γ0α
µν ≡g 0αβ
Γ0µν,β =η ηκλ 0β 0µ 0ν = . (5.23)
∂xδ ∂x² ∂x ∂x ∂x ∂xλ ∂x0µ ∂x0ν
dx0 ∂ ³ ∂xµ 0 α ´
κ
µ
Ȧ = A
ds ∂x0 κ ∂x0 α
dx0 ³ ∂xµ ∂A0 ´
κ α
∂ 2 xµ 0α
= + A .
ds ∂x0 α ∂x0 κ ∂x0 κ ∂x0 α
ν
Finally, premultiplying by ∂x0 /∂xµ and using (5.23) we get
dx0 ³ ∂A0 ´
ν κ ν
0ν ∂x0 µ 0ν 0α
Ȧ ≡ Ȧ = + Γκα A . (5.25)
∂xµ ds ∂x0 κ
(Notice that Ȧ0ν , the ν th component in the primed system of the vector Ȧ, is defined
by this equation. It must not be confused with the rate of change with s of the ν th
component of A0 . In (5.21), by contrast, dv 0α /dτ is just the rate of change of the
number v 0α .) If we define a new type of derivative, the covariant derivative by
ν
ν 0ν ∂A0 0α
A0 ;κ ≡ ∇κ A ≡ 0
0ν
κ + Γκα A , (5.26)
∂x
5.2 Covariant differentiation 47
The second term in the definition (5.26) of the covariant derivative has the fol-
lowing physical interpretation. For each value of κ, say κ = 1, we have a matrix Γ 0ν
1α .
1
When we multiply this matrix by δx we obtain the Lorentz transformation matrix Λ
which tells us by how much the speed and orientation of the frame used at x differs
from that used at (x0 , x1 + δx1 , x2 , x3 ).7
If A is really the same all along the curve, and only seems to change because
we are using a non-inertial coordinate system, we have Ȧ0ν = 0, and thus that the
ν ν κ
“gradient” ∇κ A0 of A0 either vanishes or is “perpendicular” to the direction dx0 /ds
in which we are moving.
How does ∇ operate on down vectors? Consider
κ
d 0µ 0 dx0 ∂ µ
(A B µ ) = (A0 B 0 µ )
ds ds ∂x0 κ
dx0 h³ ∂A0 ´ 0 ³ 0 µ ´i
κ µ
0 µ ∂B
= B µ+A
ds ∂x0 κ ∂x0 κ
κ
dx0 ¡h ¢ 0 ∂Bµ0 i
0µ 0µ 0α 0 0µ
= ∇κ A Bµ − Γκα A Bµ + A
ds ∂x0 κ
dx0 h¡ i
κ 0
µ¢ µ ∂Bµ 0α 0 µ 0
= ∇κ A0 Bµ0 + A0 − Γ κµ A B α .
ds ∂x0 κ
This suggests that we define
∂Bµ0
∇κ Bµ0 ≡ 0
0α 0
κ − Γκµ Bα (5.28)
∂x
for then we will have ∇κ (A0µ B 0µ ) = B 0µ ∇κ A0µ + A0µ ∇κ B 0µ and ∇ will operate on such
products like any other derivative operator.
µ ν
The same argument applied to quantities like G0µν A0 B 0 leads to the rules
∂G0µν
G0µν;κ ≡ ∇κ G0µν ≡ − Γ0α 0α
κµ Gαν − Γκν Gµα (5.29a)
∂x0 κ
µν
µν µν ∂G0
G0 ;κ ≡ ∇κ G0 ≡ + Γ0µ
κα G
αν
+ Γ0ν
κα G
µα
. (5.29b)
∂x0 κ
Notice that each index requires a Γ-symbol, with a plus or a minus sign according as
the index is up or down.
7 In “gauge field theories” this idea is generalized to define covariant derivatives for objects ψ
that live in spaces other than space-time. In the simplest case ψ lives in the two-dimensional space
of complex numbers, for which the analogue of a Lorentz transformation is multiplication by another
complex number, say iqA1 . The covariant derivative is now Dµ ≡ ∂µ + iqAµ . If ψ is the wavefunction
of a spin-zero particle of charge q, Aµ proves to be the regular e.m. potential.
48 Chapter 5: Tensors in General Relativity
Exercise (20):
Prove that A0µ;ν − A0ν;µ = A0µ ,ν −A0ν ,µ .
Einstein’s idea 49
5.3 Summary
The rules for transforming between non-inertial frames are the same as those for making
regular Lorentz transformation with the substitutions
The Minkowski metric η is replaced by the metric tensor g, which remains symmetric
but is no longer its own inverse; consequently the up-up and down-down forms of g
are in general distinct.
In a non-inertial frame x the partial derivative operator ∂µ ≡ ∂/∂xµ should be
replaced with the covariant derivative operator ∇µ :
∇µ ψ = ∂ µ ψ
Aν;µ ≡ ∇µ Aν = ∂µ Aν + Γνµα Aα ; ∇µ B νλ = ∂µ B νλ + Γνµα B αλ + Γλµα B να
∇µ A ν = ∂ µ A ν − Γ α
µν Aα ; ∇µ Bνλ = ∂µ Bνλ − Γα α
µν Bαλ − Γµν Bνα
³ ∂g ∂gβν ∂gαβ ´
να
Γµαβ = 1 µν
2g + − .
∂xβ ∂xα ∂xν
that couples the inertial and non-inertial systems. But we have found formulae (5.13)
00
and (5.20) which enable us to calculate the values gµν etc of all needful quantities in a
second non-inertial coordinate system without reference back to the inertial system x.
Since we shall no longer need to refer constantly to an inertial system, we now
drop the convention that the unprimed system x is inertial; from here on all systems
are to be assumed to be non-inertial unless explicitly specified as inertial.
The principle of equivalence suggests that a gravitational field will look very much
like a pseudo-force in an accelerating frame of reference. The Christoffel symbol Γ gen-
erates the pseudo-force associated with an accelerating frame, so when a gravitational
field is present Γ will play the role of the Newtonian force f . We have identified the
metric g as the relativistic generalization of the Newtonian potential Φ on the ground
that Γ can be written in terms of derivatives of g just as f = −∇Φ.
In Newton’s theory f and Φ are related to the density ρ of gravitating matter via
Poisson’s equation (4.5). The relativistic generalization of (4.5) should be a second-
order p.d.e. in g, or equivalently, a first-order p.d.e. in Γ. What is this equation?
Since we can make Γ as large as we like simply by choosing a perverse coordinate
system, it is clear that the trick in finding suitable field equations is to find a differential
operator on Γ which differentiates away all the contribution to Γ that is caused by
mere perversity of the coordinate system. The key to finding this operator proves to
be an examination of the geometrical relationships between the lengths of lines and
the magnitudes of angles between lines.
We have seen that the metric tensor enables us to define the length of any curve
in space-time, and in particular to determine through (5.32) which curves x 0 (s) are
straight. Now suppose we draw a straight line in a portion of space in which there is
no gravitational field and then draw a unit circle around some point on this line. Then
no matter what coordinate system we use for the calculations, we shall find that the
length s of the circumference is exactly π = 3.14159 . . . times the length of the circle’s
diameter. How come? By changing the coordinate system we can change g at any given
point to almost any value [see (5.8)]. So how come that when we evaluate the integral
(5.14) over two completely different sets of points, we always get answers in the same
ratio? It must be that g at one point is not independent of g at neighbouring points:
g must satisfy some differential equation. Einstein’s idea, and it was pure magic, was
that it is this differential equation which tells us that there is no gravitational field
present, only a perverse coordinate system. Let us find this differential equation.
There are many geometrical relationships in addition to the one just discussed
which g must furnish if there is no gravitational field present. For example, there
are 180◦ in a triangle. But the key to the equation we are seeking turns out to be
something slightly odd. It is to consider what happens when we slide a vector around
a closed curve while being careful not to rotate the vector. If we do this on a table, the
vector (a pencil, say) will be back in its old configuration at the end of the experiment:
6.1 The curvature tensor 51
In fact, on a sphere of radius r, the angle through which a pencil rotates on being
“parallel-transported” around a curve is equal to the area enclosed by this curve divided
by r 2 .
In this integral both Γµαβ and Aβ are functions of s through x(s). However, if we
consider only infinitesimal loops we may expand each component of Γ and A in power
52 Chapter 6: Gravity, Geometry & the Einstein Field Equations
We define the directed area enclosed by the loop to be the antisymmetric tensor
I
∆S ≡ xν dxα .
να
(6.6)
6.2 Derivation of the Field Equations 53
µ
Rγαν = 0. (6.10)
constant × Tαβ .
This has only two indices, whereas the left of (6.10) has four indices. Hence we must
either use g (which is the only generally available tensor) to add two more indices on
8 This is a lie, as the discussion of 6-tuples in §2.5 shows. However, the argument can be fixed up
by adding the changes ∆A around two non-coplanar paths.
54 Chapter 6: Gravity, Geometry & the Einstein Field Equations
the right, or we must contract away two indices on the left. It is not hard to see that
these two courses are equivalent. We do it the second way.
Which two indices should we contract? Well, from the defining expression (6.8)
one may show that Rµναβ has the following symmetries:
In words; R is symmetric on interchange of the first pair of indices with the second
pair, and antisymmetric under interchange of the indices within each of these pairs.
Thus we get zero if we contract within any pair, and the same answer (to within a
sign) if we contract between pairs. It is conventional to define the Ricci tensor by
µ
Rαβ ≡ Rαµβ . (6.12)
Note:
In terms of Γ, Rαβ is by (6.8)
∂Γµµα ∂Γµαβ
Rαβ = − + Γλαµ Γµβλ − Γµλµ Γλαβ . (6.13a)
∂xβ ∂xµ
Furthermore, by (5.20)
∂gµν
Γµαµ = Γµµα = 12 g µν . (6.13b)
∂xα
While Rαβ has the right number of indices to go on the left of our field equations,
the law we seek is not Rαβ = Tαβ because mass-energy conservation is expressed by
the vanishing of the covariant divergence of T. Hence whatever goes on the left of our
field equations must have zero divergence. Unfortunately, the divergence of R αβ is not
always zero. However, it turns out that (see Appendix B)
Rα β ;β = 12 R;α , (6.14)
R ≡ Rβ β . (6.15)
µ
From (6.14) it follows that a tensor made out of Rναβ which has zero divergence is
G is called the Einstein tensor because the p.d.e.’s which describe the generation of
a gravitational field by matter are
8πG αβ
Gαβ = − T . (6.17)
c4
6.3 The Newtonian Limit 55
T µν = ρ0 v µ v ν . (6.20)
Since the gravitational field is assumed very weak, we can find a nearly inertial
coordinate system. In this system
Consider the equation of motion (5.21) to which this gives rise for a non-relativistic
µ
free particle (a0 = 0). The motion is governed by a gravitational force
f µ = −Γµαβ v α v β , (6.23)
56 Chapter 6: Gravity, Geometry & the Einstein Field Equations
d ³ dxj ´ 2 2 j 21
³ ∂h
j0 ∂h00 ´
γ γ = −γ c Γ00 ' −c 2 2 0 − .
dt dt ∂x ∂xj
If the field is stationary in our chosen coordinate system (and we are free to boost until
it is), then ∂hj0 /∂x0 = 0 and to leading order in v/c
d 2 xj ∂ ¡1 2 ¢
= c h00 . (6.25)
dt 2 ∂xj 2
Φ = − 12 c2 h00 , (6.26)
1 2
R00 = R00 = 12 ∇2 h00 = − ∇ Φ.
c2
6.4 Summary
µ
The curvature tensor Rναβ tells us by how much a vector changes on being parallel
transported around a small circuit. Hence we detect the use of a crazy coordinate
system for flat space-time by seeing if the curvature tensor R = 0. If R 6= 0 there is a
true gravitational field.
µ
The presence of matter at x is signalled by Rαβ (x) ≡ Rαµβ (x) 6= 0.
Formally, there is a far-reaching analogy between g.r. and e.m.:
Parallelism of e.m. and g.r.
Aµ ↔ gµν
Fµν = −(Aµ ,ν −Aν ,µ ) ↔ Γµ,αβ = 12 (gµα ,β +gµβ ,α −gαβ ,µ )
q µ α
fµ = F αv ↔ f µ = −Γµαβ v α v β
m0
F µν ,ν = µ0 j µ ↔ eq. (6.19)
µν νρ ρµ κ κ κ
F ,ρ +F ,µ +F ,ν = 0 ↔ Rλµν;ρ + Rλνρ;µ + Rλρµ;ν =0
(Bianchi identity)
7 Weak-field gravity
The simplest applications of GR are to weak gravitational fields, which are unbiquitous
in the Universe at large as here on Earth.
dτa |Φa |
=1− 2 . (7.2)
dt c
Exercise (21):
Consider a machine which lowers boxes full of excited atoms on ropes down a well,
deexcites the atoms at the bottom, pulls the atoms back up and then reexcites the
atoms with the photons released at the bottom and beamed up to the top. Show
that this machine will violate energy conservation unless the photons’ frequencies
at top and bottom of the well satisfy (7.3).
7.2 Hydrodynamics
∂P
0 = uν uµ ∇µ (ρ + P/c2 ) + (ρ + P/c2 )∇µ (uµ uν ) + g µν , (7.5)
∂xν
where we’ve used (5.30). To recover the familiar equations of hydrodynamics we assume
that c ' u0 À ui and that g µν = η µν + hµν with |hµν | ¿ 1. Then the 00 equation
may be approximated by
∂P
0 = cuµ ∇µ (ρ + P/c2 ) + (ρ + P/c2 )∇µ (cuµ ) − . (7.6)
∂x0
We multiply this equation by ui /c and subtract it from the i equation of the set (7.5)
to find
∂P ui ∂P
0 = (ρ + P/c2 )uµ ∇µ ui + + (7.7)
∂xi c ∂x0
7.3 Harmonic coordinates & Gravitational Waves 59
Let’s see what happens when we use this form of the Ricci tensor in (6.18) when
T µν on the right is for a perfect fluid [eq. (7.4)]. Since uα uα = −c2 we have Tγγ =
3P − ρc2 , so (6.18) reads
16πG £ ¤
2hαβ = − 4
(ρ + P/c2 )uα uβ + 12 (ρc2 − P )ηαβ = 0. (7.15)
c
From the occurrence of 2 on the left it follows that GR predicts the existence of
gravitational waves that propagate at the speed of light. The right side of this
equation provides a source for these waves in the same way that the e.m. current j µ
provides the source for electromagnetic waves through the analogous equation 2A µ =
µ0 j µ .
If the gravitational field is static in the rest frame of the fluid, in this frame, and
using h00 = −2Φ/c2 , the 00 component of equation (7.15) reads
From this equation we see that GR predicts that pressure is a source of the gravitational
field independently of the energy density that’s associated with it. In the early Universe
and inside very massive stars the energy density is dominated by black-body radiation,
for which P = 31 ρc2 . So for this fluid Poisson’s equation reads
∇2 Φ = 8πGρ. (7.17)
The factor 8 on the right implies that black body radiation is twice as powerful as a
source of gravity as rest-mass energy.
Since the beam is fast, its deflection will be small, and we can estimate the net
gravitational impulse delivered to each particle by integrating the gravitational force
7.4 Deflection of Light by Gravity 61
along a straight line. We neglect variations in the particle’s speed parallel to this line,
so z ' vt. Hence after a fly-by to within distance b of the Sun, the particle has a
component of velocity perpendicular to the original line of magnitude
Z Z ∞ Z
1 ∞ G M¯ b dz c2 rs (¯) ∞ dζ
v⊥ ' F⊥ dt = 2 2
= ,
m −∞ 0 r r v bv 0 (1 + ζ 2 )3/2
where Pythagoras’ useful result has been pressed into service. The substitution ζ =
sinh θ enables one to show that the integral equals 1. So the beam is deflected through
the small angle
v⊥ rs c2
α' ' 2 .
v v b
In the limit v → c, this tends to rs /b ' 0.87500 for b = R¯ .
Relativistic treatment A proper calculation will show that our neglect of rela-
tivity has cost us a factor of 2, and Murphy’s law notwithstanding, the true deflection
is larger than our naive estimate predicts. In 1919 general relativity hit the headlines
when its prediction for α was confirmed by measurements made during a solar eclipse.
Since 1979 observations of the gravitational deflection of light have become impor-
tant astronomical tools for determining not only the structure of the Milky Way, but
even the scale and the density of the Universe. In these applications the gravitational
field is always weak (|Φ|/c2 ¿ 1), and for this case we can derive a general formula for
deflection by an arbitrary weak gravitational field.
By imposing the harmonic gauge condition (7.11) the metric of a weak, static
gravitational field can be put into the form (see Problem Set 2)
³ Φ´ ³ Φ´
dτ 2 = 1 + 2 2 dt2 − c−2 1 − 2 2 (dx2 + dy 2 + dz 2 ). (7.18)
c c
Let (dx, dy, dz) be the change in the spatial coordinates of a photon in time dt, then
since dτ has to vanish along a photon’s world line, we have from (7.18) that
µ ¶1/2
1 1 − 2Φ/c2
dt = ds,
c 1 + 2Φ/c2 (7.19)
1³ Φ´
' 1 − 2 2 ds
c c
p
where ds ≡ dx2 + dy 2 + dz 2 is the coordinate distance between two points on the
ray. Since Φ ≤ 0, equation (7.19) states that in our coordinate system light propagates
precisely as it would if there were no gravitational field but space were filled by a
medium of refractive index
Φ
n = 1 − 2 2 ≥ 1. (7.20)
c
Thus looking through a region that is permeated by a gravitational field should be
like looking through a sheet of bobbly glass: the images of background light sources
62 Chapter 7: Weak-field gravity
9 Physically Fermat’s principle applies because when the time elapse is not stationary, neighbouring
paths allow the observer to ‘see’ the source at different times when it is in different phases of oscillation.
These different ‘views’ average to zero.
7.4 Deflection of Light by Gravity 63
The figure shows that the left-hand side of this equation is the angle through which
the projection onto the xy-plane of the ray from source to observer is bent. We define
αx to be this angle and note that equivalent relations hold for the yz-plane. Hence the
angle between the direction of the ray at the source S and at the observer O is given
by
Z O
α=− ∇⊥ n ds, (7.26)
S
where the integral is along the ray’s path and ∇⊥ denotes the derivative perpendicular
to the path. When we substitute from (7.20) for n, we have
Z O
2 4
α= 2 ∇⊥ Φ ds = ∇⊥ Φ2 , (7.27a)
c S c2
where Z O
1
Φ2 ≡ Φ ds. (7.27b)
2 S
On account of the smallness of the deflection angle, the integral over z of Φ in the first
term on the right side of this equation differs insignificantly from twice the quantity
Φ2 that is defined by (7.27b). Moreover, the square bracket in (7.28) represents the
gravitational accelerations that the lensing object generates at source and observer. In
practical applications these can be neglected because source and observer are extremely
remote from the lens. Hence (7.28) yields
which is identical with the two-dimensional Poisson equation. Consequently, the po-
tential Φ2 that governs the deflection through equation (7.27a) is the gravitational
potential that would be generated in a 2-dimensional world by the lens’s projected
density Σ.
An important special case is that in which the matter distribution is effectively
that of a point mass M – that is, the deflecting matter distribution is confined well
inside the impact parameter x⊥ of every ray of interest. Then Φ2 (x⊥ ) = GM ln |x⊥ |
and
4GM
α= 2 . (7.30)
c x⊥
64 Chapter 7: Weak-field gravity
In particular, when light from a star that lies behind the Sun just grazes the Sun’s limb,
x⊥ = R¯ so α = 2rs /R¯ = 1.75 arcsec is exactly twice our non-relativistic estimate
(7.18). In the early 1990s the Hipparcos satellite measured the positions of a few ×10 5
stars at various times of the year. Since the positions were accurate to of order one
milliarcsec, the effects of light deflection by the Sun were apparent to large distances
from the Sun.
The situation when the source, mass and observer all lie on a straight line is
described by the figure: rays that encounter the mass at small impact parameter cross
the source–observer line in front of the observer, while rays that pass the mass at
large impact parameters cross behind her. The ray that passes the mass with impact
parameter rE reaches the observer. Since, in the notation of the figure, θS ' rE /DSL ,
θO ' rE /DL and α = θS + θO , it follows with a little algebra from (7.30) that
r r
4GM DSL DL
rE = .
c2 DSL + DL
r s
4GM DSL
θE ≡ . (7.31)
c2 DL (DSL + DL )
When the source and lies to one side of the observer–deflector line, the Einstein ring
degenerates into one or more arcs. An image of the cluster of galaxies Abell 2218 that
was obtained by the Hubble Space Telescope provides several spectacular examples of
this phenomenon.
7.4 Deflection of Light by Gravity 65
When the deflecting mass distribution is not axisymmetric, several images of the
source may form. The Einstein Cross consists of four images of a background quasar
QSO 2237+0305 that happens to lie almost exactly behind a spiral galaxy at redshift
z = 0.04 .10
In general the time required for photons to pass from the source to the observer
is different for each image of the source. Since the luminosity of a quasar typically
varies on timescales of a year and even less, the differences between the times of flight
to each image can be measured by cross-correlating as functions of time the measured
brightnesses of each image. These time differences enable one to constrain the scale of
the Universe, since they are clearly proportional to the distance to, and therefore the
linear scale of, the deflector. For example a delay of 12 ± 3 d between two images in B
0218+357 yields Hubble constant H0 ' 60 km s−1 Mpc−1 .11
The Einstein radius is a dimensionally important quantity because lensing signif-
icantly modifies the appearance of a source that lies within about r E of the deflector–
observer line, while a source that lies further than rE from this line will be seen very
much as it would be if the deflector were not present. It is conventional to say that a
source is lensed if it lies within rE of a deflector.
10 See Ostensen et al., 1996, Astron. Astrophys, 309, 59.
11 Corbett et al., 1996, in proc. IAU Symp. 173; see also Saha & Williams 2003, Astron. J., 125,
2769.
66 Chapter 8: The Schwarzschild Solution
Consider the case in which both the source and the deflector are stars that lie
within the Milky Way:
This angle is too small to be measured even with the Hubble Space Telescope. But
it is easy to show that the relative motion of the source, deflector and the Sun will
cause the amount by which deflection magnifies the background star to change within
several weeks. So by monitoring the brightnesses of millions of stars lensing events can
be detected even though their constituent images cannot be resolved. This technique
has proved to be a powerful way of detecting faint objects in the Milky Way – the
objects themselves are too faint to be seen, but they are detected by the effect they
have on luminous background stars.12
The effects of gravitational deflection can also be important well outside r E .
Specifically, when light from an extended source such as a galaxy passes to one side of
a large mass concentration, differences in the deflections suffered by rays that come to
the observer from different points on the source will distort the observer’s image of the
source. In particular, the image will tend to be stretched in the direction perpendicular
to the line on the sky that runs from the source to the mass concentration. This effect
is called weak lensing. By measuring the shapes of galaxy images in the vicinity of
a cluster of galaxies, one can constrain the cluster’s gravitational field. 13
7.5 Summary
GR predicts that a gravitational field makes clocks run slow by a factor 1 − |Φ|/c 2 that
manifests itself in gravitational redshifts. The equations of hydrodynamics, recovered
from ∇µ T µν = 0, predict that pressure augments the inertia of matter: in Euler’s
equation ρ is replaced by ρ+P/c2 . The harmonic gauge condition g µν Γα µν = 0 simplifies
the equations by guiding us to sensible “near Cartesian” coordinates. With its help we
see that GR predicts the existence of gravitational waves, and predicts that pressure
is a source of gravity in its own right, so Poisson’s equation has to be modified to
∇2 Φ = 4πG(ρ + P/c2 ). In the case of ultrarelativistic matter such as black-body
radiation, P = 13 ρc2 so the strength of gravity is effectively doubled. Gravity appears
to endow the vacuum with a non-trivial refractive index n = 1 − 2Φ/c 2 > 1 – this
phenomenon is an aspect of the slowing of clocks that are gravitational potential wells.
The distortion of the images of distant objects to which n 6= 1 gives rise now provides
a crucial probe of the Universe.
The way we derive most solutions to Einstein’s equations is at root the same
as that by which we are accustomed to solve other partial differential equations, for
example Maxwell’s equations. If we want to find the electrostatic potential inside
a charged spherical surface, we start by looking for potentials of the special form
Φ(r, θ, φ) = R(r)Θ(θ)eimφ . We are not initially certain that such solutions exist, but
we try the idea out anyway in the knowledge that if there are no such solutions we
shall derive inconsistent conditions on R and Θ and thus discover our mistake, but if
no inconsistencies arise, we shall get a valid solution and it will not matter that we
found it by leaping into the dark.
Proceeding in this spirit towards the metric outside a point mass, we first argue
that we should be able to find coordinates in which the metric is diagonal. To see why
this is so, suppose we are given a metric tensor g for some two-dimensional space. Then
from simple matrix algebra we know that at any point in the space we can find two
mutually perpendicular directions, the eigenvectors u and v of g, such that g would be
a diagonal matrix if our coordinate directions coincided with u and v. Now imagine
marking the directions u, v as small crosses on a grid of points in the space. Since
g is a smoothly varying function of position, the orientation of neighbouring crosses
will be similar. Hence we may draw smooth curves through neighbouring crosses, thus
covering the space with a curvilinear grid. Finally, if we are able to label each curve of
this doubly infinite family of curves with numbers (a, b), these numbers will constitute
a valid coordinate system for the space and g will be diagonal in this coordinate system.
If we start from the metric tensor of a 4-space, the situation is fundamentally the
same as in our two-dimensional example; the only difference is that there are now four
special directions at each point. So it is reasonable to conjecture that we can find
coordinates in which the metric of any simple spacetime is everywhere diagonal.
Furthermore, since the gravitational field we seek to describe is time-independent,
we should be able to choose coordinates in such a way that none of the metric coeffi-
cients depends on time. Also the gravitational field will be spherically symmetric, so
there must be closed 2-surfaces on which the geometry is that of a sphere. If we label
these surfaces with the coordinates (r, t) and indicate position on each surface with the
angle variables (θ, φ), we have
We next fix the meaning of r by determining that the sphere with labels (r, t) should
have area 4πr 2 . This yields
Exercise (22):
By making an appropriate coordinate transformation x0 (x) show that when, as
here, one uses t rather than ct for the 0th coordinate, the 4-vector of a photon
becomes k µ = (ω/c2 , k).
We next calculate the Christoffel symbols. We could proceed directly from (5.20),
but when one wants to calculate large numbers of Christoffel symbols it is generally
more cost-effective to use the procedure described in Box 2. We apply the EL eqn to
the Lagrangian
³ dt ´2 ³ dr ´2 ·³ ´ ³ dφ ´2 ¸
2 2 dθ 2 2
L ≡ −c D +B +r + sin θ (8.4)
dτ dτ dτ dτ
finding
d ³ dt ´
0= D
dτ dτ ·³ ´
d ³ dr ´ 1 2 0 ³ dt ´2 1 0 ³ dr ´2 dθ 2 ³ dφ ´2 ¸
2
0= B + 2c D − 2B −r + sin θ
dτ dτ dτ dτ dτ dτ
³ ´ ³ ´ (8.5)
d 2 dθ dφ 2
0= r − r2 sin θ cos θ
dτ dτ dτ
d ³ 2 2 dφ ´
0= r sin θ .
dτ dτ
After differentiating the products in these equations we can read off the Christoffel
symbols by comparing the resulting equations of motion with (5.32):
D0
Γttr = Γtrt =
2D
B0 r r sin2 θ c2 D 0
Γrrr = Γrθθ = − Γrφφ = − Γrtt =
2B B B 2B (8.6)
1
Γθφφ = − sin θ cos θ Γθθr = Γθrθ =
r
1
Γφφr = Γφrφ = Γφφθ = Γφθφ = cot θ.
r
Hence
B0 2 D0
Γµrµ
= + + , Γµθµ = cot θ, Γµφµ = 0, Γµtµ = 0. (8.7)
2B r 2D
By hard slog and (6.13) one can now obtain
c2 D00 c2 D 0 ³ B 0 D 0 ´ c2 D 0
Rtt = − + + − (8.8a)
2B 4B B D rB
00 0 ³ 0 0´ 0
D D B D B
Rrr = − + − (8.8b)
2D 4D B D rB
r ³ B0 D0 ´ 1
Rθθ = −1 + − + + (8.8c)
2B B D B
2
Rφφ = sin θRθθ (8.8d)
Rµν = 0 µ 6= ν. (8.8e)
8.1 Constants of Motion 69
B0 D0
=− ⇒ BD = constant. (8.9)
B D
rs
D =1− . (8.13)
r
It is clear that a possible solution to the θ-equation of the set (8.5) is θ = π2 ; that
is, a particle can move always in the equatorial plane of the coordinate system. We
shall assume that our coordinate system has been oriented to ensure θ = π2 . The t
equation of the set (8.5) implies that dt/dτ = constant/D. In special relativity, dt/dτ
is constant and we call this constant γ. So let’s call the constant of integration that
arises here γ too. Then we have
dt γ
= . (8.15)
dτ D
70 Chapter 8: The Schwarzschild Solution
Similarly, the φ equation of the set (8.5) implies that r 2 (dφ/dτ ) = constant. Calling
this constant γL, we obtain the statement of angular-momentum conservation
dφ
r2 = γL. (8.16)
dτ
The physical interpretation of γ is clarified by going back to the definition
−c2 dτ 2 = −c2 Ddt2 + D−1 dr2 + r2 dφ2 of proper time: dividing both sides by dτ 2
and using equations (8.15) and (8.16) we obtain
2 c2 γ 2 1 ³ dr ´2 γ 2 L2
−c = − + + 2 . (8.17)
D D dτ r
Rearranging, we have
c2 c2 1 ³ dr ´2 L2
= − 3 − 2. (8.18)
γ2 D D dt r
Expanding the r.h.s. in powers of rs /r and then using the binomial theorem to take a
square root, we find
µ ³ ´2 h i L2 ¶
−2 c2 rs 1 dr rs
γ =1+c − +2 1 + 3 + ··· + 2 + ··· . (8.19)
2r dt r 2r
Since −c2 rs /2r is the Newtonian potential energy −GM/r, it is clear that γc 2 is just
the energy per unit mass of the orbiting particle, as we might have anticipated by
analogy with the special-relativistic case.
π
With θ = 2 the r-equation of motion is
2 0³
d2 r 1c D dt ´2 1 B 0 ³ dr ´2 r ³ dφ ´2
0= +2 +2 − .
dτ 2 B dτ B dτ B dτ
With (8.10), (8.18) and (8.19) this becomes
0³
d2 r 1 2 2D
0
Dγ 2 L2 1D dr ´2
0= + 2 γ c − − 2 D dτ . (8.20)
dτ 2 D r3
We shall see that in Newton’s theory slightly modified forms of the first, second and
third terms occur. The last represents a new, speed dependent force.
Exercise (23):
From (8.16) and (8.18) show that L2 = r3 c2 D0 /(2D2 ) and hence that the angular
frequency of a circular orbit as seen by an observer at infinity is
r
dφ GM
=
dt r3
exactly as in Newton’s theory.
Exercise (24):
2 dr
Multiply (8.20) by and integrate the result to rederive the energy equation
D dτ
(8.18).
8.2 The Perihelion of Mercury 71
In the late 19th century Bessel showed that disturbance of Mercury’s orbit by all the
planets gives rise to a net precession of 53200 per century. Thus Bessel was able to
account for all but 4400 per century of Mercury’s precession. Since Mercury’s year is
0.24 siderial years long, 4400 per century corresponds to 0.10600 per Mercury year.
Relativistic precession Working from (8.20) in close analogy with the our New-
tonian calculation, we eliminate τ between (8.16) and (8.20) to obtain
γL d ³ γL dr ´ 1 2 2 D0 Dγ 2 L2 0 2 2³
1D γ L dr ´2
0= + 2 γ c − − 2 D r4 .
r2 dφ r2 dφ D r3 dφ
The Newtonian equivalent of (8.24) is equation (8.22). Clearly the former is much
harder to solve than (8.22): on the left the coefficient of u had changed from 1 to
(1 − rs u) and a term proportional to (du/dφ)2 has appeared, while on the right L2
has been replaced by L2 (1 − rs u). But it is immediately apparent that solutions to
(8.24) are unlikely to be periodic with period 2π and thus we do not expect relativistic
orbits around a point mass to be closed. Let us calculated the angle between successive
perihelia and compare it with Bessel’s discrepancy of 0.10600 .
We first obtain the “energy equation” associated with (8.24) by multiplying
2 du
through by and integrating:
(1 − rs u) dφ
1 ³ du ´2 c2
+ u2 = 2 + K, (8.25)
(1 − rs u) dφ L (1 − rs u)
where u1 and u2 are the smallest and largest values of u along the orbit. The denom-
inator in (8.26) involves a cubic in u. Two roots of the cubic are u 1 and u2 , so if the
third root is u3 the cubic may be written
so
1 1
u3 = − (u1 + u2 ) ' and H = 1 − rs (u1 + u2 ). (8.28)
rs rs
Thus u3 À max(u1 , u2 ) and with equations (8.27) and (8.28) we can rewrite equation
(8.26) as
Z u2 ³ ´
1 du u
∆φ = √ p 1 + 12 + ···
H u1 (u − u1 )(u2 − u) u3
Z u2
du (8.29)
' [1 + 12 rs (u1 + u2 )] p (1 + 12 urs )
u1 (u − u1 )(u2 − u)
' π[1 + 32 rs 21 (u2 + u1 )].
For Mercury 21 (u1 + u2 ) ' 1/rMerc = 1/(5.83 × 107 km), so the perihelion of Mercury
should advance in one Mercury year by
rs
3π ' 0.098300
rMerc
in excellent agreement with Bessel’s discrepancy.
In 1975 Hulse & Taylor discovered a pulsar, PSR 1913+16, that proved to be
one component of a tight and eccentric binary: the binary period is 7 34 h and the
eccentricity is e = 0.617. The periastron of this orbit has been shown to precess by
4.22◦ yr−1 . Both components have mass close to M = 1.42 M¯ and are presumably
neutron stars, although pulses are detected from only one of them. Thus PSR 1913+16
is a system in which general relativity is of prime importance rather than a marginal
correction.
Exercise (25):
Show that the semi-major axis of the orbit of PSR 1913+16 is a = 1.9 × 10 9 m,
about three times the radius of the Sun, and that the each neutron star moves
with a speed of order 220 km s−1 .
Bouncing signals within the solar system The earliest work involved bounc-
ing radar signals off the inner planets. One measures the delay before the first signals
return. This gives ∆τ
(ii) The most interesting lines of sight pass close to the Sun. Free electrons near the
Sun cause the refractive index to differ from unity.
Later experiments concentrated on timing signals sent to artificial satellites. Since
a satellite is too small to give a detectable radar reflection, one programmes the satellite
to respond to a pulse from Earth by emitting a similar pulse after a known small delay.
With this technique one does not have to worry about planetary topography. By
sending signals at several frequencies one can eliminate the effect of dispersion by free
electrons along the line of sight.
Analysis of these data has to proceed via a computer program which adjusts
orbital elements, the masses of the planets and asteroids, the oblateness of the Sun,
the orientation of an inertial coordinate system, etc., until the fit of the predicted ∆τ ’s
to the observed ∆τ ’s is optimized. One finds that the agreement with g.r. is excellent.
8.3 Tests based on Planetary and Pulsar Dynamics 75
The quality of the fit is normally judged by calculating predictions from the met-
ric15
h
rs ³ r ´2 i ³ rs ´ 2
s
2 1
ds = − 1 − α + 2 β 2 2
c dt + 1 + γ [dρ + ρ2 (dθ2 + sin2 θdφ2 )], (8.30)
ρ ρ ρ
this metric agrees with the Schwarzschild metric (8.14) up to order rs /r in space and
(rs /r)2 in time when α = β = γ = 1. (In the equations of motion the tt-component
of gµν is multiplied by the largest components of v µ .) Hence if Einstein was right, the
observations should lead to α ' 1 etc. Data from missions to Mercury & Mars give
companion and objects in the solar system all make non-negligible contributions
to the measured delays;
(iv) evolution of the pulsar orbit that is driven by the radiation of gravitational waves.
The evolution of the orbit is predicted by calculating the energy and angular mo-
mentum that the waves should carry away in a given time, and then adjusting the
orbit to ensure global conservation of E and L. Different variants of g.r. predict differ-
ent rates of E and L loss. Only Einstein’s original (and simplest) theory successfully
predicts the observed evolution of the period Ṗ = −2.4 × 10−12 .
which is perfectly finite. Furthermore, he clearly reaches r = rs with dr/dτ < 0. Hence
he would be well advised to fire his rockets before he reaches rs .
Why does grr diverge at r = rs ? Is this divergence caused by gravity or our
choice of coordinates? It is straightforward, if tedious, to check that no components of
the curvature tensor Rµ ναβ diverge at rs . So our explorer can endure the tidal forces
he experiences if he is stocky enough. The reason grr diverges at rs turns out to be
that Schwarzschild’s coordinate system assigns to all events that occur at r s the time
coordinate t = ∞. As a specific example, let us calculate the time coordinate at which
our explorer crosses r = rs :
Z τ Z rs
dt dt dτ
t= dτ = dr.
0 dτ r0 dτ dr
Thus no matter when our explorer sets off, an observer who uses Schwarzschild’s co-
ordinates always assigns t = ∞ to the event at which the explorer crosses r = r s . We
should not be surprised that such a foolish convention leads to a singular metric; if
we choose coordinates qi in ordinary space in such a way that all points on the edge
of a ruler are assigned the same three numbers qi , an expression for the length of the
ruler in terms of the coordinates of the ruler’s ends is going to involve multiplication
by some awfully big numbers!
To bring this problem under control we need to choose a new coordinate system.
In 1960 M. Kruskal showed that when new coordinates (r 0 , t0 ) are defined through
³ ´
02 02 2 r
r − t = rs − 1 er/rs
rs
³ ct ´ (8.34a)
0 0 cosh(ct/r s) − 1 0
t =r = r tanh
sinh(ct/rs ) 2rs
the metric takes the non-singular form
2 2 rs −r/rs
ds2 = r2 (dθ2 + sin2 θdφ2 ) + 4(dr 0 − dt0 ) e . (8.34b)
r
The lines r 0 = constant are always timelike. Radially directed photons move along the
45◦ lines dr 0 = ±dt0 in the (r 0 , t0 ) plane. In particular, the null line r = rs becomes
r0 = t0 . If we plot curves of constant r and t in the (r 0 , t0 ) plane, we get a picture like
this
Exercise (26):
Show that a cubic light-year of water (supposed incompressible) would be con-
tained within its Schwarzschild radius.
8.5 Summary
The metric outside a point mass can be written to look like that of ordinary spherical
polar coordinates with 1 → (1 − rs /r) in the tt slot and 1 → 1/(1 − rs /r) in the rr slot.
The singularity of these correction factors when r = rs = 2GM/c2 is not physically
interesting. However the geometry of spacetime is singular at r = 0 and r = r s is
special in that an “outward” running photon on this sphere would actually not move
away from the centre.
The Schwarzschild metric accounts for the last 10% of the precession of Mercury’s
perihelion and for the measured bending of light by the Sun. The magnitude of both
these effects is of order n × rs /r, where n ∼ 4 and r is the smallest distance of the test
body from the Sun. Detailed studies of the Solar System’s dynamics show that any
errors in the g.r.’s corrections to Newtonian dynamics are smaller that ∼ 0.1%.
9 Cosmology
As the Universe expands, the photons of the cosmic background are doppler shifted
to lower frequencies and the temperature characterizing their distribution falls.
(i) Flat space Obviously this admits spherical polar coordinates in which the
line element can be written
ds2 = dr 2 + r2 (dθ2 + sin2 θdφ2 ). (9.2)
80 Chapter 9: Cosmology
Notice that the 2-sphere with area 4πr 2 has radius aψ > r. Thus within the 3-sphere
the areas of the members of a nested sequence of 2-spheres increase more slowly than
they would in Euclidean space. (Similarly, for concentric small circles on a two sphere
circumference/2π increases more slowly than radius.)
9.2 Friedmann Metrics 81
(iii) Hyperbolic space If we set K = 0, the line element (9.6) of the 3-sphere
becomes the line-element (9.2) of flat Euclidean space. The line element of the only
other homogeneous, isotropic 3-space is given by (9.6) with K set equal to a negative
number. This space is called hyperbolic space, and is harder to visualize than the
3-sphere. The characteristic property of hyperbolic space is that in it a 2-sphere with
area 4πr 2 has radius
Z r ³ p ´
dr 1
R= p =p sinh−1 r |K| < r.
0 1 + |K|r 2 |K|
That is, in this space the areas of a sequence of nested 2-spheres increase faster than
in Euclidean space.
In summary, a spatial section of simultaneous events must form either a 3-sphere,
flat space or hyperbolic space. In each case the line element may be expressed in the
form (9.6) with an appropriate value of K.
We want to use coordinates on these spatial sections such that the coordinates of
each fundamental observer are constant. These are called comoving coordinates.
Since fundamental observers are receding from one another, it follows that our desired
coordinates cannot at all times coincide with those in which the line element takes the
form (9.6). However, if at one time, for example now, the comoving coordinates (r, θ, φ)
are such that the line element is of this form, then at an earlier time, when fundamental
observers were closer to one another, the separation δs between neighbouring observers
was some fraction a(t) of their current separation. Hence at all times the metric of
spacetime can be written
· ¸
2 2 2 2 dr2 2 2 2 2
ds = −c dt + a + r (dθ + sin θdφ ) , (9.7)
1 − Kr 2
where a dot denotes d/dτ . Using the convention that a prime denotes d/dt the equa-
tions of motion are
· ³ ´¸
d 2 0 ṙ2 2 2 2 2
0= (−c ṫ) − aa + r θ̇ + sin θφ̇
dτ 1 − Kr 2
µ ¶ · ³ ´¸
d a2 ṙ 2 ṙ2 Kr 2 2 2
0= −a + r θ̇ + sin θφ̇
dτ 1 − Kr 2 (1 − Kr 2 )2 (9.8)
d 2 2
0= (a r θ̇) − a2 r2 sin θ cos θ φ̇2
dτ
d ³ 2 2 2 ´
0= a r sin θφ̇ .
dτ
82 Chapter 9: Cosmology
From the first equation we read off the non-vanishing Γs with top index t:
2a0 Kr
0 = r̈ + ṫṙ + ṙ2 − r(1 − Kr 2 )(θ̇2 + sin2 θφ̇2 )
a 1 − Kr 2
from which we read off the non-vanishing Γs with top index r:
a0 Kr
Γrtr = ; Γrrr = ; Γrθθ = −r(1 − Kr 2 ) ; Γrφφ = −r(1 − Kr 2 ) sin2 θ (9.9b)
a 1 − Kr 2
The angular equations of motion are
2a0 2 2a0 2
0 = θ̈ + ṫθ̇ + ṙ θ̇ − sin θ cos θ φ̇2 ; 0 = φ̈ + ṫφ̇ + ṙ φ̇ + 2 cot θ θ̇φ̇
a r a r
so the remaining non-vanishing Γs are
a0 1
Γθtθ = ; Γθrθ = ; Γθφφ = − sin θ cos θ
a r (9.9c)
a0 1
Γφtφ = ; Γφrφ = ; Γφθφ = cot θ.
a r
If we elevate our status to that of a fundamental observer, and suppose that the atoms
that emit the radiation we receive were stationary with respect to a local fundamental
observer, then k 0 = ωemit /c2 on emission of a photon and k 0 = ωobs /c2 on its observa-
tion.17 The definition (5.14) of the affine parameter s fails when applied to a trajectory
xµ (λ) of a photon. Instead we define s by requiring that
dxµ
= k µ (s), (9.10)
ds
17 See Exercise (22).
9.4 Field Equations for Friedmann Cosmologies 83
where k µ is the wavevector (ω/c2 , k) of the photon. The equation of motion of the
photon is 0 = k µ ∇µ k ν . Multiplying this equation through by ds/dt, we find for the
time component of the resulting equation
· ¸
ds dxµ ∂ω/c2 t γ
0= + Γµγ k
dt ds ∂xµ
(9.11)
dω/c2 ds
= + Γtµγ k µ k γ .
dt dt
We evaluate this for a radially propagating photon. Henceforth using the convention
that ȧ = da/dt, (9.9a) states that Γtrr = ȧgrr /(ac2 ) while (9.10) gives ds/dt = 1/k 0 =
c2 /ω, so (9.11) yields
dω ȧ ¡ ¢ c2 ȧ
= − grr k r k r = − ω,
dt a ω a
where we have used the null property of k µ in the form grr k r k r + gtt (ω/c2 )2 = 0.
Integrating we get
ωemit a(tobs )
1+z = = .
ωobs a(temit )
In words, 1 + z gives the factor by which the Universe has expanded since the photons
we receive were emitted. Notice that this result has been obtained without using
Einstein’s equations to determine the dynamics of the Universe.
When using equations (9.9) in (6.13) to calculate Rαβ , it is helpful to isolate all terms
that involve a t index. One finds
∂Γµtµ ä
Rit = Rti = 0 Rtt = + Γjtk Γktj = 3
∂t a
∂Γtij
eij −
Rij = R + 2Γtik Γkjt − Γtij Γktk
∂t
· µ ¶2 ¸
eij − ä ȧ gij
=R +2 ,
a a c2
e ∝ g. Hence
Since the 3-space is homogeneous and isotropic, it is obvious that R
e err . A tedious
it is only necessary to calculate one non-zero component of R, say R
calculation yields
eij = − 2K gij .
R (9.12)
a2
84 Chapter 9: Cosmology
Hence 3ä
a
−f (t)grr /c2
Rαβ = , (9.13a)
−f (t)gθθ /c2
−f (t)gφφ /c2
where µ ¶2
2Kc2 ä ȧ
f (t) ≡ + + 2 . (9.13b)
a2 a a
We now turn our attention to the right side of the Einstein equations (6.18). We
take T to be the energy-momentum tensor (7.4) of a fluid that is at rest in the frame
of the local fundamental observer. With T of the form (7.4), Tαα = 3P − ρc2 . With
our (t, r, θ, φ) coordinates, uα = (1, 0, 0, 0), uα = (−c2 , 0, 0, 0), and the tt-equation of
the set (6.18) reads
3ä 8πG
= − 2 ( 32 P + 21 ρc2 ). (9.14a)
a c
The rr-equation reads
· µ ¶2 ¸
2Kc2 ä ȧ grr 8πG
− 2
+ +2 2
= − 4 12 (ρc2 − P )grr . (9.14b)
a a a c c
Eliminating ä between these equations yields the cosmic energy equation
ȧ2 + Kc2 = 83 πGρa2 . (9.15)
We also have the equation of mass-energy conservation
µ ¶
αβ α β αβ uα uβ
0 = ∇ β T = u u ∇β ρ + g + 2 ∇β P + (ρ + P/c2 )∇β (uα uβ ), (9.16)
c
where we’ve used (5.30). Now
β
∇β (uα uβ ) = ∂β (uα uβ ) + Γα γ β α γ
γβ u u + Γγβ u u .
With α = t this yields ∇β (ut uβ ) = Γββt = 3ȧ/a. For α = t the first term in (9.15) is ρ̇
and the second vanishes, so we find
Rest-mass energy The random motions of galaxies with respect to the cosmic
−1
background radiation are < ∼ 1000 km s ¿ c, as are the random motions of particles
within galaxies. So in the frame of the local Fundamental Observer, the energy of
such matter is dominated by its rest mass and we may adopt for T the formula (6.20)
for dust, or equivalently (7.4) for a perfect fluid with P = 0. Numerically, ρ dust >
∼
10−27 kg m−3 = 5.6 × 108 eV m−3 . Given that P = 0, equation (9.17) implies ρdust ∼
1/a3 as we would expect naively.
9.4 Field Equations for Friedmann Cosmologies 85
Relativistic matter At early times the Universe was so hot that its constituent
particles had thermal velocities near c. Moreover, even at the present time photons of
the cosmic background radiation form such a relativistic gas. We know from thermo-
dynamics that in its rest frame the pressure of a photon gas is one third of its energy
density. Hence, in the frame of a Fundamental Observer the energy-momentum tensor
of such a relativistic gas is given by (7.4) with ρ = 3P = ρrad . Eliminating P from
(9.17) we find that ρrad ∼ 1/a4 .
Exercise (27):
Recover ρrad ∼ 1/a4 by considering the adiabatic expension of a gas with ratio of
principal specific heats γ = 43 .
At the present epoch the energy density contributed by the cosmic background is
as (2.7)4 ' 1.9 × 105 eV m−3 , which is significantly smaller than the rest-mass energy
density of dust. However, since ρrad /ρdust ∼ 1/a, for a < −4
∼ 10 radiation will have been
dominant.
The false vacuum’s mass increases by ρvac dV , so its energy increases by ρvac c2 dV . The
latter increase must equal the work done on the piston, −P dV . Thus the pressure of
the false vacuum is P = −ρvac c2 .
Note:
The constant λ has units of energy density. In 1917, from a desire to construct a
static universe, Einstein replaced Gµν in the field equations by Gµν − Λgµν . He
called Λ, which has units of length−2 the cosmological constant. It is easy to
see that Λ = 8πGλ/c4 .
We now return to equation (9.15) and replace ρ(t) by ρ(t0 ) times a(t)−n , where
a(t0 ) = 1 and n = 3, 4, 0 for the cases of dust, radiation and vacuum energy, respec-
tively. We find
8πG ρ(t0 ) − Kc2
(dust)
3a
8πG
ȧ2 = ρ(t0 ) − Kc2 (radiation) (9.19)
3a 2
8πG a2 ρ(t ) − Kc2 (vacuum).
0
3
Currently the Universe is expanding, so ȧ > 0. Equation (9.19) states that, if it
is matter dominated, it will expand for ever if K ≤ 0. But if K > 0 (the case in which
spatial sections are 3-spheres), the expansion will cease when
8πGρ(t0 ) 1 ρ(t0 )
a= 2
= 10 2
× −27 .
3c K (7.5 × 10 light yr) K 10 kg m−3
Thus our longevity hangs ultimately on how the radius of curvature of the Universe
compares with some tens of billions of light years.
Exercise (28):
Integrate (9.19) in the case of dust to show
p ½ p
c |K| θ − 12 sin 2θ when K > 0 [θ ≡ arcsin( a/am )]
t(a) = 1 p
am 2 sinh 2θ − θ when K < 0 [θ ≡ arcsinh( a/am )]
The best observational evidence suggests that the actual density of matter is a factor
of several lower than this: unless vacuum energy is significant, the future is more likely
to be boring than otherwise. Note that if ρ ≤ ρcrit , the Universe is spatially infinite
and contains infinite mass, while if ρ > ρcrit the total mass is finite.
Exercises (29):
(i) Show for a dust-dominated universe with K = 0 that a = (t/t0 )2/3 . Hence
estimate the age of the Universe if ρ(t0 ) = ρcrit (t0 ).
p
(ii) Show for a radiation-dominated universe with K = 0 that a = t/t0 .
(iii) Show that in Newton’s theory the radial coordinate a(t) of a particle embedded in
a homogeneous spherical cloud of mutually gravitating particles which are initially
receding from the origin with speeds proportional to radius, obeys (9.15). Identify
the analogue of K in this case.
9.5 Inflation
Grand unified theories of the strong, weak and electromagnetic force suggest that the
time constant associated with this exponential growth is ≈ 10−34 s.
Exercise (30):
Let the present age of the Universe be tH and the distance over the current time-
slice t = tH to the most distant fundamental observer it is in principle possible to
see be DH . Show that if the Universe had inflated from t = 0 to the present day
we would have DH = ctH , while we would have DH = 2ctH if the Universe had
been always flat and radiation-dominated. The furthest fundamental observer we
can see is said to be on the particle horizon. [Hint: use 0 = grr dr2 + gtt dt2 .]
Guth’s inflationary conjecture has two very seductive properties:
ρ(t) Kc2
=1+ 2 . (9.23)
ρcrit (t) ȧ
you are told to specify its phase 0 ≤ arg(ψ) ≤ 2π throughout space. You decide to
set arg[ψ(x)] = φ(x), where φ is the usual cylindrical-polar coordinate of the point
x. This assignment works fine everywhere except at your coordinate origin, r = 0.
Here ∇ arg(ψ) diverges since any phase can be reached arbitrarily close to r = 0. It
is not hard to persuade oneself that by adjusting the values of ψ in any finite volume
you can move but not eliminate this singularity, which is associated with a line of
energy-momentum. This is a cosmic string.
What does the energy momentum tensor T look like in the narrow tube around
r = 0 in which T 6= 0? We’d expect T to be Lorentz invariant with respect to boosts
parallel to the string’s line. So in the (t, z) plane T has to be proportional to the
Minkowski metric. Also it’s hard to see how the string could be carrying anything in
the x or y directions. So
−c2 0 0 0
0 0 0 0
Tµν = −ρc2 , (9.24)
0 0 0 0
0 0 0 1
where ρ is a constant.
Now consider the line element
where r0 is a constant. This is almost the line element ds2 = −c2 dt2 +dr 2 +r2 dφ2 +dz 2
of flat spacetime in cylindrical polars; r0 θ is a kind of radial variable. The only non-zero
Christoffel symbols generated by (9.25) are
Hence with ρ > 0 (which corresponds to a positive energy density and tension in the
string) the metric (9.25) solves Einstein’s equations inside the string.
90 Chapter 9: Cosmology
What we really need is the metric outside the string, where we live. Let the outer
surface of the string be θ = θm . Then the exterior metric is
³ cos2 θ ´
2 2 2 2 2
ds = −c dt + r0 2 2
dθ + sin θdφ + dz 2 . (9.27)
cos2 θm
This metric obviously joins smoothly to the interior metric (9.25) on θ = θ m . To show
that it is a vacuum solution of Einstein’s equations, we transform to a new coordinate
set (t, r 0 , φ0 , z), where the t and z coordinates are the old ones and
sin θ
r 0 ≡ r0 ; φ0 ≡ cos(θm )φ. (9.28)
cos θm
The metric (9.27) now becomes
ds2 = −c2 dt2 + dr 02 + r02 dφ02 + dz 2 , (9.29)
which is just the cylindrical-polar metric of flat spacetime. But on a large scale the
spacetime outside the string is very odd because the range of φ0 is (0, 2π cos θm ). [This
follows from (9.28) and the fact that φ is in (0, 2π)]. Consider for example a large circle
r0 = a À r0 . The radius of this circle is
Z a
√
R= gr0 r0 dr0 ' a, (9.30a)
0
So the usual flat-space relation C = 2πR does not apply. Thinking about a cone may
help to clarify this strange state of affairs. At each point a cone is flat in the sense
that it can be made out of a piece of paper without stretching the paper (you can’t
make a paper sphere as easily), but circles distance a from the cone’s apex have a
circumference smaller than 2πa.
How could we detect a cosmic string? Our best bet is to look for lines of grav-
itationally lensed objects. To understand how a string lenses an object, think of the
exterior space as a piece of paper with a wedge of angle
θdef ≡ 2π(1 − cos θm ) (9.31)
cut out and corresponding points along the cuts identified. Place the object to be
lensed at radius r 0 = aq on the cut and yourself directly opposite at r 0 = ao .
A Matrix Manipulation 91
Rays travel over the paper in straight lines, so you can see the object along two
lines of sight separated by 2αs , where
Hence using (9.26) we have that the string’s mass per unit length is µ = ρA = c 2 (1 −
cos θm )/(4G) = c2 θdef /(8πG) independently of the string’s physical width r0 . There
won’t be room outside the string for the Universe as we know it unless µ < 14 c2 /G =
3.37 × 1026 kg m−1 . Particle theorists think strings may exist with line densities of
order a thousandth of this.
9.7 Summary
The cosmic microwave background defines a natural coordinate system for cosmology.
On large scales the Universe appears to be strikingly homogeneous and isotropic. This
implies that equal-time hypersurfaces must have the geometry of either (i) the 3-sphere,
(ii) flat space, or (iii) hyperbolic space according as the mean cosmic density ρ is greater
than, equal to, or less than ρcrit ' 10−26 kg m−3 . It is widely believed that ρ = ρcrit
although measurements suggest a smaller value.
The cosmic scale when the light we detect from a distant object was emitted
can be deduced from the redshift z of the object’s spectrum: 1 + z = ωemit /ωobs =
a(tobs )/a(temit ). The most distant objects are seen at an epoch when a was smaller
than now by more than a factor 5.
The expansion of the Universe will cease only if ρ > ρcrit . At early times we
always have ρ ' ρcrit and the cosmic scale grows as a ∝ t2/3 . If the wild speculations
of high-energy physicists are to be believed, very early on there may have been an
inflationary phase in which a ∝ eγt and the entire observable Universe grew out of a
single quantum fluctuation. If the calibrations of astronomers are to be believed, about
two thirds of the energy density in the Universe is currently contributed by vacuum
energy, and the Universe is just starting on a second inflationary episode.
A Appendices
92 A Appendices
A Matrix Manipulation
Thus to evaluate Aλ ν ≡ gµν B λµ we first rearrange to ensure that we are summing over
adjacent indices:
Aλ ν ≡ gµν B λµ
= B λµ gµν = (B · g)λ ν .
We may have to transpose a tensor to do this:
Aν λ ≡ Bµν C µλ
λ
= (B T )νµ C µλ = (BT · C)ν .
In particular:
(i) to raise/lower first index, premultiply by g – in special relativity this just changes
the sign of the top row;
(ii) to raise/lower second index, postmultiply by g – in special relativity this just
changes the sign of the left column.
Doubly contracted 2nd rank tensors are just the trace of a product matrix:
X¡ ¢
Aµν Bµν = A · BT µµ
µ
= trace(A · BT ).
It’s easy to see that on raising each index with an η we get the same pattern for the
up symbol.
The ² symbols are useful for taking determinants:
etc.
Transforming to a curvilinear coordinate system we find
α β γ δ
0 αβγδ ∂x0 ∂x0 ∂x0 ∂x0 λκµν
² = ²
∂xκ ∂xλ ∂xµ ∂xν (9.1)
∂(x0 ) αβγδ
= ²
∂(x)
But ¯ ¯
¯ 0 αβ ¯ ¯ ∂x0 α ∂x0 β κλ ¯
¯g ¯ = ¯ ¯
¯ ∂xκ ∂xλ η ¯
µ ¶2
∂(x0 )
=− .
∂(x)
So we can write (9.1) as p
αβγδ
²0 = −|g µν |²αβγδ .
(9.2)
p p
Similarly, ²0 0123 = −|gµν | = 1/ −|g µν |. Hence in g.r. the ² symbols are not made
p
up of nought and one, but of nought and −|g µν |. Also in g.r. the up and down forms
of ² are distinct.
In general ² has two jobs: (i) it extracts the totally antisymmetric parts of tensors;
(ii) it maps one-to-one totally antisymmetric nth rank tensors into totally antisymmet-
ric tensors of rank (4 − n). The correspondence F ↔ F is an example of this map at
work.
B Derivation of Rα β ;β = 12 R;α
β ∂ h βγ ³ ∂Γµαµ ∂Γµαγ ´i
Rα ;β = g −
∂xβ ∂xγ ∂xµ
∂ h ∂ ³ ∂gµν ´i
= 21 β g βγ γ g µν (B.3)
∂x ∂x ∂xα
∂ n ∂ h ³ ∂gαν ∂gνγ ∂gγα ´io
− 21 β g βγ µ g µν + − .
∂x ∂x ∂xγ ∂xα ∂xν
Since the covariant derivative of g always vanishes, and Γ = 0 at X, all first derivatives
of g must vanish at X. Dropping from Rα β ;β all terms which contain a first derivative
of g, we find
³ ∂ 3 gµν ∂ 3 gαν ∂ 3 gνγ ∂ 3 gγα ´
β 1 βγ µν
Rα ;β = 2g g − − + .
∂xα ∂xβ ∂xγ ∂xβ ∂xγ ∂xµ ∂xα ∂xβ ∂xµ ∂xβ ∂xµ ∂xν
94 A Appendices
The second and fourth terms cancel because in each case two of the partial derivatives
are contracted together and the third is contracted with an index of the component of
g being differentiated. Hence
β 1 ∂ h βγ µν ³ ∂ 2 gµν ∂ 2 gνγ ´i
Rα ;β = 2 g g − . (B.4)
∂xα ∂xβ ∂xγ ∂xβ ∂xµ
From the definition (6.15) of the Ricci scalar R we have in our special coordinate
system
³ ∂Γµ ∂Γµβγ ´
βγ βµ
R=g −
∂xγ ∂xµ (B.5)
³ ∂2g ∂ 2
g ∂ 2
g ∂ 2
g ´
µν νβ νγ γβ
= 21 g βγ g µν − − + .
∂xγ ∂xβ ∂xγ ∂xµ ∂xµ ∂xβ ∂xµ ∂xν
The first and fourth terms are equal, as are the second and third. Comparing this
expression with (9.4) we obtain the desired relation.