Differential Geometry PDF
Differential Geometry PDF
Wulf Rossmann
0
0
This is a collection of lecture notes which I put together while teaching courses
on manifolds, tensor analysis, and differential geometry. I offer them to you in
the hope that they may help you, and to complement the lectures. The style
is uneven, sometimes pedantic, sometimes sloppy, sometimes telegram style,
sometimes long–winded, etc., depending on my mood when I was writing those
particular lines. At least this set of notes is visibly finite. There are a great
many meticulous and voluminous books written on the subject of these notes
and there is no point of writing another one of that kind. After all, we are
talking about some fairly old mathematics, still useful, even essential, as a tool
and still fun, I think, at least some parts of it.
A comment about the nature of the subject (elementary differential geometry
and tensor calculus) as presented in these notes. I see it as a natural continuation
of analytic geometry and calculus. It provides some basic equipment, which is
indispensable in many areas of mathematics (e.g. analysis, topology, differential
equations, Lie groups) and physics (e.g. classical mechanics, general relativity,
all kinds of field theories).
If you want to have another view of the subject you should by all means look
around, but I suggest that you don’t attempt to use other sources to straighten
out problems you might have with the material here. It would probably take
you much longer to familiarize yourself sufficiently with another book to get
your question answered than to work out your problem on your own. Even
though these notes are brief, they should be understandable to anybody who
knows calculus and linear algebra to the extent usually seen in second-year
courses. There are no difficult theorems here; it is rather a matter of providing
a framework for various known concepts and theorems in a more general and
more natural setting. Unfortunately, this requires a large number of definitions
and constructions which may be hard to swallow and even harder to digest.
(In this subject the definitions are much harder than the theorems.) In any
case, just by randomly leafing through the notes you will see many complicated
looking expressions. Don’t be intimidated: this stuff is easy. When you looked
at a calculus text for the first time in your life it probably looked complicated as
well. Let me quote a piece of advice by Hermann Weyl from his classic Raum–
Zeit–Materie of 1918 (my translation). Many will be horrified by the flood of
formulas and indices which here drown the main idea of differential geometry
3
4
(in spite of the author’s honest effort for conceptual clarity). It is certainly
regrettable that we have to enter into purely formal matters in such detail and
give them so much space; but this cannot be avoided. Just as we have to spend
laborious hours learning language and writing to freely express our thoughts, so
the only way that we can lessen the burden of formulas here is to master the tool
of tensor analysis to such a degree that we can turn to the real problems that
concern us without being bothered by formal matters.
W. R.
Flow chart
1. Manifolds
1.1 Review of linear algebra and calculus· · · 9
1.2 Manifolds: definitions and examples· · · 25
1.3 Vectors and differentials· · · 39
1.4 Submanifolds· · · 53
1.5 Riemann metrics· · · 62
1.6 Tensors· · · 77
2. Connections and curvature 3. Calculus on manifolds
2.1 Connections· · · 87 3.1 Differential forms· · · 136
2.2 Geodesics· · · 102 3.2 Differential calculus· · · 144
2.3 Riemann curvature· · · 108 3.3 Integral calculus· · · 150
2.4 Gauss curvature· · · 113 3.4 Lie derivatives· · · 160
2.5 Levi-Civita’s connection· · · 123
2.6 Curvature identities· · · 132
4. Special topics
4.1 General Relativity· · · 173 4.3 The rotation group SO(3)· · · 188
4.2 The Schwarzschild metric· · · 179 4.4 Cartan’s mobile frame· · · 197
4.5 Weyl’s gauge theory paper of 1929· · · 203
5
6
Contents
1 Manifolds 9
1.1 Review of linear algebra and calculus . . . . . . . . . . . . . . . . 9
1.2 Manifolds: definitions and examples . . . . . . . . . . . . . . . . 25
1.3 Vectors and differentials . . . . . . . . . . . . . . . . . . . . . . . 39
1.4 Submanifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1.5 Riemann metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.6 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Index 222
7
8 CONTENTS
Chapter 1
Manifolds
9
10 CHAPTER 1. MANIFOLDS
where i1 ···in = ±1 is the sign of the permutation (i1 , · · · , in ) of (1, · · · , n). This
seems to depend on the basis we use to write A as a matrix (aij ), but in fact it
doesn’t. Recall the following theorem.
1.1.2 Theorem. A : V → V is invertible if and only if det(A) 6= 0.
There is a formula for A−1 (Cramer’s Formula) which says that
A−1 = det(A)
1
Ã, ãji = (−1)i+j det[alk |kl 6= ji] .
The ij entry ãji of à is called the ji cofactor of A, as you can look up in your
linear algebra text. This formula is rarely practical for the actual calculation of
A−1 for a particular A, but it is sometimes useful for theoretical considerations
or for matrices with variable entries.
The rank of a linear transformation A : V → W is the dimension of the image
of A, i.e. of
im(A) = {w ∈ W : w = Av for some v ∈ V }.
This rank is equal to maximal number of linearly independent columns of the
matrix (aij ), and equals maximal number of linearly independent rows as well.
The linear map A : V → W is surjective (i.e. onto) iff1 rank(A) = m and
injective (i.e. one–to–one) iff rank(A) = n. A linear functional on V is a scalar–
valued linear function ϕ : V → R. P In terms of components with respect to a
basis {v1 , · · · , vn } we can write v = xi vi and ϕ(v) = ξi xi . For example, if
P
we take (ξ1 , ξ2 , · · · , ξn ) = (0, · · · , 1 · · · 0) with the 1 in the i–th position, then we
get the linear functional v → xi which picks out the i–th component of v relative
to the
P basis {v1 , · · · , vn }. This functional is denoted v i (index upstairs). Thus
v ( x vj ) = xi . This means that v i (vj ) = δji . The set of all linear functionals
i j
From this it follows that (a∗ )ji = aji . So as long as you write everything in
terms of components you never need to mention (a∗ )ji at all. (This may surprise
you: in your linear algebra book you will see a definition which says that the
transpose [(a∗ )ij )] of a matrix [aij ] satisfies (a∗ )ij = aji . What happened?)
1.1.3 Examples. The only example of a vector space we have seen so far is
the space of n–tuples Rn . In a way, this is the only example there is, since any
n–dimensional vector space can be identified with Rn by means of a basis. But
one must remember that this identification depends on the choice of a basis!
(a) The set Hom(V, W ) consisting of all linear maps A : V → W between two
vector spaces is again a vector space with the natural operations of addition
and scalar multiplication. If we choose bases for V and W when we can identify
A with its matrix (aji ) and so Hom(V, W ) gets identified with the matrix space
Rm×n .
(b) A function U × V → R, (u, v) → f (u, v) is bilinear is f (u, v) is linear in
u and in v separately. These functions form again a vector space. In terms of
bases we can write u = ξ i ui , v = η j vj and then f (u, v) = cij ξ i η j . Thus after
choice of bases we may identify the space of such B’s again with Rn×m . We
can also consider multilinear functions U × · · · × V → R of any finite number
of vector variables, which are linear in each variable separately. Then we get
f (u, · · · , v) = ci···j ξ i · · · η j . A similar construction applies to maps U ×· · ·×V →
W with values in another vector space.
(c) Let S be a sphere in 3–space. The tangent space V to S at a point po ∈ S
consists of all vectors v which are orthogonal to the radial line through po .
V
p00 v
Fig. 1
It does not matter how you think of these vectors: whether geometrically, as
arrows, which you can imagine as attached to po , or algebraically, as 3–tuples
(ξ, η, ζ) in R3 satisfying xo ξ + yo η + zo ζ = 0; in any case, this tangent space is
a 2–dimensional vector space associated to S and po . But note: if you think of
V as the points of R3 lying in the plane xo ξ + yo η + zo ζ =const. through po
12 CHAPTER 1. MANIFOLDS
tangential to S, then V is not a vector space with the operations of addition and
scalar multiplication in the surrounding space R3 . One should rather think of V
as vector space of its own, and its embedding in R3 as something secondary, a
point of view which one may also take of the sphere S itself. You may remember
a similar construction of the tangent space V to any surface S = {p = (x, y, z) |
f (x, y, z) = 0} at a point po : it consists of all vectors orthogonal to the gradient
of f at po and is defined only it this gradient is not zero. We shall return to
this example in a more general context later.
B. Differential calculus. The essence of calculus is local linear approximation.
What it means is this. Consider a map f : Rn → Rm and a point xo ∈ Rn in
its domain. f admits a local linear approximation at xo if there is a linear map
A : Rn → Rm so that
so (2) holds with A(v) = f (v) and o(v) ≡ 0. Thus for a linear map dfx (v) = f (v)
for all x and v.
(b) Suppose f (x, y) is bilinear, linear in x and y separately. Then
It follows that (2) holds with x replaced by (x, y) and v by (v, w) if we take
A(v, w) = f (xo , w)+ f (v, yo ) and o((v, w)) = f (v, w). Thus for all x, y and v, w
The following theorem is a basic criterion for deciding when a map f is differ-
entiable.
1.1.5 Theorem. Let f : Rn · · · → Rm be a map. If the partial derivatives
∂f /∂xi exist and are continuous in a neighbourhood of xo , then f is differen-
tiable at xo . Furthermore, the matrix of its differential dfxo : Rn → Rm is the
Jacobian matrix (∂f /∂xi )xo .
(Proof omitted)
Thus if we write y = f (x) as y j = f j (x) where x = (xi ) ∈ Rn and y = (y i ) ∈ Rm ,
then we have i
∂f
dfxo (v) = ∂x j vj .
xo
We shall often suppress the subscript xo , when the point xo is understood or
unimportant, and then simply write df . A function f as in the theorem is said
to be of class C 1 ; more generally, f is of class C k if all partials of order ≤ k exist
and are continuous; f is of class C ∞ if it has continuous partials of all orders.
1.1.6 Example. (a) Consider a function 2 f : Rn · · · → R, y = f (x1 , · · · , xn )
of n variables xi with scalar values. The formula for df (v) becomes
∂f 1 ∂f n
df (v) = ∂x 1 v + · · · + ∂xn v .
This can be written as a matrix product 1
v
∂f ..
∂f
df (v) = ∂x1 · · · ∂xn .
vn
Thus df can be represented by the row vector with components ∂f /∂xi .
(b) Consider a function f : R · · · → Rn , (x1 , · · · , xn ) = (f 1 (t), · · · , f n (t)) of
one variable t with values in Rn . (The n here plays the role to the m above).
The formula for df (v) becomes
1 df 1
df (v) dt
.. .
. = .. .
n
df n (v) df
dt
In matrix notation we could represent df by the column vector with components
df i /dt. (
v is now a scalar .) Geometrically such a function represents a parametrized
curve in Rn and we can think of think of p = f (t) think as the position of a
moving point at time t. We shall usually omit the function symbol f and simply
write p = p(t) or xi = xi (t). Instead df i /dt we write ẋi (t) instead of df we write
ṗ(t), which we think of as a vector with components ẋi (t), the velocity vector
of the curve.
2 The dotted arrow indicates a partially defined map.
14 CHAPTER 1. MANIFOLDS
(whenever defined).
1.1. REVIEW OF LINEAR ALGEBRA AND CALCULUS 15
The proof needs some preliminary remarks. Define the norm kAk of a linear
transformation A : Rn → Rm by the formula kAk = maxkxk=1 kA(x)k. This
max is finite since the sphere kxk = 1 is compact (closed and bounded). Then
kA(x)k ≤ kAkkxk for any x ∈ Rn : this is clear from the definition if kxk = 1
and follows for any x since A(x) = kxkA(x/kxk) if x 6= 0.
Proof . Fix x and set k(v) = f (x + v) − f (x). Then by the differentiability of
g,
g(f (x + v)) − g(f (x)) = dgf (x) k(v) + o(k(v))
and by the differentiability of f ,
k(v) = f (x + v) − f (x) = dfx (v) + o(v)
We use the notation O(v) for any function satisfying O(v) → 0 as v → 0.
(The letter big O can stand for a different such function each time it occurs.)
Then o(v) = kvkO(v) and similarly o(k(v)) = kk(v)kO(k(v)) = kk(v)kO(v).
Substitution the expression for k(v) into the preceding equation gives
g(f (x + v)) − g(f (x)) = dgf (x) (dfx (v)) + kvkdgf (x) (O(v)) + kk(v)kO(v).
We have kk(v)k ≤ kdfx (v)k+o(v) ≤ kdfx kkvk+Ckvk ≤ C 0 kvk. It follows that
g(f (x + v)) − g(f (x)) = dgf (x) (dfx (v)) + kvkO(v)
or yet another O.
Of particular importance is the special case of the Chain Rule for curves. A
curve in Rn is a function t → x(t), R · · · → Rn , defined on some interval. In that
case we also write ẋ(t) or dx(t)/dt for its derivative vector (dxi (t)/dt) (which is
just the differential of t → x(t) considered as a vector).
1.1.8 Corollary (Chain Rule: special case). Let f : Rn · · · → Rm be a
differentiable map and p(t) a differentiable curve in Rn . Then f (p(t)) is also
differentiable (where defined) and
df (p(t)) dp(t)
= dfp(t) ( ).
dt dt
Geometrically, this equation says the says that the tangent vector df (p(t))/dt to
the image f (p(t)) of the curve p(t) under the map f is the image of its tangent
vector dp(t)/dt under the differential dfp(t) .
1.1.9 Corollary. Let f : Rn · · · → Rm be a differentiable map po ∈ Rn a point
and v ∈ Rn any vector. Then
d
dfpo (v) = f (p(t))
dt t=to
θ y
0
0
0
r θ
r x
Fig. 2
(c) Let z = f (r, θ) be a function given in polar coordinates. Then ∂z/∂x and
∂z/∂y are found by solving the equations
∂z ∂z ∂x ∂z ∂y ∂z ∂z
∂r = ∂x ∂r + ∂y ∂r = ∂x cos θ + ∂y sin θ
∂z ∂z ∂x ∂z ∂y ∂z ∂z
∂θ= ∂x ∂θ + ∂y ∂θ = ∂x (−r sin θ) + ∂y r cos θ
which gives (e.g. using Cramer’s Rule)
∂z ∂z sin θ ∂z
∂x = cos θ ∂r − r ∂θ
∂z ∂z cos θ ∂z
∂y = sin θ ∂r + r ∂θ
18 CHAPTER 1. MANIFOLDS
x = r cos θ, y = r sin θ.
Thus det[∂(x, y)/∂(r, θ)] 6= 0 except when r = 0, i.e. (x, y) = (0, 0). Hence
for any point (ro , θo ) with ro 6= 0 one can find a neighbourhood U of (ro , θo )
in the rθ-plane and a neighbourhood V of (xo , yo ) = (ro cos θo , ro sin θo ) in the
xy-plane so that F maps U one-to-one onto V and F : U → V has a C ∞ inverse
G : V → U . It is obtained by solving x = r cos θ, y = r sin θ for (r, θ), subject
to the restriction (r, θ) ∈ U, (x, y) ∈ V . For x 6= 0 the solution can be written
as
r = ± x2 + y 2 , θ = arctan xy
p
where the sign ± must be chosen so that (r, θ) lies in U . But this formula does
not work when x = 0, even though a local inverse exists as long as (x, y) 6= (0, 0).
For example, for any point off the positive x-axis one can take for U the region
in the rθ-plane described by r > 0, −π < θ < π. V is then the corresponding
region in the xy-plane, which is just the points off the positive x-axis. It is
geometrically clear that the map (r, θ) → (x, y) map V one-to-one onto U . For
any point off the negative x-axis one can take for U the region in the rθ-plane
1.1. REVIEW OF LINEAR ALGEBRA AND CALCULUS 19
described by r > 0, 0 < θ < π. V is then the corresponding region in the xy-
plane, which is just the points off the negative x-axis. It is again geometrically
clear that the map (r, θ) → (x, y) map V one-to-one onto U .
For the proof of the Inverse Function Theorem we need a lemma, which is itself
very useful.
1.1.16 Contraction Lemma. Let U be an open subset of Rn and F : U → U
be any map of U into itself which is a contraction, i.e. there is K is a positive
constant ≤ 1 so that for all x, y ∈ U, kF (y) − F (x)k ≤ Kky − xk. Then F has a
unique fixed point xo in U , i.e. a point xo ∈ U so that F (xo ) = xo . Moreover,
for any x ∈ U ,
y + x − f (x) = x,
which says that x is a fixed point of the map h(x) := y + x − f (x), y being
given. So we’ll try to show that this h is a contraction map. The proof itself
takes several steps.
(1) Preliminaries. We may assume x0 = 0, by composing f with the map
x → x − xo . Next we may assume that df0 = 1, the identity map, by replacing
20 CHAPTER 1. MANIFOLDS
g(x) = x − f (x).
f −1 (y) − f −1 (y1 ) − (dfx1 )−1 (y − y1 ) = x − x1 − (dfx1 )−1 (f (x) − f (x1 )). (9)
Since f is differentiable,
Since the linear map dfx1 depends continuously on x1 ∈ Br and det dfx1 6=
0 there, its inverse (dfx1 )−1 is continuous there as well. (Think of Cramer’s
formula for A−1 .) Hence we can apply the argument of step (2) to (10) and
find that
k(dfx1 )−1 (o(x − x1 )k ≤ Co(x − x1 ). (11)
for x, x1 ∈ Br . We also know that
kx − x1 k ≤ 2ky − y1 k.
for another little o. This proves that f −1 is differentiable at y1 with (df −1 )y1 =
(dfx1 )−1 .
(7) Conclusion of the proof. Since f is of class C 1 the matrix entries ∂f i /∂xj
of dfx are continuous functions of x, hence the matrix entries of
are continuous functions of y. (Think again of Cramer’s formula for A−1 and
recall that x = f −1 (y) is a continuous function of y.) This proves the theorem
for k = 1. For k = 2, we have to show that (12) is still C 1 as a function of y.
As a function of y, the RHS of (11) is a composite of the maps y → x = f −1 (y),
x → dfx , and A → A−1 . This first we just proved to be C 1 ; the second is C 1
because f is C 2 ; the third is C 1 by Cramer’s formula for A−1 . This proves the
theorem for k = 2, and the proof for any k follows from the same argument by
induction.
EXERCISES 1.1
1. Let V be a vector space, {vi } and {ṽi } two basis for V . Prove (summation
convention throughout):
22 CHAPTER 1. MANIFOLDS
that for any linear transformation A of R3 , V (Au, Av, Aw) = | det A|V (u, v, w).
Show that this formula remains valid for the matrix of components with respect
to any right–handed orthonormal basis u1 , u2 , u3 of R3 . [Right–handed means
det[u1 , u2 , u3 ] = +1.]
9. Let f : (ρ, θ, φ) → (x, y, z) be the spherical coordinate map. [See problem
6]. For a given point (ρo , θo , φo ) in ρθφ-space, let vectors vρ , vθ , vφ at the point
(xo , yo , zo ) in xyz-space which correspond to the three standard basis vectors
eρ , eθ , eφ = (1, 0, 0), (0, 1, 0), (0, 0, 1) in ρθφ-space under the differential df of f .
(a) Show that vρ is the tangent vector of the ρ–coordinate curve R3 , i.e. the
curve with parametric equation ρ =arbitrary (parameter), θ = θo , φ = φo in
spherical coordinates. Similarly for vθ and vφ . Sketch.
(b) Find the volume of the parallelepiped spanned by the three vectors vρ , vθ , vφ
at (x, y, z). [See problem 8.]
10. Let f, g be two differentiable functions Rn → R. Show from the definition
(1.4) of df that
d(f g) = (df )g + f (dg).
11. Suppose f : Rn → R is C2 . Show that the second partials are symmetric:
∂ ∂ ∂ ∂
∂xi ∂xj f = ∂xj ∂xi f for all ij.
[You may consult your calculus text.]
12. Use the definition of df to calculate the differential dfx (v) for the following
functionsPf (x). [Notation: x = (xi ), v = (v i ).]
(a) i ci xi (ci =constant) (b) x1 x2 · · · xn (c) i ci (xi )2 .
P
x̃i = f i (x1 , · · · , xn ), i = 1, · · · , n.
f
x x~
Fig. 1
and we shall take “coordinate system” to mean this map, in order to have
something definite. But one should be a bit flexible with this interpretation: for
example, in another convention one could equally well take “coordinate system“
to mean the inverse map Rn · · · → M, x → p(x) and when convenient we shall
also use the notation p(x) for the point of M corresponding to the coordinate
point x.
It will be noted that we use the same symbol x = (xi ) for both the coordinate
map p → x(p) and a general coordinate point x = (x1 , · · · , xn ), just as one
does with the xyz-coordinates in calculus. This is premeditated confusion: it
leads to a notation which tells one what to do (just like the ∂y i /∂xj in calculus)
and suppresses extra symbols for the coordinate map and its domain, which
are usually not needed. If it is necessary to have a name for the coordinate
domain we can write (U, x) for the coordinate system and if there is any danger
of confusion because the of double meaning of the symbol x (coordinate map
and coordinate point) we can write (U, φ) instead. With the notation (U, φ)
comes the term chart for coordinate system and the term atlas for any collection
{(Uα , φα )} of charts satisfying MAN 1–3. In any case, one should always keep
in mind that a manifold is not just a set of points, but a set of points together
with an atlas, so that one should strictly speaking consider a manifold as a
pair (M, {(Uα , φα )}). But we usually call M itself a manifold, and speak of
its manifold structure when it is necessary to remind ourselves of the atlas
{(Uα , φα )}. Here are some examples to illustrate the definition.
1.2.2 Example: the Cartesian plane. As the set of points we take R2 = {p =
(x, y) | x, y ∈ R}. As it stands, this set of points is not a manifold: we have to
specify a collection of the coordinate systems satisfying the axioms MAN 1–3.
Each coordinate system will be a map p → (x1 (p), x2 (p)), which maps a domain
of points p, to be specified, one–to–one onto an open set of pairs (x1 , x2 ). We
list some possibilities.
Cartesian coordinates (x, y): x(p) = x, y(p) = y if p = (x, y). Domain: all p.
Note again that x,y are used to denote the coordinate of a general point p as
well as to denote the coordinate functions p → x(p), y(p)
Polar coordinates (r, θ): x = r cos θ, y = r sin θ. Domain: (r,θ) may be used as
coordinates in a neighbourhood of any point where ∂(x, y)/∂(r, θ) = r 6= 0, i.e.
in a neighbourhood of any point except the origin. A possible domain is the
set of points p for which r > 0, 0 < θ < 2π, the p = (x, y) given by the above
equations for these values of r, θ; these are just the p’s off the positive x-axis.
Other domains are possible and sometimes more convenient.
Hyperbolic coordinates (u,ψ): x = u cosh ψ, y = u sinh ψ. Domain: The pairs
(x, y) representable in this form are those satisfying x2 − y 2 = u2 ≥ 0.(u, ψ)
may be used as coordinates in a neighbourhood of any point (x, y) in this region
for which det ∂(x,y)/∂(u,ψ) = u 6= 0. The whole region {(x, y) | x2 − y 2 > 0}
corresponds one-to-one to {(u, ψ) | u 6= 0} and can serve as coordinate domain.
In the region {(x, y) | x2 − y 2 < 0} one can use x = u sinh ψ, y = u cosh ψ as
coordinates.
1.2. MANIFOLDS: DEFINITIONS AND EXAMPLES 27
a b
Linear coordinates (u, v): u = ax + by, v = cx + dy, where is any
c d
invertible 2 × 2 matrix (i.e. ab − cd 6= 0). A special case are orthogonal linear
coordinates, when the matrix is orthogonal. Domain: all p.
Affine coordinates
(u,
v): u = xo + ax + by, v = yo + cx + dy, where xo , yo are
a b
arbitrary and is any invertible 2×2 matrix (i.e. ad − bc 6= 0). A special
c d
case are Euclidean coordinates, when the matrix is orthogonal. Domain: all p.
As we mentioned, these are some possible choices for coordinate systems on
R2 . The question is which to choose for the collection of coordinate systems
required in MAN 1–3. For example, the Cartesian coordinate system {(x, y)}
by itself evidently satisfies the axioms MAN 1–3, hence this single coordinate
system suffices to make R2 into a manifold. On the other hand, we can add
the polar coordinates (r, θ), with domain r > 0, 0 < θ < 2π, say. So that
we take as our collection of coordinates {(x, y), (r, θ)}. We now have to check
that the axioms are satisfied. The only thing which is not entirely clear is that
the coordinate transformation (x, y) → (r, θ) = (f 1 (x, y), f 2 (x, y)) is C ∞ on its
domain, as specified by MAN 2. First of all, the domain of this map consists
of the (x, y) off the positive x–axis {(x, y) | y = 0, x ≥ 0}, which corresponds
to {(r, θ) | r > 0, 0 < θ < 2π}. Thus this domain is open, and it is clear
geometrically that (x, y) → (r, θ) is a one–to–one correspondence between these
sets. However, the map (x, y) → (r, θ) this map is not given explicitly, but is
defined implicitly as the inverse of the map (θ, r) → (x, y) = (r cos θ, r sin θ).
We might try to simply write down the inverse map as
But this is not sufficient: as it stands, f 2 (x, y) is not defined when x = 0, but
some of these points belong to the domain of (x, y) → (r, θ). It is better to argue
like this. As already mentioned, we know that the inverse map f exists on the
domain indicated. Since the inverse of a map between two given sets is unique
(if it exists) this map f must coincide with the local inverse F guaranteed by
the Inverse Function Theorem whenever wherever are both are defined as maps
between the same open sets. But the Inverse Function Theorem says also that
F is C ∞ where defined. Hence f is then C ∞ as well at points where some such
F can be found, i.e. at all points of its domain where the Jacobian determinant
∂(x, y)/∂(r, θ) 6= 0 i.e. r 6= 0, and this is includes all points in the domain of f .
Hence all three axioms are satisfied for M = R2 together with {(x, y), (r, θ)}.
Perhaps I belabored too much the point of verifying MAN 2 for the implicitly
defined map (x, y) → (r, θ), but it illustrates a typical situation and a typical
argument. In the future I shall omit these details. There is another important
point. Instead of specifying the domain of the coordinates (r, θ) by r > 0, 0 <
θ < 2π, it is usually sufficient to know that the equations x = r cos θ, y = r sin θ
can be used to specify coordinates in some neighbourhood of any point (x, y)
where r 6= 0, as guaranteed by the Inverse Function Theorem. For example,
28 CHAPTER 1. MANIFOLDS
we could admit all of these local specifications of (r, θ) among our collection
of coordinates (without specifying explicitly the domains), and then all of the
axioms are evidently satisfied. This procedure is actually most appropriate,
since it brings out the flexibility in the choice of the coordinates. It is also
the procedure implicitly followed in calculus, where one ignores the restrictions
r > 0, 0 < θ < 2π as soon as one discusses a curve like r = sin 2θ.
We could add some or all of the other coordinates defined above and check
again that the axioms still hold and we face again the question which to choose.
One naturally has the feeling that it should not matter, but strictly speaking,
according to the definition, we would get a different manifold structure on the
point set R2 for each choice. We come back to this in a moment, but first
I briefly look at some more examples, without going through all the detailed
verifications, however.
φ
z
p
y
x r
θ
Similar with the negative root for z < 0 and with (x, y) replaced by (x, z) or by
(y, z). This gives 6 coordinate systems, corresponding to the parallel projections
onto the 3 coordinate planes and the 2 possible choices of sign ± in each case.
Geographical coordinates (θ, φ): x = cos θ sin φ, y = sin θ sin φ, z = cos φ. Do-
main: 0 < θ < 2π, 0 < φ < π. (Other domains are possible.)
Central projection coordinates (u, v)
u v 1
x= √ , y=√ , z=√ . Domain: z > 0..
1 + u2 + v 2 1 + u2 + v 2 1 + u2 + v 2
z
(u,v)
p
y
This is the central projection of the upper hemisphere z > 0 onto the plane
z = 1. One could also take the lower hemisphere or replace the plane z = 1
by x = 1 or by y = 1. This gives again 6 coordinate systems. The 6 parallel
projection coordinates by themselves suffice to make S 2 into a manifold, as do
the 6 central projection coordinates. The geographical coordinates do not, even
if one takes all possible domains for (r, θ): the north pole (0, 0, 1) never lies in
a coordinate domain. However, if one defines another geographical coordinate
system on S 2 with a different north pole (e.g. by replacing (x, y, z) by (z, y, x)
in the formula) one obtains enough geographical coordinates to cover all of S 2 .
All of the above coordinates could be defined with (x, y, z) replaced by any
orthogonal linear coordinate system (u, v, w) on R3 .
30 CHAPTER 1. MANIFOLDS
Let’s now go back to the definition of manifold and try understand what it is
trying to say. Compare the situation in analytic geometry. There one starts
with some set of points which comes equipped with a distinguished Cartesian
coordinate system (x1 , · · · , xn ), which associates to each point p of n–space a
coordinate n–tuple (x1 (p), · · · , xn (p)). Such a coordinate system is assumed
fixed once and for all, so that we may as well the points to be the n–tuples,
which means that our “Cartesian space“ becomes Rn , or perhaps some subset
of Rn , which better be open if we want to make sense out differentiable func-
tions. Nevertheless, but one may introduce other curvilinear coordinates (e.g.
polar coordinates in the plane) by giving the coordinate transformation to the
Cartesian coordinates (e.g x =rcos θ, y =rsin θ). Curvilinear coordinates are
not necessarily defined for all points (e.g. polar coordinates are not defined at
the origin) and need not take on all possible values (e.g. polar coordinates are
restricted by r≥ 0, 0 ≤ θ < 2π). The requirement of a distinguished Cartesian
coordinate system is of course often undesirable or physically unrealistic, and
the notion of a manifold is designed to do away with this requirement. However,
the axioms still require that there be some collection of coordinates. We shall
return in a moment to the question in how far this collection of coordinates
should be thought of as intrinsically distinguished.
The essential axiom is MAN 2, that any two coordinate systems should be
related by a differentiable coordinate transformation (in fact even C ∞ , but this
is a comparatively minor, technical point). This means that manifolds must
locally look like Rn as far as a differentiable map can “see“: all local properties
of Rn which are preserved by differentiable maps must apply to manifolds as
well. For example, some sets which one should expect to turn out to be manifolds
are the following. First and foremost, any open subset of some Rn , of course;
smooth curves and surfaces (e.g. a circle or a sphere); the set of all rotations of
a Euclidean space (e.g. in two dimensions, a rotation is just by an angle and this
set looks just like a circle). Some sets one should not expect to be manifolds
are: a half–line (with an endpoint) or a closed disk in the plane (because of
the boundary points); a figure–8 type of curve or a cone with a vertex; the
set of the possible rotations of a steering wheel (because of the “singularity”
when it doesn’t turn any further. But if we omit the offending singular points
from these non–manifolds, the remaining sets should still be manifolds.) These
example should give some idea of what a manifold is supposed to be, but they
may be misleading, because they carry additional structure, in addition to their
manifold structure. For example, for a smooth surface in space, such as a
sphere, we may consider the length of curves on it, or the variation of its normal
direction, but these concepts (length or normal direction) do not come from
its manifold structure, and do not make sense on an arbitrary manifold. A
manifold (without further structure) is an amorphous thing, not really a space
with a geometry, like Euclid’s. One should also keep in mind that a manifold is
not completely specified by naming a set of points: one also has to specify the
coordinate systems one considers. There may be many natural ways of doing
this (and some unnatural ones as well), but to start out with, the coordinates
have to be specified in some way.
1.2. MANIFOLDS: DEFINITIONS AND EXAMPLES 31
Let’s think a bit more about MAN 2 by recalling the meaning of “differentiable“:
a map is differentiable if it can be approximated by a linear map to first order
around a given point. We shall see later that this imposes a certain kind of
structure on the set of points that make up the manifolds, a structure which
captures the idea that a manifold can in some sense be approximated to first
order by a linear space. “Manifolds are linear in infinitesimal regions” as classical
geometers would have said.
One should remember that the definition of “differentiable“ requires that the
function in question be defined in a whole neighbourhood of the point in ques-
tion, so that one may take limits from any direction: the domain of the function
must be open. As a consequence, the axioms involve “openness” conditions,
which are not always in agreement with the conventions of analytic geometry.
(E.g. polar coordinates must be restricted by r > 0, 0 < θ < 2π; −strict
inequalities!). I hope that this lengthy discussion will clarify the definition,
although I realize that it may do just the opposite.
As you can see, the definition of “manifold“ is really very simple, much shorter
than the lengthy discussion around it; but I think you will be surprised at the
amount of structure hidden in the axioms. The first item is this.
1.2.6 Definition. A neighbourhood of a point po in M is any subset of M
containing all p whose coordinate points x(p) in some coordinate system satisfy
kx(p) − x(po )k < for some > 0. A subset of M is open if it contains a
neighbourhood of each of its points, and closed if its complement is open.
This definition makes a manifold into what is called a topological space: there
is a notion of “neighbourhood“ or (equivalently) of “open set”. An open subset
U of M , together with the coordinate systems obtained by restriction to U , is
again an n–dimensional manifold.
1.2.7 Definition. A map F : M → N between manifolds is of class Ck ,
0 ≤k≤ ∞, if F maps any sufficiently small neighbourhood of a point of M into
the domain of a coordinate system on N and the equation q = F (p) defines a
Ck map when p and q are expressed in terms of coordinates:
y j = F j (x1 , · · · , xn ), j = 1, · · · , m.
and if we use another coordinate system on it, e.g. polar coordinates in the
plane, we would strictly speaking have another manifold. But this is not what
is intended. So we extend the notion of “coordinate system“ as follows.
1.2.8 Definition. A general coordinate system on M is any diffeomorphism
from an open subset U of M onto an open subset of Rn .
These general coordinate systems are admitted on the same basis as the coor-
dinate systems with which M comes equipped in virtue of the axioms and will
just be called “coordinate systems“ as well. In fact we shall identify mani-
fold structures on a set M which give the same general coordinates, even if the
collections of coordinates used to define them via the axioms MAN 1–3 were
different. Equivalently, we can define a manifold to be a set M together with
all general coordinate systems corresponding to some collection of coordinates
satisfying MAN 1–3. These general coordinate systems form an atlas which is
maximal, in the sense that it is not contained in any strictly bigger atlas. Thus
we may say that a manifold is a set together with a maximal atlas; but since
any atlas can always be uniquely extended to a maximal one, consisting of the
general coordinate systems, any atlas determines the manifold structure. That
there are always plenty of coordinate systems follows from the inverse function
theorem:
1.2.9 Theorem. Let x be a coordinate system in a neighbourhood of po and
f : Rn · · · → Rn x̃i = f i (x1 , · · · , xn ), a C ∞ map defined in a neighbourhood of
xo = x(po ) with det(∂ x̃i /∂xj )xo 6= 0. Then the equation x̃(p) = f (x(p)) defines
another coordinate system in a neighbourhood of po .
In the future we shall use the expression “a coordinate system around po “ for
“a coordinate system defined in some neighbourhood of po ”.
A map F : M → N is a local diffeomorphism at a point po ∈ M if po has
an open neighbourhood U so that F |U is a diffeomorphism of U onto an open
neighbourhood of F (po ). The inverse function theorem says that this is the case
if and only if det(∂F j /∂xi ) 6= 0 at po . The term local diffeomorphism by itself
means “local diffeomorphism at all points”.
1.2.10 Examples. Consider the map F which wraps the real line R1 around
the unit circle S 1 . If we realize S 1 as the unit circle in the complex plane, S 1 =
{z ∈ C : |z| = 1}, then this map is given by F (x) = eix . In a neighbourhood
of any point zo = eixo of S 1 we can write z ∈ S 1 uniquely as z = eix with x
sufficiently close to xo , namely |x − xo | < 2π. We can make S 1 as a manifold
by introducing these locally defined maps z → x as coordinate systems. Thus
for each zo ∈ S 1 fix an xo ∈ R with zo = eixo and then define x = x(z) by
z = eix , |x − xo | < 2π, on the domain of z’s of this form. (Of course one can
cover S 1 with already two of such coordinate domains.) In these coordinates
the map x → F (x) is simply given by the formula x → x. But this does not
mean that F is one–to–one; obviously it is not. The point is that the formula
holds only locally, on the coordinate domains. The map F : R1 → S 1 is not a
diffeomorphism, only a local diffeomorphism.
1.2. MANIFOLDS: DEFINITIONS AND EXAMPLES 33
(x,y)
θ
bothered by hits: they can pass through each other, and anything else. Not
useful as a weapon.) Describe the configuration space of this mechanical system
as a direct product of some of the manifolds defined in the text. Describe the
subset of configurations in which the sticks are not in collision and show that it
is open in the configuration space.
(b) Same problem if the joint admits only rotations at right angles to the “grip
stick” to whose tip the “hit stick” is attached. (Wind–mill type joint; good for
lateral blows. Each stick now has a tip and a tail.)
[Translating all of this into something precise is part of the problem. Is the
configuration space a manifold at all, in a reasonable way? If so, is it connected
or what are its connected components? If you find something unclear, add
precision as you find necessary; just explain what you are doing. Use sketches.]
7. Let M be a set with two manifold structures and let M 0 , M 00 denote the
corresponding manifolds.
(a) Show that M 0 = M 00 as manifolds if and only if the identity map on M
is C ∞ as a map M 0 → M 00 and as a map M 00 → M 0 . [Start by stating the
definition what it means that “M 0 = M 00 as manifolds “.]
(b) Suppose M 0 , M 00 have the same open sets and each open set U carries the
same collection of C ∞ functions f : U → R for M 0 and for M 00 . Is then
M 0 = M 00 as manifold? (Proof or counterexample.)
8. Specify a collection {(Uα , φα )} of partially defined maps φα : Uα ⊂ R → R
as follows.
(a) U = R, φ(t) = t3 , (b) U1 = R−{0}, φ1 (t) = t3 , U2 = (−1, 1), φ2 (t) = 1/1−t.
Determine if {(Uα , φα )} is an atlas (i.e. satisfies MAN 1 − 3). If so, determine
if the corresponding manifold structure on R is the same as the usual one.
9. (a) Let S be a subset of Rn for some n. Show that there is at most one
manifold structure on S so that a partially defined map Rk · · · → S (any k)
⊂
is C ∞ if and only if Rk · · · → S →Rn is C ∞ . [Suggestion. Write S 0 , S 00 for S
equipped with two manifold structures. Show that the identity map S 0 → S 00 is
C ∞ in both directions. ]
(b) Show that (a) holds for the usual manifold structure on S = Rm considered
as subset of Rn (m ≤ n).
10. (a) Let P1 be the set of all one–dimensional subspaces of R2 (lines through
the origin). Write hxi = hx1 , x2 i for the line through x = (x1 , x2 ). Let U1 =
{hx1 , x2 i : x1 6= 0} and define φ1 : U1 → R, hxi → x2 /x1 . Define (U2 , φ2 )
similarly by interchanging x1 and x2 and prove that {(U1 , φ1 ), (U2 , φ2 )} is an
1.2. MANIFOLDS: DEFINITIONS AND EXAMPLES 37
atlas for P1 (i.e. MAN 1–3 are satisfied.) Explain why the map φ1 can be
viewed as taking the intersection with the line x1 = 1. Sketch.
(b) Generalize part (a) for the set of one–dimensional subspaces Pn of Rn+1 .
[Suggestion. Proceed as in part (a): the line x1 = 1 in R2 is now replaced by the
n–plane in Rn+1 given by this equation. Consider the other coordinate n–planes
xi = 0 as well. Sketching will be difficult for n > 2. The manifold Pn is called
(real) projective n–space.]
11. Let Pn be the set of all one–dimensional subspaces of Rn+1 (lines through
the origin). Let F : S n → Rn be the map which associates to each p ∈ S n the
line hpi = Rp.
(a) Show that Pn admits a unique manifold structure so that the map F is a
local diffeomorphism.
(b)Show that the manifold structure on Pn defined in problem 10 is the same
as the one defined in part (a).
12. Generalize problem 10 to the set Gk,n of k–dimensional subspaces of Rn .
[Suggestion. Consider the map φ which intersects a k–dimensional subspace
P ∈Gk,n with the coordinate (n − k)–plane with equation x1 = x2 = · · · = xk =
1 and similar maps using other coordinate (n − k)–planes of this type. The
manifolds Gk,n are called (real) Grassmannian manifolds.) ]
13. Let M be a manifold, p ∈ M a point of M . Let Mp be the set of all points
which can be joined to p by a continuous curve. Show that Mp is an open subset
of M .
14. Define an n–dimensional linear manifold to be a set L together with a
maximal atlas of charts L → Rn , p → x(p) which are everywhere defined bijec-
tions and any two of which are related by an invertible linear transformation
x̃j = aji xi . Show that every vector space is in a natural way a linear manifold
and vice versa. [Thus a linear manifold is really the same thing as a vector
space. The point is that vector spaces could be defined in a way analogous to
manifolds.]
15. (a) Define a notion of affine manifold in analogy with (a) using affine
coordinate transformations as defined in 1.2.7. Is every vector space an affine
manifold in a natural way? How about the other way around?
(b) Define an affine space axiomatically as follows.
Definition. An n–dimensional affine space consists of set A of points together
with a set of vectors which form an n–dimensional vector space V .
ASP 1. Any two points p, q in A determine a vector pq ~ in V .
ASP 2. Given a point p ∈ A and a vector v ∈ V there is a unique point q ∈ A
so that v = pq.
~
ASP 3. For any three points a, b, c ∈ A, ab ~ + bc
~ = ac.
~
Show that every affine manifold space is in a natural way an affine space and
vice versa. [Thus an affine manifold is really the same thing as an affine space.]
16. Give further examples of some types of “spaces“ one can define in analogy
with C ∞ manifolds, like those in the previous two problems. In each case,
38 CHAPTER 1. MANIFOLDS
vector ∂p(x)/∂xk of the k–th coordinate line through p, i.e. of the curve
p(x1 , x2 , · · · , xn ) parametrized by xk = t, the remaining coordinates being
given their value at p. The (∂/∂xk )p form a basis for Tp M since every vec-
tor v at p can be uniquely written as a linear combination of the (∂/∂xk )p : v =
ξ k (∂/∂xk )p .The components ξ k of a vector at p with respect to this basis are
just the ξ k representing v relative to the coordinate system (xk ) according to
the definition of “vector at p“ and are sometimes called the components of v
relative to the coordinate system (xi ). It is important to remember that ∂/∂x1
(for example) depends on all of the coordinates (x1 , · · · , xn ), not just on the
coordinate function x1 .
__
d
dθ
__
d
0
dr
0
p = x1 e1 + · · · + xn en .
~v = ξ 1 e1 + · · · + ξ n en . (2)
∂ ∂x ∂ ∂y ∂ ∂z ∂ ∂ ∂
→ + + = − sin θ sin φ + cos θ sin φ
∂θ ∂θ ∂x ∂θ ∂y ∂θ ∂z ∂x ∂y
∂ ∂x ∂ ∂y ∂ ∂z ∂ ∂ ∂ ∂
→ + + = cos θ cos φ + sin θ cos φ − sin φ (4)
∂φ ∂φ ∂x ∂φ ∂y ∂φ ∂z ∂x ∂y ∂z
In this way Tp S 2 is identified with the subspace of Tp R3 ≈ R3 given by
Tp S 2 ≈ {v ∈ Tp R3 = R3 : v · p = 0}. (5)
(exercise).
1.3.5 Definition. A vector field X on M associates to each point p in M a
vector X(p) ∈ Tp M . We also admit vector fields defined only on open subsets
of M.
Examples are the basis vector fields ∂/∂xk of a coordinate system: by definition,
∂/∂xk has components v i = δki relative to the coordinate system (xi ). On the
coordinate domain, every vector field can be written as
∂
X = Xk (6)
∂xk
1.3. VECTORS AND DIFFERENTIALS 43
the partials being taken at f (p). But this is just the Chain Rule:
∂ ỹ j ˜i
η̃ j = ∂ x̃i ξ [definition of η̃ j ]
j i
= ∂∂ỹx̃i ∂x
∂ x̃ k
kξ [(ξ k ) is a vector]
j
∂ ỹ k
= ∂x kξ [Chain Rule]
∂ ỹ j ∂y i k
= ∂yi ∂xk ξ [Chain Rule]
j
= ∂∂yỹ i η i [definition of η i ]
∂f
df = dxk ; (8)
∂xk
the subscripts “p“ have been omitted to simplify the notation. In general, a map
f : M → N between manifolds, written in coordinates as y j = f j (x1 , · · · , xn )
has its differential dfp : Tp M → Tf (p) N given by the formula dy j = (∂f j /∂xi )dxi ,
as is evident if we think of dxi as the components ξ i = dxi (v) of a general vector
at p, and think of dy j similarly. The transformation property of vectors is built
into this notation.
The differential has the following geometric interpretation in terms of tangent
vectors of curves.
1.3.9 Lemma. Let f : M · · · → N be a C 1 map between manifolds, p = p(t) a
C 1 curve in M . Then df maps the tangent vector of p(t) to the tangent vector
of f (p(t)):
df (p(t)) dp(t)
= dfp(t) .
dt dt
Proof. Write p = p(t) in coordinates as xi = xi (t) and q = f (p(t)) as y i = y i (t).
Then
dy j ∂y j dxi df (p(t)) dp(t)
= says = dfp(t)
dt ∂xi dt dt dt
as required.
1.3. VECTORS AND DIFFERENTIALS 45
Remarks. (a) To say that the linear map dfpo is injective (i.e. one–to–one)
means that rank(dfpo ) = dim M ≤ dim N . We then call f an immersion at po .
(b) The theorem says that one can find coordinate systems around po and f (po )
so that on the coordinate domain f : M → N becomes the inclusion map
Rn → Rm (n ≤ m).
Proof. Using coordinates, it suffices to consider a (partially defined) map f :
Rn · · · → Rm (n ≤ m). Suppose we can a find local diffeomorphism ϕ : Rm · · · →
Rm so that f = ϕ ◦ i where i : Rn → Rm is the inclusion.
46 CHAPTER 1. MANIFOLDS
i
R n → Rm
id ↓ ↓ϕ
R n → Rm
f
Then the equation f (x) = ϕ(i(x)), says that f “becomes“ i if we use ϕ as
coordinates on Rm and the identity on Rn .
Now assume rank(dfpo ) = n ≤ m. Write q = f (p) as y j = f j (x1 , · · · , xn ), j ≤
m. At po , the m × n matrix [∂f j /∂xi ],j ≤ m, i ≤ n, has n linearly independent
rows (indexed by some j’s). Relabeling the coordinates (y j ) we may assume
that the n × n matrix [∂f j /∂xi ], i, j ≤ n, has rank n, hence is invertible. Define
ϕ by
ϕ(x1 , · · · xn , · · · , xm ) = f (x1 , · · · , xn ) + (0, · · · , 0, xn+1 , · · · , xm ).
i.e.
ϕ = (f 1 , · · · , f n , f n+1 + xn+1 , · · · , f m + xm ),
Since i(x1 , · · · , xn ) = (x1 , · · · , xn , 0, · · · , 0) we evidently have f = ϕ ◦ i. The
determinant of the matrix (∂ϕj /∂xi ) has the form
j
∂f /∂xi 0
det = det[∂f j /∂xi ], i, j ≤ n
∗ 1
hence is nonzero at i(po ). By the Inverse Function Theorem ϕ is a local diffeo-
morphism at po , as required.
Example. a)Take for f : M → N the inclusion map i : S 2 → R3 . If we use
coordinates (θ, φ) on S 2 = {r = 1} and (x, y, z) on R3 then i is given by the
familiar formulas
i : (θ, φ) → (x, y, z) = (cos θ sin φ, sin θ sin φ, cos φ).
A coordinate system (y 1 , y 2 , y 2 ) on R3 as in the theorem is given by the slightly
modified spherical coordinates (θ, φ, ρ − 1) on R3 , for example: in these coor-
dinates the inclusion i : S 2 → R3 becomes the standard inclusion map (θ, φ) →
(θ, φ, ρ − 1) = (θ, φ, 0) on the coordinate domains. (That the same labels θ and
φ stand for coordinate functions both on S 2 and R3 should cause no confusion.)
b)Take for f : M → N a curve R · · · → M, t → p(t). This is an immersion at
t = to if ṗ(to ) 6= 0. The immersion theorem asserts that near such a point p(to )
there are coordinates (xi ) so that the curve p(t) becomes a coordinate line, say
x1 = t, x2 = 0, · · · , xn = 0.
1.3.13 Submersion Theorem. Let f : M → N be a C ∞ map of manifolds,
n = dim M , m = dim N . Suppose po is a point of M at which the differential
dfpo : Tpo M → Tf (po ) N is surjective. Let (y 1 , · · · , y m ) be a coordinate system
around f (po ). There is a coordinate system (x1 , · · · , xn ) around po so that
p → q := f (p)
becomes
(x1 , · · · , xm , · · · , xn ) → (y 1 , · · · , y m ) := (x1 , · · · , xm )
1.3. VECTORS AND DIFFERENTIALS 47
Remarks. (a) To say that the linear map dfpo is surjective (i.e. onto) means
that
rank(dfpo ) = dim N ≤ dim M.
We then call f a submersion at po .
(b) The theorem says that one can find coordinates systems around po and
f (po ) so that on the coordinate domain f : M → N becomes the projection
map Rn → Rm (n ≥ m).
Proof. Using coordinates, it suffices to consider a partially defined map f :
Rn · · · → Rm (n ≥ m). Suppose we can find a local diffeomorphism ϕ : Rn · · · →
Rn so that p ◦ ϕ = f where p : Rm → Rn is the projection.
f
Rn → Rm
ϕ↓ ↓ id
Rn → Rm
p
Then we can use ϕ as a coordinate system and the equation f (x) = p(ϕ(x)),
says that f “becomes“ i if we use ϕ as coordinates on Rn and the identity on
Rn .
Now assume rank(dfpo ) = m ≤ n. Write q = f (p) as y j = f j (x1 , · · · , xn ).
At po , the m × n matrix [∂f j /∂xi ],j ≤ m, i ≤ n, has m linearly independent
columns (indexed by some i’s). Relabeling the coordinates (xi ) we may assume
that the m × m matrix [∂f j /∂xi ], i, j ≤ m, has rank m, hence is invertible.
Define ϕ by
which is not the same as for vectors, since the upper indices are being summed
over. (Memory aid: the tilde on the right goes downstairs like the index j on the
left.) Elements of Tp∗ M are also called covectors at p, and this transformation
rule could be used to define “covector“ in a way analogous to the definition of
vector. Any covector at p can be realized as the differential of some C ∞ defined
near p, just like any vector can be realized as the tangent vector to some C ∞
curve through p. In spite of the similarity of their transformation laws, one
should carefully distinguish between vectors and covectors.
A differential 1-form (or covector field) ϕ on an open subset of M associates
to each point p in its domain a covector ϕp . Examples are the coordinate
differentials dxk : by definition, dxk has components ηi = δik relative to the
coordinate system (xi ). On the coordinate domain, every differential 1-form ϕ
can be written as
ϕ = ϕk dxk
for certain scalar functions ϕk . ϕ is said to be of class Ck if the ϕk have this
property.
1.3.15 Definition. Tangent bundle and cotangent bundle. The set of all
vectors on M is denoted T M and called the tangent bundle of M ; we make it into
a manifold by using as coordinates (xi , ξ i ) of a vector v at p the coordinates
xi (p) of p together with the components dxi (v) = ξ i of v. (Thus ξ i = dxi
as function on T M , but the notation dxi as coordinate on T M gets to be
confusing in combinations such as ∂/∂ξ i .) As (xi ) runs over a collection of
coordinate systems of M satisfying MAN 1-3, the (xi , ξ i ) do the same for T M .
The tangent bundle comes equipped with a projection map π : T M → M which
sends a vector v ∈ Tp M to the point π(v) = p to which it is attached.
1.3.16 Example. From (1.3.3) and (1.3.4), p 42 we get the identifications
a) T Rn = {(p, v) : p, v ∈ Rn } = Rn × Rn
b) T S 2 = {(p, v) ∈ R3 × R3 : p · v = 0 (dot product)}
The set of all covectors on M is denoted T ∗ M and called the cotangent bundle
of M ; we make it into a manifold by using as coordinates (xi , ξi ) of a covector
1.3. VECTORS AND DIFFERENTIALS 49
w at p the coordinates (xi ) of p together with the components (ξi ) of w. (If one
identifies v ∈ Tp M with the function ϕ → ϕ(v) on Tp∗ M , then ξi = ∂/∂xi as
a function on cotangent vectors.) As (xi ) runs over a collection of coordinate
systems of M satisfying MAN 1-3, p 25 the (xi , ξi ) do the same for T ∗ M . There
is again a projection map π : T ∗ M → M which sends a covector w ∈ Tp∗ M to
the point π(w) = p to which it is attached.
EXERCISES 1.3
1. Show that addition and scalar multiplication of vectors at a point p on a
manifold, as defined in the text, does indeed produce vectors.
2. Prove the two assertions left as exercises in 1.3.3.
3. Prove the two assertions left as exercises in 1.3.4.
4. (a) Prove the transformation rule for the components of a covector w ∈ Tp∗ M :
∂xi
η̃j = ηi . (*)
∂ x̃j p
(b) Prove that covectors can be defined be defined in analogy with vectors in
the following way. Let w be a quantity which relative to a coordinate system
(xi ) around p is represented by an n–tuple (η i ) subject to the transformation
rule (*). Then the scalar ηi ξ i depending on the components of w and of a vector
v at p is independent of the coordinate system and defines a linear functional
on Tp M (i.e. a covector at p).
(c)Show that any covector at p can be realized as the differential dfp of some
C ∞ function f defined in a neighbourhood of p.
5. Justify the following rules from the definitions and the usual rules of differ-
entiation.
∂ x̃k i ∂ ∂xi ∂
(a) dx̃k = dx (b) =
∂xi ∂ x̃k ∂ x̃k ∂xi
6. Let (ρ, θ, φ) be spherical coordinates on R3 . Calculate the coordinate vector
fields ∂/∂ρ, ∂/∂θ, ∂/∂φ in terms of ∂/∂x, ∂/∂y, ∂/∂z, and the coordinate
differentials dρ, dθ, dφ in terms of dx, dy, dz. (You can leave their coefficients in
terms of ρ, θ, φ). Sketch some coordinate lines and the coordinate vector fields at
some point. (Start by drawing a sphere ρ =constant and some θ, φ–coordinate
lines on it.)
7. Define coordinates (u, v) on R2 by the formulas
c) Find the coordinate vector fields ∂/∂u, ∂/∂v in terms of ∂/∂x, ∂/∂y.
8. Define coordinates (u, v) on R2 by the formulas
1 2
x= (u − v 2 ), y = uv.
2
a) Determine all points (x, y) in a neighbourhood of which (u, v) may be used
as coordinates.
b) Sketch the coordinate lines u,v = 0, 1/2, 1, 3/2, 2 .
c) Find the coordinate vector fields ∂/∂u, ∂/∂v in terms of ∂/∂x, ∂/∂y.
9. Let (u, θ, ψ) hyperbolic coordinates be on R3 , defined by
a) Sketch some surfaces u =constant (just enough to show the general shape of
these surfaces).
b) Determine all points (x, y, z) in a neighbourhood of which (u, θ, ψ) may be
used as coordinates.
c) Find the coordinate vector fields ∂/∂u, ∂/∂θ, ∂/∂ψ in terms of ∂/∂x, ∂/∂y, ∂/∂z.
10. Show that as (xi ) runs over a collection of coordinate systems for M satis-
fying MAN 1-3, the (xi , ξ i ) defined in the text do the same for T M .
11. Show that as (xi ) runs over a collection of coordinate systems for M satis-
fying MAN 1-3, the (xi , ξi ) defined in the text do the same for T ∗ M .
12. a) Prove that T (M ×M ) is diffeomorphic to (T M )×(T M ) for every manifold
M.
b) Is T (T M ) diffeomorphic to T M × T M for every manifold M ? (Explain.
Give some examples. Prove your answer if you can. Use the notation M ≈ N
to abbreviate “M is diffeomorphic with N “.)
13. Let M be a manifold and let W = T ∗ M . Let π : W → M and ρ : T W → W
be the projection maps. For any z ∈ T W define θ(z) ∈ R by
(f )Rewrite the whole quotation in our language, again with minimal amount of
change.
1.4. SUBMANIFOLDS 53
1.4 Submanifolds
1.4.1 Definition.Let M be an n−dimensional manifold, S a subset of M . A
point p ∈ S is called a regular point of S if p has an open neighbourhood U
in M that lies in the domain of some coordinate system x1 , · · · , xn on M with
the property that the points of S in U are precisely those points in U whose
coordinates satisfy xm+1 = 0, · · · , xn = 0 for some m. This m is called the
dimension of S at p. Otherwise p is called a singular point of S. S is called
an m−dimensional (regular ) submanifold of M if every point of S is regular of
the same dimension m.
Remarks. a) We shall summarize the definition of “p is regular point of S“
by saying that S is given by the equations xm+1 = · · · , xn = 0 locally around
p. The number m is independent of the choice of the (xi ): if S is also given by
x̃m̃+1 = · · · = x̃n = 0, then maps (x1T, · · · , xm ) ↔ (x̃1 , · · · , x̃m̃ ) which relate the
coordinates of the points of S in U Ũ are inverses of each other and of class
C ∞ . Hence their Jacobian matrices are inverses of each other, in particular
m = m̃.
b) We admit the possibility that m = n or m = 0. An n-dimensional submani-
fold S must be open in M i.e. every point of S has neighbourhood in M which is
contained in S. At the other extreme, a 0-dimensional submanifold is discrete,
i.e. every point in S has a neighbourhood in M which contains only this one
point of S.
By definition, a coordinate system on a submanifold S consists of the restrictions
u1 = x1 |S , · · · , um = xm |S to S of the first m coordinates of a coordinate system
x1 , · · · , xnT
on M of the type mentioned above (for some p in S); as their domain
we take S U . TWe have to verity that coordinate systems satisfy T MAN1–3.
MAN 1. xS (S U ) consists of the (xi ) in the open subset x(U ) Rm of Rn .
(Here we identify Rm = {x ∈ Rn : xm+1 = · · · = xn = 0}.)
MAN 2. Then map (x1T , · · · , xm ) → (x̃1 , · · · , x̃m ) which relates theTcoordinates
of the points of S in U Ũ is of class C ∞ with
T m
T open domain x(U Ũ ) R .
MAN 3. Every p ∈ S lies in some domain U S, by definition.
Let S be a submanifold of M , and i : S → M the inclusion map. The differential
dip : Tp S → Tp M maps the tangent vector of a curve p(t) in S into the tangent
vector of the same curve p(t) = i(p(t)), considered as curve in M . We shall use
the following lemma to identify Tp S with a subspace of Tp M .
1.4.2 Lemma. Let S be a submanifold of M . Suppose S is given by the
equations
xm+1 = 0, · · · , xn = 0
⊂
locally around p. The differential of the inclusion S →M at p ∈ S is a bijection
of Tp S with the subspace of Tp M given by the linear equations
(dxm+1 )p = 0, · · · , (dxn )p = 0.
x1 = u1 , · · · , xm = um , xm+1 = 0, · · · , xn = 0.
F 5.49
S
f 3.22
Remarks. (1)The second proof is really the same as the first, with the proof of
the Submersion Theorem relegated to a parenthetical comment; it brings out
the nature that theorem.
(2)The theorem is local; even when the differentials of the f j are linearly depen-
dent at some points of S, the subset of S where they are linearly independent
is still a submanifold.
1.4.4 Examples. (a)The sphere S = { p ∈ R3 | x2 + y 2 + z 2 = 1}. et
f = x2 +y 2 +z 2 −1. Then df = 2(xdx+ydy +zdz) and dfp = 0 iff x = y = z = 0
i.e. p = (0, 0, 0). In particular, df is everywhere non–zero on S, hence S is a
submanifold, and its tangent space at a general point p = (x, y, z) is given by
xdx + ydy + zdz = 0. This means that, as a subspace of R3 , the tangent space
at the point p = (x, y, z) on S consists of all vectors v = (a, b, c) satisfying
xa + yb + zc = 0, as one would expect.
In spherical coordinates ρ, θ, φ on R3 the sphere S 2 is given by ρ = 1 on the
coordinate domain, as required by the definition of “submanifold“. According
to the definition, θ and φ provide coordinates on S 2 (where defined). If one
identifies the tangent spaces to S 2 with subspaces of R3 then the coordinate
vector fields ∂/∂θ and ∂/∂φ are given by the formulas of (1.3.4).
(b)The circle C = { p ∈ R3 | x2 + y 2 + z 2 = 1, ax + by + cz = d} with fixed a, b, c
not all zero. Let f = x2 +y 2 −1, g = ax+by+cz−d. Then df = 2(xdx+ydy+zdz)
and dg = adx + bdx + cdz are linearly independent unless (x, y, z) is a multiple
of (a, b, c). This can happen only if the plane ax + by + cz = d is parallel to
the tangent plane of the sphere at p(x, y, z) in which case C is either empty or
reduces to the point of tangency. Apart for that case, C is a submanifold of
R3 and its tangent space Tp C at any of its points p(x, y, z) is the subspace of
Tp R3 = R3 given by the two independent linear equations df = 0, dg = 0, hence
has dimension 3 − 2 = 1. Of course, C is also the submanifold of the sphere
S = {p ∈ R3 | f = 0} by g = 0 and Tp C the subspace of Tp S given dg = 0.
1.4.5 Example. The cone S = {p ∈ R3 | x2 + y 2 − z 2 = 0}. We shall show the
following.
(a) The cone S − {(0, 0, 0)} with the origin excluded is a 2-dimensional subman-
ifold of R3 .
(b) The cone S with the origin (0, 0, 0) is not a 2-dimensional submanifold of
R3 .
56 CHAPTER 1. MANIFOLDS
4.96 G
S
U 2.48 g
Fig. 3. p = G(x1 , · · · , xm )
Remarks. The remarks after the preceding theorem apply again, but with
an important modification. The theorem is again local in that g need not be
defined on all of Rm , only on some open set D containing xo . But even if dg
has rank m everywhere on D, the image g(D) need not be a submanifold of M ,
as one can see from curves in R2 with self-intersections.
F
p
r
φ
EXERCISES 1.4
1. Show that the submanifold coordinates on S 2 are also coordinates on S 2
when the manifold structure on S 2 is defined by taking the orthogonal projection
coordinates in axioms MAN 1−3.
2. Let S = {p = (x, y, x2 ) ∈ R3 | x2 − y 2 − z 2 = 1}.
(a) Prove that S is a submanifold of R3 .
(b) Show that for any (ψ, θ) the point p = (x, y, z) given by
[Suggestion. Consider the map F from M3 (R) to the space Sym3 (R) of sym-
metric 3 × 3 matrices defined by F (X) = X ∗ X. Show that the differential dFX
of this map is given by
dFX (V ) = X ∗ V + X ∗ V.
Conclude that dFX surjective for all X ∈ O(3). Apply theorem 1.4.3.]
x = cos t, y = sin t, z = t.
Let S be the surface swept out by the tangent line of C. [See previous problem].
a) Find a parametric equations x = x(u, v), y = y(u, v), z = z(u, v) for S.
b) Which points of S have to be omitted (if any) so that the rest is a submanifold
of R3 ? Prove your answer. [This is not a special case of the previous problem.
Explain why not.]
In exercises 11 −16 a set S in R2 or in R3 is given.
a) Find the regular points of S and specify a coordinate system around each
regular point.
b) Find all singular points of S, if any. (Prove that these points are singular.)
If S depends parameters a, b, · · · you may have to consider various cases, de-
pending on the values of the parameters. Try to sketch S.
12. The surface with parametric equations
2au2 au(u2 − 1)
x= , y = , z = v.
1 + u2 1 + u2
13. The set of all points P in a plane for which the product of the distances to
two given points F1 , F2 has a constant value a2 . (Denote the distance between
F1 and F2 by b > 0. The set of these points P is called the ovals of Cassini.)
1.4. SUBMANIFOLDS 61
14. The curve with equation r = 2a cos θ + 2b in polar coordinates (a, b ≥ 0).
15. The curve in the plane with parametric equations x = cos3 t, y = sin3 t.
16. The endpoints of a mobile line segment AB of constant length 2a are
constrained to glide on the coordinate axis in the plane. Let OP be the perpen-
dicular to AB from the origin O to a point P on AB. S is the set of all possible
positions of P . [Suggestion. Show that r = a sin 2θ.]
17. The surface obtained by rotating the curve z =siny in the yz-plane about
the z-axis.
18. Show that T S 2 is diffeomorphic with the submanifold V of R6 defined by
V = {(x, y, z; v, η, ζ) : x2 + y 2 + z 2 = 1, vx + ηy + ζz = 0}.
The integrand is called the element of arc and denoted by ds, but its meaning
remains somewhat mysterious if introduced in this formal, symbolic way. It
actually has a perfectly precise meaning: it is a function on the set of all tan-
gent vectors on R3 , since the coordinate differentials dx, dy, dz are functions on
tangent vectors. But the notation ds for this function, is truly objectionable:
ds is not the differential of any function s. The notation is too old to change
and besides gives this simple object a pleasantly old fashioned flavour. The
function ds on tangent vectors characterizes the Euclidean metric just a well as
the distance functions we started out with. It is convenient to get rid of the
square root by considering the square of ds instead:
ds2 = (dx)2 + (dy)2 + (dz)2 .
This is now a quadratic function o*n tangent vectors which can be used to
characterize the Euclidean metric on R3 . For our purposes this ds2 is more a
suitable object than the Euclidean distance function, so we shall simply call
ds2 itself the metric. One reason why it is more suitable is that ds2 can be
easily written down in any coordinate system. For example, in cylindrical co-
ordinates we use x = r cos θ, y = r sin θ, z = z to express dx, dy, dz in terms of
dr, dθ, dz and substitute into the above expression for ds2 ; similarly for spherical
coordinates x = ρ cos θ sin φ, y = ρ sin θ sin φ, z = ρ cos φ. This gives:
Cylindrical: (dr)2 + r2 (dθ)2 + (dz)2
Spherical: (dρ)2 + ρ2 (sin2 φ(dθ)2 + (dφ)2 )
The same discussion applies to the Euclidean metric in Rn in any dimension.
For polar coordinates in the plane R2 one can draw a picture to illustrate the
ds2 in manner familiar from analytic geometry:
dr
ds
r dφ
r
O
Fig. 1. ds in polar coordinates : ds2 = (dr)2 + r2 (dθ)2
Hence
X X X ∂xi ∂xi
(dxi )2 = gkl dy k dy l , where gkl = .
i
∂y k ∂y l
kl i
The whole discussion applies equally to the case when we start with some other
quadratic function instead to the Euclidean metric. A case of importance in
physics is the Minkowski metric on R4 , which is given by the formula
where the coefficients guu , 2guv , gvv are obtained by expanding the squares in
the previous equation and the symbol ds2 denotes the quadratic function on
tangent vectors to S defined by this equation. This function can be thought of
as defining the metric on S, in the sense that it allows one to compute the length
of curves on S, and hence the distance between two points as the infimum over
the length of curves joining them.
1.5. RIEMANN METRICS 65
g(u, v) = g(v, u)
and it is non-degenerate if
Q = ±(ξ 1 )2 ± · · · ± (ξ n )2
66 CHAPTER 1. MANIFOLDS
where ξ i are the component functionals with respect to a suitable basis ei , i.e.
v = ξ i (v)ei . Such a basis is called orthonormal for Q. The number of ±signs is
independent of the basis and is called the signature of the form.
After this excursion into linear algebra we now return to manifolds.
1.5.3 Definition. A Riemann metric on M associates to each p ∈ M a non–
degenerate symmetric bilinear form gp on Tp M .
The corresponding quadratic form Q is denoted ds2 and determines g uniquely.
Relative to a coordinate system (xi ) we can write
X
ds2 = gij dxi dxj .
As part of the definition we require that the gij are C ∞ functions. This is
requirement evidently independent of the coordinates. An equivalent condition
is that g(X, Y ) be a C ∞ function for any two C ∞ vector fields X, Y on M or
an open subset of M .
We add some remarks. (1)We emphasize once more that ds2 is not the differen-
tial of some function s2 on M , nor the square of the differential of a function s
(except when dim M = 0, 1); the notation is rather explained by the examples
discussed earlier.
(2) The term “Riemann metric“ is sometimes reserved for positive definite met-
rics and then “pseudo-Riemann” or “semi-Riemann“ is used for the possibly
indefinite case. A manifold together with a Riemann metric is referred to as a
Riemannian manifold, and may again be qualified as “pseudo” or “semi”.
(3)At a given point po ∈ M one can find a coordinate system (x̃i ) so that the
metric ds2 = gij dxi dxj takes on the pseudo-Euclidean form ±δij dx̃i dx̃j . [Rea-
son: as remarked above, quadratic form gij (po )ξ i ξ j can be written as ±δij ξ˜i ξ˜j
in a suitable basis, i.e. by a linear transformation ξ i = ai ξ˜j , which can be used
j
as a coordinate transformation xi = aij x̃j .] But it is generally not possible to do
this simultaneously at all points in a coordinate domain, not even in arbitrarily
small neighbourhoods of a given point.
(4)As a bilinear form on tangent spaces, a Riemann metric is a conceptually
very simple piece of structure on a manifold, and the elaborate discussion of
arclength etc. may indeed seem superfluous. But a look at some surfaces
should be enough to see the essence of a Riemann metric is not to be found in
the algebra of bilinear forms.
For reference we record the transformation rule for the coefficients of the metric,
but we omit the verification.
1.5.4 Lemma. The coefficients gij and g̃kl of the metric with respect to two
coordinate systems (xi ) and (x̃j ) are related by the transformation law
∂xi ∂xj
g̃kl = gij .
∂ x̃k ∂ x̃l
1.5. RIEMANN METRICS 67
∂xi ∂xj
g̃kl = gij .
∂ x̃k ∂ x̃l
So
∂xi ∂xj ∂xi
det g̃kl = det(gij k l
) = det gij (det k )2 ,
∂ x̃ ∂ x̃ ∂ x̃
and
∂xi
Z Z q Z Z q
1 n
··· | det g̃ij |dx̃ · · · dx̃ = · · · | det gij || det k |dx̃1 · · · dx̃n
∂ x̃
Z Z q
= ··· | det gij |dx1 · · · dxn ,
1.5.7 Remarks.
(a) If f is a real-valued function one can define the integral of f over R by the
formula Z Z q
· · · f | det gij | dx1 · · · dxn ,
68 CHAPTER 1. MANIFOLDS
Here gij is the matrix of the Riemann metric in the coordinate system u, v. The
right-hand side is the norm of the cross-product of vectors in R3 . This shows
that the “volume“ defined above agrees with the usual definition of surface area
for a surface in R3 .
We now consider the problem of finding the shortest line between two points
on a Riemannian manifold with a positive definite metric. The problem is this.
Fix two points A, B in M and consider curves p = p(t), a ≤ t ≤ b from p(a) = A
to p(b) = B. Given such a curve, consider its arc–length
Z b Z b p
ds = Qdt.
a a
√
as a function of the curve p(t). The integrand Q is a function on tangent
vectors evaluated at the velocity vector ṗ(t) of p(t). The integral is independent
of the parametrization. We are looking for “the“ curve for which this integral is
minimal, but we have no guarantee that such a curve is unique or even exists.
This is what is called a variational problem and it will be best to consider it in
a more general setting.
Consider a function S on the set of paths p(t), a ≤ t ≤ b between two given
points A = p(a) and B = p(b) of form
Z b
S= L(p, ṗ)dt. (1)
a
We now use the term “path“ to emphasize that in general the parametrization is
important: p(t) is to be considered as function on a given interval [a, b], which we
assume to be differentiable. The integrand L is assumed to be a given function
L = L(p, v) on the tangent vectors on M , i.e. a function on the tangent bundle
T M . We here denote elements of T M as pairs (p, v), p ∈ M being the point at
which the vector v ∈ Tp M is located. The problem is to find the path or paths
p(t) for which make S a maximum, a minimum, or more generally stationary,
in the following sense.
1.5. RIEMANN METRICS 69
for all such one–parameter variations p(, t) of p(t). The converse is not neces-
sarily true, but any path p(t) for which (2) holds for all variations p(, t) of p(t)
is called a stationary (or critical ) path for the path–function S.
We now compute the derivative (2). Choose a coordinate system x = (xi ) on
M . We assume that the curve p(t) under consideration lies in the coordinate
domain, but this is not essential: otherwise we would have to cover the curve by
several coordinate systems. We get a coordinate system (x, ξ) on T M by taking
as coordinates of (p, v) the coordinates xi of p together with the components ξ i
of v. (Thus ξ i = dxi as function on tangent vectors.) Let x = x(t, ) be the
coordinate point of p(t, ), and in (2) set L = L(x, ξ) evaluated at ξ = ẋ. Then
b b ∂L ∂xk ∂L ∂ ẋk
Z Z
dS ∂L
= dt = + dt.
d a ∂ a ∂xk ∂ ∂ξ k ∂
Change the order of the differentiation with respect to t and in the second
term in parentheses and integrate by parts to find that this
b t=b
d ∂L ∂xk ∂L ∂xk
Z ∂L
= − dt +
a ∂xk dt ∂ξ k ∂ ∂ξ k ∂ t=a
The terms in brackets is zero, because of the boundary conditions x(, a) ≡ x(A)
and x(, b) ≡ x(B). The whole expression has to vanish at = 0, for all
x = x(t, ) satisfying the boundary conditions. The partial derivative ∂xk /∂
at = 0 can be any C 1 function wk (t) of t which vanishes at t = a, b since we
can take xk (t, ) = xk (t) + wk (t), for example. Thus the initial curve x = x(t)
satisfies Z b
∂L d ∂L k
k
− w (t)dt = 0
a ∂x dt ∂ξ k
70 CHAPTER 1. MANIFOLDS
for all such wk (t). From this one can conclude that
∂L d ∂L
− = 0. (3)
∂xk dt ∂ξ k
for all t, a ≤ t ≤ b and all k. For otherwise there is a k so that this expression is
non-zero, say positive, at some to , hence in some interval about to . For this k,
choose wk equal to zero outside such an interval and positive on a subinterval
and take all other wk equal to zero. The integral will be positive as well, contrary
to the assumption. The equation (3) is called the Euler–Lagrange equation for
the variational problem (2).
We add some remarks on what is called the principle of conservation of energy
connected with the variational problem (2). The energy E = E(p, v) associated
to L = L(p, v) is the function defined by
∂L k
E= ξ −L
∂ξ k
and is in this case independent of the parametrization of the path p(t). The
derivative (2) becomes
Z b
1 −1/2 ∂Q
Q dt.
a 2 ∂ =0
√ i j
This means in effect that we can replace L = Q by L = Q = gij √ξ ξ . For this
new L the energy E becomes E = 2Q − Q = Q. So the speed Q is constant
along p(t). We now write out (3) explicitly and summarize the result.
1.5. RIEMANN METRICS 71
for all v ∈ Tp M .
1.5.14 Remarks. (1) In terms of scalar products, this is equivalent to
for all v, w ∈ Tp M .
(2) Let (x1 , · · · , xn ) and (y 1 , · · · , y m ) be coordinates on M and N respectively.
Write
ds2M = gij dxi dxj ds2N = hab dy a dy b
for the metrics and
y a = f a (x1 , · · · , xn ), a = 1, · · · , m (6)
for the map f . To say that f is an isometry means that ds2N becomes ds2M if we
substitute for the y a and dy a in terms if xi and dxi by means of the equation
(6).
(5) If f is an isometry then it preserves arclength of curves as well. Conversely,
any C ∞ map preserving arclength of curves is an isometry. (Exercise).
1.5.15 Examples
a) Euclidean space. The linear transformations of Rn which preserve the Eu-
clidean metric are of the form x → Ax where A is an orthogonal real n×n matrix
(AA∗ = I). The set of all such matrices is called the orthogonal group, denoted
O(n). If the Euclidean metric is replaced by a pseudo-Euclidean metric with
p plus signs and q minus signs the corresponding set of “pseudo-orthogonal“
matrices is denoted O(p, q). It can be shown that any transformation of Rn
which preserves a pseudo-Euclidean metric is of the form x → xo + Ax with A
linear orthogonal.
b) The sphere. Any orthogonal transformation A ∈ O(3) of R3 gives an isometry
of S 2 onto itself, and it can be shown that all isometries of S 2 onto itself are of
this form. The same holds in higher dimensions and for pseudo-spheres defined
by a pseudo-Euclidean metric.
c) Curves and surfaces. Let C : p = p(σ), σ ∈ R, be a curve in R3 parametrized
by arclength, i.e. ṗ(σ) has length = 1. If we assume that the curve is non-
singular, i.e. a submanifold of R3 , then C is a 1-dimensional manifold with a
Riemann metric and the map σ → p(σ) is an isometry of R onto C. In fact it
can be shown that for any connected 1-dimensional Riemann
manifold C there exists an isometry of an interval on the Euclidean line R onto
C, which one can still call parametrization by arclength. The map σ → p(σ) is
not necessarily 1–1 on R (e.g. for a circle), but it is always locally 1–1. Thus
any 1-dimensional Riemann manifold is locally isometric with the straight line
R. This does not hold in dimension 2 or higher: for example a sphere is not
locally isometric with the plane R2 (exercise 18); only very special surfaces are,
e.g. cylinders (exercise 17; such surfaces are called developable.)
1.5. RIEMANN METRICS 73
The complex coordinates z and w are only used as shorthand for the real coor-
dinates (x, y) and (u, v). The map f : w → z defined by
1 + iw
z=
1 − iw
sends H onto D and is an isometry for these metrics. To verify the latter use
the above formula for z and the formula
2idw
dz =
(1 − iw)2
for dz to substitute into the equation for ds2D ; the result is ds2H . (Exercise.)
1.5.16 The set of all invertible isometries of a Riemannian metric ds2 manifold
M is called the isometry group of M , denoted Isom(M ) or Isom(M, ds2 ). It is a
group in the algebraic sense, which means that the composite of two isometries
is an isometry as is the inverse of any one isometry. In contrast to the above
example, the isometry group of a general Riemann manifold may well consist of
the identity transformation only, as is easy to believe if one thinks of a general
surface in space.
1.5.17 On a connected manifold M with a positive-definite Riemann metric one
can define the distance between two points as the infimum of the lengths of all
curves joining these points. (It can be shown that this makes M into a metric
space in the sense of topology.)
EXERCISES 1.5
1. Verify the formula for the Euclidean metric in spherical coordinates:
S 2 = {p = (x, y, z) | x2 + y 2 + z 2 = R2 }
is given by
(xdx + ydy)2
dx2 + dy 2 +
R2 − x2 − y 2
Specify a domain for the coordinates (x, y) on S 2 .
3. Let M = R3 . Let (x0 , x1 , x2 ) be Cartesian coordinates on R3 . Define a
metric ds2 by
Show that
ds2 = −dρ2 + ρ2 [(dψ)2 + sinh2 ψ(dθ)2 ]
4. Let M = R3 . Let (x0 , x1 , x2 ) be Cartesian coordinates on R3 . Define a
metric ds2 by
ds2 = −(dx0 )2 + (dx1 )2 + (dx2 )2 .
Let S = {p = (x0 , x1 , x2 ) ∈ R3 | (x0 )2 − (x1 )2 − (x2 )2 = 1} with the metric ds2
induced by the metric ds2 on R3 . (With this metric S is called a pseudo-sphere).
(a) Prove that S is a submanifold of R3 .
(b) Show that for any (ψ, θ) the point p = (x0 , x1 , x2 ) given by
aw + b
z= .
cw + d
76 CHAPTER 1. MANIFOLDS
1.6 Tensors
1.6.1 Definition. A tensor T at a point p ∈ M is a quantity which, relative
to a coordinate system (xk ) around p, is represented by an indexed system of
ij··· ij··· ab···
real numbers (Tlk··· ). The (Tkl··· ) and (T̃cd··· ) representing T in two coordinate
i j
systems (x ) and (x̃ ) are related by the transformation law
separately (i.e. when all but one of them is kept fixed). The theorem says that
one can think of a tensor at p as a multilinear function T (λ, µ, · · · ; v, w, · · · ) of
r covectors λ, µ, · · · and s vectors v,w, · · · at p, independent of any coordinate
system. Here we write the covectors first. One could of course list the variables
λ, µ, · · · ; v, w · · · in any order convenient. In components, this is indicated by
the position of the indices, e.g.
j j
T (v, λ, w) = Tik v λj w k
(3) Tensor product. The product of two tensors T = (Tji ) and S = (Slk ) is the
tensor T ⊗ S (also denoted ST ) with components
(T ⊗ S)ik i k
jl = Tj Sl .
1.6. TENSORS 79
One has to verify that the above operations do produce tensors. As an example,
we consider the contraction of a (1, 1) tensor:
∂ x̃k ∂xj
T̃kk = Tji = Tji δij = Tii .
∂xi ∂ x̃k
The verification for a general tensor is the same, except that one has some more
indices around.
1.6.5 Lemma. Denote by Tpr,s M the set of tensors of type (r, s) at p ∈ M .
a)Tpr,s M is a vector space under the operations addition and scalar multiplica-
tion.
b) Let (xi ) be a coordinate system around p. T he r × s–fold tensor products
tensors
∂ ∂
i
⊗ j ⊗ · · · ⊗ dxk ⊗ dxl ⊗ · · · (*)
∂x ∂x
ij···
form a basis for Tpr,s M . If T ∈ Tpr,s M has components (Tkl··· ) relative to (xi ),
then
ij··· ∂ ∂
T = Tkl··· ⊗ j ⊗ · · · ⊗ dxk ⊗ dxl ⊗ · · · . (**)
∂xi ∂x
Proof. a) is clear. For (b), remember that ∂/∂xi has i–component = 1 and all
other components = 0, and similarly for dxk . The equation (**) is then clear,
ij···
since both sides have the same components Tkl··· . This equation shows also that
r,s
every T ∈ Tp M is uniquely a linear combination of the tensors (*), so that
these do indeed form a basis.
1.6.6 Tensors on a Riemannian manifold.
We now assume that M comes equipped with a Riemann metric g, a non–
degenerate, symmetric bilinear form on the tangent spaces Tp M which, relative
to a coordinate system (xi ), is given by
g(v, w) = gij v i wj .
v i = g ij λj
g ik gkj = δji .
The scalar product (S, T ) of any two tensors of the same type is defined by
raising or lowering the appropriate indices and contracting, e.g for T = (Tij )
and S = (Sij ),
(S, T ) = Tij S ij = T ij Sij .
1.6. TENSORS 81
T (u, v, w) = τ det[u, v, w]
F (a, b) = v · (a × b) = det[v, a, b]
Fij = kij v k .
or alternating
T (v, w) = −T (w, v) i.e. T ij = −T ji ,
but need not be either. For example, on any manifold a Riemann metric
g(v, w) = gij v i wj
(B × J) · V = B · (J × V ) det[B, J, V ] = kij B k J i V j
wj = Tij v i .
Type (1, 3). The stress tensor S of an elastic medium in R3 associates to the
plane element spanned by two vectors a, b at a point p the force acting on this
plane element. If this plane is displaced with velocity v, then the work done
per unit time is a scalar which does not depend on the coordinate system. This
scalar is of the form
S(a, b, v) = Sijk ai bj v k
84 CHAPTER 1. MANIFOLDS
Then
S(a, b, v) = Plk ijk ai aj v l = P(a × b) · v.
This formula uses the correspondence (b) and remains true in any coordinate
system related to the Cartesian coordinate system by a transformation with
Jacobian determinant 1.
The space V ⊗ W is independent of the bases {ej } and {fk } in the following
sense. Any other bases {ẽj } and {f˜k } lead to symbols ẽj ⊗
˜ f˜k forming a basis of
˜ . We may then identify V ⊗W
V ⊗W ˜ with V ⊗ W by sending the basis ẽj ⊗ ˜ f˜k of
˜
V ⊗W to the vectors ẽj ⊗ f˜k in V ⊗ W , and vice versa.
The space V ⊗ W is called the tensor product of V and W . The vector x ⊗ y ∈
V ⊗ W is called the tensor product of x ∈ V and P w ∈ W . (One should keep
in mind that an element of V ⊗ W looks likeP ζij ej ⊗ fk and may not be
expressible as a single tensor product x ⊗ y = xj yk ej ⊗ fk .)
The triple tensor products (V1 ⊗ V2 ) ⊗ V3 and V1 ⊗ (V2 ⊗ V3 ) may be identified
in an obvious way and simply denoted V1 ⊗ V2 ⊗ V3 . Similarly one can form the
tensor product V1 ⊗ V2 ⊗ · · · ⊗ VN of any finite number of vector spaces.
This construction applies in particular if one takes for each Vj the tangent space
Tp M or the cotangent space Tp∗ M at a given point p of a manifold M . It should
be clear from lemma 1.6.5 that one obtains in this way the same notion of tensor
at p as in definition 1.6.1
Another message from Herman Weyl. For edification and moral support
contemplate this message from Hermann Weyl’s Raum-Zeit-Materie (my trans-
lation). Entering into tensor calculus has certainly its conceptual difficulties,
1.6. TENSORS 85
apart of the fear of indices, which has to be overcome. But formally this calcu-
lus is of extreme simplicity, much simpler, for example, than elementary vector
calculus. Two operations: multiplication and contraction, i.e. juxtaposition of
components of tensors with distinct indices and identification of two indices, one
up one down, with implicit summation. It has often been attempted to introduce
an invariant notation into tensor calculus· · · . But then one needs such a mass
of symbols and such an apparatus of rules of calculation (if one does not want
to go back to components after all) that the net effect is very much negative.
One has to protest strongly against these orgies of formalism, with which one
now starts to bother even engineers. – Elsewhere one can find similar sentiments
expressed in similar words with reversed casting of hero and villain.
EXERCISES 1.6
ij···
1. Let (Tkl··· )be the components of a tensor (field) of type (r,s) with respect to
ij···
the coordinate system (xi ). Show that the quantities ∂Tkl··· /∂xm , depending
on one more index m, do not transform like the components of a tensor unless
(r,s) = (0, 0). [You may take (r,s) = (1, 1) to simplify the notation.]
2. Let (ϕk ) be the components of a tensor (field) of type (0, 1) (= 1-form) with
respect to the coordinate system (xi ). Show that the quantities (∂ϕi /∂xj ) −
(∂ϕj /∂xi ) depending on two indices ij form the components of a tensor of type
(0, 2).
3. Let (X i ), (Y j ) be the components of two tensors (fields) of type (1, 0)
(=vector fields) with respect to the coordinate system (xi ). Show that the
quantities X j (∂Y i /∂xj ) − Y j (∂X i /∂xj ) depending on one index i form the
components of a tensor of type (1, 0).
4. Let f be a C2 function. (a) Do the quantities ∂ 2 f /∂xi ∂xj depending on the
indices ij form the components of a tensor field? [Prove your answer.]
(b) Suppose p is a point for which dfp = 0. Do the quantities (∂ 2 f /∂xi ∂xj )p
depending on the indices ij form the components of a tensor at p? [Prove your
answer.]
5. (a) Let I be a quantity represented by (δij ) relative to every coordinate
system. Is I a (1, 1)-tensor? Prove your answer. [δij =Kronecker delta:= 1 if
i =j and = 0 otherwise.]
(b) Let J be a quantity represented by (δij ) relative to every coordinate system.
Is J a (0, 2)-tensor? Prove your answer. [δij =Kronecker delta := 1 if i =j and
= 0 otherwise.]
6. (a) Let (Tji ) be a quantity depending on a coordinate system around the
point p with the property that for every (1, 1)-tensor (Sij ) at p the scalar Tji Sij
is independent of the coordinate system. Prove that (Tji ) represents a (1, 1)-
tensor at p.
(b) Let (Tji ) be a quantity depending on a coordinate system around the point
p with the property that for every (0, 1)-tensor (Si ) at p the quantity Tji Si , de-
86 CHAPTER 1. MANIFOLDS
pending on the index j, represents a (0, 1)-tensor at p. Prove that (Tji ) represents
a (1, 1)-tensor at p.
ij···
(c) State a general rule for quantities (Tkl··· ) which contains both of the rules
(a) and (b) as special cases. [You need not prove this general rule: the proof is
the same, just some more indices.]
7. Let p = p(t) be a C2 curve given by xi = xi (t) in a coordinate system (xi ). Is
the “acceleration“ (ẍi (t)) = (d2 xi (t)/dt2 ) a vector at p(t)? Prove your answer.
8. Let (Si ) and (T i ) be the components in a coordinate system (xi ) of C ∞ tensor
fields of type (0, 1) and (1, 0) respectively. Which of the following quantities are
tensor fields? Prove your answer and indicate the type of those which are tensor
fields.
(a) ∂(Si T i )/∂xk (b) ∂(Si T j )/∂xk
(c)(∂Si /∂x ) − (∂Sj /∂xi )
j
(d)(∂T i /∂xj ) − (∂T j /∂xi ).
9. Prove that the operations on tensors defined in the text do produce tensors
in the following cases.
a) Addition and scalar multiplication. If Tij and Sij are tensors, so are Sij + Tij
and cSij (c any scalar).
b) Symmetry operations. If Tij is a tensor, then the equation Sij = Tji de-
fines another tensor S. Prove that this tensor may also defined by the formula
S(v, w) = T (w, v).
c) Tensor product. If Ti and S j are tensors, so is Ti S j .
10. a) In the equation Aji dxi ⊗(∂/∂xj ) = Ãji dx̃i ⊗(∂/∂ x̃j ) transform the left side
using the transformation rules for dxi and ∂/∂xj to find the transformation rule
for Aji (thus verifying that the transformation rule is “built into the notation“
as asserted in the text).
b) Find the tensor field ydx ⊗ (∂/∂y) − xdy ⊗ (∂/∂x) on R2 in polar coordinates
(r, φ).
11. Let (x1 , x2 ) = (r, θ) be polar coordinates on R2 . Let T be the tensor with
components T11 = tan θ, T12 = 0, T21 = 1 + r, T22 = er in these coordinates.
Find the components T ij when both indices are raised using the Euclidean
metric on R2 .
12. Let V ⊗ W be the tensor product of two finite–dimensional vector spaces as
defined in Appendix 2.
a) Let {ei }, {fj } be bases for V and W , respectively. Let {ẽr } and {f˜s } be two
other bases and write ei = ari ẽr, fj = bsj f˜s . Suppose T ∈ V ⊗W has components
T ij with respect to the basis {ei ⊗ fj } and T̃ rs with respect to {ẽr ⊗ f˜s }. Show
that T̃ rs = T ij ari bsj . Generalize this transformation rule to the tensor product
V ⊗ W ⊗ · · · of any finite number of vector spaces. (You need not prove the
generalization in detail. Just indicate the modification in the argument.)
b) Let M be a manifold, p ∈ M a point. Show that the space of tensors of type
(r, s) at p, as defined in 1.6.1 is the same as the tensor product of r copies is of
Tp M and s copies is of Tp∗ M , as defined in Appendix 2.
Chapter 2
2.1 Connections
where p(t) is any differentiable curve on M with p(0) = po and ṗ(0) = v. One
would naturally like to define a directional derivative of an arbitrary differen-
tiable tensor field F on M , but this cannot be done in the same way, because
one cannot subtract tensors at different points on M . In fact, on an arbitrary
manifold this cannot be done at all (in a reasonable way) unless one adds an-
other piece of structure to the manifold, called a “covariant derivative“. This
we now do, starting with a covariant derivative of vector fields, which will later
be used to define a covariant derivative of arbitrary tensor fields.
2.1.1 Definition. A covariant derivative on M is an operation which produces
for every “input“ consisting of (1) a tangent vector v ∈ Tp M at some point p
and (2) a C ∞ vector field X defined in a neighbourhood of p a vector in Tp M ,
denoted ∇v X. This operation is subject to the following axioms:
CD1. ∇(u+v) X = (∇u X)+ (∇v X)
CD2. ∇av X = a∇v X
CD3. ∇v (X+ Y ) = (∇v X)+ (∇v Y )
CD4. ∇v (f X) = (Dv f )X+ f (p)(∇v X)
for all u,v ∈ Tp M , all C ∞ vector fields X,Y defined around p, all C ∞ functions
f defined in a neighbourhood of p, and all scalars a∈ R.
If X,Y are two vector fields, we define another vector field ∇X Y by (∇X Y )p =
∇Xp Y for all points p where both X and Y are defined. As final axiom we
require:
87
88 CHAPTER 2. CONNECTIONS AND CURVATURE
∂ ∂
Dv X = (Dv X 1 ) + · · · + (Dv X n ) n . (1)
∂x1 ∂x
This means that
1
Dv X(po ) = lim X(p(t)) − X(po ) (2)
t→0 t
where p(t) is any differentiable curve on M with p(0) = p and ṗ(0) = v. This
defines a covariant derivative on Rn .
∂ ∂ ∂ ∂
= cos θ cos φ + sin θ cos φ − sin φ
∂φ ∂x ∂y ∂z
∂ ∂ ∂
= − sin θ sin φ + cos θ sin φ .
∂θ ∂x ∂y
2.1. CONNECTIONS 89
∂ ∂ ∂
if X = X x + Xy + Xz
∂x ∂y ∂z
∂ ∂ ∂
then Dv X = (Dv X x ) + (Dv X y ) + (Dv X z )
∂x ∂y ∂z
for certain C ∞ functions Γkij on the coordinate domain. Omitting the summa-
tion signs we get
DX ∂
= (dX k + ωjk ) k + a vector orthogonal to S
dt ∂u
In particular take X to be the ith coordinate vector field ∂/∂ui on S and p = p(t)
the jth coordinate line through a point with tangent vector ṗ = ∂/∂uj . Then
X = ∂/∂ui is the ith partial ∂p/∂ui of the E–valued function p = p(u1 , · · · , um )
and its covariant derivative DX/dt in E is
∂2p ∇ ∂
= + a vector orthogonal to S
∂uj ∂ui ∂uj ∂ui
X ∂
= Γkij + a vector orthogonal to S
∂uk
k
2.1. CONNECTIONS 91
∂
Take the scalar product of both sides with ∂ul
in E to find that
∂2p ∂ ∂ ∂
j i
· l
= Γkij ·
∂u ∂u ∂u ∂u ∂ul
k
∂
since ∂ul
is tangential to S. This may be written as
∂2p ∂
· = Γkij gkl
∂uj ∂ui ∂ul
where gkl is the coefficient of the induced Riemann metric ds2 = gij dui duj on
S. Solve for Γkij to find that
∂2p ∂p
Γkij = g lk · .
∂u ∂u ∂ul
i j
∂2p ∂ ∂
= − cos θ sin φ − sin θ sin φ
∂θ2 ∂x ∂y
∂2p ∂ ∂ ∂
= cos θ sin φ − sin θ sin φ − cos φ
∂φ2 ∂x ∂y ∂z
∂2p ∂ ∂
= − sin θ cos φ + cos θ cos φ
∂θ∂φ ∂x ∂y
∂2p ∂
The only nonzero scalar products ∂uj ∂ui · ∂ul
are
∂ 2 p ∂p ∂ 2 p ∂p
· = − sin φ cos φ, · = sin φ cos φ
∂θ2 ∂φ ∂θ∂φ ∂θ
and the Riemann metric is ds2 = sin2 θ dθ2 + dφ2 . Since the matrix gij is
1 ∂2p ∂p
diagonal the equations for the Γs become Γkij = gkk ∂ui ∂uj · ∂uk and the only
nonzero Γs are
sin φ cos φ sin φ cos φ
Γφθθ = − , Γθφθ = Γθθφ = .
sin2 θ sin2 θ
2.1.5 Definition. Let p(t) be a curve in M . A vector field X(t) along p(t)
92 CHAPTER 2. CONNECTIONS AND CURVATURE
is a rule which associates to each value of t for which p(t) is defined a vector
X(t) ∈ Tp(t) M at p(t).
If the curve p(t) intersects itself, i.e. if p(t1 ) = p(t2 ) for some t1 6= t2 a vector
field X(t) along p(t) will generally attach two different vectors X(t1 ) and X(t2 )
to the point p(t1 ) = p(t2 ).
Relative to a coordinate system we can write
X ∂
X(t) = X i (t) (9)
i
∂xi p(t)
at all points p(t) in the coordinate domain. When the curve is C ∞ the we say
X is C ∞ if the X i are C ∞ functions of t for all coordinate systems.
If p(t) is a C ∞ curve and X(t) a C ∞ vector field along p(t), we define
∇X
= ∇ṗ X.
dt
The right side makes sense in view of (8) and defines another vector field along
p(t).
In coordinates,
∇X X dX k X dxj i
= + Γkij X ∂k . (10)
dt dt ij
dt
k
2.1.6 Lemma (Chain Rule). Let X(t) be a vector field along a curve p(t). Let
X̃(t̃) and p̃(t̃) be obtained from X(t), p(t) by a change of parameter t = f (t̃),
i.e. p̃(t̃) = p(f (t̃)), X̃(t̃) = X(f (t̃)). T hen
∇X̃ dt ∇X
= .
dt̃ dt̃ dt
Proof. A change of parameter t = f (t̃) in a curve p(t) is understood to be a
diffeomorphism between the intervals of definition. Now calculate:
dt ∇X
= .
dt̃ dt
2.1.7 Theorem. Let p(t) be a C ∞ curve on M , po = p(to ) a point on p(t).
Given any vector v ∈ Tpo M at po there is a unique C ∞ vector field X along
p(t) so that
∇X
≡ 0, X(to ) = v.
dt
2.1. CONNECTIONS 93
dX k X k dxj i
+ Γij X = 0, X k (to ) = v k .
dt ij
dt
From the theory of differential equations one knows that these equations do
indeed have a unique solution X 1 , · · · , X n where the X k are C ∞ functions of
t. This proves the theorem when the curve p(t) lies in a single coordinate do-
main. Otherwise one has to cover the curve with several overlapping coordinate
domains and apply this argument successively within each coordinate domain,
taking as initial vector X(tj ) for the j-th coordinate domain the vector at the
point p(tj ) obtained from the previous coordinate domain.
Remark. Actually the existence theorem for systems of differential equations
only guarantees the existence of a solution X k (t) for t in some interval around
0. This should be kept in mind, but we shall not bother to state it explicitly.
This caveat applies whenever we deal with solutions of differential equations.
2.1.8 Supplement to the theorem. T he vector field X along a curve C : p =
p(t) satisfying
∇X
dt ≡ 0, X(to ) = v.
∇X ∇X̃
≡ 0, ≡ 0,
dt dt̃
X(to ) = v, X̃(t̃o ) = v.
T hen
X(t) = X̃(t̃) if t = f (t̃).
Proof. Consider X(t) as a function of t̃ by substituting t = f (t̃). As a function
of t̃, X is then a vector field along p̃(t). It suffices to show that ∇X/dt̃ = 0,
which is clear form the Chain Rule above.
2.1.9 Definitions. Let C : p = p(t) be a C ∞ curve on M .
(a) A vector C ∞ field X(t) along c satisfying ∇X/dt ≡ 0 is said to be
parallel along p(t).
(b) Let po = p(to ) and p1 = p(t1 ) two points on C. For each vector
vo ∈ Tpo M define a vector v1 ∈ Tp1 M by w = X(t1 ) where X is the unique
vector field along p(t) satisfying ∇X/dt ≡ 0, X(to ) = vo . The vector v1 ∈ Tp1 M
is called parallel transport of vo ∈ Tpo M from po to p1 along p(t). It is denoted
(c) The map T (to → t1 ) : Tpo M → Tp1 M is called parallel transport from
po to p1 along p(t).
It is important to remember that T (to → t1 ) depends in general on the curve
p(t) from po to p1 , not just on the endpoints po and p1 . This is important and
to bring this out we may write C : p = p(t) for the curve and TC (to → t1 ) for
the parallel transport along C.
2.1.10 Theorem. Let p(t) be a C ∞ curve on M , po = p(to ) and p1 = p(t1 )
two points on p(t). T he parallel transport along p(t) from po to p1 ,
∇ ∇ ∇ ∇ ∇X
(X + Y ) = ( X) + ( Y ) , ( aX) = a( ) (a ∈ R).
dt dt dt dt dt
The map T (to → t1 ) is invertible: its inverse is in fact T (t1 → to ), as follows
from the definitions.
A covariant derivative on a manifold is also called an affine connection, be-
cause it leads to a notion of parallel transport along curves which “connects“
tangent vectors at different points, a bit like in the “affine space” Rn , where one
may transport tangent vectors from one point to another parallel translation
(see the example below). The fundamental difference is that parallel transport
for a general covariant derivative depends on a curve connecting the points,
while in an affine space it depends on the points only. While the terms “covari-
ant derivatives” and “connection” are logically equivalent, they carry different
connotations, which can be gleaned from the words themselves.
2.1.11 Example 2.1.2 (continued). Let M = Rn with the covariant derivative
defined earlier. We identify tangent vectors to M at any point with vectors in
Rn . Then parallel transport along any curve keeps the components of vectors
with respect to the standard coordinates constant, i.e. T (to → t1 ) : Tpo M →
Tp1 M is the identity mapping if we identify both Tpo M and Tp1 M with Rn as
just described.
2.1.12 Example 2.1.3 (continued). Let E = R3 with its positive definite
inner product, S = {p ∈ E | kpk = 1} the unit sphere about the origin. S is a
submanifold of E and for any p ∈ S, Tp S is identified with the subspace {v ∈ E
| v · p = 0} of E orthogonal to p. S has a covariant derivative ∇S induced by
the covariant derivative ∇E in E: for v ∈ Tp S and X a vector field on S,
If X(t) is a vector field along a curve p(t) on S, then its covariant derivative on
S is
∇X DX dX
= − p(t) · ( ) p(t)
dt dt dt
where DX/dt is the covariant derivative on R3 , i.e. the componentwise deriva-
tive. We shall explicitly determine the parallel transport along great circles on
S. Let po ∈ S be a point of S, e ∈ Tp S a unit vector. Thus
e · po = 0, e · e = 1.
The great circle through po in the direction e is the curve p(t) : on S given by
f · po = 0, f · e = 0, f · f = 1.
Then f · p(t) = 0 for any t, so f ∈ Tp(t) S for any t. Define a second vector field
F along p(t) by setting
F (t) = f
for all t. Then again
∇F
≡ 0.
dt
Thus the two vector fields E, F , are parallel along the great circle p(t) and form
an orthonormal basis of the tangent space to S at any point of p(t).
Any tangent vector v ∈ Tpo S can be uniquely written as
v = ae + bf
for all t. Then ∇X/dt ≡ 0. Thus the parallel transport along p(t) is given by
In other words,
parallel transport along p(t) leaves the components of vectors with respect to the
bases E, F unchanged.
This may also be described geometrically like this:
w = T (0 →t)v has the same length as v and makes the same angle with the
tangent vector ṗ(t) at p(t) as v makes with the tangent vector ṗ(0) at p(0).
(This follows from the fact that E, F are orthonormal and E(t) = ṗ(t).)
0
0 0 0
E1 (to ) = v1 , · · · , En (to ) = vn .
For any value of t, the vectors E1 (t), · · · , En (t) form a basis for Tp(t) M .
Proof. This follows from theorems 2.1.7, 2.1.10, since an invertible linear trans-
formation maps a basis to a basis.
2.1.14 Definition. Let p = p(t) be a C ∞ curve in M . A parallel frame along
p(t) is an n–tuple of parallel vector fields E1 (t), · · · , En (t) along p(t) which form
a basis for Tp(t) M for any value of t.
2.1.15 Theorem. Let p = p(t) be a C ∞ curve in M , E1 (t), · · · , En (t) a parallel
frame along p(t).
(a) Let X(t) = ξ i (t) Ei (t) be a C ∞ vector field along p(t). T hen
∇X dξ i
= Ei (t).
dt dt
(b) Let v =a i Ei (to ) ∈ Tp(to ) M a vector at p(to ). T hen for any value of t,
∇X ∇(ξ i Ei ) dξ i ∇Ei dξ i
= = Ei + ξ i = Ei + 0
dt dt dt dt dt
2.1. CONNECTIONS 97
as required.
(b) Apply T (to → t) to v = ai Ei (to ) we get by linearity,
as required.
Remarks. This theorem says that with respect to a parallel frame along p(t),
(a) the covariant derivative along p(t) equals the componentwise derivative,
(b) the parallel transport along p(t) equals the transport by constant compo-
nents.
We now turn to covariant differentiation of arbitrary tensor fields.
2.1.16 Theorem. T here is a unique operation which produces for every “input“
consisting of (1) a tangent vector v ∈ Tp M at some point p and (2) a C ∞
tensor field defined in a neighbourhood of p a tensor ∇v T at p of the same type,
denoted ∇v T , subject to the following conditions.
0. If X is a vector field, the ∇v X is its covariant derivative with respect to the
given covariant-derivative operation on M .
1. ∇(u+v) T = (∇u T )+ (∇v T )
2. ∇av T =a(∇v T )
3. ∇v (T + S) = (∇v T )+ (∇v S)
4. ∇v (S · T ) = (∇v S) · T (p)+ S(p) · (∇v T )
for all p ∈ M,all u, v ∈ Tp M , all a∈ R, and all tensor fields S, T defined in a
neighbourhood of p. T he products of tensors like S · T are tensor products S ⊗ T
contracted with respect to any collection of indices (possibly not contracted at
all ).
Remark. If X is a vector field and T a tensor field, then ∇X T is a tensor field
of the same type as T . If X = X k ∂k , then ∇X T = X k ∇k T , by rules (1.) and
(i)
(2.). So one only needs to know the components (∇k T )(j) of ∇k T . The explicit
formula is easily worked out, but will not be needed.
We shall indicate the proof after looking at some examples.
2.1.17 Examples.
(a) Covariant derivative of a scalar function. Let f be a C ∞ covariant derivative
of a covector scalar function on M . For every vector field X we have
by (4). On the other hand since ∇v f X is the given covariant derivative, by (0.),
we have
∇v (f X) = (Dv f )X(p) + f (p)(∇v X)
where
Dv f = dfp (v)
by the axiom CD4. Thus
∇v f = Dv f.
98 CHAPTER 2. CONNECTIONS AND CURVATURE
Substitute the right side of (1) for the left side of (2):
Since
∇j X = (∂j X i )∂i + X i (∇j ∂i ) = (∂j X k )∂k + X i Γkij ∂k
we get
(∇j X)k = ∂j X k + X i Γkij (5)
Substitute (5) into (4) after changing an index i to k in (4):
(∇j F )i = ∂j Fi − Fk Γkij .
that if one defines ∇v T in this unique way, then 1.–4. hold, but we shall omit
this verification.
EXERCISES 2.1
1. (a) Verify that the covariant derivatives defined in example 2.1.2 above do
indeed satisfy the axioms CD1 - CD5.
(b) Do the same for example 2.1.3.
2. Supply all details in the proof of Theorem 2.1.10: Let T = TC (to → t1 ) :
Tp(to ) M → Tp(t1 ) M . Prove:
(a) T is linear, i.e.
S = {p ∈ R3 | − x2 − y 2 + z 2 = 1}.
(a) Let p ∈ S. Show that every vector v ∈ R3 can be uniquely written as a sum
of a vector tangential to S at p an a vector orthogonal to S at p:
Let po be a point of S, e ∈ Tpo S a unit vector (i.e. (e, e) = −1). Let p(t) be the
curve on S defined by
Find two vector fields E(t), F (t) along p(t) so that parallel transport along p(t)
is given by
x = r cos θ, y = r sin θ.
Calculate all Γkij for the covariant derivative on R2 defined in example 2.1.2.
(Use r, θ as indices instead of 1, 2.)
8. Let (ρ, θ, φ) be spherical coordinates on R3 :
1 2
x= (u − v 2 ), y = uv, z = z.
2
Calculate Γkuz , kj = u, v, z, for the covariant derivative on R3 defined in example
2.1.2. (Use u, v, z as indices instead of 1, 2, 3.)
10. Let S = {p = (x, y, z) ∈ R3 | z = x2 + y 2 } and let ∇ = ∇S be the induced
covariant derivative. Define a coordinate system (r, θ) on S by x = r cos θ,
y = rsinθ, z = r2 . Calculate ∇r ∂r and ∇θ ∂θ .
11. Let S be a surface of revolution, S = {p = (x, y, z) ∈ R3 | r = f (z)}
where r2 = x2 + y 2 and f a positive differentiable function. Let ∇ = ∇S be the
induced covariant derivative (Example 11.3). Define a coordinate system (z, θ)
on S by x = f (z) cos θ, y = f (z)sinθ, z = z.
2.1. CONNECTIONS 101
a) Show that the coordinate vector fields ∂z = ∂p/∂z and ∂θ = ∂p/∂θ are
orthogonal at each point of S.
b) Let C: p = p(t) be a meridian on S, i.e. a curve with parametric equations
z = t, θ = θo . Show that the vector fields E = k∂θ k−1 ∂θ and F = k∂z k−1 ∂z
form a parallel frame along C.
[Suggestion. Show first that E is parallel along C. To prove that F is also
parallel along C differentiate the equations (F, F ) = 1 and (F, E) = 0 with
respect to t.]
12. Let S be a surface in R3 with its induced connection ∇. Let C: p = p(t)
be a curve on S.
a) Let E(t), F (t) be two vector fields along C. Show that
d ∇E ∇F
(E, F ) = , F + E,
dt dt dt
b) Show that parallel transport along C preserves the scalar products vectors,
i.e.
(TC (to → t1 )u, TC (to → t1 )v) = (u, v)
for all u, v ∈ Tp(to ) S.
c) Show that there exist orthonormal parallel frames E1 , E2 along C.
13. Let T be a tensor-field of type (1, 1). Find a derive a formula for (∇k T )ij .
∂
[Suggestion: Write T = Tba ∂x a ⊗ dx
b
and use the appropriate rules 0.– 4.]
102 CHAPTER 2. CONNECTIONS AND CURVATURE
2.2 Geodesics
2.2.1 Definition. Let M be a manifold with a connection ∇. A C ∞ curve
p = p(t) on M is said to be a geodesic (of the connection) ∇ if
∇ṗ
= 0. (1)
dt
i.e. the tangent vector field ṗ(t) is parallel along p(t). We write the equation
(1) also as
∇2 p
= 0.
dt2
In terms of the coordinates (x1 (t), · · · , xn (t)) of p = p(t) relative to a coordinate
system x1 , · · · , xn the equation (1) becomes
d2 xk X i
k dx dx
j
+ Γij = 0.
dt2 ij
dt dt
2.2.2 Example. Let M = Rn with its usual covariant derivatives. For p(t) =
(x1 (t), · · · , xn (t)) the equation
∇2 p
= 0.
dt2
says that ẍi (t) = 0 for all t, so xi (t) = ai + tbi and x(t) = a + tb is a straight
line.
2.2.3 Example. Let S = {p ∈ E : kpk = 1} be the unit sphere in a Euclidean
3-space E = R3 . Let c be the great circle through p S with direction vector
e:
p(t) = (cos t)p + (sin t)e
where
e · p = 0, e · e = 1.
We have shown in §5 that ∇2 p/dt2 = 0. Thus each great circle is a geodesic.
The same is true if we change the parametrization by some constant factor, say
α:
p(t) = (cos αt)p + (sin αt)e. (2)
It will follow from the theorem below that every geodesic is of this form
2.2.4 Theorem. Let M be a manifold with a connection ∇. Let p ∈ M be
a point in M , v ∈ Tp M a vector at p. Then there is a unique geodesic p(t),
defined on some interval around t= 0, so that
p(0) = p, ṗ(0) = v.
d2 xk X dxi dxj
2
+ Γkij =0 (3)
dt ij
dt dt
dxk (0)
xk (0) = x0 , = vk . (4)
dt
The system of differential equations (3) has a unique solution subject to the
initial condition (4).
2.2.5 Definition. Fix po ∈ M . A coordinate system (x̃i ) around po is said to
be a local inertial frame (or geodesic coordinate system) at po if all Γ̃kij vanish
at po , i.e.
∇ ∂
= 0 for all ij.
∂ x̃j ∂ x̃i po
2.2.6 Theorem. There exists a local inertial frame (x̃i ) around po if and only
if the Γkij with respect to an arbitrary coordinate system (xi ) satisfy
i.e
∇ ∂ ∇ ∂
= at po (all ij).
∂xi ∂xj ∂xj ∂xi
Proof. We shall need the transformation formula for the Γkij :
∂xi 1 i s
j
= δji − (γjs i r
x̃ + γrj x̃ )
∂ x̃ 2
104 CHAPTER 2. CONNECTIONS AND CURVATURE
By the inverse function theorem, the equations (6) define coordinates (x̃i )
around po with x̃i (po ) = 0. By (5), the Γ̃trs at po with respect to (x̃i ) are
2.2.7 Remark. A local inertial frame (x̃i ) at po is not unique. It is unique only
up to a coordinate transformation of the form
where o(2) denotes some function of the x̃i which vanishes to second order at
(x̃io ), i.e. whose partials of order ≤ 1 all vanish at (x̃io ). (aij ) can be any matrix
with det(aij ) 6= 0 and (xio ) any n–tuple of real numbers.
2.2.8 Definition. The covariant derivative ∇ is said to be symmetric if the
condition of the theorem holds for all point po , i.e. if Γkij ≡ Γkji for some (hence
every) coordinate system (xi ).
2.2.9 Lemma. Fix po ∈ M . For v ∈ Tpo M , let pv (t) be the unique geodesic with
pv (0) = po , p0v (0) = v. Then pv (at) = pav (t) for all a, t ∈ R.
Proof. Calculate:
d dp
v
pv (at) = a (at)
dt dt
∇ d ∇ dp
v
pv (at) = a2 (at) = 0
dt dt dt dt
d dpv
= a2
pv (at) (at) = av
dt t=0 dt t=0
Hence pv (at) = pav (t).
2.2.10 Corollary. Fix p ∈ M . There is a unique map Tp M → M which sends
the line tv in Tp M to the geodesic pv (t) with initial velocity v.
Proof. This map is given by v → pv (1): it follows from the lemma that
pv (t) = ptv (1), so this map does send the line tv to the geodesic pv (t).
Remark. This map Tp M → M is of class C ∞ . This follows from a theorem
on differential equation, which guarantees that the solution xk = xk (t) of (3)
depends in a C ∞ fashion on the initial conditions (4).
2.2.11 Definition. The map Tp M → M of the previous corollary is called the
geodesic spray or exponential map of the connection at p, denoted v → expp v.
2.2.12 Theorem. For any p ∈ M the geodesic spray expp : Tp M → M maps a
neighbourhood of 0 in Tp M one-to-one onto a neighbourhood of p in M .
2.2. GEODESICS 105
xi = tξ i .
Since these xi = xi (t) are the coordinates of the geodesic pv (t) we get
d2 xk X i
k dx dx
j
+ Γij = 0 (for all k).
dt2 ij
dt dt
This gives X
Γkij ξ i ξ j = 0.
ij
as required.
EXERCISES 2.2
1. Use theorem 2.2.4 to prove that every geodesic on the sphere S of example
2.2.3 is a great circle as in equation (2).
2. Referring to exercise 7 of §11, show that for each constant α the curve
is a geodesic on S.
3. Prove the transformation formula (5) for the Γkij from the definition of the
Γkij .
4. Suppose( x̃i ) is a local inertial frame at po . Let (xi ) be a coordinate system
so that
xi − xio = aij (x̃j − x̃jo ) + o(2)
as in remark 2.2.6. Prove that (xi ) is also a local inertial frame at po . (It can
be shown that any two local inertial frames at po are related by a coordinate
transformation of this type.)
5. Let (x̃i ) be a coordinate system on Rn so that Γ̃trs = 0 at all points of
Rn . Show that (x̃i ) is an affine coordinate system, i.e. (x̃i ) is related to the
Cartesian coordinate system (xi ) by an affine transformation:
∂p/∂s, ∂/∂t are the tangent vector fields along the surface p = p(s, t). The
covariant derivatives along these vector fields are denoted ∇/∂s, ∇/∂t.
0
0
The vector ∇ ∂p ∇ ∂p
T (u, v) = −
∂t ∂s ∂s ∂t (so ,t0 )
depends only on u and v. Relative to a coordinate system (xi ) around po this
vector is given by
∂
T (u, v) = Tijk ui v j
∂xk po
where Tijk = Γkij − Γkji is a tensor of type (1, 2), called the torsion tensor of the
covariant derivative ∇.
Proof. Write p = p(s, t) in the coordinates (xi ) as xi = xi (s,t). Calculate:
∂p ∂xi ∂
∂s = ∂s ∂xi
∇ ∂p ∇ ∂xi ∂
∂t ∂s = ∂t ∂s ∂xi
∂ 2 xi ∂ ∂xi ∇ ∂
= ∂t∂s ∂xi + ∂s ∂t ∂xi
2.3. RIEMANN CURVATURE 109
∂ 2 xi ∂ ∂xi ∂xj ∇ ∂
=
∂t∂s ∂xi + ∂s ∂t ∂xj ∂xi Interchange
s and t and subtract to get
∇ ∂p ∇ ∂p ∂xi ∂xj ∇ ∂ ∇ ∂
∂t ∂s − ∂s ∂t = ∂s ∂t j
∂x ∂xi − i
∂x ∂xj .
Substitute
∇ ∂
∂xj ∂xi = Γkij ∂x∂ k
to get
∇ ∂p ∇ ∂p ∂xi ∂xj k k ∂
− ∂s
∂t ∂s ∂t = ∂s ∂t Γij − Γ ji ∂xk .
For s= so , t = to this
becomes
∇ ∂p ∇ ∂p
∂t ∂s − ∂s ∂t = Γkij − Γkji ui v j ∂x∂ k
so to po
Read from left to right this equation shows first of all that the left side depends
only on u and v. Read from right to left the equation shows that the right side
is independent of the coordinates, hence defines a tensor Tijk = Γkij − Γkji .
2.3. 3 Theorem. Let po be a point of M , u, v, w ∈ Tpo M three vectors at po . Let
p = p(s, t) be a surface so that p(so , t0 ) = po ,
∂p ∂p
= u, = v.
∂s (so t0 ) ∂t (so t0 )
depends only on u, v, w. Relative to a coordinate system (xi ) around po this vector is given
by
i
R(u, v)w = Rmkl uk v l wm (∂i )po
where ∂i = ∂/∂xi and
i
Rmkl = (∂l Γimk − ∂k Γiml + Γpmk Γipl − Γpml Γipk ) (R)
i
R = (Rmkl ) is a tensor of type (1, 3), called the curvature tensor of the covariant derivative
∇.
Proof. Exercise.
2.3. 4 Lemma. For any vector field X(t) and any curve p = p(t)
∇X(t) 1
= lim X(t + ) − T (t → t + )X(t) .
dt →0
or equivalently
∇X(to ) 1
= lim X(t) − T (to → t)X(to ) .
dt t→t0 t − to
Proof. Recall the formula for the covariant derivative in terms of a parallel
frame E1 (t), · · · , En (t) along c(t) (Theorem 11.15): if X(t) = ξ i (t)Ei (t), then
110 CHAPTER 2. CONNECTIONS AND CURVATURE
∇X dξ i
dt = = lim→0 1 ξ i (t + ) − ξ i (t) Ei (t)
dt Ei (t)
= lim→0 1 ξ i (t + ) − ξ i (t) Ei (t + )
= lim→0 1 ξ i (t + )Ei (t + ) − ξ i (t)Ei (t + )
= lim→0 1 X(t + ) − T (t → t + )X(t) ,
the last equality coming from the fact the components with respect to a parallel
frame remain under parallel transport.
We now give a geometric interpretation of R.
2.3. 5 Theorem. Let po be a point of M , u, v, w ∈ Tpo M three vectors at po .
Let p = p(s,t) be a surface so that p(so , t0 ) = po ,
∂p ∂p
= u, = v.
∂s (so t0 ) ∂t (so t0 )
0
(s0 ,t)0
0 (s,t)
0
0
0
00 Tw
(s,t0)
0
(s0,t0) w
Imagine this expression substituted for T in the limit on the right-hand side of
(1). Factor out the first two terms of (2) from the parentheses in (1) to get
1
lim (w − T w) =
∆s,∆t→0 ∆s∆t
1 n o
= lim T (so , t → to )T (s → so , t) × (3)
∆s,∆t→0 ∆s∆t
n o
× T (so → s, t)T (so , to → t)w − T (s, to → t)T (so → s, to ) w .
T (so → s, t)[T (so , to → t)X(so , to ) − X(so , t)] + [(T (so → s, t)X(so , t) − X(s, t)]
−T (s, to → t)[T (so → s, to )X(so , to )−X(s, to )]−[T (s, to → t)X(s, to )−X(s, t)].
Multiply through by 1/∆s∆t and take the appropriate limits of the expression
in the brackets using (4) to get:
1
lim (w − T w)
∆s,∆t→0 ∆s∆t
∇X
1 1 ∇X
= limit of − T (so → s, t) −
∆s ∂t (so, t0 ) ∆t ∂s (so, t)
1 ∇X 1 ∇X
+ T (s, to → t) +
∆t ∂s (so, to ) ∆s ∂t (s,to )
1 n ∇X ∇X o
= limit of − T (so → s, t)
∆s ∂t (s,to ) ∂t (so, t0 )
1 n ∇X ∇X o
+ T (s, to → t) −
∆t ∂s (so, to ) ∂s (so, t)
∇ ∇X ∇ ∇X
= −
∂s ∂t ∂t ∂s (so ,to )
= R(u, v)w.
112 CHAPTER 2. CONNECTIONS AND CURVATURE
x̃j = x̃jo + c ji xi
EXERCISES 2.3
1. Let ∇ be a covariant derivative on M , T its torsion tensor.
(a) Show that the equation
0 1
∇v X = ∇v X + T (v, X)
2
defines another covariant derivative 0 ∇ on M . [Verify at least CD4.]
(b) Show that 0 ∇ is symmetric i.e the torsion tensor 0 T of 0 ∇ is 0 T = 0.
(c) Show that ∇ and 0 ∇ have the same geodesics, i.e. a C ∞ curve p = p(t) is a
geodesic for ∇ if and only if it is a geodesic for 0 ∇.
2. Prove theorem 2.3.3. [Suggestion: use the proof of theorem 2.3.2 as pattern.]
3. Take for granted the existence of a tensor R satisfying
∇ ∇X ∇ ∇X ∂
i
R(u, v)w = − , R(u, v)w = Rmkl uk v l w m ,
∂s ∂t ∂t ∂s (so ,to ) ∂xi po
0
0
2.4. 1 Lemma.
a) As subspace of R3 ,
Tp S = N (p)⊥ = TN (p) S 2
where N (p)⊥ = {u ∈ R3 = R3 | (u, N (p)) = 0} is the subspace of R3 orthogonal
to N (p).
b) If u ∈ Tp S, then
(N (p), dNp (u)) = 0.
c) For any u, v ∈ Tp S,
∂ 2 W ∂ ∂W ∂W ∂N
= tang. component of − ( , N) N − ( , N)
∂t∂s ∂t ∂s ∂s ∂t
The second term may be omitted because it is orthogonal to S. The last term
is already tangential to S, because ∂N/∂t is orthogonal to N , by part (b) of the
lemma. Thus
∇ ∇W ∂2W ∂W ∂N
= tang. comp. of −( , N)
∂t ∂s ∂t∂s ∂s ∂t
In this equation, interchange s an t and subtract to find
∂p ∂p ∂W ∂N ∂W ∂N
R( , )W = ( , N) −( , N) .
∂s ∂t ∂t ∂s ∂s ∂t
These formulas can be rewritten as follows. Since (W, N ) = 0 one finds by
differentiation that
∂W ∂N
( , N ) + (W, ) = 0.
∂t ∂t
and similarly with t replaced by s. Hence
∂p ∂p ∂W ∂N ∂N ∂N
R( , )W = (N, ) − (N, ) .
∂s ∂t ∂s ∂t ∂t ∂s
If we consider p → N (p) as a function on S with values in S 2 ⊂ R3 , and write
u, v, w for three tangent vectors at p on S, this equation becomes
Hence
(R(u, v)w, z) = (w, dN (u))(dN (v), z) − (w, dN (v))(dN (u), z). (3)
where
This scalar function K = K(p) is called the Gauss curvature of S. The equation
(5) shows that for a surface S the curvature tensor R is essentially determined
by K. In connection with this formula it is important to remember that N
is to be considered as a map N : S → S 2 and dN as a linear transformation
dN : Tp S → TN (p) S 2 between 2-dimensional spaces, both of which may be
identified with the subspace N (p)⊥ of R3 orthogonal to N (p).
One can give a geometric interpretation of K as follows. In (5), set w = u and
z = v to find
where
i.e.
Geometrically,
Large curvature, large Gauss image Small curvature, small Gauss image
Fig. 2
dN
k(u) = ( , u).
ds
The right side is the tangential component of the vectorial rate of change of N
along the curve p = p(s), hence k(u) does measure the “curvature of S in the
direction u“.
2.4. 2 Lemma. T he principal curvatures k1 ,k2 are the maximum and the
minimum values of k(u) as u varies over all unit-vectors in Tp S.
Proof : For fixed p = po on S, consider k(u) = (dNpo (u),u) as a function on the
unit-circle (u,u) = 1 in Tpo S. Suppose u =uo is a minimum or a maximum of
k(u). Then for any parametrization u =u(t) of (u,u) = 1 with u(to ) =uo and
u0 (to ) = vo :
d
0= (dNpo (u), u) = (dNpo (vo ), uo ) + (dNpo (uo ), vo ) = 2(dNpo (uo ), vo ).
dt t=to
Hence dNpo (uo ) must be orthogonal to the tangent vector vo to the circle at uo ,
i.e. dNpo (uo ) =ko uo for some scalar ko .
118 CHAPTER 2. CONNECTIONS AND CURVATURE
dp 1 dT
T = ,N = , B = T × N.
ds kdT /dsk ds
These three vector fields along p = p(s) form what is called the Frenet frame
along C. They satisfy the Frenet formulas
dT dN dB
= κN , = −κT + τ B, = −τ N
ds ds ds
where κ, τ are scalar functions, which are uniquely defined by these formulas,
called curvature and torsion of the curve C. (You can consult your calculus
text for more details).
Take in particular the case of a plane curve C : p = p(s) in R2 . In that case
the we have dN/ds = −κT , hence κ = ±|dN/ds|. This equation is entirely
analogous to (6’).
From this point of view it is very surprising that there is a fundamental difference
between K and κ: K is independent of the embedding of the surface S in R3 , but
κ is very much dependent on the embedding of the curve in R2 . The curvature
κ of a curve C should therefore not be confused with the Riemann curvature
tensor R of C as 1-dimensional Riemann manifold, which would be zero. For
emphasis we may call κ the space curvature of C, because is does depend on
the embedding of C in R2 or R3 , not just on the Riemann metric of C. In the
same way the torsion τ of a space curve C has nothing to do with the torsion
tensor.
Nevertheless, there is a relation between the Riemann or Gauss curvature of a
surface S and the space curvature of curves C on it. From the above formulas
we find that generally
dN
κ=− ,T .
ds
2.4. GAUSS CURVATURE 119
From this it follows that the curvature k(u) of a surface S in the direction u ∈
Tp S is (up to a sign) the same a the curvature at p of the curve of intersection
of S with the plane through p spanned by u and the normal vector N (p). These
curves are called normal sections of S and for this reason k(u) is also called the
normal curvature of S in the direction u ∈ Tp S.
2.4.4 Remarks on the computation of the Gauss curvature. For the
purpose of computation one can start with the formula
det(dN (vi ), vj )
K = det dN = (8)
det(vi , vj )
which holds for any two v1 , v2 ∈ Tp S for which the denominator is nonzero. If
M is any nonzero normal vector, then we can write N = µM where µ = kM k−1 .
Since dN = µ(dM ) + (dµ)M formula (8) gives
det(dM (vi ), vj )
K= (9)
(M, M ) det(vi , vj )
Take M = p1 ×p2 , the cross product. Since kp1 ×p2 k2 = det(pi , pj ), the equation
(11) can be written as
det(pij , p1 × p2 )
K= . (12)
(det(pi , pj ))2
This can be expressed in terms of the fundamental forms as follows. Write
Then
(pij , p1 × p2 )
Πij = Π(pi , pj ) = −(dN (pi ), pj ) = , gij = (pi , pj )
kp1 × p2 k
Hence (11) says
detΠij
K= .
detgij
120 CHAPTER 2. CONNECTIONS AND CURVATURE
det(r−1 pi , pj )
K= = r−2
det(pi , pj )
1 4ab − 0
=
(4a2 x2 + 4b2 y 2 + 1) (1 + (2ax)2 )(1 + (2by)2 ) − (2ax2by)2
4ab
=
(4a2 x2 + 4b2 y 2 + 1)2
Concerning the sign of K, note that
K > 0 if ab > 0 (elliptic paraboloid: cup or cap shaped),
K < 0 if ab < 0 (hyperbolic paraboloid: saddle shaped),
K = 0 if ab = 0 (parabolic cylinder, plane, or empty: degenerate cases).
EXERCISES 2.4
In the following problems S is a surface in R3 , (s, t) a coordinate system on S,
N a unit-normal.
1. a) Let (x1 , x2 ) be a coordinate system on S. Show that the only covariant
j
components Rimkl = gij Rmkl the Riemann curvature tensor R are
where κ and τ are the curvature and the torsion of the curve C.
7. Calculate the first fundamental form ds2 , the second fundamental form Π,
and the Gauss curvature K for the following surfaces S: p = p(s, t) using the
given parameters s, t as coordinates. The letters a, b, c denote positive constants.
a) Ellipsoid of revolution: x = a cos s cos t, y = a cosssin t, z = c sin s.
b) Hyperboloid of revolution of one sheet: x = a cosh s cosh t, y = a coshssin t,
z = c sinh s.
c) Hyperboloid of revolution of two sheets: x = a sinh s cosh t, y = a sinhssin t,
z = c cosh s.
d) Paraboloid of revolution: x = s cos t, y = s sin t, z = s2 .
e) Circular cylinder: x = R cos t, y = R sin t, z = s.
f) Circular cone without vertex: x = s cos t, y = s sin t, z = as (s 6= 0).
g) Torus: x = (a + b cos s) cos t, y = (a + b cos s) sin t, z = b sin s.
h) Catenoid: x = cosh(s/a) cos t, y = cosh(s/a) sin t, z = s.
i) Helicoid: x = s cos t, y = s sin t, z = at.
8. Find the principal curvatures k1 , k2 at the points (±a, 0, 0) of the hyperboloid
of two sheets
x2 y2 z2
− − = 1.
a2 b2 c2
2.5. LEVI-CIVITA’S CONNECTION 123
d ∇X ∇Y
g(X, Y ) = g , Y + g X, . (2)
dt dt dt
Proof. We have to show (1) ⇔ (2).
(⇒) Assume (1) holds. Let E1 (t), · · · , En (t) be a parallel frame along p(t).
Write
g(Ei (to ), Ej (to )) = cij .
Since
Ei (t) = Tc (to → t)Ei (to )
(1) gives
g(Ei (t), Ej (t)) = cij
for all t. Write
X(t) = X i (t)Ei (t), Y (t) = Y i (t)Ei (t).
Then X X
g(X, Y ) = X i Y j g(Ei , Ej ) = cij X i Y i .
ij ij
So
d X dX i dY ji X dX i j X dY j
g(X, Y ) = cij Y j +cij X i = cij Y + cij X i
dt ij
dt dt ij
dt ij
dt
124 CHAPTER 2. CONNECTIONS AND CURVATURE
∇X ∇Y
=g ,Y + g X, .
dt dt
This proves (2).
(⇐) Assume (2) holds. Let c: p = p(t) be any curve, po = p(to ), and u, v ∈
Tp(to ) M two vectors. Let
Then
∇X ∇Y
= 0, = 0.
dt dt
Thus by (2)
d ∇X ∇Y
g(X, Y ) = g , Y + g X, = 0.
dt dt dt
Consequently
g(X, Y ) = constant.
If we compare this constant at t = to and at t = t1 we get
i.e.
Γk,ji = gkl Γlji . (5)
Since ∇ is symmetric
Γlji = Γlij
and therefore
Γk,ji = Γk,ij . (6)
0
Equations (4 ) and (6) may be solved for the Γ’s as follows. Write out the
equations corresponding to (40 ) with the cyclic permutation of the indices: (ijk),
(kij), (jki). Add the last two and subtract the first. This gives (together with
(6)):
gki,j + gij,k − gjk,i = (Γi,kj + Γk,ij ) + (Γj,ik + Γi,jk ) − (Γk,ji + Γj,ki ) = 2Γi,jk .
Therefore
1
Γi,kj = (gki,j + gij,k − gjk,i ). (7)
2
Use the inverse matrix (g ab ) of (gij ) to solve (5) and (7) for Γljk :
1 li
Γljk = g (gki,j + gij,k − gjk,i ). (8)
2
This proves that the Γ’s, and therefore the connection ∇ are uniquely deter-
mined by the gij , i.e. by the metric g.
(Existence). Given g, choose a coordinate system (xi ) and define ∇ to be the
(symmetric) connection which has Γljk given by (8) relative to this coordinate
system. If one defines Γk,ji by (5) one checks that (40 ) and therefore (4) holds.
This implies that (3) holds, since any vector field is a linear combination of the
∂i . So ∇ is compatible with the metric g. That ∇ is symmetric is clear from
(8).
2.5. 6 Definitions. (a) The unique connection ∇ compatible with a given
Riemann metric g is called the Levi-Civita connection of g.
(b) The Γk,ij and the Γljk defined by (7) and (8) are called Christoffel symbols
of the first and second kind, respectively. Sometimes the following notation is
used (sometimes with other conventions concerning the positions of the entries
in the symbols):
1
[jk, i] = Γi,kj = (gki,j + gij,k − gjk,i ), (9)
2
nlo 1
= Γljk = g li (gki,j + gij,k − gjk,i ). (10)
jk 2
These equations are called Christoffel’s formulas.
for some constants x̃jo , cji with glk cli ckj = gij δij .
P
lk
Proof omitted.
Geodesics of the Levi-Civita connection. We have two notions of geodesic
on M : (1) the geodesics of the Riemann metric g, defined as “shortest lines”,
characterized by the differential equation
d2 xk dxi dxj
2
+ Γkij = 0 (for all k). (12)
dt dt dt
The following theorem says that these two notions of “geodesic“ coincide, so
that we can simply speak of “geodesics”.
2.5. 8 Theorem. The geodesics of the Riemann metric are the same as the
geodesics of the Levi-Civita connection.
Proof. We have to show that (11) is equivalent to (12) if we set
1 kl
Γkij = g (gjl,i + gli,j − gij,l ).
2
This follows by direct calculation.
We know from Proposition 5.9 that a geodesics of the metric have constant
speed, hence so do the geodesics of the Levi-Civita connection, but it is just as
easy to see directly clear since the velocity vector of a geodesic is parallel and
parallel transport preserves length.
Riemannian submanifolds.
2.5. 9 Definition. Let S be an m–dimensional submanifold of a manifold M
with a Riemann metric g. Assume the restriction g S of g to tangent vectors
to S is non-degenerate, i.e. if {v1 , · · · , vm } is a basis for Tp S ⊂ Tp M , then
2.5. LEVI-CIVITA’S CONNECTION 127
Tp M = Tp S ⊕ Tp⊥ S
∇S p ∇S p ∇ p ∇ p S
− = − =0
∂s ∂t ∂t ∂s ∂s ∂t ∂t ∂s
since the Levi-Civita connection ∇ on M is symmetric. It remains to show that
∇S is compatible with g S . For this we use Lemma 2.5.4. Let X, Y, Z be three
vector fields on S. Considered as vector fields on M along S they satisfy
N S = {v ∈ Tp M : p ∈ S, v ∈ Tp S ⊥ }
The Riemannian submanifold S will remain fixed from now on and N will denote
its normal bundle.
2.5. 13 Lemma . N is an n-dimensional submanifold of the tangent bundle
TM.
Proof. In a neighbourhood of any point of S we can find an orthonormal family
E1 , · · · , En of vectors fields on M , so that at points p in S the first m = dim S
of them form a basis for Tp S (with the help of the Gram–Schmidt process,
for example). Set ξ i (v) = g(Ei , v) as function on T M . We may also choose a
coordinate system (x1 , · · · , xn ) so that S is locally given by xm+1 = 0, · · · , xn =
0. The 2n function (xi , ξ i ) form a coordinate system on T M so that N is given
by xm+1 = 0, · · · , xn = 0, ξ 1 = 0, · · · , ξ m = 0, as required by the definition of
“submanifold“.
The set N comes equipped with a map N → S which maps a normal vector
v ∈ Tp S ⊥ to its base-point s ∈ S. We can think of S as a subset of N , the
zero-section, which consists of the zero-vectors 0p at points p of S.
2.5. 14 Proposition. The geodesic spray exp : N → M is locally bijective
around any point of S.
Proof. We use the inverse function theorem. For this we have to calculate
the differential of exp along the zero section S in N . Fix po ∈ S. Then S
and Tpo S ⊥ ⊂ Tpo M are two submanifolds of N whose tangent spaces Tpo S and
Tpo S ⊥ are orthogonal complements in Tpo M . (g is nondegenerate on Tpo S).
We have to calculate d exppo (w). We consider two cases.
(1) w ∈ Tpo S is the tangent vector of a curve p(t) in S
(2) w ∈ Tpo S ⊥ is the tangent vector of the straight line tw in Tpo S ⊥ .
2.5. LEVI-CIVITA’S CONNECTION 129
Then we find
d d
(1) d exppo (w) = exp 0p(t) = p(t) = w ∈ Tpo S
dt t=0 dt t=0
d
(2) d exppo (w) = exp tw = w ∈ Tpo S ⊥
dt t=0
Thus dexppo w = w in either case, hence d exppo has full rank n = dim N =
dim M .
For any c ∈ R, let
Nc = {v ∈ N | g(v, v) = c}.
At a given point p ∈ S, the vectors in Nc at p from a “sphere“ (in the sense of
the metric g) in the normal space Tp S ⊥ . Let Sc = exp Nc be the image of Nc
under the geodesic spray exp, called a normal√tube around S in M ; it may be
thought of as the points at constant distance c from S, at least if the metric
g is positive definite.
We shall need the following observation.
2.5. 15 Lemma. Let t → p = p(s, t) be a family of geodesics depending on a
parameter s. Assume they all have the same (constant) speed independent of s,
i.e. g(∂p/∂t, ∂p/∂t) = c is independent of s. Then g(∂p/∂s, ∂p/∂t) is constant
along each geodesic.
Proof. Because of the constant speed,
∂ ∂p ∂p ∇∂p ∂p
0= g , = 2g ,
∂s ∂t ∂t ∂s∂t ∂t
Hence ∇∂p ∂p ∂p ∇∂p
∂ ∂p ∂p
g , =g , +g , =0+0
∂t ∂s ∂t ∂t∂s ∂t ∂s ∂t∂t
the first 0 because of the symmetry of ∇ and the previous equality, the second
0 because t → p(s, t) is a geodesic.
We now return to S and N .
2.5. 16 Gauss Lemma (Version 1). The geodesics through a point of S with
initial velocity orthogonal to S meet the normal tubes Sc around S orthogonally.
Proof. A curve in Sc = exp Nc is of the form exp p(s) v(s) where p(s) ∈ S
and v(s) ∈ Tp(s) S ⊥ . Let t → p(s, t) = expp(s) tv(s) be the geodesic with
initial velocity v(s). Since v(s) ∈ Nc these geodesics have speed independent
of s: g(∂p/∂t, ∂p/∂t) = g(v(s), v(s)) = c. Hence the lemma applies. Since
(∂p/∂t)t=0 = v(s) ∈ Tp(s) S ⊥ is perpendicular to (∂p/∂s)t=0 = dp(s)/ds ∈
Tp(s) S at t = 0, the lemma says that ∂p/∂t remains perpendicular to ∂p/∂s for
all t. On the other hand, p(s, 1) = exp v(s) lies in Sc , so ∂p/∂s is tangential to
Sc at t = 1. Since all tangent vectors to Sc = exp Nc are of this form we get the
assertion.
2.5. 17 Gauss Lemma (Version 2). Let p be any point of M . The geodesics
through p meets the spheres Sc in M centered at p orthogonally.
130 CHAPTER 2. CONNECTIONS AND CURVATURE
Proof. This is the special case of the previous lemma when S = {p} reduces to
a single point.
2.5. 18 Remark. The spheres around p are by definition the images of the
“spheres“ g(v, v) = c in Tp M under the geodesic spray. If the metric is positive
definite, these are really the points at constant distance from p, as follows from
the definition of the geodesics of a Riemann metric. For example, when M = S 2
and p the north pole, the “spheres” centered at p are the circles of latitude
φ =constant and the geodesics through p the circles of longitude θ =constant.
EXERCISES 2.5
1. Complete the proof of Theorem 2.5.10.
2. Let gbe Riemann metric, ∇the Levi-Civita connection of g. Consider the
Riemann metric g = (gij ) as a (0, 2)-tensor. Prove that ∇g = 0, i.e. ∇v g =
0 for any vector v. [Suggestion. Let X, Y, Z be a vector field and consider
DZ (g(X, Y )). Use the axioms defining the covariant derivative of arbitrary
tensor fields.]
3. Use Christoffel’s formulas to calculate the Christoffel symbols Γij,k , Γkij in
spherical coordinates (ρ, θ, φ): x = ρ cos θ sin φ, y = ρ sin θ sin φ, z = ρ cos θ.
[Euclidean metric ds2 = dx2 + dy 2 + dz 2 on R3 . Use ρ, θ, φ as indices i, j, k
rather than 1, 2, 3.]
4. Use Christoffel’s formulas to calculate the Γkij for the metric
∇ ∇X ∇ ∇X
R(U, V )X = − . (*)
∂t ∂s ∂s ∂t
1. This is clear from (*).
2. Let A = A(s, t), B = B(s, t) be C ∞ vector fields along p = p(s, t). Compute
∂ ∇A ∇B
(A, B) = ( , B) + (A, ),
∂s ∂s ∂s
∂ ∂ ∇ ∇ ∇A ∇B ∇A ∇B ∂∇ ∂∇
(A, B) = ( A, B) + ( , )+( , ) + (A, B).
∂t ∂s ∂t ∂s ∂s ∂t ∂t ∂s ∂t ∂s
Interchange s and t, subtract, and use (*):
2. Rabcd = −Rbacd
a a a
3. Rbcd + Rdbc + Rcdb =0
4. Rabcd = Rdcba
Proof. This follows immediately from the theorem.
2.6. 3 Theorem (Jacobi’s Equation). Let p = ps (t) be a one-parameter
family of geodesics (s =parameter ). Then
∇2 ∂p ∂p ∂p ∂p
= R( , ) .
∂t2 ∂s ∂s ∂t ∂t
Proof.
∇2 ∂p ∇ ∇ ∂p
2
= [symmetry of ∇]
∂t ∂s ∂t ∂s ∂t
∇ ∇ ∂p ∇ ∇ ∂p ∇ ∂p
= − [ ∂t ∂t = 0 since ps (t) is a geodesic]
∂t ∂s ∂t ∂s ∂t ∂t
∂p ∂p ∂p
= R( , ) .
∂s ∂t ∂t
t
2.6. 4 Definition. For any two vectors v, w, Ric(v, w) is the trace of the linear
transformation u→ R(u, v)w.
In coordinates:
i
R(u, v)w = Rqkl uk v l wq ∂i ,
k
Ric(v, w) = Rqkl v l wq .
Thus Ric is a tensor obtained by a contraction of R:
k
(Ric)ql = Rqkl .
2.6. 5 Theorem. The Ricci tensor is symmetric, i.e. Ric(v, w) = Ric(w, v) for
all vectors v, w.
Proof. Fix v, w and take the trace of Bianchi’s identity, considered as a linear
transformation of u:
Calculus on manifolds
135
136 CHAPTER 3. CALCULUS ON MANIFOLDS
with real coefficients Tij··· and with the differentials dxi being taken at p. The
tensor need not be homogeneous, i.e. the sum involve (0, k)–tensors for dif-
ferent values of k. Such expressions are added and multiplied in the natural
way, but one must be careful to observe that multiplication of the dxi is not
commutative, nor satisfies any other relation besides the associative law and the
distributive law. Covariant tensors at p can be thought of as purely formal alge-
braic expressions of this type. Differential forms are obtained by the same sort
of construction if one imposes in addition the rule that the dxi anticommute.
This leads to the following definition.
3.1. 1 Definition. A differential form $ at p ∈ M is a quantity which relative
to a coordinate system (xi ) around p is represented by a formal expression
X
$= fij··· dxi ∧ dxj ∧ · · · (3.1)
with real coefficients fij··· and with the differentials dxi being taken at p. Such
expressions are added and multiplied in the natural way but subject to the
relation
dxi ∧ dxj = −dxj ∧ dxi . (3.2)
If all the wedge products dxi ∧ dxi ∧ · · · in (1) contain exactly k factors, then $
said to be homogeneous of degree k and is called a k–form at p. A differential
form on M (also simply called a form) associates to each p ∈ M a form at
p. The form is of class C ∞ if its coefficients fij··· (relative to any coordinate
system) are C ∞ functions of p. This will always be assumed to be the case.
Remarks. (1) The definition means that we consider identical expressions
(1) which can be obtained from each other using the relations (2), possibly
repeatedly or in conjunction with the other rules of addition and multiplication.
For example, since dxi ∧ dxi = −dxi ∧ dxi (any i) one finds that 2dxi ∧ dxi = 0,
so dxi ∧ dxi = 0. The expressions for $ in two coordinate systems (xi ), (x̃i )
are related by the substitutions xi = f i (x̃1 , · · · , x̃n ), dxi = (∂f i /∂ x̃j )dx̃j on the
intersection of the coordinate domains.
(2) By definition k-fold wedge product dxi ∧ dxj ∧ · · · transforms like the (0,k)-
tensor dxi ⊗dxj ⊗· · · , but is alternating in the differentials dxi , dxj , i.e. changes
sign if two adjacent differentials are interchanged.
(3) Every differential form is uniquely a sum of homogeneous differential forms.
For his reason one can restrict attention to forms of a given degree, except that
the wedge-product of a k-form and an l-form is a (k + l)-form.
3.1. DIFFERENTIAL FORMS 137
where the sum goes over ordered k-tuples i < j < · · · . It is also clear that
an alternating (0,k)-tensor Tij··· dxi ⊗ dxj ⊗ · · · is uniquely determined by its
components
Tij··· = fij··· if i < j < · · · .
indexed by ordered k-tuples i < j < · · · . Hence the formula
does give a one-to-one correspondence. The fact that the Tij··· defined in this
way transform like a (0,k)-tensor follows from the transformation law of the dxi .
.
3.1. 7 Example: forms on R3 .
(1) The 1-form Adx + Bdy + Cdz is a covector, as we know.
(2) The differential 2-form P dy ∧ dz + Qdz ∧ dx + Rdx ∧ dy corresponds to
the (0, 2)-tensor T with components Tzy = −Tyz = P , Tzx = −Txz = Q,
Txy = −Tyx = R. This is just the tensor
The sum runs over the group Sk of all permutations (1, · · · , k) → (σ(1), · · · , σ(k));
sgn(σ) = ±1 is the sign of the permutation σ. The multiplication of forms then
takes on the following form. Write k = p + q and let Sp × Sq be the subgroup of
the group Sk which permutes the indices {1, · · · , p} and {p+1, · · · , p+q} among
themselves. Choose a set of coset representatives [Sp+q /Sp ×Sq ] for the quotient
so that every element of σ ∈ Sp+q can be uniquely written as σ = τ σ 0 σ 00 with
τ ∈[Sp+q /Sp × Sq ] and (σ 0 , σ 00 ) ∈ Sp × Sq . If in (4) one first performs the sum
over (σ 0 , σ 00 ) and then over τ one finds the formidable formula
(These τ ’s are called “shuffle permutations”). On the other hand, one can
also let the sum in (5) run over all τ ∈ Sp+q provided one divides by p!q! to
compensate for the redundancy. Finally we note that (4) and (5) remain valid
if the dxi are replaced by arbitrary 1–forms θi , and are then independent of the
coordinates.
3.1. DIFFERENTIAL FORMS 139
The formula (5) gives the multiplication law for differential forms when consid-
ered as tensors via (4). Luckily, the formulas (4) and (5) are rarely needed. The
whole point of differential forms is that both the alternating property and the
transformation law is built into the notation, so that they can be manipulated
“mechanically“.
(b) We now have (at least) three equivalent ways of thinking about k-forms:
(i) formal expressions fij··· dxi ∧ dxj ∧ · · ·
(ii) alternating tensors Tij··· dxi ⊗ dxj ⊗ · · ·
(iii) alternating multilinear functions T (v, w, · · · )
3.1. 9 Theorem. On an n–dimensional manifold, any differential n–form is
written as
Ddx1 ∧ dx2 ∧ · · · ∧ dxn
relative to a coordinate system (xi ). Under a change of coordinates x̃i =
x̃i (x1 , · · · , xn ),
Proof. Exercise.
3.1. 11 Remarks and definitions. If M is connected, then the collection of
all coordinate system on M falls into two classes characterized by the property
that the Jacobian is positive on the intersection of the coordinate domains for
any two coordinate systems in the same class. This follows from the fact that
the various Jacobians det(∂xi /∂ x̃j ) are always non-zero (where defined), hence
cannot change sign along any continuous curve. Singling out one of these two
classes determines what is called an orientation on M . The coordinate systems
in the distinguished class are then called positively oriented. If M is has several
connected components, the orientation may be chosen independently on each.
We shall now assume that M is oriented and only use positively oriented coor-
dinate systems. This eliminates the ambiguous sign in the above n-from, which
is then called the volume element of the Riemann metric g on M , denoted volg .
3.1. 12 Example. In R3 with the Euclidean metric dx2 + dy 2 + dz 2 the volume
element is :
Cartesian coordinates (x, y, z): dx ∧ dy ∧ dz
Cylindrical coordinates (r, θ, z): rdr ∧ dθ ∧ dz
Spherical coordinates (ρ, θ, φ): ρ2 sin φdρ ∧ dθ ∧ dφ
3.1. 13 Example. Let S be a two-dimensional submanifold of R3 (smooth
surface). The Euclidean metric dx2 + dy 2 + dz 2 on R3 gives a Riemann metric
g = ds2 on S by restriction. Let u,v coordinates on S. Write p = p(u, v) for the
point on S with coordinates (u, v). Then
q
∂p ∂p
| det gij | =
×
.
∂u ∂v
Here gij is the matrix of the Riemann metric in the coordinate system u,v. The
right-hand side is the norm of the cross-product of vectors in R3 .
Recall that in a Riemannian space indices on tensors can be raised and lowered
at will.
3.1. 14 Definition. The scalar product g(α, β) of two k-forms
is
g(α, β) = aij··· bij··· ,
sum over ordered k-tuples i < j < · · · .
3.1. 15 Theorem. For any k-form α there is a unique (n−k )-form * α so that
EXERCISES 2.2
1. Prove the formulas in Example 3.1. 2.
2. Prove the formula in Example 3.1. 13
3. Prove the formula (0)-(3) in Example 3.1.16
4. Prove the formulas *(a∧b) =a×b, and *(a∧b∧c) =a·(b×c) in Example 3.1.16
5. Prove the formula for *F in Example 3.1. 16
6. Prove the Tij··· defined as in Lemma 3.1.5 in terms of a (0,k)-form fij··· dxi ∧
dx · · · do transform like the components of a (0,k)-tensor, as stated in the proof
of the lemma.
7. Verify the formula *(*F ) = (−1)k(n−k) sgndet(gij )F of Theorem 3.1.15 for
(0,k)–tensors on R3 , k= 0, 1, 2, 3 directly using the formulas of Example 3.1.16.
8. Let ϕ be the following differential form on R3 :
∂fij··· k
dfij··· ∧ dxi ∧ dxj · · · = dx ∧ dxi ∧ dxj · · ·
∂xk
is independent of the coordinate system (xi ).
Proof. Consider first the case of a differential 1-form. Thus assume fi dxi =
f˜a dx̃a . This equation gives fi dxi = f˜a (∂ x̃a /∂xi )dxi hence fi = f˜a (∂ x̃a /∂xi ), as
we know. Now compute:
a
∂fi
∂xk
dxk ∧ dxi = ∂x∂ k f˜a ∂∂xx̃ i dxk ∧ dxi
˜ a
= ∂x∂ fa ∂ x̃ ˜ ∂ 2 x̃a dxk ∧ dxi
k ∂xi + fa ∂xk ∂xi
∂fij··· k
d$ := dfij··· ∧ dxi ∧ dxj · · · = dx ∧ dxi ∧ dxj · · · .
∂xk
Actually, the above recipe defines a form d$ separately on each coordinate
domain, even if $ is defined on all of M . However, because of 3.2.1 the forms
d$ on any two coordinate domains agree on their intersection, so d$ is really
a single form defined on all of M after all. In the future we shall take this kind
of argument for granted.
3.2. 4 Example: exterior derivative in R3 .
0-forms (functions): $ = f, d$ = ∂f ∂f
∂x dx + ∂y dy +
∂f
∂z dz.
1-forms (covectors): $ = Adx + Bdy + Cdz
3.2. DIFFERENTIAL CALCULUS 145
d$ = ( ∂C ∂B ∂C ∂A ∂B
∂y − ∂z ), dy ∧ dz − ( ∂x − ∂z )dz ∧ dx + ( ∂x −
∂A
∂y )dx ∧ dy
2-forms :4 $ = P dx ∧ dz + Qdz ∧ dx + Rdx ∧ dy
∂Q
d$ = ( ∂P ∂R
∂x + ∂y + ∂dz )dxdydz
3-form: $ = Ddx ∧ dy ∧ dz
3.2. 5 Theorem (Product Rule). Let α, β be differential forms with α ho-
mogeneous of degree|α|. Then
Proof. By induction on |α|it suffices to prove this for α a 1-form, say α = ak dxk .
Put β = bij··· dxi ∧ dxj · · · . Then
d(α∧β) = d(ak bij dxk ∧dxi ∧dxj · · · ) = {(dak )bij +ai (dbij··· )}∧dxk ∧dxi ∧dxj · · ·
= (dak ∧dxk )∧(bij dxi ∧dxj · · · )−(ai dxk )∧(dbij··· ∧dxi ∧dxj · · · ) = (dα)∧β−α∧(dβ).
This is clear since any differential form fij··· dxi ∧dxj · · · on a coordinate domain
can be built up from scalar functions and their differentials by sums and wedge
products. Note that in this “axiomatic“ characterization we postulate that d
operates also on forms defined only on open subsets. This postulate is natural,
but actually not necessary.
3.2. 6 Theorem. For any differential form $ of class C 2 , d(d$) = 0.
Proof. First consider the case when $ = f is a C 2 function. Then
∂f i ∂2f
dd$ = d( dx ) = dxj ∧ dxi
∂xi ∂xj ∂xi
and this is zero, since the terms ij and ji cancel, because of the symmetry of
the second partials. The general case is obtained by adding some dots, like
$ = f... ... to indicate indices and wedge-factors, which are not affected by the
argument. (Alternatively one can argue to induction.)
3.2. 7 Corollary. If $ is a differential form which can be written as an exterior
derivative $ = dϕ, then d$ = 0.
146 CHAPTER 3. CALCULUS ON MANIFOLDS
xi = F i (y 1 , · · · , y m ), i = 1, · · · , n.
F * $ = gkl··· (y 1 , · · · , y m )dy k ∧ dy l · · ·
∂F i j
xi = F i (y 1 , · · · , y m ), dxi = dy . (*)
∂y j
Proof. Exercise.
3.2. 13 Examples.
(a) Let $ be the 1-form on R2 given by
$ = xdy
$ = dt
Then
F ∗ $ = F ∗ (dt) = d(x − y)2 = 2(x − y)(dx − dy).
(c) The symmetric (0, 2)-tensor g representing the Euclidean metric dx2 + dy 2 +
dz 2 on R3 has components (δij ) in Cartesian coordinates. It can be written as
g = dx ⊗ dx + dy ⊗ dy + dz ⊗ dz .
i∗ T (u, v, · · · ) = T (u, v, · · · )
−1
δ$ = * d*$ = (−1)(n−k+1)(k−1) *d*$.
EXERCISES 3.2
1. Let f be a C2 function (0-form). Prove that d(df ) = 0.
2. Let $ = i<j fij dxi ∧ dxj . Write d$ in the form
P
X
d$ = fijk dxi ∧ dxj ∧ dxk
i<j<k
(Find fijk .)
3. Let ϕ = fi dxi be a 1-form. Find a formula for δϕ.
4. Prove the assertion of Remark 2, assuming Theorem 3.2.1.
5. Prove parts (a) and (b) of Theorem 3.2.11.
6. Prove part (c) of Theorem 3.2.11.
7. Prove part (d) of Theorem 3.2.11.
8. Prove from the definitions the formula
i∗ T (u, v, · · · ) = T (u, v, · · · )
3.2. DIFFERENTIAL CALCULUS 149
i
12. a) Let $ = fi dxi be a C ∞ 1-form. Prove that δ$ = − γ1 ∂(γf )
∂xi where
p
γ = | det gij |. Deduce
that
1 ∂ ij ∂f
δdf = − γ ∂xi γg ∂xj
[Remark. The operator ∆ : f → −δdf is called the Laplace-Beltrami operator
of the Riemann metric.]
13. Use problem 12 to write down ∆f = δdf
(a) in cylindrical coordinates (r, θ, z) on R3 ,
(b) in spherical coordinates (ρ, θ, φ) on R3 ,
(b) in geographical coordinates (θ, φ) on S 2 .
14. Identify vectors and covectors on R3 using the Euclidean metric ds2 =
dx2 + dy 2 + dz 2 , so that the 1-form Fx dx + Fy dy + Fz dz is identified with the
vector field Fx (∂/∂x) + Fy (∂/∂y) + Fz (∂/∂z). Show that
a) ∗dF =curlF b) d ∗ F = ∗divF
for any 1-form F , as stated in E xample 3.2.15
15. Let F = Fr (∂/∂r) + Fθ (∂/∂θ) + Fz (∂/∂z) be a vector field in cylindrical
coordinates (r, θ, z) on R3 . Use problem 14 to find a formula for curlF and
divF .
16. Let F = Fρ (∂/∂ρ) + Fθ (∂/∂θ) + Fφ (∂/∂φ) be a vector field in spherical
coordinates (ρ, θ, φ) on R3 . Use problem 14 to find a formula for curlF and
divF .
17. Find all radial solutions f to Laplace’s equation ∆f = 0 in R2 and in R3 .
(“Radial“ means f (p) = f (r). You may use problem 12 to find a formula of
∆f .)
150 CHAPTER 3. CALCULUS ON MANIFOLDS
if one wants the value of the integral to come out independent of the coordinates.
The reason for this is the Change of Variables formula form calculus.
3.3.1 Theorem (Change of Variables Formula). Let x̃j = F j (x1 , · · · , xn )
be a C ∞ mapping Rn → Rn which maps an open set U one-to-one onto an
open set Ũ . Then for any integrable function f on U ,
∂ x̃i
Z Z Z Z
· · · f dx̃1 · · · dx̃n = · · · f | det j |dx1 · · · dxn .
∂x
∂xi
det >0
∂ x̃j
at all points in the domain of both coordinate systems. Any coordinate sys-
tem (xi ) for which this determinant is positive for all (x̃j ) in the given class
is itself called positively oriented. An orientation on manifold consists of the
specification of such a class.
3.3.2 Theorem and definition. Let M be an oriented n–dimensional mani-
fold, (xi ) a positively oriented coordinate system defined on an open set U . For
any n–form $ = f dxb 1 ∧ · · · ∧ dxn on U the integral
Z Z Z
$ := · · · f dx1 · · · dxn
U
j
= ± · · · f˜| det ∂∂xx̃i |dx1 · · · dxn
R R
= ± · · · f˜dx̃1 · · · dx̃n
R R
The sum need not be finite. The integral exists provided the series converges.
1-forms. A 1-form on R3 may be written as
P dx + Qdy + Rdz.
152 CHAPTER 3. CALCULUS ON MANIFOLDS
for the Cartesian coordinates of p(u, v). Thus S can be considered a surface in
R3 parametrized by (u, v) ∈ D. The restriction of Ady ∧dz −Bdx∧dz +Cdx∧dy
to S is
n ∂y ∂z ∂z ∂y ∂x ∂z ∂z ∂x ∂x ∂y ∂y ∂x o
A − +B − +C − du ∧ dv.
∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v
In calculus notation this expression is V · N where V = Ai + Bj + Ck and
∂p ∂p ∂y ∂z ∂z ∂y ∂x ∂z ∂z ∂x ∂x ∂y ∂y ∂x
N= × = − i− − j+ − k.
∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v
V is the vector field corresponding to the given 2-form under the correspondence
discussed in §5. N is the normal vector corresponding to the parametrization
of the surface. Thus the integral above is the usual surface integral:
Z ZZ
Ady ∧ dz − Bdx ∧ dz + Cdx ∧ dy = V · N dudv.
S
D
If the coordinates (u, v) do not cover all of S, then the integral must be defined
by subdivision as remarked earlier.
3-forms. A 3-form on R3 may be written as
f dx ∧ dy ∧ dz
3.3. INTEGRAL CALCULUS 153
Thus ∂ x̃1 /∂x2 = 0, · · · , ∂ x̃1 /∂xk = 0 on ∂R. Expanding det(∂ x̃i /∂xj ) along
the “row“ (∂x1 /∂xj ) one finds that
∂ x̃i ∂ x̃1 ∂ x̃i
det j
= 1
det on ∂R.
∂x 1≤ij≤k ∂x ∂xj 2≤ij≤k
The LHS is > 0, since (xi ) and (x̃i ) are positively oriented. The derivative
∂x1 /∂ x̃1 cannot be < 0 on ∂R, since x̃1 = x̃1 (x1 , x2o , · · · xko ) is = 0 for x1 = 0
and is < 0 for x1 < 0. It follows that in the above equation the first factor
(∂x1 /∂ x̃1 ) on the right is > 0 on ∂R, and hence so is the second factor, as
required.
3.3.10 Example. Let S be a 2-dimensional submanifold of R3 . For each p ∈ S,
there are two unit vectors ±n(p) orthogonal to Tp S. Suppose we choose one
of them, say n(p), depending continuously on p ∈ S. Then we can specify an
orientation on S by stipulating that a coordinate system (u, v) on S is positive
if
∂p ∂p
× = Dn with D > 0. (*)
∂u ∂v
Let R be a bounded submanifold of S with boundary C = ∂R. Choose a positive
coordinate system (u, v) on S around a point of C as above, so that u < 0 on
R and u = 0 on C. Then p = p(0, v) defines the positive coordinate v on C
and ∂p/∂v is the tangent vector along C. Along C, the equation(*) amounts
to this. If we walk upright along C (head in the direction of n) in the positive
direction (direction of ∂p/∂v), then R is to our left (∂p/∂u points outward from
R, in the direction in the direction of increasing u, since the u-component of
∂p/∂u = ∂/∂u is+1).
The next theorem is a kind of change of variables formula for integrals of forms.
3.3.11 Theorem. Let F : M → N be a diffeomorphism of n–dimensional
manifolds. For any m–dimensional oriented bounded submanifold R of N and
any m–form $ on N one has
Z Z
∗
F $= $
R F (R)
Proof. We first prove this for the special case of a n-cube I n in Rn . Let
I n = {(x1 , · · · , xn ) ∈ Rn | 0 ≤ xj ≤ 1}.
1 n
We specify the orientation so that the Cartesian coordinates
n
Sn (x0 , · · ·1, x ) from
a positively oriented coordinate system. We have ∂I = j=1 (Ij ∪ Ij ) where
Ij0 = {(x1 , · · · , xn ) ∈ I n | xj = 0}, Ij1 = {(x1 , · · · , xn ) ∈ I n | xj = 1}.
I1
2
∗ ←− ∗
I10 ↓ ↓ I11
∗ −→
0
∗
I2
Near Ij0 the points of I n satisfy xj ≥ 0. If one uses xj as first coordinate,
one gets a coordinate system (xj , x1 , · · · [xj ] · · · xn ) on I n whose orientation is
positive or negative according to the sign of (−1)j−1 . The coordinate system
(−xj , x1 , · · · [xj ] · · · xn ) on I n is of the type required in definition 3.3.11 for
R = I n near a point of S = Ij0 . It follows that (x1 , · · · [xj ] · · · xn ) is a coordi-
nate system on Ij0 whose orientation (specified according to 3.3.11) is positive or
negative according to the sign of (−1)j . Similarly, (x1 , · · · [xj ] · · · xn ) is a coor-
dinate system on Ij0 whose orientation (specified according to 3.3.11) is positive
or negative according to the sign of (−1)j−1 . We summarize this by specifying
the required sign as follows
Positive n–form on I n : dx1 ∧ · · · ∧ dxn
Positive (n−1)-form on Ij0 : (−1)j dx1 ∧ · · · [dxj ] · · · ∧ dxn
Positive (n−1)-form on Ij1 : (−1)j−1 dx1 ∧ · · · [dxj ] · · · ∧ dxn
A general (n−1)-form can be written as
X
ω= fj dx1 ∧ · · · [dxj ] · · · ∧ dxn .
j
Compute
∂fj j
∧ dx1 ∧ · · · [dxj ] · · · ∧ dxn
R R
In
dωj = I n ∂xj dx
[definition of dω]
∂fj j−1
dx1 n
R
= In ∂xj (−1) ∧ · · · ∧ dx
[move dxj in place]
R1 R1 ∂fj
= 0
··· 0 ∂xj
(−1)j−1 dx1 · · · dx n
R1 R1
= 0
··· fj |xj =1 (−1)j−1 dx1 · · · [dxj ] · · · dxn +
0
R1 R1
+ 0 · · · 0 fj |xj =0 (−1)j dx1 · · · [dxj ] · · · dxn
fj dx1 ∧ · · · [dxj ] · · · ∧ dxn + I 0 fj dx1 ∧ · · · [dxj ] · · · ∧ dxn
R R
= Ij1 j
R [definition 3.3.2]
= ∂I n
ωj
[ I 0 ωj = 0 if k 6= j, because dxk = 0 on Ik0 , Ik1 ]
R
k
This proves the formula for a cube. To prove it in general we need too appeal
to a theorem in topology, which implies that any bounded submanifold can be
subdivided into a finite number of coordinate-cubes, a procedure familiar from
surface integrals and volume integrals in R3 . (A coordinate cube is a subset of
M which becomes a cube in a suitable coordinate system).
The advantage of this procedure is that one does not have to worry about
subdivisions into cubes, since this is already built in. This disadvantage is that
it is often much more natural to integrate over bounded submanifolds rather
than over chains. (Think of the surface and volume integrals from calculus, for
example.)
Proof. For each point p∈M one can find a continuous function hp so that hp ≡ 0
3.3. INTEGRAL CALCULUS 157
Since each gi and g̃j vanishes outside a coordinate ball, the same additivity gives
XZ X XZ X
gk gi $ = gk g̃j $.
k M k j M k
158 CHAPTER 3. CALCULUS ON MANIFOLDS
P
Since k gk = 1 on D this gives
XZ XZ
gi $ = g̃j $
i M j M
as required. R
Thus the integral M $ is defined whenever $ is locally integrable and vanishes
outside of a compact set.
P
Remark. Partitions of unity 1 = gk as in the lemma can be found with the
gk even of class C ∞ . But this is not required here.
EXERCISES 3.3
1. Verify in detail the assertion in the proof of Theorem 3.3.11 that “the for-
mula reduces to the usual change of variables formula 3.3.1“. Explain how the
orientation on F (R) is defined.
2. Let C be the parabola with equation y = 2x2 .
(a) Prove that C is a submanifold of R2 .
(b) Let ω be the differential 2-form on R2 defined by
ω = 3xydx + y 2 dy.
R
Find U
ω where U is the part of C between the points (0, 0) and (1, 2).
3. Let S be the cylinder in R3 with equation x2 + y 2 = 16.
(a) Prove that S is a submanifold of R3 .
(b) Let ω be the differential 1-form on R3 defined by
(c) Show that the area (see Example 3.3.3) of an open subset U of S is given
by the integral
Z s 2 2
∂f ∂f
+ + 1 dx ∧ dy .
U ∂x ∂y
6. Let S be the hypersurface in Rn with equation F (x1 , · · · , xn ) = 0 where F
is a C ∞ function satisfying (∂F/∂xn )p 6= 0 at all points p on S.
(a) Show that S is an (n−1)-dimensional submanifold of Rn and that x1 , · · · , xn−1
form a coordinate system in a neighbourhood of any point of S.
(b) Show that the Riemann metric ds2 on S obtained by restriction of the
Euclidean metric in Rn in the coordinates x1 , · · · , xn−1 on S is given by gij =
δij + Fn−2 Fi Fj (1 ≤ i, j ≤ n − 1) where Fi = ∂F/∂xi .
(c) Show that the volume (see Example 3.3.3) of an open subset U of S is given
by the integral
r 2 2
∂F ∂F
Z
∂x1 + · · · + ∂xn
∂F
dx1 ∧ · · · ∧ dxn−1 .
U ∂xn
d
exp(tX)p = X(exp(tX)p), exp(tX)p|t=0 = p. (2)
dt
We think of exp(tX) : M · · · → M as a partially defined transformation of M ,
whose domain consists of those p ∈ M for which exp(tX)p exists in virtue of
the above theorem. More precisely, the map (t, p) → M , R × M · · · → M , is a
C ∞ map defined in a neighbourhood of [0] × M in R × M . In the future we
shall not belabor this point: expressions involving exp(tX)p are understood to
be valid whenever defined.
3.4.2 Theorem. (a) For all s, t ∈ R one has
as desired.
The family {exp(tX) : M · · · → M | t ∈ R} of locally defined transformations
on M is called the one parameter group of (local) transformations, or the flow,
generated by the vector field X.
3.4.3 Example. The family of transformations of M = R2 generated by the
vector field
∂ ∂
X = −y +x
∂x ∂y
is the one–parameter family of rotations given by
exp(tX)(x, y) = ((cos t)x − (sin t)y, (sin t)x + (cos t)y).
It is of course defined for all (x, y).
3.4.4 Pull–back of tensor fields. Let F : N → M be a C ∞ map of manifolds.
If f is a function on M we denote by F ∗ f the function on N defined by
F ∗ f (p) = f (F (p))
i.e. F ∗ f = f ◦ F . More generally, if ϕ is a covariant tensor filed on M , say of
type (0, k), then we can consider ϕ a multilinear function on vector fields on M
and define F ∗ ϕ by the rule
F ∗ ϕ(X, Y, · · · ) = ϕ(dF (X), dF (Y ), · · · )
162 CHAPTER 3. CALCULUS ON MANIFOLDS
F ∗ (S + T ) = (F ∗ S) + (F ∗ T ), (6)
F ∗ (S ⊗ T ) = (F ∗ S) ⊗ (F ∗ T ) (7)
for all tensor fields S, T on M . In fact, if we use coordinates (xi ) on M and (y j )
on N , write p = F (q) as xj = xj (y 1 , · · · , y n ), and express tensors as a linear
combination of the tensor products of the dxi , ∂/∂xi and dy a , ∂/∂y b all of this
amounts to the familiar rule
If f is a scalar function and X vector field, then the scalar functions Xf = df (X)
on M and (F ∗ X)(F ∗ f ) satisfy
(G ◦ F )∗ = F ∗ ◦ G∗ (13)
whenever defined. Note the reversal of the order of composition! (The verifica-
tions of all of these rules are left as exercises.)
3.4. LIE DERIVATIVES 163
This is called the Lie derivative of T along X. It is very important to note that
for general tensor fields T the Lie derivative LX T (p) depends not only on the
value of the vector field X at p, as does the directional derivative Xf = df (X)
for scalar functions. (This will be seen from the discussion of special types of
tensor fields below.) We shall momentarily see how to compute it. The following
rules follow immediately from the definition.
3.4.5 Lemma. Let X be a vector field, S, T tensor fields. T hen
a) LX (S + T ) = LX (S) + LX (T )
b) LX (S · T ) = LX (S) · T + S · LX (T )
c) LX (ϕ ∧ ψ) = LX (ϕ) ∧ ψ + ϕ ∧ LX (ψ)
d) F ∗ (LX (T )) = LF ∗ X (F ∗ T )
Explanation. In (a) we assume that S and T are of the same type, so that
the sum is defined. In (b) we use the symbol S · T to denotes any (partial)
contraction with respect to some components. In (c) ϕ and ψ are differential
forms. (d) requires that F be a diffeomorphism so that F ∗ T and F ∗ X are
defined .
Proof. (a) is clear from the definition. (b) follows directly from the definition
(14) of LX T (p) as a limit, just like the product rule for scalar functions, as
follows. Let f (t) = exp(tX)∗ S and g(t) = exp(tX)∗ S. We have
1 1 1
[f (t) · g(t) − f (0) · g(0)] = [f (t) − f (0)] · g(t) + f (0) · [g(t) − g(0)], (15)
t t t
which gives (b) as t → 0. (c) is proved in the same way. To prove (d) it suffices
to consider for T scalar functions, vector fields, and covector fields, since any
tensor is built from these using sums and tensor products, to which the rules
(a) and (b) apply. The details are left as an exercise.
Remark. The only property of the “product“ S · T needed to prove a product
rule using (15) is its R–bilinearity.
3.4.6 Lie derivative of scalar functions. Let f be a scalar function. From
(13) we get
d d
exp(tX)∗ f (p)
LX f (p) = = f (exp(tX)p) = dfp (X(p)) = Xf (p),
dt t=0 dt t=0
164 CHAPTER 3. CALCULUS ON MANIFOLDS
This is easily computed with the help of the formula (2), which says that
X tk
exp(tX)∗ f ∼ X k f. (17)
k!
One finds that
[exp(tX)∗ ◦ Y ◦ exp(−tX)∗ ]f
= [1 + tX + · · · ]Y [1 − tX + · · · ]f
= [Y + t(XY − Y X) + · · · ]f
Thus (16) gives the basic formula
is a first–order differential operator, so the left side of (18) has order one as
differential operator, while the right side appears to have order two. The
explanation comes from the following lemma.
3.4.8 Lemma. Let X, Y be two vector fields on M . There is a unique vector
field, denoted [X, Y ], so that [X, Y ] = XY − Y X as operators on functions
defined on open subsets of M .
Proof. Write locally in coordinates x1 , x2 , · · · , xn on M :
X ∂ X ∂
X= Xk k
,Y = Yk .
∂x ∂xk
k k
By aPsimple computation using the symmetry of second partials one sees that
Z = k Z k ∂x∂ k satisfies Zf = XY f − Y Xf for all analytic functions f on the
coordinate domain if and only if
X ∂Y k ∂X k
Zk = Xj j
−Yj . (19)
j
∂x ∂xj
3.4. LIE DERIVATIVES 165
This formula defines [X, Y ] on the coordinate domain. Because of the unique-
ness, the Z’s defined on the coordinate domains of two coordinate systems agree
on the intersection, from which one sees that Z is defined globally on the whole
manifold (assuming X and Y are).
The bracket operation (19) on vector fields is called the Lie bracket. Using this
the formula (18) becomes
LX Y = [X, Y ]. (20)
The Lie bracket [X, Y ] is evidently bilinear in X, and Y and skew–symmetric:
X, Y ∈ L ⇒ aX + bY ∈ L (for all a, b ∈ R)
X, Y ∈ L ⇒ [X, Y ] ∈ L.
The derivative at t = 0 of the right side can again be evaluated as a limit using
(15) if we write it as a symbolic product f (t) · g(t) with f (t) = exp(tX)∗ Y and
g(t) = exp(tX)∗ ϕ. This gives
LX (iY ϕ) = i[X,Y ] ϕ + iY LX ϕ,
j j
∂ i ∂θ i j ∂Y
Xθ(Y ) = X i (θ j j
Y ) = X Y j
+ X θ
∂xi ∂xi ∂xi
Interchanging X and Y and subtracting gives
∂θj i j j i j
i ∂Y
j
i ∂X
j
Xθ(Y )−Y θ(X) = (X Y −X Y )+θ X −Y = dθ(X, Y )+θ([X, Y ])
∂xi ∂xi ∂xi
as desired.
The following formula is often useful.
3.4.13 Lemma (Cartan’s Homotopy Formula). For any vector field X,
LX = d ◦ iX + iX ◦ d (23)
as operators on forms.
Proof. We first verify that the operators A on both sides of this relation satisfy
and similarly
Adding the last two equations one gets the relation (24) for A = d ◦ iX + iX ◦ d.
Since any form can be built form scalar functions f and 1–forms θ using sums
and wedge products one sees from (24) that it suffices to show that (23) is true
for these. For a scalar function f we have:
LX f = df (X) = iX df
(x,y)
θ
Let l = AB be the length of the tie rod (if that is the name of the thing connecting
the front and rear axles). Then C D =l too since the tie rod does not change
length (in non−relativistic mechanics). It is readily seen that C E =l +o(h),
and since DE =hsin θ+o(h), the angle BC D (which is the increment in φ) is
(hsin θ)/l )+o(h) while θ remains the same. Let us choose units so that l = 1.
Then
∂ ∂ ∂
Drive = cos(φ + θ) + sin(φ + θ) + sin θ . (28)
∂x ∂y ∂φ
By (27) and (28 )
∂ ∂ ∂
[Steer, Drive] = − sin(φ + θ) + cos(φ + θ) + cos(θ) . (29)
∂x ∂y ∂φ
Let
∂ ∂
Slide = − sin φ + cos φ ,
∂x ∂y
∂
Rotate = .
∂φ
Then the Lie bracket of Steer and Drive is equal to Slide+ Rotate on θ = 0, and
generates a flow which is the simultaneous action of sliding and rotating. This
motion is just what is needed to get out of a tight parking spot. By formula (26)
this motion may be approximated arbitrarily closely, even with the restrictions
−θmax < θ < θmax with θmax arbitrarily small, in the following way: steer,
drive reverse steer, reverse drive, steer, drive reverse steer , · · ·. What makes the
process so laborious is the square roots in (26).
Let us denote the Lie bracket (29) of Steer and Drive by Wriggle. Then further
simple computations show that we have the commutation relations
(30)
and the commutator of Slide with Steer, Drive and Wriggle is zero. Thus the
four vector fields span a four-dimensional Lie algebra over R.
To get out of an extremely tight parking spot, Wriggle is insufficient because it
may produce too much rotation. The last commutation relation shows, however,
that one may get out of an arbitrarily tight parking spot in the following way:
wriggle, drive, reverse wriggle, (this requires a cool head), reverse drive, wriggle,
drive, · · · .
EXERCISES 3.4
1. Prove (8).
2. Prove (9).
3. Prove (11).
4. Prove (13).
5. Prove the formula for exp(tX) in example 3.4.3.
6. Fix a non–zero vector v ∈ R3 and let X be the vector field on R3 defined by
X(p) = v × p (cross-product).
exp(tX)e3 = e3 .
where c = kvk. [For the purpose of this problem, an ordered orthonormal basis
(e1 , e2 , e3 ) is defined to be right-handed if it satisfies the usual cross–product
relation given by the “right-hand rule“, i.e.
e1 × e2 = e3 , e2 × e3 = e1 , e3 × e1 = e2 .
LX (vol) = (divX)vol
∂P ∂Q ∂R
where divX = ∂x + ∂y + ∂z as usual.
18. Prove the following formula for the exterior derivative dϕ of a (k − 1)–form
ϕ. For any k vector fields X1 , · · · , Xk ,
k
X
dϕ(X1 , · · · , Xk ) = (−1)j+1 Xj ϕ(X1 , · · · , X
b j · · · , Xk )
j=1
(−1)i+j ϕ([Xi , Xj ], X1 , · · · , X
P bi , · · · , X
b j , · · · Xk )
+ i<j
where the terms with a hat are to be omitted. The term Xj ϕ(X1 , · · · , X b j · · · , Xk )
is the differential operator Xj applied to the scalar function ϕ(X1 , · · · , X
bj · · · , Xk ).
[Suggestion. Use induction on k. Calculate the Lie derivative Xϕ(X1 , · · · , Xk )
of ϕ(X1 , · · · , Xk ) by the product rule. Use LX = diX + iX d.]
3.4. LIE DERIVATIVES 171
F ∗$ =
R
[Suggestion.
R Show first that for any diffeomorphism F one has S
F (S)
$.]
24. Prove (25).
25. Verify the bracket relation (29).
26. Verify the bracket relations (30).
27. In his Leçons of 1925-26 Élie Cartan introduces the exterior derivative on
differential forms this way (after discussing the formulas of Green, Stokes, and
Gauss in 3-space). The operation which produces all of theseR formulas can be
given in a very simple form. Take the case of a line integral $ over a closed
curve C. Let S a piece of a 2–surface (in a space of n dimensions) bounded by
C. Introduce on S two interchangeable differentiation symbols d1 and d2 and
partition S into the corresponding family of infinitely small parallelograms. If p
is the vertex of on of these parallelograms (Fig. 2) and if p1 , p2 are the vertices
obtained from p by the operations p1 and p2 , then
p3
p2
p1
p Fig. 2.
R p1 R p2
p
$ = $(d1 ), $ = $(d2 ),
R p3 R p1 R pp2 R p3
p1
$ = p $ + d1 p $ = $(d2 ) + d1 $(d2 ), p2
$ = $(d1 ) + d2 $(d1 ).
R
Hence the integral $ over the boundary of the parallelogram equals
$(d1 ) + [$(d2 ) + d1 $(d2 )] − [$(d1 ) + d2 $(d1 ) − $(d2 )
= d2 $(d1 ) − d2 $(d1 ).
The last expression is the exterior derivative d$.
(a)Rewrite Cartan’s discussion in our language. [Suggestion. Take d1 and
d2 to be the two vector fields ∂/∂u1 , ∂/∂u2 on the 2-surface S tangent to a
parametrization p = p(u1 , u2 ).]
(b)Write a formula generalizing Cartan’s d$ = d2 $(d1 ) − d2 $(d1 ) in case
d1 , d2 are not necessarily interchangeable. [Explain what this means.]
Chapter 4
Special Topics
∇ dp
= 0. (1)
ds ds
GR3. In spacetime regions free of matter the metric g satisfies the field equations
Ric[g] = 0. (2)
Discussion. (1) The axiom GR1 is not peculiar to GR, but is at the basis
of virtually all physical theories since time immemorial. This does not mean
that it is cast in stone, but it is hard to see how mathematics as we know it
could be applied to physics without it . Newton (and everybody else before
Einstein) made further assumptions on how “space-time coordinates “ are to
be defined (“inertial coordinates for absolute space and absolute time”). What
is especially striking in Einstein’s theory is that the four-dimensional manifold
spacetime is not further separated into a direct product of a three-dimensional
manifold “space“ and a one-dimensional manifold “time“, and that there are no
further restrictions on the space-time coordinates beyond the general manifold
axioms.
(2) In the tangent space Tp M at any point p ∈ M one has a light cone, consisting
of null-vectors (ds2 = 0); it separates the vectors into timelike (ds2 < 0) and
spacelike (ds2 > 0). The set of timelike vectors at p consist of two connected
components, one of which is assumed to be designated as forward in a manner
173
174 CHAPTER 4. SPECIAL TOPICS
varying continuously with p. The world lines of all material objects are assumed
to have a forward, timelike direction.
(3 )In general the parametrization p = p(t) of a world line is immaterial.
For a geodesic however the parameter t is determined up to t → at + b with
a, b =constant. In GR2 the parameter s is normalized so that g(dp/ds, dp/ds) =
−1 and is then unique up to s → s + so ; it is called proper time along the world
line. (It corresponds to parametrization by arclength for a positive definite
metric.)
(4) Relative to a coordinate system (x0 , x1 , x2 , x3 ) on spacetime M the equations
(1) and (2) above read follows.
d2 xk dxi dxj
2
+ Γkij = 0, (10 )
ds ds ds
∂Γkql ∂Γkqk
− + − Γkpk Γpql + Γkpl Γpqk = 0, (20 )
∂xk ∂xl
where
1 li
Γljk = g (gki,j + gij,k − gjk,i ). (3)
2
Equation (1), or equivalently (10 ), takes the place of Newton’s second law
d2 xα 1 ∂ϕ
2
+ = 0. (4)
dt m ∂xα
(We keep the convention that the index α runs over α = 1, 2, 3 only.) Thus one
can think of the metric g as a sort of gravitational potential in spacetime.
he field equations. We wish to understand the relation of the field equations
(2) in Einstein’s theory to the field equation (6) of Newton’s theory. For this
we consider the mathematical description of the same physical situation in the
two theories, namely the world lines of a collection of objects (say stars) in a
given gravitational field. It will suffice to consider a one-parameter family of
objects, say p = p(r, s), where r labels the object and s is the parameter along
its world-line.
4.1. GENERAL RELATIVITY 175
Note that the left-hand side vanishes for j = 0, hence F0i = 0 for all i.
(b) In Newton’s theory we choose a Newtonian inertial frame (x0 , x1 , x2 , x3 ) so
that the equation of motion (4) reads
∂ 2 xα 1 ∂ϕ
=− (4)
∂t2 m ∂xα
If we introduce a new parameter s by s = ct + to this becomes
∂ 2 xα 1 ∂ϕ
c2 =− .
∂s2 m ∂xα
By differentiation with respect to r we find
3
∂ 2 ∂xα 1 X ∂ 2 ϕ ∂xβ
= − (10)
∂s2 ∂r mc2 ∂xβ ∂xα ∂r
β=1
176 CHAPTER 4. SPECIAL TOPICS
(c) Compare (8) and (10). The xi = xi (r, s) in (8) refer the solutions of Ein-
stein’s equation of motion (7), the xα = xα (r, s) in (10) to solutions of Newton’s
equations of motion (4). Assume now that the local inertial frame for ∇ in (8)
is the same as the Newtonian inertial frame in (10) and assume further that
derivatives on the left-hand side of (8) and (10) agree at p0 . Then we find that
3 3
X ∂xj 1 X ∂ 2 ϕ ∂xβ
Fjα =− 2 (at p0 ) (11)
j=0
∂r mc ∂xβ ∂xα ∂r
β=1
Newton’s field equations (5) say that the right-hand side of (17) is zero. Since
∂/∂x0 is an arbitrary timelike vector at po we find that Ric(g) = 0 at p0 .
In summary, the situation is this. As a manifold (no metric or special coordi-
nates), spacetime is the same in Einstein’s and in Newton’s theory. In Einstein’s
theory, write xi = xi (g; r, s) for a one-parameter family of solutions of the equa-
tions of motion (1) corresponding to a metric g written in a local inertial frame
for ∇ at po . In Newton’s theory write xα = xα (ϕ; r, t) for a one parameter family
of solutions of the equations of motion (4) corresponding to a static potential ϕ
written in a Newtonian inertial frame. Assume the two inertial frames represent
the same coordinate system on spacetime and require that the derivatives on
the left-hand sides of (8) and (10) agree at po . Then the relation (17) between
g and ϕ must hold. In particular, if ϕ satisfies Laplace’s equation (5) at po then
g satisfies Einstein’s field equation (2) at po .
Remark. In Newton’s theory, Laplace’s equation ∆ϕ = 0 gets replaced by
Poisson’s equation ∆ϕ = r in the presence of matter, where r depends on the
matter. In Einstein’s theory the equation Ric(g) = 0 gets analogously replaced
by Ric(g) = T , where T is a 2–tensor, called the energy-momentum tensor. But
this tensor is not an exact, fundamental representation of matter, only a rough,
macroscopic approximation. Einstein once put it something like this. “The field
equations Ric(g) = T have the aspect of an edifice whose left wing is constructed
from marble and whose right wing is constructed from inferior lumber.“
4.1. GENERAL RELATIVITY 177
Thus the observer e would say the object w moves through a space–displacement
d during a time–duration τ relative to himself. Hence e would consider d/τ as
the relative space–velocity of w. For a light ray one has w2 = 0, so −τ 2 + d2 = 0
and d2 /τ 2 = 1, i.e. the observer e is using units so that velocity of light is 1.
Now suppose we have another observer e0 ∈ Tp M at p, again normalized to
(e0 , e0 ) = −1. Then e0 will split w as
w = τ 0 e0 + d0 .
The map (τ, d) → (τ 0 , d0 ) defined by the condition τ e+d = τ 0 e0 +d0 is the Lorentz
transformation which relates the (infinitesimal) space–time displacements w as
observed by e and e0 . (The term Lorentz transformation is also applied to any
linear transformation of W which preserves the inner product.) It is easy to
find a formula for it. Write
e0 = ae + av
so that v is the space–velocity of e0 relative to e. Taking inner products of this
equation gives
−1 = a2 (−1 + v 2 )
where v 2 = (v, v). So
a2 = (1 − v 2 )−1 .
178 CHAPTER 4. SPECIAL TOPICS
d2 = (d0 )2 − (v, e0 )2 .
Since d, v are both orthogonal to e in the same 2–plane, d/|d| = v/|v|. Substi-
tuting d in the previous equation we get
d2
d2 = (d0 )2 − (v, e0 )2 .
v2
Since e0 = ae + av we find d2 = (d0 )2 − d2 a2 v 2 , or
This gives the desired relation between the relative lengths of the stick:
p
l = l0 1 − v 2 .
Thus l ≤ l0 and this is known as the Lorentz contraction. From a purely math-
ematical point of view all of this is elementary vector algebra, but its physical
interpretation is startling.
4.2. THE SCHWARZSCHILD METRIC 179
(d) Let S = S 1 be the unit circle, ds2S the standard metric on S = S 2 , and ds2C
the induced metric on C. In a neighbourhood of po there is a local diffeomor-
phism C × S → R3 so that
The first three statements are again geometrically obvious. For the last one
recall the expression for the Euclidean metric in cylindrical coordinates:
This example shows that the same group G can be realized as a transformation
group on different spaces M : we say G acts on M and we write a · p for the
action of the transformation a ∈ G on the point p ∈ M . The orbit a point
p ∈ M by G is the set
G · p = {a · p | a ∈ G}
of all transforms of p by elements of G. In the example G =SO(3), M = R3 the
orbits are the spheres of radius ρ > 0 together with the origin {0}.
4.2. 2 Lemma. Let G be a group acting on a space M . Then M is the
disjoint union of the orbits of G.
Proof . Let G · p and G · q be two orbits. We have to show that they are either
disjoint or identical. So suppose they have a point in common, say b · p = c · q
for some b, c ∈ G. Then p = b−1 c · q ∈ G · q, hence a · p = ab−1 c · q ∈ G · q for
any a ∈ G, i.e. G · p ⊂ G · q. Similarly G · q ⊂ G · p, hence G · p = G · q.
From now on M will be space–time, a 4–dimensional manifold with a Riemann
metric g of signature (−, +, +, +).
4.2. 3 Definition. The metric g is spherically symmetric if M admits an action
of SO(3) by isometries of g so that every orbit of SO(3) in M is isometric to
a sphere in R3 of some radius r > 0 by an isometry preserving the action of
SO(3).
Remarks. a) We exclude the possibility r = 0, i.e. the spheres cannot de-
generate to points. Let us momentarily omit this restriction to explain where
the definition comes from. The centre of symmetry in our space–time should
consist of a world line L (e.g. the world line of the centre of the sun) where
r = 0. Consider the group G of isometries of the metric g which leaves L
pointwise fixed. For a general metric g this group will consist of the identity
transformation only. In any case, if we fix a point p on L, then G acts on the
3–dimensional space of tangent vectors at p orthogonal to L, so we can think
of L as a subgroup of SO(3). If it is all of SO(3) (for all points p ∈ L) then
we have spherical symmetry as in the definition in the region off L. However
in the definition, we do not postulate the existence of such a world line (and in
fact explicitly exclude it from consideration by the condition r > 0), since the
metric (gravitational field) might not be defined at the centre, in analogy with
Newton’s theory.
182 CHAPTER 4. SPECIAL TOPICS
C(p0 )
S(p0 )
Fig.2
the action of I(po ) on M looks locally like its linear action on Tpo M (*)
In may help to keep this in mind for the proof of the following lemma.
4.2. 5 Lemma. C intersects each sphere S(q), q ∈ C, orthogonally.
Proof. Consider the action of the subgroup I(po ) of SO(3) fixing po . Since
the action of SO(3) on the orbit S(po ) is equivalent to its action an ordinary
sphere in Euclidean 3-space, the group I(po ) is just the rotation group SO(2)
2
in the 2–dimensional plane Tpo S(pT o ) ≈ R . It maps C(po ) into itself as well as
each orbit S(q), hence also C(po ) S(q) = {q}, i.e. I(po ) fixes all q ∈ C(po ),
0
i.e. I(po ) ⊂ I(q). Thus C(po ) is contained in T the set C (po ) ofTpoints fixed
0 0
by I(po ). Thus C (po ) ⊂ C(po ) and C (po ) S(po ) = C(po ) S(po ) since
both reduce to {po } near po (which is the only fixed point of I(po ) near po .).
It follows that C 0 (po ) = C(po ). But I(po ) ⊂ I(q) implies C 0 (po ) = C 0 (q),
hence C(po ) ⊃ C(q). Interchanging po and q in this argument we find also
C(q) ⊂ C(po ). So C(po ) = C(q) and intersects S(q) orthogonally.
In summary, the situation is now the following. Under local diffeomorphism
C × S → M the metric ds2 on M decomposes as an orthogonal sum
ds2 = −A−2 (τ, ρ)dτ 2 + B 2 (τ, ρ)dρ2 + r2 (τ, ρ)(dφ2 + sin2 φdθ2 ) (1)
184 CHAPTER 4. SPECIAL TOPICS
for some strictly positive functions A(τ, ρ), B(τ, ρ). We record the result as a
theorem.
4.2. 6 Theorem. Any spherically symmetric metric is of the form (1) in
suitable coordinates τ, ρ, φ, θ.
So far we have not used Einstein’s field equations, just spherical symmetry.
The field equations in empty space say that Ric[g] = 0. For the metric (1) this
amounts to the following equations (as one can check by some unpleasant but
straightforward computations). Here 0 denotes ∂/∂ρ and · denotes ∂/∂τ .
2B r·0 B · r0 r· A0
− + =0 (2)
A r Br rA
1 2 r 0 0 r 0 2
2B r
· · · 2
2 r
+ − − 3 + 2A + A =0 (3)
r2 B Br Br Br r
1 r · · r · 2 2 r0 A0 r0 2
2
+ 2A A + 3 A + − =0 (4)
r2 r r B 2 rA Br
1 A0 2 B · · r · · B · 2 r· 2 1 A0 2 2 r0 A0
− −A A −2A A −A2 −2A2 + 2 − 2 =0
B AB B r B r B A B rA
(5)
One has to distinguish three cases, according to the nature of the variable r =
r(p), the radius of the sphere S(p).
(a) r is a space-like variable, i.e. g(gradr,gradr) > 0,
(b) r is a time-like variable, i.e. g(gradr,gradr) < 0,
(c) r is a null variable, i.e. g(gradr,gradr) ≡ 0.
Here gradr = (g ij ∂r/∂xj )(∂/∂xi ) is the vector-field which corresponds to the
covector field dr by the Riemann metric g. It is orthogonal to the 3–dimensional
hypersurfaces r =constant. It is understood that we consider the cases where
one of these conditions (a)–(c) holds identically on some open set.
We first dispose of the exceptional case (c). So assume g(gradr,gradr) ≡ 0. This
means that −(Ar· )2 + (B −1 r0 )2 = 0 i.e.
r0
= Ar· (6)
B
up to sign, which may be adjusted by replacing the coordinate τ by −τ , if
necessary. But then one finds that r·0 determined by (2) is inconsistent with
(3), so this case is excluded.
Now consider the case (a) when g(gradr,gradr) > 0. In this case we can take
ρ = r as ρ-coordinate on C. Then r· = 0 and r0 = 1. The equations (2)–(4)
now simplify as follows.
B· = 0 (20 )
1 2 1 0 1 2
+ − − 3 =0 (30 )
r2 B Br Br
1 2 1A0 1 2
+ − =0 (40 )
r2 B 2 rA Br
4.2. THE SCHWARZSCHILD METRIC 185
1 A0 2 1 A0 2 2 A0
− + 2 − 2 =0 (50 )
B AB B A B rA
Equation (20 ) shows that B · = 0, i.e. B = B(r). Equation (40 ) differentiated
with respect to τ shows that (A· /A)0 = 0, i.e. (logA)·0 = 0. Hence A =
Ã(r)F (τ ). Now replace τ by t = t(τ ) so that dt = dτ /F (τ ) and then drop the
∼ ; we get A = A(r). Equation (30 ) simplifies to
Remark. The discussion above does not accurately reflect the historical devel-
opment. Schwarzschild assumed from the outset that the metric is of the form
(1) with A and B independent of t, so Birkhoff’s theorem was not immediate
from Schwarzschild’s result. The definition of spherical symmetry used here
came later.
The Schwarzschild metric (7) can be written in many other ways by introducing
other coordinates τ, ρ instead of t, r (there is no point in changing the coordinates
φ, θ on the sphere). For example, one can write the metric (7) in the form
2m
ds2 = − 1 − dvdw + r2 (dφ2 + sin2 φdθ2 ). (8)
r
The coordinates v, w are related to Schwarzschild’s t, r by the equations
v = t + r∗ , w = t − r∗
with Z
dr
r∗ = = r + 2mlog(r − 2m).
1 − 2m/r
In terms of these v, w, Schwarzschild’s r is determined by
1
(v − w) = r + 2mlog(r − 2m).
2
Another possibility due to Kruskal (1960) is
16m2 e−r/2m
ds2 = (−dt̃2 + dx̃2 ) + r2 (dφ2 + sin2 φdθ2 ). (9)
r
The coordinates t̃, x̃ are defined by
1 v/2 1 v/2
t̃ = e + e−w/2 , x̃ = e − e−w/2
2 2
and r must satisfy
t̃2 − x̃2 = −(r − 2m)er/2m . (10)
These coordinates t̃, x̃ must be restricted by
1. Generalize the assertion “It follows that C 0 (po ) = C(po ).“ in the proof of
Lemma 4.2.5 to a more general situation where S(po ), C(po ) and C 0 (po ) are
replaced by three submanifolds S, C, C 0 of an arbitrary manifold M satisfying
suitable hypotheses. (State carefully the hypothesis required.)
188 CHAPTER 4. SPECIAL TOPICS
kapk = kpk
kT (q) − T (p)k = kq − pk
T (p) = ap + b
v1 × v2 = v3 , v3 × v1 = v2 , v2 × v3 = v1 .
v1 · (v2 × v3 ) = det[v1 , v2 , v3 ]
which can in fact be used to define the cross–product. We need some facts
about the matrix exponential function.
4.3.2 Theorem. The series
∞
X 1 k
exp X := X
k!
k=0
converges for any X ∈ Mn (R) and defines a C ∞ map exp : Mn (R) → Mn (R)
which maps an open neighbourhood U0 of 0 diffeomorphically onto an open
neighbourhood U1 of 1.
Proof. We have termwise
∞ ∞
X 1 X 1
kX k k ≤ kXkk . (1)
k! k!
k=0 k=0
It follows that the series for exp X converges in norm for all X and defines a
C ∞ function. We want to apply the Inverse Function Theorem to X → exp X.
From the series we see that
exp X = 1 + X + o(X)
190 CHAPTER 4. SPECIAL TOPICS
and this implies that the differential of exp at 0 is the identity map X → X.
4.3.3 Notation. We write a → log a for the local inverse of X → exp X. It is
defined in a ∈ U1 . It can in fact be written as the log series
∞
X (−1)k−1
log a = (a − 1)k
k
k=1
d
exp tX = X (exp tX) = (exp tX)X
dt
b) If XY = Y X, then
c) For all X,
exp X exp(−X) = 1.
d) If a(t) ∈ Mn (R) is a differentiable function of t ∈ R satisfying ȧ(t) = Xa(t)
then a(t) = exp(tX)a(0).
Proof. a)Compute:
∞ ∞ ∞
d d X tk k X ktk−1 k X tk k+1
exp tX = X = X = X
dt dt k! k! k!
k=0 k=1 k=0
∞ k
X t
=X X k = X (exp tX) = (exp tX)X.
k!
k=0
d d d
(exp(−tX))a(t) = exp(−tX) a(t)) + (exp(−tX)) a(t))
dt dt dt
= (exp(−tX))(−X)a(t) + (exp(−tX))(Xa(t)) ≡ 0
4.3. THE ROTATION GROUP SO(3) 191
Note that so(3) is a 3–dimensional vector space. For example, the following
three matrices form a basis.
0 0 0 0 0 1 0 −1 0
E1 = 0 0 −1 , E2 = 0 0 0 , E3 = 1 0 0 .
0 1 0 −1 0 0 0 0 0
au1 = cos αu1 + sin αu2 , au2 = − sin αu1 + cos αu2 , au3 = u3 .
Xv = u × v (4.2)
for all v ∈ R3 .
Proof. The eigenvalues λ of a matrix a satisfying a∗ a = 1 satisfy |λ| = 1.
For a real 3 × 3 matrix one eigenvalue will be real and the other two complex
conjugates. If in addition det a = 1, then the eigenvalues will have to be of the
form e±iα , 1. The eigenvectors can be chosen to of the form u2 ± iu1 , u3 where
u1 , u2 , u3 are real. If the eigenvalues are distinct these vectors are automati-
cally orthogonal and otherwise may be so chosen. They may assumed to be
normalized. The first relations then follow from
T T
We now set V0 =so(3) U0 and V1 =SO(3) U1 and assume U0 chosen so that
X ∈ U0 ⇒ X ∗ ∈ U0 .
4.3.7 Corollary. exp maps so(3) onto SO(3) and gives a bijection from V0
onto V1 .
Proof. The first assertion follows from the theorem. If exp X ∈SO(3), then
a = ao exp X, a ∈ ao V1 , X ∈ V0
where
a3 (θ) = exp(θE3 ), a2 (φ) = exp(φE2 )
and 0 ≤ θ, ψ < 2π, 0 ≤ φ ≤ π. Furthermore, (θ, φ, ψ) is unique as long as
φ 6= 0, π.
Proof. Consider the rotation-action of SO(3) on sphere S 2 . The geographical
coordinates (θ, φ) satisfy
p = a3 (θ)a2 (φ)e3 .
Thus for any a ∈O(3) one has an equation
for some (θ, φ) subject to the above inequalities and unique as long as φ 6= 0.
This equation implies that a = a3 (θ)a2 (φ)b for some b ∈SO(3) with be3 = e3 .
194 CHAPTER 4. SPECIAL TOPICS
θ, φ, ψ are the Euler angles of the element a = a3 (θ)a2 (φ)a3 (ψ). They form a
coordinate system on SO(3) with domain consisting of those a’s whose Euler
angles satisfy 0 < θ, ψ < 2π, 0 < φ < π (strict inequalities). To prove this it
suffices to show that the map R3 → M3 (R), (θ, φ, ψ) → a3 (θ)a2 (φ)a3 (ψ), has
an injective differential for these (θ, φ, ψ). The partials of a = a3 (θ)a2 (φ)a3 (ψ)
are given by
∂a
a−1 = − sin φ cos ψE1 − sin φ sin ψE2 + cos φE3
∂θ
∂a
a−1 = sin ψE1 + cos ψE2
∂φ
∂a
a−1 = E3 .
∂ψ
The matrix of coefficients of E1 , E2 , E3 on the right has determinant− sin φ,
hence the three elements of so(3) given by these equations are linearly indepen-
dent as long as sin φ 6= 0. This proves the desired injectivity of the differential
on the domain in question.
EXERCISES 4.3
1. Fix po ∈ S 2 . Define F :SO(3) → S 2 by F (a) = apo . Prove that F is a
surjective submersion.
2. Identify T S 2 = {(p, v) ∈ S 2 × R3 : p ∈ S 2 , v ∈ Tp S 2 ⊂ R3 }. The circle
bundle over S 2 is the subset
S = {(p, v) ∈ T S 2 : kvk = 1}
of T S 2 .
a) Show that S is a submanifold of T S 2 .
b) Fix (po , vo ) ∈ S. Show that the map F :SO(3) → S, a → (apo , avo ) is a
diffeomorphism of SO(3) onto S.
3. a) For u ∈ R3 , let Xu be the linear transformation of R3 given by Xu (v) :=
u × v (cross product).
Show that Xu ∈so(3) and that u → Xu is a linear isomorphism R3 →so(3).
b) Show that Xau = aXu a−1 for any u ∈ R3 and a ∈SO(3).
c) Show that exp Xu is the right–handed rotation about u with angle kuk.
[Suggestion. Use a right–handed o.n. basis u1 , u2 , u3 with u3 = u/kuk, assuming
u 6= 0.]
d) Show that u → exp Xu maps the closed ball {kuk ≤ π} onto SO(3), is one–to–
one except that it maps antipodal points ±u on the boundary sphere {kuk = 1}
into the same point in SO(3). [Suggestion. Argue geometrically, using (c).]
4. Prove the formulas for the partials of a = a3 (θ)a2 (φ)a3 (ψ). [Suggestion. Use
the product rule on matrix products and the differentiation rule for exp(tX).]
4.3. THE ROTATION GROUP SO(3) 195
(2)0 = d(dei ) = d($ij ej ) = (d$ij )ej − $ij ∧ (dej ) = (d$ij )ej − $ij ∧ $jl el =
j
(d$i − $ik ∧ $ki )ei .
(3)0 = d(ei , ej ) = (dei , ej ) + (ei , dej ) = ($ik ek , ej ) + (ei , $jk ek ) = $ij + $ki .
From now on S is a surface in R3 .
4.4.3 Interpretation. The forms $i and $ij have the following interpretations.
a) For any v ∈ Tp S we have dp(v) = v considered as vector in Tp R3 . Thus
This shows that $1 , $2 are just the components of a general tangent vector
v ∈ Tp S to S with respect to the basis e1 , e2 of Tp S. We shall call the $i the
component forms of the frame. We note that $1 , $2 depend only on e1 , e2 .
b) For any vector field X = X i ei along S in R3 and any tangent vector v ∈ Tp S
the (componentwise) directional derivative Dv X (=covariant derivative in R3 )
is
Convention. Greek indices α, β, · · · run only over {1, 2}, Roman indices i, j, k
run over {1, 2, 3}.
(This is not important in the above equation, since a vector field X on S has
normal component X 3 = 0 anyway.) In particular we have
This shows that the $βα determine the connection ∇ on S. We shall call them
the connection forms for the frame.
4.4.4 Lemma. a) The Darboux frame (e1 , e2 , e3 ) satisfies the equations of
motion
dp = $1 e1 + $2 e2 + 0
de1 = 0 + $12 e2 + $13 e3
de2 = −$21 e1 + 0 + $23 e3
de3 = −$13 e1 − $23 e2 + 0
b) The forms $i , $ij satisfy Cartan’s structural equations
d$1 = −$2 ∧ $12 , d$2 = $1 ∧ $12
$1 ∧ $13 + $2 $23 = 0
d$12 = −$13 ∧ $23 .
Proof. a) In the first equation, $3 = 0, because dp/dt = dp(ṗ) is tangential to
S.
In the remaining three equation $ij = −$ji by (CS3).
b) This follows from (CS1 –3) together with the relation $3 = 0
4.4.5 Proposition. The Gauss curvature K satisfies
Proof. We use N = e3 as unit normal for S. Recall that the Gauss map
N : S → S 2 satisfies
N ∗ (areaS 2 ) = K(area).
4.4. CARTAN’S MOBILE FRAME 199
This gives
Hence
$13 ∧ $23 = K$1 ∧ $2 .
From Cartan’s structural equation d$12 = −$13 ∧ $23 we therefore get
d$12 = −K$1 ∧ $2 .
where X, θ, e1 , e2 are all considered functions of t. The angle θ(t) can be thought
of the polar angle of X(t) relative to the vector e1 (p(t)). It is defined only up to
a constant multiple of 2π. To say something about it, we consider the derivative
of the scalar product (X, ei ) at values of t where p(t) is differentiable. We have
d ∇X ∇ei
(X, ei ) = ( , ei ) + (X, ).
dt dt dt
This gives
d d
(cos θ) = 0 + (sin θ) $12 (ṗ), (sin θ) = 0 − (cos θ) $12 (ṗ)
dt dt
hence
θ̇ = −$12 (ṗ).
Using this relation together with (4) and Stokes’ Theorem we find that the total
variation of θ around C, given by
Z t1
∆C θ := θ̇(t)dt (*)
t0
200 CHAPTER 4. SPECIAL TOPICS
is Z Z
∆C θ = −$12 = K$1 ∧ $2
C D
This relation holds provided D and C are oriented compatibly. We now assume
fixed an orientation on S and write the area element $1 ∧ $2 as dS. With this
notation, Z
∆C θ = K dS. (**)
D
$12 = λ1 $1 + λ2 $2
which give
d$1 = λ1 $1 ∧ $2 , d$2 = λ2 $1 ∧ $2 . (5)
and find
−K$1 ∧ $2 = d$12 = d(λ1 ∧ $1 + λ2 ∧ $2 ). (6)
We write the equations (5) symbolically as
d$1 d$2
λ1 = λ2 =
$1∧ $2 $1 ∧ $2
(The “quotients“ denote functions λ1 , λ2 satisfying (5).) Then (6) becomes
d$1 d$2
−K$1 ∧ $2 = d $ 1
+ $ 2
. (7)
$1 ∧ $2 $1 ∧ $2
4.4. CARTAN’S MOBILE FRAME 201
∂p ∂p
e1 = A−1 , e2 = B −1 .
∂s ∂t
Then
$1 = Ads, $2 = Bdt.
Hence
where the subscripts indicate partial derivatives. The formula (7) becomes
A Bs A B
t t s
−KABds ∧ dt = d − Ads + Bdt = + ds ∧ dt
AB AB B t A s
Thus
1 At B
s
K=− + (8)
AB B t A s
A footnote. The theory of the mobile frame (repère mobile) on general Rie-
mannian manifolds is an invention of Élie Cartan., expounded in his wonderful
book Leçons sur la géométrie des espaces de Riemann. An adaptation to mod-
ern tastes can be found in th book by his son Henri Cartan entitled Formes
diff érentielles.
202 CHAPTER 4. SPECIAL TOPICS
EXERCISES 4.4
The following problems refer to a surface S in R3 and use the notation explained
above.
1. Show that the second fundamental form of S is given by the formula
Φ = $1 $13 + $2 $23 .
[Suggestion. First write $23 = d$1 + c$2 . To show that d = b, show first that
$1 ∧ $2 ∧ $23 = 0.]
b) Show that the normal curvature in the direction of a unit vector u ∈ Tp S
which makes an angle θ with e1 is
Introduction
produces the desired correction. (2)The Dirac field equations for ψ together with
Maxwell’s equations for the four potentials Aα for the electromagnetic field have
an invariance property formally identical to the one I called gauge invariance
in my theory of gravitation of 1918; the field equations remain invariant one
replaces simultaneously
∂λ
ψ by eiλ ψ and Aα by Aα − ∂x α
where λ denotes an arbitrary function of the place in four dimensional spacetime.
A factor e/c~ is incorporated into Aα (−e is the charge of the electron, c is the
velocity of light, and ~ is Planck’s constant divided by 2π). The relation of
“gauge invariance” to conservation of charge remains unchanged as well. But
there is one essential difference, crucial for agreement with empirical data in
that the exponent of the factor multiplying ψ is not real but purely imaginary.
ψ now takes on the role played by the ds of Einstein in that old gauge theory.
It appears to me that this new principle of gauge invariance, which is derived
from empirical data not speculation, indicates cogently that the electric field is
a necessary consequence of the electron field ψ and not of the gravitational field.
Since gauge invariance involves an arbitrary function λ it has the character of
“general” relativity and can naturally be understood only in that framework.
There are several reasons why I cannot believe in parallelism at a distance.
First of all, it a priori goes against my mathematical feeling to accept such an
artificial geometry; I find it difficult to understand the force by which the local
tetrads in at different spacetime points might have been rigidly frozen in their
distorted positions. There are also two important physical reason, it seems to
me. It is exactly the removal of the relation between the local tetrads which
changes the arbitrary gauge factor eiλ in ψ from a constant to an arbitrary
function; only this freedom explains the gauge invariance which in fact exists in
nature. Secondly, the possibility of rotating the local tetrads independently of
each other is equivalent with the symmetry of the energy–momentum tensor or
with conservation of energy–momentum, as we shall see.
In any attempt to establish field equations one must keep in mind that these
cannot be compared with experiment directly; only after quantization do they
provide a basis for statistical predictions concerning the behaviour of matter
particles and light quanta. The Dirac–Maxwell theory in its present form in-
volves only the electromagnetic potentials Aα and the electron field ψ. No
doubt, the proton field ψ 0 will have to be added. ψ, ψ 0 and Aα will enter as
functions of the same spacetime coordinates into the field equations and one
should not require that ψ be a function of a spacetime point (t, xyz) and ψ 0 a
function of an independent spacetime point (t0 , x0 y 0 z 0 ). It is natural to expect
that one of the two pairs of components of Dirac’s field represents the electron,
the other the proton. Furthermore, there will have to be two charge conserva-
tion laws, which (after quantization) will imply that the number of electrons
as well as the number of protons remains constant. To these will correspond a
gauge invariance involving two arbitrary functions.
We first examine the situation in special relativity to see if and to what ex-
tent the increase in the number of components of ψ from two to four is neces-
4.5. WEYL’S GAUGE THEORY PAPER OF 1929 205
(3) jα = ψ ∗ σα ψ.
ψ is taken as a 2–column, ψ ∗ is its conjugate transpose. σ0 is the identity
matrix; one has the equations
(4) σ12 = 1, σ2 σ3 = iσ1
and those obtained from these by cyclic permutation of the indices 1, 2, 3.
It is formally more convenient to replace the real variable j0 by the imaginary
variable ij0 . The Lorentz transformations Λ then appear as orthogonal trans-
formations of the four variables
j(0) = ij0 , j(α) = jα for α = 1, 2, 3.
Instead of (3) write
(5) j(α) = ψ ∗ σ(α)ψ.
so that σ(0) = iσ0 , σ(α) = σα for α = 1, 2, 3. The transformation law of the
components of the ψ field relative to a Lorentz frame in spacetime is char-
acterized by the requirement that quantities j(α) in (5) undergo the Lorentz
transformation Λ under if the Lorentz frame does. A quantity of this type
represents the field of a matter particle, as follows from the spin phenomenon.
The j(α) are the components of vector relative to a the Lorentz frame e(α);
e(1), e(2), e(3) are real space-like vectors forming left-handed Cartesian coordi-
nate system, e(0)/i is a real, time-like vector directed toward the future. The
transformation Λ describes the transition from one such Lorentz frame another,
and will be referred to as a rotation of the Lorentz frame. We get the same co-
efficients Λ(αβ) whether we make Λ act on the basis vectors e(α) of the tetrad
or on theP components j(α):
j = α j(α)e(α) = α j 0 (α)e0 (α),
P
if
e0 (α) = β Λ(αβ)e(β), j 0 (α) = β Λ(αβ)j(β);
P P
this follows from the orthoganality of Λ.
For what follows it is necessary to compute the infinitesimal transformation
(6) dψ = dL.ψ
which corresponds to an infinitesimal rotation dj = dΛ.j under j = j(ψ). The
transformation (6) is assumed to be normalized so that the trace of dL is 0. The
matrix dL depends linearly on thePdΛ; so we write
dL = 21 αβ dΛ(αβ)σ(αβ) = αβ dΛ(αβ)σ(αβ)
P
for certain complex 2 × 2 matrices σ(αβ) of trace 0 depending skew-symetrically
on (αβ) defined by this equation. The last sum runs only over the index pairs
(αβ) = (01), (02), (03); (23), (31), (12).
One must not forget that the skew-symmetric coefficients dΛ(αβ) are purely
imaginary for the first three pairs, real for the last three, but arbitrary otherwise.
One finds
1 1
(7) σ(23) = − 2i σ(1), σ(01) = 2i σ(1)
and two analogous pairs of equations resulting from cyclic permutation of the
indices 1, 2, 3. To verify this assertion one only has to check that the two in-
finitesimal transformations dψ = dLψ given by
1
dψ = 2i σ(1)ψ, and dψ = 12 σ(1)ψ,
correspond to the infinitesimal rotations dj = dΛj given by
4.5. WEYL’S GAUGE THEORY PAPER OF 1929 207
Sitzungsber. Preuß. Ak. Wissensch. 1928, p.217, 224; 1920 p.2. Einstein uses the letter h
instead of e.
208 CHAPTER 4. SPECIAL TOPICS
The theory of gravitation must now be recast in this new analytical form. I
start with the formulas for the infinitesimal parallel transport determined by
the metric. Let the vector e(α) at the point x go to the vector e0 (α) at the
infinitely close point x0 by parallel transport. The e0 (α) form a tetrad at x0
arising from the P tetrad e(α) at x by an infinitesimal rotation ω :
(8) ∇e(β) = γ ω(βγ).e(γ), [∇e(β) = e0 (β) − e(β; x0 )].
∇e(β) depends linearly on the vector v from x to x0 ; if its components dxµ
equal v µ = eµ (α)v(α) then ω(βγ) = ωµ (βγ)dxµ equals
(9) ωµ (βγ)v µ = ω(α; βγ)v(α).
The infinitesimal parallel transport of a vector w along v is given by the well-
known equations
∇w = −Γ(v).w, i.e. ∇wµ = −Γµρ (v)wρ , Γµρ (v) = Γµρν v ν ;
the quantities Γµρν are symmetric in ρν and independent of w and v. We therefore
have
e0 (β) − e(β) = −Γ(v).e(β)
in addition to (8). Subtracting the two differences on the left-hand sides gives
the differential de(β) = e(β, x0 ) − e(β, x) :
deµ (β) + Γµρ (v)eρ (β) = −ω(βγ).eµ (γ),
or
∂eµ (β) ν µ ρ ν µ
∂xν e (α) + Γρν e (β)e (α) = −ω(α; βγ).e (γ).
Taking into account that the ω(α; βγ) are skew symmetric in β and γ, one can
eliminate the ω(α; βγ) and find the well-known equations for the Γµρν . Taking
into account that the Γµρν and the Γµ (β, α) = Γµρν eρ (β)eν (α) symmetric in ρ
and ν one can eliminate the Γµρν and finds
µ µ
∂e (α) ν ∂e (β) ν µ
(10) ∂xν e (α) − ∂xν e (α) = (ω(α; βγ) − ω(β; αγ)) e (γ).
The left-hand side is a component of the Lie bracket of the two vector fields
e(α), e(β), which plays a fundamental role in Lie’s theory of infinitesimal trans-
formations, denoted [e(α), e(β)]. Since ω(β; αγ) is skew-symmetric in α and γ
one has
[e(α), e(β)]µ = (ω(α; βγ) + ω(β; γα)) eµ (γ).
or
(11) ω(α; βγ) + ω(β; γα) = [e(α), e(β)](γ).
If one takes the three cyclic permutations of αβγ in these equations and adds
the resulting equations with the signs+ − + then one obtains
2ω(α; βγ) = [e(α), e(β)](γ) − [e(β), e(γ)](α) + [e(γ), e(α)](β).
ω(α; βγ) is therefore indeed uniquely determined. The expression for it found
satisfies all requirements, being skew-symmetric in β and γ, as is easily seen.
In what follows we need in particular the contraction (sum over ρ)
µ
∂eµ (ρ) ν
ω(ρ; ρα) = [e(α), e(ρ)](ρ) = ∂e∂x(α) ν
µ e (α) − ∂xν e (α)eµ (ρ),
Since = | det eµ (α)| satisfies
−d( 1 ) = d ν
= eν (ρ)de (ρ)
one finds µ
(12) ω(ρ; ρα) = ∂e (α)
∂xµν ,
where e(α) = e(α)/.
4.5. WEYL’S GAUGE THEORY PAPER OF 1929 209
§3. The matter action. With the help of the parallel transport one can
compute not only the covariant derivative of vector and tensor fields, but also
that of the ψ field. Let ψa (x) and ψa (x0 ) [a = 1, 2] denote the components
relative to a local tetrad e(α) at the point x and at an infinitely close point
x0 . The difference ψa (x0 ) − ψa (x) = dψa is the usual differential. On the other
hand, we parallel transport the tetrad e(α) at x to a tetrad e0 (α) at x0 . Let
ψa0 denote the components of ψ at x0 relative to the tetrad e0 (α). Both ψa and
ψ’a depend only on the choice of the tetrad e(α) at x; they have nothing to
with the tetrad at x0 . Under a rotation of the tetrad at x the ψa0 transform like
the ψa and the same holds for the differences ∇ψa = ψa0 − ψa . These are the
components of the covariant differential ∇ψ of ψ. The tetrad e0 (α) arises from
the local tetrad e(α) = e(α, x0 ) at x0 by the infinitesimal rotation ω of §2. The
corresponding infinitesimal transformation θ is therefore of the form
θ = 12 ω(βγ)σ(βγ)
and transforms ψa (x0 ) into ψa0 , i.e. ψ 0 − ψ(x0 ) = θ.ψ. Adding dψ = ψ(x0 ) − ψ(x)
to this one obtains
(13) ∇ψ = dψ + θ.ψ.
Everything depends linearly on the vector v from x to x0 ; if its components dxµ
equal v µ = eµ (α)v(α), then ∇ψ = ∇ψµ dxµ equals
∇ψµ v µ = ∇ψ(α)v(α), θ = θµ v µ = θ(α)v(α).
We find
∇ψµ = ( ∂x∂ µ + θµ )ψ or ∇ψ(α) = (eµ (α) ∂x∂ µ + θ(α))ψ
where
θ(α) = 12 ω(α; βγ)σ(βγ).
Generally, if ψ 0 is a field the same type as ψ then the quantities
ψ ∗ σ(α)ψ 0
are the components of a vector relative to the local tetrad. Hence
v 0 (α) = ψ ∗ σ(α)∇ψ(β)v(β)
defines a linear transformation v 7→ v 0 of the vector space at x which is inde-
pendent of the tetrad. Its trace
ψ ∗ σ(α)∇ψ(α)
therefore a scalar and the equation
(14) i m= ψ ∗ σ(α)∇ψ(α)
defines
R a scalar density m whose integral
m dx [dx = dx0 dx1 dx2 dx3 ]
can be used as the action of the matter field ψ in the gravitational field repre-
sented by the metric defining the parallel transport. To find an explicit expres-
sion for m we need to compute
(15) σ(α)θ(α) = 21 σ(α)θ(βγ).ω(α; βγ).
From (7) and (4) it follows that for α 6= β
σ(β)θ(βα) = 21 σ(α) [no sum over β]
and for any odd permutation αβγδ of the indices 0 1 2 3,
σ(β)θ(γδ) = 12 σ(α).
These two kinds of terms to the sum (15) give the contributions
µ
1 1 ∂e (α)
2 ω(ρ; ργ) = 2 ∂xµ
210 CHAPTER 4. SPECIAL TOPICS
resp.
i
2 ϕ(α)
:= ω(β; γδ) + ω(γ; δβ) + ω(δ; βγ).
If αβγδ is an odd permutation of the indices 0 1 2 3,then according to (11),
i
2 ϕ(α) := [e(β), e(γ)] + +(cycl.perm of βγδ)
µ
= ± ∂e∂x(β) ν
P
(16) ν e (γ)eµ (δ)
The sum run over the six permutations of βγδ with the appropriate signs (and
of course also over
µ and ν). With this notation,
µ
∂ψ 1 ∂e (α) ∗
(17) m = 1i ψ ∗ eµ (α)σ(α) ∂x µ + 2 ∂xµ ψ σ(α)ψ +
1
4 ϕ(α)j(α).
The secondhpart is i
1 ν ∂eµ (α)
4i det eµ (α), e (α), ∂x ν , j(α)
(sum over µ and ν). Each term of this sum is a 4 × 4 determinant whose rows
are obtained from the one written by setting α = 0, 1, 2, 3. The quantity j(α) is
(18) j(α) = ψ ∗ σ(α)ψ.
Generally,R it is not an action integral
(19) h dx R
itself which is of significance for the laws of nature, but only is variation δ h dx.
Hence it is not necessary that h itself be real, but it is sufficient that h̄−h be a
divergence. In that case we say that h is practically real. We have to check this
for m. eµ (α) is real for α = 1, 2, 3 and purely imaginary for α = 0. So eµ (α)σ(α)
is a Hermitian matrix. ϕ(α) is also real fro α = 1, 2, 3 and purely imaginary for
α = 0. Thus µ
∂ψ 1 ∂e (α) ∗
m̄ = − 1i ψ ∗ µ ∂x µ + 2 ∂xµ ψ σ(α)ψ 1
+ 4 ϕ(α)j(α).
∗ µ
∂ψ ∂ψ ∂e (α) ∗
i(m−m̄) = ψ ∗ σ µ ∂x µ
µ + ∂xµ σ ψ + ∂xµ ψ σ(α)ψ
µ
= ∂x∂ µ (ψ ∗ σ µ ψ) = ∂∂xjµ .
Thus m is indeed partially real. We return to special relativity if we set
e0 (0) = −i, e1 (1) = e2 (2) = e3 (3) = 1,
and all other eµ (α) = 0.
§4. Energy. Let (19) be the action of matter in an extended sense, represented
by the ψ field and by the electromagnetic potentials Aµ . The laws of nature say
that the
R variation
δ h dx = 0
when the ψ and Aµ undergo arbitrary infinitesimal variations which vanish
outside of a finite region in spacetime. The variation of the ψ gives the equations
of matter in the restricted sense, the variation of the Aµ the electromagnetic
equations. If the eµ (α), which were kept fixed up to now, undergo an analogous
µ
infinitesimal
R variationR δe (α),µthen there will be an equation of the form
(20) δ h dx = T µ (α)δe (α) dx,
the induced variations δψa and δAµ being absent as a consequence of the preced-
ing laws. The tensor density T µ (α) defined in this way the energy-momentum.
Because of the invariance of the action density h, the variation of (20) must
vanish when the variation δeµ (α) is produced
(1)by infinitesimal rotations of the local tetrads e(α), the coordinates xµ
being kept fixed.
4.5. WEYL’S GAUGE THEORY PAPER OF 1929 211
RZM), Berlin 1923. According to a reference there, the argument is due to F.Klein. A
sentence explaining the argument has been added and an infinitesimal calculation, amounting
to a derivation of the formula for the Lie bracket, has been omitted. [WR]
5 Cf. RZM §41. [WR]
212 CHAPTER 4. SPECIAL TOPICS
The integrals are independent of t6 . Using the symmetry of T one finds further
the divergence equations
∂ ν ν
∂xν (x2 T 3 − x3 T 2 ) = 0, · · · ,
∂ ν ν
∂xν (x1 T 1 − x1 T 0 ) = 0, · · · .
The three equations of the first type show that the angular momentum (J1 , J2 , J3 )
is constantR in time:
J1 = x0 =t (x2 T 03 − x3 T 02 )dx, · · · .
The equations of the second type contain the law of inertia of energy 7 .
We compute the energy–momentum density for the matter action m defined
above; we treat separately the two parts of m appearing in (17). For the first
part Rwe obtain after an integration by parts
δm.dx = uµ (α)δeµ (α).dx
R
where ∗
∂ψ 1 ∂(ψ σ(α)ψ)
iuµ (α) = ψ ∗ σ(α) ∂x µ − 2 ∂xµ
1 ∗ ∂ψ ∂ψ ∗
= 2 ψ σ(α) ∂xµ − ∂xµ σ(α)ψ) .
The part of the energy–momentum arising from the first part of m is therefore
T µ (α) = uµ (α) − eµ (α)u, T νµ = uνµ − δµν u,
where u denotes the contraction eµ (α)uµ (α). These formulas also hold in gen-
eral relativity, for non-constant eµ (α). To treat the second part of m we restrict
ourselves to special relativity for the sake simplicity. For this second part in (17)
one has h µ
i
1
det eµ (α), eν (α), ∂(δe (α))
R R
δm.dx = 4i ∂x ν , j(α) dx
h i
1 ∂j(α)
= 4i det δeµ (α), eµ (α), eν (α), ∂xν , j(α) dx
R
The expression (23) does not contribute to the integral. The momentum (P1 , P2 , P3 )
will therefore be represented by the operators
P1 , P2 , P3 := 1i ∂x∂ 1 ∂ 1 ∂
1 , i ∂x2 , i ∂x3
for an arbitrary function λ of the point. Exactly in the way described by (26)
does the electromagnetic potential act on matter empirically. We are therefore
justified in identifying the quantities Aµ defined here with the components of
the electromagnetic potential. The proof is complete if we show that the Aµ
field is conversely acted on by matter in the way empirically known for the
electromagnetic field.
∂Aµ
Fµν = ∂A∂xµ − ∂xν
ν
because of the additional term (26). In special relativity, this lead one to rep-
resent theP energy by the operator
3
H = i=1 σ i 1i ∂x∂ µ + Aµ ,
and gravitation, i.e. ψ and eµ (α), will by themselves be sufficient to explain the
electromagnetic phenomena when one takes the ϕ(α) as electromagnetic poten-
tials. These quantities ϕ(α) depend on the eµ (α) and on their first derivatives
in such a manner that there is invariance under arbitrary transformations of the
coordinates. Under rotations of the tetrads, however, the ϕ(α) transform as the
components of a vector only if all tetrads undergo the same rotation. If one ig-
nores the matter field and considers only the relation between electromagnetism
and gravitation, then one arrives in this way at a theory of electromagnetism of
exactly the kind Einstein recently tried to establish. Parallelism at a distance
would only be simulated, however.
I convinced myself that this Ansatz, tempting as it may be at first sight, does not
lead to Maxwell’s equations. Besides, gauge invariance would remain completely
mysterious; the electromagnetic potential itself would have physical significance,
and not only the field strength. I believe, therefore, that this idea leads astray
and that we should take the hint given by gauge invariance: electromagnetism
is a byproduct of the matter field, not of gravitation.
***
13 This form had shown up long before Dirac and Weyl, in the context of projective geometry,
in Klein’s work on automorphic functions (cf. Fricke and Klein book of 1897, vol. 1, §12),
and probably even earlier e.g. in the work of Möbius.
Time chart
219
220 CHAPTER 4. SPECIAL TOPICS
Annotated bibliography
Riemann’s On the hypotheses which lie at the bases of geometry , his Habilita-
tionschrift of 1854, translated by William Kingdon Clifford. Available on the
web thanks to D.R.Wilkins. [Riemann’s only attempt to explain his revolutionary ideas
on geometry, and that in non-technical terms. Explanations of his explanations are available;
one may also just listen to him.]
Élie Cartan’s Leçons sur la géométrie des espaces de Riemann dating from 1925–
1926, published in 1951 by Gauthier-Villars. [An exposition of the subject by the
greatest differential geometer of the 20th century, based on his own methods and on some of
his own creations.]
221
222 CHAPTER 4. SPECIAL TOPICS
Index
connected, 34
connection, 94
curvature, 109
curve, 33
differential, 12, 43
differential form, 136
geodesic (connection), 102
geodesic (metric), 71
Lie derivivative, 163
Lie bracket, 165
manifold, 25
Riemann metric, 66
submanifold, 53
tensor, 77
vector, 39
223