100% found this document useful (1 vote)
292 views

Differential Geometry PDF

This document contains lecture notes on differential geometry. It begins with an introduction by the author noting that the style of writing is uneven and these notes are meant to complement lectures. The notes then cover topics like manifolds, vectors and differentials, connections and curvature, calculus on manifolds, and special topics including general relativity. The document contains detailed definitions, examples, theorems and explanations to introduce students to the concepts of differential geometry.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
292 views

Differential Geometry PDF

This document contains lecture notes on differential geometry. It begins with an introduction by the author noting that the style of writing is uneven and these notes are meant to complement lectures. The notes then cover topics like manifolds, vectors and differentials, connections and curvature, calculus on manifolds, and special topics including general relativity. The document contains detailed definitions, examples, theorems and explanations to introduce students to the concepts of differential geometry.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 223

Lectures on Differential Geometry

Wulf Rossmann

0
0

(Updated October 2003)


2
To the student

This is a collection of lecture notes which I put together while teaching courses
on manifolds, tensor analysis, and differential geometry. I offer them to you in
the hope that they may help you, and to complement the lectures. The style
is uneven, sometimes pedantic, sometimes sloppy, sometimes telegram style,
sometimes long–winded, etc., depending on my mood when I was writing those
particular lines. At least this set of notes is visibly finite. There are a great
many meticulous and voluminous books written on the subject of these notes
and there is no point of writing another one of that kind. After all, we are
talking about some fairly old mathematics, still useful, even essential, as a tool
and still fun, I think, at least some parts of it.
A comment about the nature of the subject (elementary differential geometry
and tensor calculus) as presented in these notes. I see it as a natural continuation
of analytic geometry and calculus. It provides some basic equipment, which is
indispensable in many areas of mathematics (e.g. analysis, topology, differential
equations, Lie groups) and physics (e.g. classical mechanics, general relativity,
all kinds of field theories).
If you want to have another view of the subject you should by all means look
around, but I suggest that you don’t attempt to use other sources to straighten
out problems you might have with the material here. It would probably take
you much longer to familiarize yourself sufficiently with another book to get
your question answered than to work out your problem on your own. Even
though these notes are brief, they should be understandable to anybody who
knows calculus and linear algebra to the extent usually seen in second-year
courses. There are no difficult theorems here; it is rather a matter of providing
a framework for various known concepts and theorems in a more general and
more natural setting. Unfortunately, this requires a large number of definitions
and constructions which may be hard to swallow and even harder to digest.
(In this subject the definitions are much harder than the theorems.) In any
case, just by randomly leafing through the notes you will see many complicated
looking expressions. Don’t be intimidated: this stuff is easy. When you looked
at a calculus text for the first time in your life it probably looked complicated as
well. Let me quote a piece of advice by Hermann Weyl from his classic Raum–
Zeit–Materie of 1918 (my translation). Many will be horrified by the flood of
formulas and indices which here drown the main idea of differential geometry

3
4

(in spite of the author’s honest effort for conceptual clarity). It is certainly
regrettable that we have to enter into purely formal matters in such detail and
give them so much space; but this cannot be avoided. Just as we have to spend
laborious hours learning language and writing to freely express our thoughts, so
the only way that we can lessen the burden of formulas here is to master the tool
of tensor analysis to such a degree that we can turn to the real problems that
concern us without being bothered by formal matters.
W. R.
Flow chart

1. Manifolds
1.1 Review of linear algebra and calculus· · · 9
1.2 Manifolds: definitions and examples· · · 25
1.3 Vectors and differentials· · · 39
1.4 Submanifolds· · · 53
1.5 Riemann metrics· · · 62
1.6 Tensors· · · 77
2. Connections and curvature 3. Calculus on manifolds
2.1 Connections· · · 87 3.1 Differential forms· · · 136
2.2 Geodesics· · · 102 3.2 Differential calculus· · · 144
2.3 Riemann curvature· · · 108 3.3 Integral calculus· · · 150
2.4 Gauss curvature· · · 113 3.4 Lie derivatives· · · 160
2.5 Levi-Civita’s connection· · · 123
2.6 Curvature identities· · · 132
4. Special topics
4.1 General Relativity· · · 173 4.3 The rotation group SO(3)· · · 188
4.2 The Schwarzschild metric· · · 179 4.4 Cartan’s mobile frame· · · 197
4.5 Weyl’s gauge theory paper of 1929· · · 203

Chapter 3 is independent of chapter 2 and is used only in section 4.3.

5
6
Contents

1 Manifolds 9
1.1 Review of linear algebra and calculus . . . . . . . . . . . . . . . . 9
1.2 Manifolds: definitions and examples . . . . . . . . . . . . . . . . 25
1.3 Vectors and differentials . . . . . . . . . . . . . . . . . . . . . . . 39
1.4 Submanifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1.5 Riemann metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.6 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

2 Connections and curvature 87


2.1 Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2.2 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.3 Riemann curvature . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.4 Gauss curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
2.5 Levi-Civita’s connection . . . . . . . . . . . . . . . . . . . . . . . 123
2.6 Curvature identities . . . . . . . . . . . . . . . . . . . . . . . . . 132

3 Calculus on manifolds 135


3.1 Differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.2 Differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.3 Integral calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
3.4 Lie derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

4 Special Topics 173


4.1 General Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.2 The Schwarzschild metric . . . . . . . . . . . . . . . . . . . . . . 179
4.3 The rotation group SO(3) . . . . . . . . . . . . . . . . . . . . . . 188
4.4 Cartan’s mobile frame . . . . . . . . . . . . . . . . . . . . . . . . 197
4.5 Weyl’s gauge theory paper of 1929 . . . . . . . . . . . . . . . . . 203

Time chart 219

Annotated bibliography 221

Index 222

7
8 CONTENTS
Chapter 1

Manifolds

1.1 Review of linear algebra and calculus


A. Linear algebra. A (real) vector space is a set V together with two opera-
tions, vector addition u + v (u, v ∈ V ) and scalar multiplication αv (α ∈ R, v ∈
V ). These operations have to satisfy those axioms you know (and can find
spelled out in your linear algebra text). Example: Rn is the vector space of real
n–tuples (x1 , · · · , xn ), xi ∈ R with componentwise vector addition and scalar
multiplication. The basic fact is that every vector space has a basis, meaning
a set of vectors {viP } so that any other vector v can be written uniquely as a
linear combination αi vi of vi ’s. We shall always assume that our space V is
finite–dimensional, which means that it admits a finite basis, consisting of say
n elements. It that case any other basis has also n elements and n is called
the dimension of V . For example, Rn comes equipped with a standard basis
e1 , · · · , en characterized by the property that (x1 , · · · , xn ) = x1 e1 + · · · + xn en .
We may say that we can “identify“ V with Rn after we fix an ordered basis
{v1 , · · · , vn }, since the x ∈ V correspond one–to–one to their n–tuples of com-
ponents (x1 , · · · , xn ) ∈ R. But note(!): this identification depends on the choice
of the basis {vi } which “becomes” the standard basis {ei } of Rn . The indices
on αi and vi are placed the way they are with the following rule in mind.
1.1.1 Summation convention. Any indexPoccurring twice, once up, once
down, is summed over.
P For example xi ei = i xi ei = x1 e1 + · · · + xn en . We
may still keep the ’s if we want to remind ourselves of the summation.
A linear transformation (or linear map) A : V → W between two vector spaces
is a map which respects the two operations, i.e. A(u+v) = Au+Av and A(αv) =
αAv. One often writes Av instead of A(v) for linear maps. In termsPof a basis
{v1 , · · · , vn } for V and {w1 , · · · , wm } for W this implies that Av = ij αi aji vj
P i j
if v = i α vi for some indexed system of scalars (ai ) called the matrix of
A with respect to the bases {vi }, {wj} . With the summation convention the
equation w = Av becomes β j = aji αi . Example: the matrix of the identity

9
10 CHAPTER 1. MANIFOLDS

transformation 1 : V → V (with respect to any basis) is the Kronecker delta δji


defined by 
1 if i = j
δji =
6 j
0 if i =
The inverse (if it exists) of a linear transformation A with matrix (aji ) is the
linear transformation B whose matrix (bji ) satisfies

aik bkj = δji

If A : V → V is a linear transformation of V into itself, then the determinant


of A is defined by the formula
X
det(A) = i1 ···in ai11 · · · ainn (1)

where i1 ···in = ±1 is the sign of the permutation (i1 , · · · , in ) of (1, · · · , n). This
seems to depend on the basis we use to write A as a matrix (aij ), but in fact it
doesn’t. Recall the following theorem.
1.1.2 Theorem. A : V → V is invertible if and only if det(A) 6= 0.
There is a formula for A−1 (Cramer’s Formula) which says that
A−1 = det(A)
1
Ã, ãji = (−1)i+j det[alk |kl 6= ji] .
The ij entry ãji of à is called the ji cofactor of A, as you can look up in your
linear algebra text. This formula is rarely practical for the actual calculation of
A−1 for a particular A, but it is sometimes useful for theoretical considerations
or for matrices with variable entries.
The rank of a linear transformation A : V → W is the dimension of the image
of A, i.e. of
im(A) = {w ∈ W : w = Av for some v ∈ V }.
This rank is equal to maximal number of linearly independent columns of the
matrix (aij ), and equals maximal number of linearly independent rows as well.
The linear map A : V → W is surjective (i.e. onto) iff1 rank(A) = m and
injective (i.e. one–to–one) iff rank(A) = n. A linear functional on V is a scalar–
valued linear function ϕ : V → R. P In terms of components with respect to a
basis {v1 , · · · , vn } we can write v = xi vi and ϕ(v) = ξi xi . For example, if
P
we take (ξ1 , ξ2 , · · · , ξn ) = (0, · · · , 1 · · · 0) with the 1 in the i–th position, then we
get the linear functional v → xi which picks out the i–th component of v relative
to the
P basis {v1 , · · · , vn }. This functional is denoted v i (index upstairs). Thus
v ( x vj ) = xi . This means that v i (vj ) = δji . The set of all linear functionals
i j

on V is again a vector space, called the dual space to V and denoted V ∗ . If


{vi ) is a basis
P ifor V , then P {v i } is a basis for P V ∗ , called the dual basis. If we
i
write v = x vi and ϕ = ξi v the ϕ(v) = ξi xi as above. If A : V → W
is a linear transformation we get a linear transformation A∗ : W ∗ → V ∗ of the
dual spaces in the reverse direction defined by(A∗ ψ)(v) = ψ(Av).A∗ is called
the transpose of A. In terms of components with respect to bases {vi } for V
1 “iff” means “if and only if”.
1.1. REVIEW OF LINEAR ALGEBRA AND CALCULUS 11

and {wj } for W we write ψ = ηj wj , v = xi vi , Avi = aji wj , A∗ wj = (a∗ )ji v i and


then the above equation reads P ∗ j
(a )i ηj xi = ηj aji xi .
P

From this it follows that (a∗ )ji = aji . So as long as you write everything in
terms of components you never need to mention (a∗ )ji at all. (This may surprise
you: in your linear algebra book you will see a definition which says that the
transpose [(a∗ )ij )] of a matrix [aij ] satisfies (a∗ )ij = aji . What happened?)
1.1.3 Examples. The only example of a vector space we have seen so far is
the space of n–tuples Rn . In a way, this is the only example there is, since any
n–dimensional vector space can be identified with Rn by means of a basis. But
one must remember that this identification depends on the choice of a basis!
(a) The set Hom(V, W ) consisting of all linear maps A : V → W between two
vector spaces is again a vector space with the natural operations of addition
and scalar multiplication. If we choose bases for V and W when we can identify
A with its matrix (aji ) and so Hom(V, W ) gets identified with the matrix space
Rm×n .
(b) A function U × V → R, (u, v) → f (u, v) is bilinear is f (u, v) is linear in
u and in v separately. These functions form again a vector space. In terms of
bases we can write u = ξ i ui , v = η j vj and then f (u, v) = cij ξ i η j . Thus after
choice of bases we may identify the space of such B’s again with Rn×m . We
can also consider multilinear functions U × · · · × V → R of any finite number
of vector variables, which are linear in each variable separately. Then we get
f (u, · · · , v) = ci···j ξ i · · · η j . A similar construction applies to maps U ×· · ·×V →
W with values in another vector space.
(c) Let S be a sphere in 3–space. The tangent space V to S at a point po ∈ S
consists of all vectors v which are orthogonal to the radial line through po .

V
p00 v

Fig. 1

It does not matter how you think of these vectors: whether geometrically, as
arrows, which you can imagine as attached to po , or algebraically, as 3–tuples
(ξ, η, ζ) in R3 satisfying xo ξ + yo η + zo ζ = 0; in any case, this tangent space is
a 2–dimensional vector space associated to S and po . But note: if you think of
V as the points of R3 lying in the plane xo ξ + yo η + zo ζ =const. through po
12 CHAPTER 1. MANIFOLDS

tangential to S, then V is not a vector space with the operations of addition and
scalar multiplication in the surrounding space R3 . One should rather think of V
as vector space of its own, and its embedding in R3 as something secondary, a
point of view which one may also take of the sphere S itself. You may remember
a similar construction of the tangent space V to any surface S = {p = (x, y, z) |
f (x, y, z) = 0} at a point po : it consists of all vectors orthogonal to the gradient
of f at po and is defined only it this gradient is not zero. We shall return to
this example in a more general context later.
B. Differential calculus. The essence of calculus is local linear approximation.
What it means is this. Consider a map f : Rn → Rm and a point xo ∈ Rn in
its domain. f admits a local linear approximation at xo if there is a linear map
A : Rn → Rm so that

f (xo + v) = f (xo ) + Av + o(v), (2)

with an “error term” o(v) satisfying o(v)/kvk → 0 as v → 0. If so, then


f is said to be differentiable at xo and the linear transformation A (which
depends on f and on xo ) is called the differential of f at xo , denoted dfxo . Thus
dfxo : Rn → Rm is a linear transformation depending on f and on xo . It is
evidently given by the formula
dfxo (v) = lim→0 1 [f (xo + v) − f (xo )].
In this definition f (x) does not need to be defined for all x ∈ Rn , only in a
neighbourhood of xo , i.e. for all x satisfying kx − xo k <  for some  > 0. We
say simply f is differentiable if is differentiable at every xo in its domain. The
definition of “differentiable” requires that the domain of f must be an open set.
i.e. it contain a neighbourhood of each of its points, so that f (xo + v) is defined
for all v near 0. We sometimes write f : Rn · · · → Rm to indicate that f need
only be partially defined, on some open subset.
1.1.4 Example. Suppose f (x) is itself linear. Then

f (xo + v) = f (xo ) + f (v) (3)

so (2) holds with A(v) = f (v) and o(v) ≡ 0. Thus for a linear map dfx (v) = f (v)
for all x and v.
(b) Suppose f (x, y) is bilinear, linear in x and y separately. Then

f (xo + v, yo + w) = f (xo , yo ) + f (xo , w) + f (v, yo ) + f (v, w).

Claim. f (v, w) = o((v, w)).


P Check. In terms of components, we have v = (ξ i ),
j i j
w = (η ), and f (v, w) = cij ξ η , so
|f (v, w)| ≤ |cij ξ i η j |
P
 2
≤ Cmax{|cij |}max{|ξ i η j |} ≤ C max{|ξ i |, |η j |}
≤ C (ξ i )2 + (η j )2 = Ck(v, w)k2 ,
P

hence |f (v, w)|/k(v, w)k ≤ Ck(v, w)kand→ 0 as (v, w) → (0, 0).


1.1. REVIEW OF LINEAR ALGEBRA AND CALCULUS 13

It follows that (2) holds with x replaced by (x, y) and v by (v, w) if we take
A(v, w) = f (xo , w)+ f (v, yo ) and o((v, w)) = f (v, w). Thus for all x, y and v, w

df(x,y) (v, w) = f (x, w) + f (v, y). (4)

The following theorem is a basic criterion for deciding when a map f is differ-
entiable.
1.1.5 Theorem. Let f : Rn · · · → Rm be a map. If the partial derivatives
∂f /∂xi exist and are continuous in a neighbourhood of xo , then f is differen-
tiable at xo . Furthermore, the matrix of its differential dfxo : Rn → Rm is the
Jacobian matrix (∂f /∂xi )xo .
(Proof omitted)
Thus if we write y = f (x) as y j = f j (x) where x = (xi ) ∈ Rn and y = (y i ) ∈ Rm ,
then we have  i
∂f
dfxo (v) = ∂x j vj .
xo
We shall often suppress the subscript xo , when the point xo is understood or
unimportant, and then simply write df . A function f as in the theorem is said
to be of class C 1 ; more generally, f is of class C k if all partials of order ≤ k exist
and are continuous; f is of class C ∞ if it has continuous partials of all orders.
1.1.6 Example. (a) Consider a function 2 f : Rn · · · → R, y = f (x1 , · · · , xn )
of n variables xi with scalar values. The formula for df (v) becomes
∂f 1 ∂f n
df (v) = ∂x 1 v + · · · + ∂xn v .
This can be written as a matrix product  1
v
∂f  .. 
 ∂f 
df (v) = ∂x1 · · · ∂xn  . 
vn
Thus df can be represented by the row vector with components ∂f /∂xi .
(b) Consider a function f : R · · · → Rn , (x1 , · · · , xn ) = (f 1 (t), · · · , f n (t)) of
one variable t with values in Rn . (The n here plays the role to the m above).
The formula for df (v) becomes
 1   df 1 
df (v) dt
 ..   . 
 .  =  .. .
n
df n (v) df
dt
In matrix notation we could represent df by the column vector with components
df i /dt. (
v is now a scalar .) Geometrically such a function represents a parametrized
curve in Rn and we can think of think of p = f (t) think as the position of a
moving point at time t. We shall usually omit the function symbol f and simply
write p = p(t) or xi = xi (t). Instead df i /dt we write ẋi (t) instead of df we write
ṗ(t), which we think of as a vector with components ẋi (t), the velocity vector
of the curve.
2 The dotted arrow indicates a partially defined map.
14 CHAPTER 1. MANIFOLDS

(c) As a particular case of a scalar-valued function on Rn we can take the i-th


coordinate function
f (x1 , · · · , xn ) = xi .
As is customary in calculus we shall simply write f = xi for this function. Of
course, this is a bit objectionable, because we are using the same symbol xi
for the function and for its general value. But this notation has been around
for hundreds of years and is often very convenient. Rather than introducing
yet another notation it is best to think this through once and for all, so that
one can always interpret what is meant in a logically correct manner. Take a
symbol like dxi , for example. This is the differential of the function f = xi we
∂xi
are considering. Obviously for this function f = xi we have ∂x i
j = δj . So the
∂f j
general formula df (v) = ∂xj v becomes
dxi (v) = v i .
This means that df = dxi picks out the i-th component of the vector v, just
like f = xi picks out the i-the coordinate of the point p. As you can see, there
is nothing mysterious about the differentials dxi , in spite of the often confusing
explanations given in calculus texts. For example, the equation
∂f i
df = ∂x i dx

for the differential of a scalar-valued function f is literally correct, being just


another way of saying that
∂f i
df (v) = ∂x iv

for all v. As a further example, the approximation rule


∂f
f (xo + ∆x) ≈ f (xo ) + ∂x i ∆x

is just another way of writing  


∂f
f (xo + v) = f (xo ) + ∂xi v i + o(v),
xo
which is the definition of the differential.
(d) Consider the equations defining polar coordinates
x = rcos θ
y = rsin θ
These equations define a map between two copies of R2 , say f : R2rθ → R2xy ,
(r, θ) → (x, y). The differential of this map is given by the equations
dx = ∂x ∂x
∂r dr + ∂θ dθ = cos θ dr − sin θ dθ
dy = ∂y ∂y
∂r dr + ∂θ dθ = sin θ dr + cosθ dθ
i
∂f
This is just the general formula df i = ∂x j dx
j
but this time with (x1 , x2 ) the
2 1 2
coordinates in the rθ plane Rrθ and (f , f ) = (x, y) the functions defined above.
Again, there is nothing mysterious about these differentials.

1.1.7 Theorem (Chain Rule: general case). Let f : Rn · · · → Rm and


g : Rm · · · → Rl be differentiable maps. Then g ◦ f : Rn · · · → Rl is also
differentiable (where defined) and

d(g ◦ f )x = (dg)f (x) ◦ (df )x

(whenever defined).
1.1. REVIEW OF LINEAR ALGEBRA AND CALCULUS 15

The proof needs some preliminary remarks. Define the norm kAk of a linear
transformation A : Rn → Rm by the formula kAk = maxkxk=1 kA(x)k. This
max is finite since the sphere kxk = 1 is compact (closed and bounded). Then
kA(x)k ≤ kAkkxk for any x ∈ Rn : this is clear from the definition if kxk = 1
and follows for any x since A(x) = kxkA(x/kxk) if x 6= 0.
Proof . Fix x and set k(v) = f (x + v) − f (x). Then by the differentiability of
g,
g(f (x + v)) − g(f (x)) = dgf (x) k(v) + o(k(v))
and by the differentiability of f ,
k(v) = f (x + v) − f (x) = dfx (v) + o(v)
We use the notation O(v) for any function satisfying O(v) → 0 as v → 0.
(The letter big O can stand for a different such function each time it occurs.)
Then o(v) = kvkO(v) and similarly o(k(v)) = kk(v)kO(k(v)) = kk(v)kO(v).
Substitution the expression for k(v) into the preceding equation gives
g(f (x + v)) − g(f (x)) = dgf (x) (dfx (v)) + kvkdgf (x) (O(v)) + kk(v)kO(v).
We have kk(v)k ≤ kdfx (v)k+o(v) ≤ kdfx kkvk+Ckvk ≤ C 0 kvk. It follows that
g(f (x + v)) − g(f (x)) = dgf (x) (dfx (v)) + kvkO(v)
or yet another O.
Of particular importance is the special case of the Chain Rule for curves. A
curve in Rn is a function t → x(t), R · · · → Rn , defined on some interval. In that
case we also write ẋ(t) or dx(t)/dt for its derivative vector (dxi (t)/dt) (which is
just the differential of t → x(t) considered as a vector).
1.1.8 Corollary (Chain Rule: special case). Let f : Rn · · · → Rm be a
differentiable map and p(t) a differentiable curve in Rn . Then f (p(t)) is also
differentiable (where defined) and
df (p(t)) dp(t)
= dfp(t) ( ).
dt dt

Geometrically, this equation says the says that the tangent vector df (p(t))/dt to
the image f (p(t)) of the curve p(t) under the map f is the image of its tangent
vector dp(t)/dt under the differential dfp(t) .
1.1.9 Corollary. Let f : Rn · · · → Rm be a differentiable map po ∈ Rn a point
and v ∈ Rn any vector. Then

d
dfpo (v) = f (p(t))
dt t=to

for any differentiable curve p(t) in Rn with p(to ) = po and ṗ(to ) = v.


The corollary gives a recipe for calculating the differential dfxo (v) as a derivative
of a function of one variable t: given x0 and v, we can choose any curve x(t) with
x(0) = xo and ẋ(0) = v and apply the formula. This freedom of choice of the
curve x(t) and in particular its independence of the coordinates xi makes this
recipe often much more convenient that the calculation of the Jacobian matrix
∂f i /∂xj in Theorem 1.1.5.
16 CHAPTER 1. MANIFOLDS

1.1.10 Example. Return to example 1.1.4, p 12


a) If f (x) is linear in x, then
d dx(t)
dt f (x(t)) = f ( dt )
b) If f (x, y) is bilinear in (x, y) then
  
d dx(t) dy(t)
dt f (x(t), y(t)) = f dt , y(t) + f x(t), dt
i.e. df(x,y) (u, v) = f (u, x) + f (x, v). We call this the product rule for bilinear
maps f (x, y). For instance, let Rn×n be the space of all real n × n matrices and
f : Rn×n × Rn×n → Rn×n , f (X, Y ) = XY
the multiplication map. This map is bilinear, so the product rule for bilinear
maps applies and gives
d d
dt f (X(t), Y (t)) = dt (X(t)Y (t)) = Ẋ(t)Y (t) + X(t)Ẏ (t).
The Chain Rule says that this equals df(X(t),Y (t) (Ẋ(t), Ẏ (t)). Thus df(X,Y ) (U, V ) =
U X + Y V . This formula for the differential of the matrix product XY is more
simply written as the Product Rule
d(XY ) = (dX)Y + X(dY ).
You should think about this formula until you see that it perfectly correct: the
differentials in it have a precise meaning, the products are defined, and the
equation is true. The method of proof gives a way of disentangling the meaning
of many formulas involving differentials: just think of X and Y as depending
on a parameter t and rewrite the formula in terms of derivatives:
d(XY )
dt = dX dY
dt Y + X dt .

1.1.11 Coordinate version of the Chain Rule. In terms of coordinates the


above theorems read like this (using the summation convention).
(a) Special case. In coordinates write y = f (x) and x = x(t) as
y j = f j (x1 , · · · , xn ), xi = xi (t)
respectively. Then
dy j ∂y j dxi
dt = ∂xi dt .
(b) General case. In coordinates, write z = g(y), y = f (x) as
z k = g k (y 1 , · · · , y m ), y j = f j (x1 , · · · , xn )
respectively. Then
j
∂z k ∂z k ∂y
∂xi = ∂y j ∂xi .
In the last equation we consider some of the variables are considered as functions
of others by means of the preceding equation, without indicating so explicitly.
For example, the z k on the left is considered a function of the xi via z = g(f (x)),
on the right z k is first considered as a function of y j via z = g(y) and after the
differentiation the partial ∂z k /∂y j is considered a function of x via y = g(x).
In older terminology one would say that some of the variables are dependent,
some are independent. This kind of notation is perhaps not entirely logical, but
is very convenient for calculation, because the notation itself tells you what to
do mechanically, as you know very well from elementary calculus. This is just
what a good notation should do, but one should always be able to go beyond
the mechanics and understand what is going on.
1.1.12 Examples.
1.1. REVIEW OF LINEAR ALGEBRA AND CALCULUS 17

(a)Consider the tangent vector to curves p(t) on the sphere x2 + y 2 + z 2 = R2


in R3 .
The function f (x, y, z) = x2 + y 2 + z 2 is constant on such a curve p(t), i.e.
f (p(t)) =const, and, hence
d
0 = dt f (p(t))|t=to = dfp(to ) (ṗ(to )) = xo ξ + yo η + zo ζ
where p(to ) = (xo , yo , zo ) and ṗ(to ) = (ξ, η, ζ). Thus we see that tangent vectors
at po to curves on the sphere do indeed satisfy the equation xo ξ + yo η + zo ζ = 0,
as we already mentioned.
(b) Let r = r(t), θ = θ(t) be the parametric equation of a curve in polar
coordinates x = r cos θ, y = r sin θ. The tangent vector of this curve has
.
components x, ẏ given by
dx ∂x dr ∂x dθ dr dθ
dt = ∂r dt + ∂θ dt = cos θ dt − r sin θ dt
dy ∂y dr ∂y dθ dr dθ
dt = ∂r dt + ∂θ dt = sin θ dt + r cos θ dt
In particular, consider a θ-coordinate line r = ro (constant), θ = θo + t (vari-
able). As a curve in the rθ plane this is indeed a straight line through (ro , θo );
its image in the xy plane is a circle through the corresponding point (xo , yo ).
The equations above become
dx ∂x dr ∂x dθ
dt = ∂r dt + ∂θ dt = (cos θ) 0 − (r sin θ) 1
dy ∂y dr ∂y dθ
dt = ∂r dt + ∂θ dt = (sin θ) 0 + (r cos θ) 1
They show that the tangent vector to image in the xy plane of the θ-coordinate
line the image of the θ-basis vector eθ = (1, 0) under the differential of the polar
coordinate map x = r cos θ, y = r sin θ. The tangent vector to the image in the
xy plane of the θ-coordinate line is the image of the θ-basis vector eθ = (0, 1)
under the differential of the polar coordinate map x = r cos θ, y = r sin θ. The
tangent vector to the image of the r-coordinate line is similarly the image of the
r-basis vector er = (1, 0).

θ y
0
0
0

r θ
r x

Fig. 2

(c) Let z = f (r, θ) be a function given in polar coordinates. Then ∂z/∂x and
∂z/∂y are found by solving the equations
∂z ∂z ∂x ∂z ∂y ∂z ∂z
∂r = ∂x ∂r + ∂y ∂r = ∂x cos θ + ∂y sin θ
∂z ∂z ∂x ∂z ∂y ∂z ∂z
∂θ= ∂x ∂θ + ∂y ∂θ = ∂x (−r sin θ) + ∂y r cos θ
which gives (e.g. using Cramer’s Rule)
∂z ∂z sin θ ∂z
∂x = cos θ ∂r − r ∂θ
∂z ∂z cos θ ∂z
∂y = sin θ ∂r + r ∂θ
18 CHAPTER 1. MANIFOLDS

1.1.13 Corollary. Suppose f : Rn · · · → Rm and g : Rm · · · → Rn are inverse


functions, i.e. g ◦ f (x) = x and f ◦ g(y) = y whenever defined. Then dfx and
dgy are inverse linear transformations when y = f (x), i.e. x = g(y):

dgf (x) ◦ dfx = 1, dfg(y) ◦ dgy = 1.

This implies in particular that dfx : Rn → Rm is an invertible linear trans-


formation, and in particular n = m. If we write in coordinates y = f (x) as
y j = y j (x) and x = g(y) as xi = xi (y) this means that we have
∂y i ∂xj i ∂xi ∂y
j
i
∂xj ∂y k = δk and ∂y j ∂xk = δk
For us, the most important theorem of differential calculus is the following.
1.1.14 Theorem (Inverse Function Theorem). Letf : Rn · · · → Rn be a
C k (k ≥ 1) map and xo a point in its domain. If dfxo : Rn → Rn is an
invertible linear transformation, then f maps some open neighbourhood U of
xo on–to−one onto an open neighbourhood V of f (xo ) and the inverse map
g : V → U is also C k .
We paraphrase the conclusion by saying that f is locally C k –invertible at xo . If
we write again y = f (x) as y j = y j (x) then the condition that dfxo be invertible
means that det(∂y i /∂xj )xo 6= 0. We shall give the proof of the Inverse Function
Theorem after we have looked at an example.
1.1.15 Example: polar coordinates in the plane. Consider the map
F (r, θ) = (x, y) form the rθ–plane to the xy–plane given by the equations

x = r cos θ, y = r sin θ.

The Jacobian determinant of this map is


 
cos θ −r sin θ
det =r
sin θ r cos θ

Thus det[∂(x, y)/∂(r, θ)] 6= 0 except when r = 0, i.e. (x, y) = (0, 0). Hence
for any point (ro , θo ) with ro 6= 0 one can find a neighbourhood U of (ro , θo )
in the rθ-plane and a neighbourhood V of (xo , yo ) = (ro cos θo , ro sin θo ) in the
xy-plane so that F maps U one-to-one onto V and F : U → V has a C ∞ inverse
G : V → U . It is obtained by solving x = r cos θ, y = r sin θ for (r, θ), subject
to the restriction (r, θ) ∈ U, (x, y) ∈ V . For x 6= 0 the solution can be written
as
r = ± x2 + y 2 , θ = arctan xy
p

where the sign ± must be chosen so that (r, θ) lies in U . But this formula does
not work when x = 0, even though a local inverse exists as long as (x, y) 6= (0, 0).
For example, for any point off the positive x-axis one can take for U the region
in the rθ-plane described by r > 0, −π < θ < π. V is then the corresponding
region in the xy-plane, which is just the points off the positive x-axis. It is
geometrically clear that the map (r, θ) → (x, y) map V one-to-one onto U . For
any point off the negative x-axis one can take for U the region in the rθ-plane
1.1. REVIEW OF LINEAR ALGEBRA AND CALCULUS 19

described by r > 0, 0 < θ < π. V is then the corresponding region in the xy-
plane, which is just the points off the negative x-axis. It is again geometrically
clear that the map (r, θ) → (x, y) map V one-to-one onto U .
For the proof of the Inverse Function Theorem we need a lemma, which is itself
very useful.
1.1.16 Contraction Lemma. Let U be an open subset of Rn and F : U → U
be any map of U into itself which is a contraction, i.e. there is K is a positive
constant ≤ 1 so that for all x, y ∈ U, kF (y) − F (x)k ≤ Kky − xk. Then F has a
unique fixed point xo in U , i.e. a point xo ∈ U so that F (xo ) = xo . Moreover,
for any x ∈ U ,

xo = lim F n (x) (5)


n→∞

where F n = F ◦ · · · ◦ F is the n–th iterate of F .


Proof. Let x ∈ U . We show that {F n (x)} converges. By Cauchy’s convergence
criterion, it suffices to show that for any  > 0 there is an N so that kF n (x) −
F m (x)k ≤  for n, m ≥ N . For this we may assume n ≥ m so that we can write
n − m = r ≥ 0. Then
kF n (x) − F m (x)k ≤ KkF n−1 (x) − F m−1 (x)k
≤ · · · ≤ K m kF r (x) − xk. (6)
From this and the Triangle Inequality we get
kF r (x) − xk ≤ kF r (x) − F r−1 (x)k+kF r−1 (x) − F r−2 (x)k+ · · · + kF (x) − xk
≤ (K r−1 + K r−2 + · · · + 1)kF (x) − xk
r
−1
= KK−1 kF (x) − xk (7)
Since 0 < K < 1 the fraction is bounded by 1/(K − 1) and the RHS of (6) tends
to 0 as m → ∞, as desired. Thus the limit xo in (5) exists for any given x ∈ U .
To see that xo is a fixed point, consider
kF (F n (x)) − F (xo )k ≤ KkF n (x) − xo k.
As n → ∞, the LHS becomes kxo −F (xo )k while the RHS becomes Kkxo −xo k =
0, leaving us with xo − F (xo ) = 0. This proves the existence of a fixed point.
To prove uniqueness, suppose we also have F (x1 ) = x1 . Then
kxo − x1 k = kF (xo ) − F (x1 )k ≤ Kkxo − x1 k
which implies kxo − x1 k = 0, since K 6= 1.
Proof of the Inverse Function Theorem. The idea of the proof is this. To
find an inverse for f we have to solve the equation y = f (x) for x if y is given:
x = f −1 (y). We rewrite this equation as

y + x − f (x) = x,

which says that x is a fixed point of the map h(x) := y + x − f (x), y being
given. So we’ll try to show that this h is a contraction map. The proof itself
takes several steps.
(1) Preliminaries. We may assume x0 = 0, by composing f with the map
x → x − xo . Next we may assume that df0 = 1, the identity map, by replacing
20 CHAPTER 1. MANIFOLDS

f by (df0 )−1 ◦ f and using 1.1.12, p 1.1 We set

g(x) = x − f (x).

Then dg0 = 1 − df0 = 0.


(2) Claim. There is r > 0 so that for kxk ≤ 2r we have
1
kdgx (v)k ≤ kvk (8)
2
for all v. Check. kdgx (v)k ≤ kdgx kkvk. For x = 0 we have kdgx k = 0, so
by continuity we can make kdgx karbitrarily small, and in particular ≤ 1/2, by
making kxksufficiently small, which we may write as kxk ≤ 2r.
(3) Claim. g(x) = x − f (x) maps the ball Br = {x | kxk ≤ r} into Br/2 . Check.
By the Fundamental Theorem of Calculus we have
Z 1 Z 1
d
g(x) = g(tx)dt = dgtx (x)dt
0 dt 0

hence from the triangle inequality for integrals and (8),


Z 1
1
kg(x)k ≤ kdgtx (x)kdt ≤ .
0 2

(4) Claim. Given y ∈ Br/2 there is a unique x ∈ Br so that f (x) = y. Check.


Fix y ∈ Br/2 and consider
h(x) = y + g(x)
with x ∈ Br . Since kyk ≤ r/2 and kxk ≤ r we have kh(x)k ≤ kyk+kg(x)k ≤ r.
So h maps Br into itself. Let x(t) = x1 + t(x2 − x1 ). By the Fundamental
Theorem of Calculus we have
Z 1 Z 1
d
h(x2 ) − h(x1 ) = g(x(t))dt = dgx(t) (x2 − x1 )dt.
0 dt 0

Together with (8) this gives


Z 1 Z 1
1 1
kh(x2 ) − h(x1 )k ≤ kdgx(t) (x2 − x1 )kdt ≤ kx2 − x1 kdt = kx2 − x1 k.
0 0 2 2

Thus h : Br → Br is a contraction map with K = 1/2 in the Contraction


Lemma. It follows that h has a unique fixed point, i.e for any y ∈ Br/2 there
is a unique x ∈ Br so that y + g(x) = x, which amounts to y = f (x). This
proves the claim. Given y ∈ Br/2 write x = f −1 (y) for the unique x ∈ Br
satisfying f (x) = y. Thus f −1 : Br/2 → f −1 (Br/2 ) ⊂ Br is an inverse of
f : f −1 (Br/2 ) → Br/2 .
(5) Claim. f −1 is continuous. Check. Since g(x) = x − f (x) we have x =
g(x) + f (x). Thus
kx1 − x2 k = kf (x1 ) − f (x2 ) + g(x1 ) − g(x2 )k
1.1. REVIEW OF LINEAR ALGEBRA AND CALCULUS 21

≤ kf (x1 ) − f (x2 )k+kg(x1 ) − g(x2 )k


≤ kf (x1 ) − f (x2 )k+ 12 kx1 − x2 k
sokx1 − x2 k ≤ 2kf (x1 ) − f (x2 )k, i.e. kf −1 (y1 ) − f −1 (y2 )k ≤ 2ky1 − y2 k. Hence
f −1 is continuous.
(6) Claim. f −1 is differentiable on the open ball U = {y | kyk < 12 r}. Check.
Let y, y1 ∈ U . Then y = f (x), y1 = f (x1 ) with x, x1 ∈ Br . We have

f −1 (y) − f −1 (y1 ) − (dfx1 )−1 (y − y1 ) = x − x1 − (dfx1 )−1 (f (x) − f (x1 )). (9)

Since f is differentiable,

f (x) = f (x1 ) + dfx1 (x − x1 ) + o(x − x1 ).

Substituting this into the RHS of (9) we find

(dfx1 )−1 (o(x − x1 ). (10)

Since the linear map dfx1 depends continuously on x1 ∈ Br and det dfx1 6=
0 there, its inverse (dfx1 )−1 is continuous there as well. (Think of Cramer’s
formula for A−1 .) Hence we can apply the argument of step (2) to (10) and
find that
k(dfx1 )−1 (o(x − x1 )k ≤ Co(x − x1 ). (11)
for x, x1 ∈ Br . We also know that

kx − x1 k ≤ 2ky − y1 k.

Putting these inequalities together we find from (9) that

f −1 (y) − f −1 (y1 ) − dfx−1


1
(y − y1 ) = o(y − y1 )

for another little o. This proves that f −1 is differentiable at y1 with (df −1 )y1 =
(dfx1 )−1 .
(7) Conclusion of the proof. Since f is of class C 1 the matrix entries ∂f i /∂xj
of dfx are continuous functions of x, hence the matrix entries of

(df −1 )y = (dfx )−1 (12)

are continuous functions of y. (Think again of Cramer’s formula for A−1 and
recall that x = f −1 (y) is a continuous function of y.) This proves the theorem
for k = 1. For k = 2, we have to show that (12) is still C 1 as a function of y.
As a function of y, the RHS of (11) is a composite of the maps y → x = f −1 (y),
x → dfx , and A → A−1 . This first we just proved to be C 1 ; the second is C 1
because f is C 2 ; the third is C 1 by Cramer’s formula for A−1 . This proves the
theorem for k = 2, and the proof for any k follows from the same argument by
induction.
EXERCISES 1.1
1. Let V be a vector space, {vi } and {ṽi } two basis for V . Prove (summation
convention throughout):
22 CHAPTER 1. MANIFOLDS

(a) There is an invertible matrix (aji ) so that ṽj = aij vi .


(b) The components (xi ) and (x̃j ) of a vector v ∈ V are related by the equation
xi = aij x̃j . [Make sure you proof is complete: you must prove that the xi have
to satisfy this relation.]
(c) The components (ξi ) and (ξ˜j ) of a linear functional vector ϕ ∈ V ∗ with
respect to the dual bases are related by the equation ξ˜j = aij ξi . [Same advice.]
2. Suppose A, B : Rn → Rn is a linear transformations of Rn which are inverses
of each other. Prove from the definitions that their matrices satisfy ajk bki = δij .
3. Let A : V → W be a linear map. (a) Suppose A is injective. Show that there
is a basis of V and a basis of W so that in terms of components x → y := A(x)
becomes (x1 , · · · , xn ) → (y 1 , · · · , y n , · · · , y m ) := (x1 , · · · , xn , 0, · · · , 0).
(b) Suppose A is surjective. Show that there is a basis of V and a basis of W so
that in terms of components x → y := A(x) becomes (x1 , · · · , xm , · · · , xn ) →
(y 1 , · · · , y m ) := (x1 , · · · , xm ).
Note. Use these bases to identify V ≈ Rn and W ≈ Rm . Then (a) says that
A becomes the inclusion Rn → Rm (n ≤ m) and (b) says that A becomes the
projection Rn → Rm (n ≥ m). Suggestion. In (a) choose a basis {v1 , · · · , vm }
for V and {w1 , · · · , wm , wm+1 , · · · , wn } for W so that wj = A(vj ) for j ≤ m.
Explain how. In (b) choose a basis {v1 , · · · , vm , vm+1 , · · · , vn } for V so that
A(vj ) = 0 iff j ≥ m + 1. Show that {w1 = A(v1 ), · · · , wm = A(vm )} is a basis
for W . You may wish to use the fact that any linearly independent subset of a
vector space can be completed to a basis, which you can look up in your linear
algebra text.
4. Give a detailed proof of the equation (a∗ )ji = aji stated in the text. Explain
carefully how it is related to the equation (a∗ )ij = aji mentioned there. Start
by writing out carefully the definitions of the four quantities (a∗ )ji , aji , (a∗ )ij ,
aji .
5. Deduce the corollary 1.1.8 from the Chain Rule.
6. Let f : (r, θ, φ) → (x, y, z) be the spherical coordinate map:
x = ρ cos θ sin φ, y = ρ sin θ sin φ, z = ρ cos φ.
(a) Find dx, dy, dz in terms of dρ, dθ, dφ . (b) Find ∂f ∂f ∂f
∂ρ , ∂θ , ∂φ in terms of
∂f ∂f ∂f
∂x , ∂y , ∂z .

7. Let f : (ρ, θ, φ) → (x, y, z) be the spherical coordinate map as above.


(a)Determine all points (ρ, θ, φ) at which f is locally invertible. (b)For each
such point specify an open neighbourhood, as large as possible, on which f is
one-to-one. (c) Prove that f is not locally invertible at the remaining point(s).
8. (a) Show that the volume V (u, v, w) of the parallelepiped spanned by three
vectors u, v, w in R3 is the absolute value of u · (v × w). [Use a sketch and the
geometric properties of the dot product and the cross product you know from
your first encounter with vectors as little arrows.]
(b) Show that u · (v × w) = |u, v, v|the determinant of the matrix of the com-
ponents of u, v, w with respect to the standard basis e1 , e2 , e3 of R3 . Deduce
1.1. REVIEW OF LINEAR ALGEBRA AND CALCULUS 23

that for any linear transformation A of R3 , V (Au, Av, Aw) = | det A|V (u, v, w).
Show that this formula remains valid for the matrix of components with respect
to any right–handed orthonormal basis u1 , u2 , u3 of R3 . [Right–handed means
det[u1 , u2 , u3 ] = +1.]
9. Let f : (ρ, θ, φ) → (x, y, z) be the spherical coordinate map. [See problem
6]. For a given point (ρo , θo , φo ) in ρθφ-space, let vectors vρ , vθ , vφ at the point
(xo , yo , zo ) in xyz-space which correspond to the three standard basis vectors
eρ , eθ , eφ = (1, 0, 0), (0, 1, 0), (0, 0, 1) in ρθφ-space under the differential df of f .
(a) Show that vρ is the tangent vector of the ρ–coordinate curve R3 , i.e. the
curve with parametric equation ρ =arbitrary (parameter), θ = θo , φ = φo in
spherical coordinates. Similarly for vθ and vφ . Sketch.
(b) Find the volume of the parallelepiped spanned by the three vectors vρ , vθ , vφ
at (x, y, z). [See problem 8.]
10. Let f, g be two differentiable functions Rn → R. Show from the definition
(1.4) of df that
d(f g) = (df )g + f (dg).
11. Suppose f : Rn → R is C2 . Show that the second partials are symmetric:
∂ ∂ ∂ ∂
∂xi ∂xj f = ∂xj ∂xi f for all ij.
[You may consult your calculus text.]
12. Use the definition of df to calculate the differential dfx (v) for the following
functionsPf (x). [Notation: x = (xi ), v = (v i ).]
(a) i ci xi (ci =constant) (b) x1 x2 · · · xn (c) i ci (xi )2 .
P

13. Consider det(X) as function of X = (Xij ) ∈ Rn×n .


a) Show that the differential of det at X = 1 (identityP matrix) is the trace, i.e.
d det1 (A) =trA. [The trace of A is defined as trA = i Aii . Suggestion. Expand
det(1 + tA) using the definition (1) of det to first order in t, i.e. omitting terms
involving higher powers of t. Explain why this solves the problem. If you can’t
do it in general, work it out at least for n = 2, 3.]
b) Show that for any invertible matrix X ∈ Rn×n , d detX (A) = det(X) tr(X −1 A).
[Suggestion. Consider a curve X(t) and write det(X(t)) = det(X(to )) det(X(to )−1 X(t)).
Use the Chain Rule 1.1.7.]
14. Let f (x1 , · · · , xn ) be a differentiable function satisfying fP
(tx1 , · · · , txn ) =
∂f
tN f (x1 , · · · , xn ) for all t > 0 and some (fixed) N. Show that i xi ∂x i = Nf.

[Suggestion. Chain Rule.]


15. Calculate the Jacobian matrices for the polar coordinate map (r, θ) →
(x, y)pand its inverse (x, y) → (r, θ), given by x = r cos θ, y = r sin θ and
r = x2 + y 2 , θ = arctan xy . Verify by direct calculation that these matrices
are inverses of each other, as asserted in corollary (1.1.13).
16. Consider an equation of the form f (x, y) = 0 where f (x, y) is a C ∞
function of two variables. Let (xo , yo ) be a particular solution of this equation
for which (∂f /∂y)(xo ,yo ) 6= 0. Show that the given equation has a unique solution
y = y(x) for y in terms of x and z, provided (x, y) is restricted to lie in a suitable
24 CHAPTER 1. MANIFOLDS

neighbourhood of (xo , yo ). [Suggestion. Apply the Inverse Function Theorem


to F (x, y) := (x, f (x, y)].
17. Consider a system of k equations in n = m+k variables, F 1 (x1 , · · · , xn ) = 0,
· · · , F k (x1 , · · · , xn ) = 0 where the F i are C ∞ functions. Suppose det[∂F i /∂xm+j |
1 ≤ i, j ≤ k] 6= 0 at a point po ∈ Rn . Show that there is a neighbourhood U of
po in Rn so that if (x1 , · · · , xn ) is restricted to lie in U then the equations admit
a unique solution for xm+1 , · · · , xn in terms of x1 , · · · , xm . [This is called the
Implicit Function Theorem.]
18. (a) Let t → X(t) be a differentiable map from R into the space Rn×n of
−1
n × n matrices. Suppose that X(t) is invertible
 forall t. Show that t → X(t)
−1
is also differentiable and that dXdt = −X −1 dX dt X
−1
where we omitted the
variable t from the notation. [Suggestion. Consider the equation XX −1 = 1.]
(b) Let f be the inversion map of Rn×n itself, given by f (X) = X −1 . Show
that f is differentiable and that dfX (Y ) = −X −1 Y X −1 .
1.2. MANIFOLDS: DEFINITIONS AND EXAMPLES 25

1.2 Manifolds: definitions and examples


The notion of “manifold” took a long time to crystallize, certainly hundreds of
years. What a manifold should be is clear: a kind of space to which one can
apply the methods of differential calculus, based on local linear approximations
for functions. Riemann struggled with the concept in his lecture of 1854.
Weyl is credited with its final formulation, worked out in his book on Riemann
surfaces of 1913, in a form adapted to that context. In his Leçons of 1925-1926
Cartan sill finds that la notion générale de variété est assez difficile à définir
avec précison and goes on to give essentially the definition offered below. As
is often the case, all seems obvious once the problem is solved. We give the
definition in three axioms, à la Euclid.
1.2.1 Definition. An n–dimensional C ∞ manifold M consists of points together
with coordinate systems. These are subject to the following axioms.
MAN 1. Each coordinate system provides a one-to-one correspondence between
the points p in a certain subset U of M and the points x = x(p) in an open
subset of Rn .
MAN 2. The coordinate transformation x̃(p) = f (x(p)) relating the coordinates
of the same point p ∈ U ∩ Ũ in two different coordinate systems (xi ) and (x̃i )
is given by a C ∞ map

x̃i = f i (x1 , · · · , xn ), i = 1, · · · , n.

The domain {x(p) | p ∈ U ∩ Ũ } of f is required to be open in Rn .


MAN 3. Every point of M lies in the domain of some coordinate system.
Fig. 1 shows the maps p → x(p), p → x̃(p), and x → x̃ = f (x)

f
x x~

Fig. 1

In this axiomatic definition of “manifold“, points and coordinate systems func-


tion as primitive notions, about which we assume nothing but the axioms MAN
1–3. The axiom MAN 1 says that a coordinate system is entirely specified by a
certain partially defined map M · · · →Rn ,

p → x(p) = (x1 (p), · · · , xn (p))


26 CHAPTER 1. MANIFOLDS

and we shall take “coordinate system” to mean this map, in order to have
something definite. But one should be a bit flexible with this interpretation: for
example, in another convention one could equally well take “coordinate system“
to mean the inverse map Rn · · · → M, x → p(x) and when convenient we shall
also use the notation p(x) for the point of M corresponding to the coordinate
point x.
It will be noted that we use the same symbol x = (xi ) for both the coordinate
map p → x(p) and a general coordinate point x = (x1 , · · · , xn ), just as one
does with the xyz-coordinates in calculus. This is premeditated confusion: it
leads to a notation which tells one what to do (just like the ∂y i /∂xj in calculus)
and suppresses extra symbols for the coordinate map and its domain, which
are usually not needed. If it is necessary to have a name for the coordinate
domain we can write (U, x) for the coordinate system and if there is any danger
of confusion because the of double meaning of the symbol x (coordinate map
and coordinate point) we can write (U, φ) instead. With the notation (U, φ)
comes the term chart for coordinate system and the term atlas for any collection
{(Uα , φα )} of charts satisfying MAN 1–3. In any case, one should always keep
in mind that a manifold is not just a set of points, but a set of points together
with an atlas, so that one should strictly speaking consider a manifold as a
pair (M, {(Uα , φα )}). But we usually call M itself a manifold, and speak of
its manifold structure when it is necessary to remind ourselves of the atlas
{(Uα , φα )}. Here are some examples to illustrate the definition.
1.2.2 Example: the Cartesian plane. As the set of points we take R2 = {p =
(x, y) | x, y ∈ R}. As it stands, this set of points is not a manifold: we have to
specify a collection of the coordinate systems satisfying the axioms MAN 1–3.
Each coordinate system will be a map p → (x1 (p), x2 (p)), which maps a domain
of points p, to be specified, one–to–one onto an open set of pairs (x1 , x2 ). We
list some possibilities.
Cartesian coordinates (x, y): x(p) = x, y(p) = y if p = (x, y). Domain: all p.
Note again that x,y are used to denote the coordinate of a general point p as
well as to denote the coordinate functions p → x(p), y(p)
Polar coordinates (r, θ): x = r cos θ, y = r sin θ. Domain: (r,θ) may be used as
coordinates in a neighbourhood of any point where ∂(x, y)/∂(r, θ) = r 6= 0, i.e.
in a neighbourhood of any point except the origin. A possible domain is the
set of points p for which r > 0, 0 < θ < 2π, the p = (x, y) given by the above
equations for these values of r, θ; these are just the p’s off the positive x-axis.
Other domains are possible and sometimes more convenient.
Hyperbolic coordinates (u,ψ): x = u cosh ψ, y = u sinh ψ. Domain: The pairs
(x, y) representable in this form are those satisfying x2 − y 2 = u2 ≥ 0.(u, ψ)
may be used as coordinates in a neighbourhood of any point (x, y) in this region
for which det ∂(x,y)/∂(u,ψ) = u 6= 0. The whole region {(x, y) | x2 − y 2 > 0}
corresponds one-to-one to {(u, ψ) | u 6= 0} and can serve as coordinate domain.
In the region {(x, y) | x2 − y 2 < 0} one can use x = u sinh ψ, y = u cosh ψ as
coordinates.
1.2. MANIFOLDS: DEFINITIONS AND EXAMPLES 27
 
a b
Linear coordinates (u, v): u = ax + by, v = cx + dy, where is any
c d
invertible 2 × 2 matrix (i.e. ab − cd 6= 0). A special case are orthogonal linear
coordinates, when the matrix is orthogonal. Domain: all p.

Affine coordinates
 (u,
 v): u = xo + ax + by, v = yo + cx + dy, where xo , yo are
a b
arbitrary and is any invertible 2×2 matrix (i.e. ad − bc 6= 0). A special
c d
case are Euclidean coordinates, when the matrix is orthogonal. Domain: all p.
As we mentioned, these are some possible choices for coordinate systems on
R2 . The question is which to choose for the collection of coordinate systems
required in MAN 1–3. For example, the Cartesian coordinate system {(x, y)}
by itself evidently satisfies the axioms MAN 1–3, hence this single coordinate
system suffices to make R2 into a manifold. On the other hand, we can add
the polar coordinates (r, θ), with domain r > 0, 0 < θ < 2π, say. So that
we take as our collection of coordinates {(x, y), (r, θ)}. We now have to check
that the axioms are satisfied. The only thing which is not entirely clear is that
the coordinate transformation (x, y) → (r, θ) = (f 1 (x, y), f 2 (x, y)) is C ∞ on its
domain, as specified by MAN 2. First of all, the domain of this map consists
of the (x, y) off the positive x–axis {(x, y) | y = 0, x ≥ 0}, which corresponds
to {(r, θ) | r > 0, 0 < θ < 2π}. Thus this domain is open, and it is clear
geometrically that (x, y) → (r, θ) is a one–to–one correspondence between these
sets. However, the map (x, y) → (r, θ) this map is not given explicitly, but is
defined implicitly as the inverse of the map (θ, r) → (x, y) = (r cos θ, r sin θ).
We might try to simply write down the inverse map as

r = f 1 (x, y) = (x2 + y 2 ), θ = f 2 (x, y) = arctan(y/x).

But this is not sufficient: as it stands, f 2 (x, y) is not defined when x = 0, but
some of these points belong to the domain of (x, y) → (r, θ). It is better to argue
like this. As already mentioned, we know that the inverse map f exists on the
domain indicated. Since the inverse of a map between two given sets is unique
(if it exists) this map f must coincide with the local inverse F guaranteed by
the Inverse Function Theorem whenever wherever are both are defined as maps
between the same open sets. But the Inverse Function Theorem says also that
F is C ∞ where defined. Hence f is then C ∞ as well at points where some such
F can be found, i.e. at all points of its domain where the Jacobian determinant
∂(x, y)/∂(r, θ) 6= 0 i.e. r 6= 0, and this is includes all points in the domain of f .
Hence all three axioms are satisfied for M = R2 together with {(x, y), (r, θ)}.
Perhaps I belabored too much the point of verifying MAN 2 for the implicitly
defined map (x, y) → (r, θ), but it illustrates a typical situation and a typical
argument. In the future I shall omit these details. There is another important
point. Instead of specifying the domain of the coordinates (r, θ) by r > 0, 0 <
θ < 2π, it is usually sufficient to know that the equations x = r cos θ, y = r sin θ
can be used to specify coordinates in some neighbourhood of any point (x, y)
where r 6= 0, as guaranteed by the Inverse Function Theorem. For example,
28 CHAPTER 1. MANIFOLDS

we could admit all of these local specifications of (r, θ) among our collection
of coordinates (without specifying explicitly the domains), and then all of the
axioms are evidently satisfied. This procedure is actually most appropriate,
since it brings out the flexibility in the choice of the coordinates. It is also
the procedure implicitly followed in calculus, where one ignores the restrictions
r > 0, 0 < θ < 2π as soon as one discusses a curve like r = sin 2θ.
We could add some or all of the other coordinates defined above and check
again that the axioms still hold and we face again the question which to choose.
One naturally has the feeling that it should not matter, but strictly speaking,
according to the definition, we would get a different manifold structure on the
point set R2 for each choice. We come back to this in a moment, but first
I briefly look at some more examples, without going through all the detailed
verifications, however.

1.2.3 Example: Cartesian 3-space R3 = {(x, y, z) | x, y, z ∈ R}. I only


write down the formulas and leave the discussion to you, in particular the de-
termination of the coordinate domains.
Cartesian coordinates (x, y, z): x(p) = x, y(p) = y, z(p) = z.
Cylindrical coordinates (r, θ, z): x = r cos θ, y = r sin θ, z = z.
Spherical coordinates (ρ, θ, φ): x = ρ cos θ sin φ, y = ρ sin θ sin φ, z = ρ cos φ.
Hyperbolic coordinates (u, θ, ψ): x = u cos θ sinh ψ, y = u sin θ sinh ψ, z =
u cosh ψ.
Linear coordinates (u,v, w): (u,v, w) = (x, y, z)A, A any invertible 3×3 matrix.
Special case: orthogonal linear coordinates: AAT = I.
Affine coordinates: (u,v, w) = (xo , yo , zo ) + (x, y, z)A, (xo , yo , zo ) any point, A
any invertible 3×3 matrix. Special case: Euclidean coordinates: AAT = I.

φ
z
p

y
x r
θ

Fig. 2. Spherical coordinates

1.2.4 Cartesian n-space Rn = {p = (x1 , · · · , xn ) | xi ∈ R}.


Linear coordinates (y 1 , · · · , y n ): y j = aji xi , det(aji ) 6= 0.
Affine coordinates: (y 1 , · · · , y n ): y j = xjo + aji xi , det(aji ) 6= 0.
Euclidean coordinates: (y 1 , · · · , y n ): y j = xjo + aji xi , k aik ajk = δ ij
P

It can be shown that the transformations (xi ) → (y i ) defining the Euclidean


coordinates are the only ones which preserve Euclidean distance. They are
called Euclidean transformations. Instead of Rn one can start out with any
1.2. MANIFOLDS: DEFINITIONS AND EXAMPLES 29

n–dimensional real vector space V . Given a basis e1 , · · · , en of V , define linear


coordinates (xi ) on V by using the components xi = xi (p) as coordinatesp =
x1 e1 + · · · xn en .This single coordinate system makes V into a manifold. This
manifold structure on V does not depend on the basis in a sense to be explained
after definition 1.2.8.
1.2.5 The 2-sphere S 2 = {p = (x, y, z) | x2 + y 2 + z 2 = 1}. Note first of all
that the triple (x, y, z) itself cannot be used as a coordinate system on S 2 . So
we have to do something else.
Parallel projection coordinates (x,y):
p
(x, y) : x = x, y = y, z = 1 − x2 − y 2 , Domain: z > 0

Similar with the negative root for z < 0 and with (x, y) replaced by (x, z) or by
(y, z). This gives 6 coordinate systems, corresponding to the parallel projections
onto the 3 coordinate planes and the 2 possible choices of sign ± in each case.
Geographical coordinates (θ, φ): x = cos θ sin φ, y = sin θ sin φ, z = cos φ. Do-
main: 0 < θ < 2π, 0 < φ < π. (Other domains are possible.)
Central projection coordinates (u, v)
u v 1
x= √ , y=√ , z=√ . Domain: z > 0..
1 + u2 + v 2 1 + u2 + v 2 1 + u2 + v 2

z
(u,v)

p
y

Fig. 3. Central projection

This is the central projection of the upper hemisphere z > 0 onto the plane
z = 1. One could also take the lower hemisphere or replace the plane z = 1
by x = 1 or by y = 1. This gives again 6 coordinate systems. The 6 parallel
projection coordinates by themselves suffice to make S 2 into a manifold, as do
the 6 central projection coordinates. The geographical coordinates do not, even
if one takes all possible domains for (r, θ): the north pole (0, 0, 1) never lies in
a coordinate domain. However, if one defines another geographical coordinate
system on S 2 with a different north pole (e.g. by replacing (x, y, z) by (z, y, x)
in the formula) one obtains enough geographical coordinates to cover all of S 2 .
All of the above coordinates could be defined with (x, y, z) replaced by any
orthogonal linear coordinate system (u, v, w) on R3 .
30 CHAPTER 1. MANIFOLDS

Let’s now go back to the definition of manifold and try understand what it is
trying to say. Compare the situation in analytic geometry. There one starts
with some set of points which comes equipped with a distinguished Cartesian
coordinate system (x1 , · · · , xn ), which associates to each point p of n–space a
coordinate n–tuple (x1 (p), · · · , xn (p)). Such a coordinate system is assumed
fixed once and for all, so that we may as well the points to be the n–tuples,
which means that our “Cartesian space“ becomes Rn , or perhaps some subset
of Rn , which better be open if we want to make sense out differentiable func-
tions. Nevertheless, but one may introduce other curvilinear coordinates (e.g.
polar coordinates in the plane) by giving the coordinate transformation to the
Cartesian coordinates (e.g x =rcos θ, y =rsin θ). Curvilinear coordinates are
not necessarily defined for all points (e.g. polar coordinates are not defined at
the origin) and need not take on all possible values (e.g. polar coordinates are
restricted by r≥ 0, 0 ≤ θ < 2π). The requirement of a distinguished Cartesian
coordinate system is of course often undesirable or physically unrealistic, and
the notion of a manifold is designed to do away with this requirement. However,
the axioms still require that there be some collection of coordinates. We shall
return in a moment to the question in how far this collection of coordinates
should be thought of as intrinsically distinguished.
The essential axiom is MAN 2, that any two coordinate systems should be
related by a differentiable coordinate transformation (in fact even C ∞ , but this
is a comparatively minor, technical point). This means that manifolds must
locally look like Rn as far as a differentiable map can “see“: all local properties
of Rn which are preserved by differentiable maps must apply to manifolds as
well. For example, some sets which one should expect to turn out to be manifolds
are the following. First and foremost, any open subset of some Rn , of course;
smooth curves and surfaces (e.g. a circle or a sphere); the set of all rotations of
a Euclidean space (e.g. in two dimensions, a rotation is just by an angle and this
set looks just like a circle). Some sets one should not expect to be manifolds
are: a half–line (with an endpoint) or a closed disk in the plane (because of
the boundary points); a figure–8 type of curve or a cone with a vertex; the
set of the possible rotations of a steering wheel (because of the “singularity”
when it doesn’t turn any further. But if we omit the offending singular points
from these non–manifolds, the remaining sets should still be manifolds.) These
example should give some idea of what a manifold is supposed to be, but they
may be misleading, because they carry additional structure, in addition to their
manifold structure. For example, for a smooth surface in space, such as a
sphere, we may consider the length of curves on it, or the variation of its normal
direction, but these concepts (length or normal direction) do not come from
its manifold structure, and do not make sense on an arbitrary manifold. A
manifold (without further structure) is an amorphous thing, not really a space
with a geometry, like Euclid’s. One should also keep in mind that a manifold is
not completely specified by naming a set of points: one also has to specify the
coordinate systems one considers. There may be many natural ways of doing
this (and some unnatural ones as well), but to start out with, the coordinates
have to be specified in some way.
1.2. MANIFOLDS: DEFINITIONS AND EXAMPLES 31

Let’s think a bit more about MAN 2 by recalling the meaning of “differentiable“:
a map is differentiable if it can be approximated by a linear map to first order
around a given point. We shall see later that this imposes a certain kind of
structure on the set of points that make up the manifolds, a structure which
captures the idea that a manifold can in some sense be approximated to first
order by a linear space. “Manifolds are linear in infinitesimal regions” as classical
geometers would have said.
One should remember that the definition of “differentiable“ requires that the
function in question be defined in a whole neighbourhood of the point in ques-
tion, so that one may take limits from any direction: the domain of the function
must be open. As a consequence, the axioms involve “openness” conditions,
which are not always in agreement with the conventions of analytic geometry.
(E.g. polar coordinates must be restricted by r > 0, 0 < θ < 2π; −strict
inequalities!). I hope that this lengthy discussion will clarify the definition,
although I realize that it may do just the opposite.
As you can see, the definition of “manifold“ is really very simple, much shorter
than the lengthy discussion around it; but I think you will be surprised at the
amount of structure hidden in the axioms. The first item is this.
1.2.6 Definition. A neighbourhood of a point po in M is any subset of M
containing all p whose coordinate points x(p) in some coordinate system satisfy
kx(p) − x(po )k <  for some  > 0. A subset of M is open if it contains a
neighbourhood of each of its points, and closed if its complement is open.
This definition makes a manifold into what is called a topological space: there
is a notion of “neighbourhood“ or (equivalently) of “open set”. An open subset
U of M , together with the coordinate systems obtained by restriction to U , is
again an n–dimensional manifold.
1.2.7 Definition. A map F : M → N between manifolds is of class Ck ,
0 ≤k≤ ∞, if F maps any sufficiently small neighbourhood of a point of M into
the domain of a coordinate system on N and the equation q = F (p) defines a
Ck map when p and q are expressed in terms of coordinates:

y j = F j (x1 , · · · , xn ), j = 1, · · · , m.

We can summarize the situation in a commutative diagram like this:


F
M → N
(xi ) ↓ ↓ (y j )
Rn →j Rm
(F )
This definition is independent of the coordinates chosen (by axiom MAN 2) and
applies equally to a map F : M · · · → N defined only on an open subset of N .
A C ∞ map F : M → N which is bijective and whose inverse is also C ∞ is called
a diffeomorphism of M onto N .
Think again about the question if the axioms capture the idea of a manifold
as discussed above. For example, an open subset of Rn together with its single
Cartesian coordinate system is a manifold in the sense of the definition above,
32 CHAPTER 1. MANIFOLDS

and if we use another coordinate system on it, e.g. polar coordinates in the
plane, we would strictly speaking have another manifold. But this is not what
is intended. So we extend the notion of “coordinate system“ as follows.
1.2.8 Definition. A general coordinate system on M is any diffeomorphism
from an open subset U of M onto an open subset of Rn .
These general coordinate systems are admitted on the same basis as the coor-
dinate systems with which M comes equipped in virtue of the axioms and will
just be called “coordinate systems“ as well. In fact we shall identify mani-
fold structures on a set M which give the same general coordinates, even if the
collections of coordinates used to define them via the axioms MAN 1–3 were
different. Equivalently, we can define a manifold to be a set M together with
all general coordinate systems corresponding to some collection of coordinates
satisfying MAN 1–3. These general coordinate systems form an atlas which is
maximal, in the sense that it is not contained in any strictly bigger atlas. Thus
we may say that a manifold is a set together with a maximal atlas; but since
any atlas can always be uniquely extended to a maximal one, consisting of the
general coordinate systems, any atlas determines the manifold structure. That
there are always plenty of coordinate systems follows from the inverse function
theorem:
1.2.9 Theorem. Let x be a coordinate system in a neighbourhood of po and
f : Rn · · · → Rn x̃i = f i (x1 , · · · , xn ), a C ∞ map defined in a neighbourhood of
xo = x(po ) with det(∂ x̃i /∂xj )xo 6= 0. Then the equation x̃(p) = f (x(p)) defines
another coordinate system in a neighbourhood of po .
In the future we shall use the expression “a coordinate system around po “ for
“a coordinate system defined in some neighbourhood of po ”.
A map F : M → N is a local diffeomorphism at a point po ∈ M if po has
an open neighbourhood U so that F |U is a diffeomorphism of U onto an open
neighbourhood of F (po ). The inverse function theorem says that this is the case
if and only if det(∂F j /∂xi ) 6= 0 at po . The term local diffeomorphism by itself
means “local diffeomorphism at all points”.
1.2.10 Examples. Consider the map F which wraps the real line R1 around
the unit circle S 1 . If we realize S 1 as the unit circle in the complex plane, S 1 =
{z ∈ C : |z| = 1}, then this map is given by F (x) = eix . In a neighbourhood
of any point zo = eixo of S 1 we can write z ∈ S 1 uniquely as z = eix with x
sufficiently close to xo , namely |x − xo | < 2π. We can make S 1 as a manifold
by introducing these locally defined maps z → x as coordinate systems. Thus
for each zo ∈ S 1 fix an xo ∈ R with zo = eixo and then define x = x(z) by
z = eix , |x − xo | < 2π, on the domain of z’s of this form. (Of course one can
cover S 1 with already two of such coordinate domains.) In these coordinates
the map x → F (x) is simply given by the formula x → x. But this does not
mean that F is one–to–one; obviously it is not. The point is that the formula
holds only locally, on the coordinate domains. The map F : R1 → S 1 is not a
diffeomorphism, only a local diffeomorphism.
1.2. MANIFOLDS: DEFINITIONS AND EXAMPLES 33

Fig.4. The map R1 → S 1

1.2.11 Definition. If M and N are manifolds, the Cartesian product M ×


N = {p, q) | p ∈ M, q ∈ N } becomes a manifold with the coordinate sys-
tems (x1 , · · · , xn , y 1 , · · · , y m ) where (x1 , · · · , xn ) is a coordinate system of M ,
(y 1 , · · · , y m ) for N .
The definition extends in an obvious way for products of more than two man-
ifolds. For example, the product R1 × · · · × R1 of n copies of R1 is Rn ; the
product S 1 × · · · × S 1 of n copies of S 1 is denoted Tn and is called the n–torus.
For n = 2 it can be realized as the doughnut–shaped surface by this name. The
maps R1 → S 1 combine to give a local diffeomorphism of the product manifolds
F : R1 × · · · × R1 → S 1 × · · · × S 1 .
1.2.12 Example: Nelson’s car. A rich source of examples of manifolds are
configuration spaces from mechanics. Take the configuration space of a car, for
example. According to Nelson, (Tensor Analysis, 1967, p.33), “the configuration
space of a car is the four dimensional manifold M= R2 ×T2 parametrized by
(x, y, θ, φ), where (x, y) are the Cartesian coordinates of the center of the front
axle, the angle θ measures the direction in which the car is headed, and φ
is the angle made by the front wheels with the car. (More realistically, the
configuration space is the open subset −θmax < θ < θmax .)“ (Note the original
design of the steering mechanism.)

(x,y)
θ

Fig.5. Nelson’s car

A curve in M is an M −valued function R · · · → M , written p = p(t)), defined


on some interval. Exceptionally, the domain of a curve is not always required
to be open. The curve is of class Ck if it extends to such a function also in a
34 CHAPTER 1. MANIFOLDS

neighbourhood of any endpoint of the interval that belongs to its domain. In


coordinates (xi ) a curve p = p(t) is given by equations xi = xi (t). A manifold
is connected if any two of its point can be joined by a continuous curve.
1.2.13 Lemma and definition. Any n-dimensional manifold is the disjoint
union of n-dimensional connected subsets of M , called the connected compo-
nents of M .
Proof. Fix a point p ∈ M . Let Mp be the set of all points which can be joined
to p by a continuous curve. Then Mp is an open subset of M (exercise). Two
such Mp ’s are either disjoint or identical. Hence M is the disjoint union of the
distinct Mp ’s.
In general, a topological space is called connected if it cannot be decomposed
into a disjoint union of two non–empty open subsets. The lemma implies that
for manifolds this is the same as the notion of connectedness defined in terms
of curves, as above.
1.2.14 Example. The sphere in R3 is connected. Two parallel lines in R2
constitute a manifold in a natural way, the disjoint union of two copies of R,
which are its connected components.
Generally, given any collection of manifolds of the same dimension n one can
make their disjoint union again into an n–dimensional manifold in an obvious
way. If the individual manifolds happen not to be disjoint as sets, one must
keep them disjoint, e.g. by imagining the point p in the i–th manifold to come
with a label telling to which set it belongs, like (p, i) for example. If the individ-
ual manifolds are connected, then they form the connected components of the
disjoint union so constructed. Every manifold is obtained from its connected
components by this process. The restriction that the individual manifolds have
the same dimension n is required only because the definition of “n–dimensional
manifold “ requires a definite dimension.
We conclude this section with some remarks on the definition of manifolds. First
of all, instead of C ∞ functions one could take C k functions for any k ≥ 1, or
real analytic functions (convergent Taylor series) without significant changes.
One could also take C 0 functions (continuous functions) or holomorphic func-
tion (complex analytic functions), but then there are significant changes in the
further development of the theory.
There are many other equivalent definitions of manifolds. For example, the
definition is often given in two steps, first one defines (or assumes known) the
notion of topological space, and then defines manifolds. One can then require
the charts to be homeomorphisms from the outset, which simplifies the state-
ment of the axioms slightly. But this advantage is more than offset by the
inconvenience of having to specify separately topology and charts, and verify
the homeomorphism property, whenever one wants to define a particular mani-
fold. It is also inappropriate logically, since the atlas automatically determines
the topology. Two more technical axioms are usually added to the definition of
manifold.
1.2. MANIFOLDS: DEFINITIONS AND EXAMPLES 35

MAN 4 (Hausdorff Axiom). Any two points of M have disjoint neighbourhoods.


MAN 5 (Countability Axiom). M can be covered by countably many coordinate
balls.
The purpose of these axioms is to exclude some pathologies; all examples we
have seen or shall see satisfy these axioms (although it is easy to artificially
construct examples which do not). We shall not need them for most of what we
do, and if they are required we shall say so explicitly.
As a minor notational point, some insist on specifying explicitly the domains of
functions, e.g. write (U, φ) for a coordinate system or f : U → N for a partially
defined map; but the domain is rarely needed and f : M · · · → N is usually
more informative.
EXERCISES 1.2
1. (a) Specify domains for the coordinates on R3 listed in example 1.2.3.
(b) Specify the domain of the coordinate transformation (x, y, z) → (ρ, θ, φ) and
prove that this map is C ∞ .
2. Is it possible to specify domains for the following coordinates on R3 so that
the axioms MAN 1–3 are satisfied?
(a) The coordinates are cylindrical and spherical coordinates as defined in exam-
ple 1.2.3. You may use several domains with the same formula, e.g use several
restrictions on r and/or θ to define several cylindrical coordinates (r, θ, z) with
different domains.
(b) The coordinates are spherical coordinates (ρ, θ, φ) but with the origin trans-
lated to an arbitrary point (xo , yo , zo ). (The transformation equations between
(x, y, z) and (ρ, θ, φ) are obtained from those of example 1.2.3 by replacing x, y, z
by x − xo , y − yo , z − zo , respectively.) You may use several such coordinate sys-
tems with different (xo , yo , zo ).
3. Verify that the parallel projection coordinates on S 2 satisfy the axioms MAN
1–3. [See example 1.2.5. Use the notation (U, φ), (Ũ , φ̃) the coordinate systems
in orderTnot to get confused with the x–coordinate in R3 . Specify the domains
U, Ũ , U Ũ etc. as needed to verify the axioms.]
4. (a)Find a map F :T2 → R3 which realizes the 2–torus T2 = S 1 × S 1 as the
doughnut–shaped surface in 3–space R3 referred to in example 1.2.11. Prove
that F is a C ∞ map. (b) Is T2 is connected? (Prove your answer.) (c)What
is the minimum number of charts in an atlas for T2 ? (Prove your answer and
exhibit such an atlas. You may use the fact that T2 is compact and that the
image of a compact set under a continuous map is again compact; see your
several–variables calculus text, e.g. Marsden–Tromba.)
5. Show that as (xi ) and (y j ) run over a collections of coordinate systems for
M and N satisfying MAN 1–3, the (xi , y j ) do the same for M × N .
6. (a) Imagine two sticks flexibly joined so that each can rotate freely about the
about the joint. (Like the nunchucks you know from the Japanese martial arts.)
The whole contraption can move about in space. (These ethereal sticks are not
36 CHAPTER 1. MANIFOLDS

bothered by hits: they can pass through each other, and anything else. Not
useful as a weapon.) Describe the configuration space of this mechanical system
as a direct product of some of the manifolds defined in the text. Describe the
subset of configurations in which the sticks are not in collision and show that it
is open in the configuration space.
(b) Same problem if the joint admits only rotations at right angles to the “grip
stick” to whose tip the “hit stick” is attached. (Wind–mill type joint; good for
lateral blows. Each stick now has a tip and a tail.)
[Translating all of this into something precise is part of the problem. Is the
configuration space a manifold at all, in a reasonable way? If so, is it connected
or what are its connected components? If you find something unclear, add
precision as you find necessary; just explain what you are doing. Use sketches.]

7. Let M be a set with two manifold structures and let M 0 , M 00 denote the
corresponding manifolds.
(a) Show that M 0 = M 00 as manifolds if and only if the identity map on M
is C ∞ as a map M 0 → M 00 and as a map M 00 → M 0 . [Start by stating the
definition what it means that “M 0 = M 00 as manifolds “.]
(b) Suppose M 0 , M 00 have the same open sets and each open set U carries the
same collection of C ∞ functions f : U → R for M 0 and for M 00 . Is then
M 0 = M 00 as manifold? (Proof or counterexample.)
8. Specify a collection {(Uα , φα )} of partially defined maps φα : Uα ⊂ R → R
as follows.
(a) U = R, φ(t) = t3 , (b) U1 = R−{0}, φ1 (t) = t3 , U2 = (−1, 1), φ2 (t) = 1/1−t.
Determine if {(Uα , φα )} is an atlas (i.e. satisfies MAN 1 − 3). If so, determine
if the corresponding manifold structure on R is the same as the usual one.
9. (a) Let S be a subset of Rn for some n. Show that there is at most one
manifold structure on S so that a partially defined map Rk · · · → S (any k)

is C ∞ if and only if Rk · · · → S →Rn is C ∞ . [Suggestion. Write S 0 , S 00 for S
equipped with two manifold structures. Show that the identity map S 0 → S 00 is
C ∞ in both directions. ]
(b) Show that (a) holds for the usual manifold structure on S = Rm considered
as subset of Rn (m ≤ n).
10. (a) Let P1 be the set of all one–dimensional subspaces of R2 (lines through
the origin). Write hxi = hx1 , x2 i for the line through x = (x1 , x2 ). Let U1 =
{hx1 , x2 i : x1 6= 0} and define φ1 : U1 → R, hxi → x2 /x1 . Define (U2 , φ2 )
similarly by interchanging x1 and x2 and prove that {(U1 , φ1 ), (U2 , φ2 )} is an
1.2. MANIFOLDS: DEFINITIONS AND EXAMPLES 37

atlas for P1 (i.e. MAN 1–3 are satisfied.) Explain why the map φ1 can be
viewed as taking the intersection with the line x1 = 1. Sketch.
(b) Generalize part (a) for the set of one–dimensional subspaces Pn of Rn+1 .
[Suggestion. Proceed as in part (a): the line x1 = 1 in R2 is now replaced by the
n–plane in Rn+1 given by this equation. Consider the other coordinate n–planes
xi = 0 as well. Sketching will be difficult for n > 2. The manifold Pn is called
(real) projective n–space.]
11. Let Pn be the set of all one–dimensional subspaces of Rn+1 (lines through
the origin). Let F : S n → Rn be the map which associates to each p ∈ S n the
line hpi = Rp.
(a) Show that Pn admits a unique manifold structure so that the map F is a
local diffeomorphism.
(b)Show that the manifold structure on Pn defined in problem 10 is the same
as the one defined in part (a).
12. Generalize problem 10 to the set Gk,n of k–dimensional subspaces of Rn .
[Suggestion. Consider the map φ which intersects a k–dimensional subspace
P ∈Gk,n with the coordinate (n − k)–plane with equation x1 = x2 = · · · = xk =
1 and similar maps using other coordinate (n − k)–planes of this type. The
manifolds Gk,n are called (real) Grassmannian manifolds.) ]
13. Let M be a manifold, p ∈ M a point of M . Let Mp be the set of all points
which can be joined to p by a continuous curve. Show that Mp is an open subset
of M .
14. Define an n–dimensional linear manifold to be a set L together with a
maximal atlas of charts L → Rn , p → x(p) which are everywhere defined bijec-
tions and any two of which are related by an invertible linear transformation
x̃j = aji xi . Show that every vector space is in a natural way a linear manifold
and vice versa. [Thus a linear manifold is really the same thing as a vector
space. The point is that vector spaces could be defined in a way analogous to
manifolds.]
15. (a) Define a notion of affine manifold in analogy with (a) using affine
coordinate transformations as defined in 1.2.7. Is every vector space an affine
manifold in a natural way? How about the other way around?
(b) Define an affine space axiomatically as follows.
Definition. An n–dimensional affine space consists of set A of points together
with a set of vectors which form an n–dimensional vector space V .
ASP 1. Any two points p, q in A determine a vector pq ~ in V .
ASP 2. Given a point p ∈ A and a vector v ∈ V there is a unique point q ∈ A
so that v = pq.
~
ASP 3. For any three points a, b, c ∈ A, ab ~ + bc
~ = ac.
~
Show that every affine manifold space is in a natural way an affine space and
vice versa. [Thus an affine manifold is really the same thing as an affine space.]
16. Give further examples of some types of “spaces“ one can define in analogy
with C ∞ manifolds, like those in the previous two problems. In each case,
38 CHAPTER 1. MANIFOLDS

discuss if this type of space is or is not a C ∞ manifold in a natural way. [Give


at least one example that is and one that isn’t. You do know very natural
examples of such “spaces”.]
1.3. VECTORS AND DIFFERENTIALS 39

1.3 Vectors and differentials


The notion of a “vector“ on a manifold presents some conceptual difficulties.
We start with some preliminary remarks. We may paraphrase the definition of
“manifold ” by saying that a manifold can locally be identified with Rn , but
this identification is determined only up to diffeomorphism. Put this way, it is
evident that anything which can be defined for Rn by a definition which is local
and preserved under diffeomorphism can also be defined for manifolds. The
definition of “C ∞ function“ is a first instance of this principle, but there are
many others, as we shall see. One of them is a notion of “vector”, but there
are already several notions of “vector“ connected with Rn , and we should first
make clear which one we want to generalize to manifolds: it is the notion of a
tangent vector to a differentiable curve at a given point.
Consider a differentiable curve p(t) on a manifold M . Fix momentarily a co-
ordinate system (xi ) defined around the points p(t) for t in some open interval
around some t = to . For each t in this interval the coordinates xi (t) of p(t)
are differentiable functions of t and we can consider the n–tuple of derivatives
(ẋi (t)). It depends not only on the curve p(t) but also on the coordinate system
(xi ) and to bring this out we call the n–tuple (ξ i ) = (ẋi (to )) the coordinate
tangent vector of p(t) at po (more accurately one should say “at to “) relative to
the coordinate system (xi ). If x̃j = x̃j (p) is another coordinate system defined
around p(to ), then we can write x̃(p) = f (x(p)) by the coordinate transforma-
tion as in MAN 2, p 25 and by the chain rule the two coordinate tangent vectors
(ξ˜i ) and (ξ i ) are related by
 ∂ x̃j 
ξ˜j = ξi.
∂xi po
From a logical point of view, this transformation law is the only thing that
matters and we shall just take it as an axiomatic definition of “vector at po ”.
But one might still wonder, what a vector “really“ is: this is a bit similar to
the question what a number (say a real number or even a natural number)
“really” is. One can answer this question by giving a constructive definition,
which defines the concept in terms of others (ultimately in terms of concepts
defined axiomatically, in set theory, for example), but they are artificial in the
sense that these definitions are virtually never used directly. What matters are
certain basic properties, for vectors just as for numbers; the construction is of
value only in so far as it guarantees that the properties postulated are consistent,
which is not in question in the simple situation we are dealing with here. Thus
we ignore the metaphysics and simply give an axiomatic definition.
1.3.1 Definition. A tangent vector (or simply vector ) v at p ∈ M is a quantity
which relative to a coordinate system (xi ) around p is represented by an n–
tuple (ξ 1 , · · · , ξ n ). The n–tuples (ξ i ) and (ξ˜j ) representing v relative to two
coordinate different systems (xi ) and (x̃j ) are related by the transformation law
 ∂ x̃j 
ξ˜j = ξi. (1)
∂xi p
40 CHAPTER 1. MANIFOLDS

Explanation. To say that a vector is represented by an n–tuple relative to a


coordinate system means that a vector associates to a coordinate system an n–
tuple and two vectors are considered identical if they associate the same n–tuple
to any one coordinate system. If one still feels uneasy about the word “quantity“
(which just stands for “something”) and prefers that the objects one deals with
should be defined constructively in terms of other objects, then one can simply
take a vector at po to be a rule (or “function“) which to (po , (xi )) associates (ξ i )
subject to (1). This is certainly acceptable logically, but awkward conceptu-
ally. There are several other constructive definitions designed to produce more
palatable substitutes, but the axiomatic definition by the transformation rule,
though the oldest, exhibits most clearly the general principle about manifolds
and Rn mentioned and is really quite intuitive because of its relation to tangent
vectors of curves. In any case, it is essential to remember that the n–tuple (ξ i )
depends not only on the vector v at p, but also on the coordinate system (xi ).
Furthermore, by definition, a vector is “tied to a point”: vectors at different
points which happen to be represented by the same n–tuple in one coordinate
system will generally be represented by different n–tuples in another coordinate
system, because the partials in (1) depend on the point po .
Any differentiable curve p(t) in M has a tangent vector ṗ(to ) at any of its
points po = p(to ) : this is the vector represented by its coordinate tangent
vector (ẋi (to )) relative to the coordinate system (xi ). In fact, every vector v
at a point po is the tangent vector ṗ(to ) to some curve; for example, if v is
represented by (ξ i ) relative to the coordinate system (xi ) then we can take for
p(t) the curve defined by xi (t) = xio + tξ i and to = 0. There are of course many
curves with the same tangent vector v at a given point po , since we only require
xi (t) = xio + tξ i + o(t).

Summary. Let po ∈ M . Any differentiable curve p(t) through po = p(to )


determines a vector v = ṗ(to ) at po and every vector at po is of this form.
Two such curves determine the same vector if and only if they have the same
coordinate tangent vector at po in some (and hence in any) coordinate system
around po .
One could evidently define the notion of a vector at po in terms of differentiable
curves: a vector at po can be taken as an equivalence class of differentiable
curves through po , two curves being considered “equivalent“ if they have the
same coordinate tangent vector at po in any coordinate system.
Vectors at the same point p may be added and scalar multiplied by adding and
scalar multiplying their coordinate vectors relative to coordinate systems. In
this way the vectors at p ∈ M form a vector space, called the tangent space of M
at p, denoted Tp M . The fact that the set of all tangent vectors of differentiable
curves through po forms a vector space gives a precise meaning to the idea that
manifolds are linear in “infinitesimal regions“.
Relative to a coordinate system (xi ) around p ∈ M , vectors v at p are repre-
sented by their coordinate vectors (ξ i ). The vector with ξ k = 1 and all other
components = 0 relative to the coordinate system (xi ) is denoted (∂/∂xk )p
and will be called the k–th coordinate basis vector at p. It is the tangent
1.3. VECTORS AND DIFFERENTIALS 41

vector ∂p(x)/∂xk of the k–th coordinate line through p, i.e. of the curve
p(x1 , x2 , · · · , xn ) parametrized by xk = t, the remaining coordinates being
given their value at p. The (∂/∂xk )p form a basis for Tp M since every vec-
tor v at p can be uniquely written as a linear combination of the (∂/∂xk )p : v =
ξ k (∂/∂xk )p .The components ξ k of a vector at p with respect to this basis are
just the ξ k representing v relative to the coordinate system (xk ) according to
the definition of “vector at p“ and are sometimes called the components of v
relative to the coordinate system (xi ). It is important to remember that ∂/∂x1
(for example) depends on all of the coordinates (x1 , · · · , xn ), not just on the
coordinate function x1 .

1.3.2 Example: vectors on R2 . Let M = R2 with Cartesian coordinates x, y.


We can use this coordinate system to represent any vector v ∈ Tp R2 at any
point p ∈ R2 by its pair of components (a, b) with respect to this coordinate
∂ ∂
system: v = a ∂x + b ∂y . The distinguished Cartesian coordinate system (x, y)
on R therefore allows us to identify all tangent spaces Tp R2 again with R2 ,
2

which explains the somewhat confusing interpretation of pairs of numbers as


points or as vectors which one encounters in analytic geometry.
Now consider polar coordinates r, θ, defined by x = r cos θ, y = rsinθ. The
coordinate basis vectors ∂/∂r, ∂/∂θ are ∂p/∂r, ∂p/∂θ where we think of p =
p(r, θ) as a function of r, θ and use the partial to denote the tangent vectors to
the coordinate lines r → p(r, θ) and θ → p(r, θ). We find
∂ ∂x ∂ ∂y ∂ ∂ ∂
∂r = ∂r ∂x + ∂r ∂y = cos θ ∂x + sin θ ∂y ,
∂ ∂y ∂
∂θ = ∂x ∂ ∂ ∂
∂θ ∂x + ∂θ ∂y = −r sin θ ∂x + r cos θ ∂y .
The components of the p vectors on the right may be expressed in terms of x, y
as well, e.g. using r = x2 + y 2 and θ = arctan(y/x), on the domain where
these formulas are valid.

__
d

__
d
0
dr
0

Fig. 1. Coordinate vectors in polar coordinates.

1.3.3 Example: tangent vectors to a vector space as manifold. In Rn


can one identify vectors at different points, namely those vectors that have the
same components in the Cartesian coordinate system. Such vectors will also
have the same components in any affine coordinate system (exercise). Thus
we can identify Tp Rn = Rn for any p ∈ Rn . More generally, let V be any
42 CHAPTER 1. MANIFOLDS

n–dimensional real vector space. Choose a basis e1 , · · · , en for V and introduce


linear coordinates (xi ) in V by using the components xi of p ∈ V as coordinates:

p = x1 e1 + · · · + xn en .

With any tangent vector v ∈ Tp V to V as manifold, represented by an n–tuple


(ξ i ) relative to the coordinate system (xi ) according to definition 1.3.1, associate
an element ~v ∈ V by the formula

~v = ξ 1 e1 + · · · + ξ n en . (2)

Thus we have a map Tp V → V , v → ~v , defined in terms of the linear coordinates


(xi ). This map is in fact independent of these coordinates (exercise). So for any
n–dimensional real vector space V , considered as a manifold, we may identify
Tp V = V for any p ∈ V .
1.3.4 Example: tangent vectors to a sphere. Consider 2–sphere S 2 as a
manifold, as in §2. Let p = p(t) be a curve on S 2 . It may also be considered
as a curve in R3 which happens to lie on S 2 . Thus the tangent vector ṗ(to )
at po = p(to ) can be considered as an element in Tpo S 2 as well as of Tpo R3 .
This means that we can think of Tpo S 2 as a subspace of Tpo R3 ≈ R3 . If we use
geographical coordinates (θ, φ) on S 2 , so that the inclusion S 2 → R3 , is given
(θ, φ) → (x, y, z),

(θ, φ) → (x, y, z) = (cos θ sin φ, sin θ sin φ, cos φ) (3)

then the inclusion map Tp S 2 → Tp R3 at a general point p = p(θ, φ) ∈ S 2 , is


found by differentiation along the θφ–coordinate lines (exercise) :

∂ ∂x ∂ ∂y ∂ ∂z ∂ ∂ ∂
→ + + = − sin θ sin φ + cos θ sin φ
∂θ ∂θ ∂x ∂θ ∂y ∂θ ∂z ∂x ∂y
∂ ∂x ∂ ∂y ∂ ∂z ∂ ∂ ∂ ∂
→ + + = cos θ cos φ + sin θ cos φ − sin φ (4)
∂φ ∂φ ∂x ∂φ ∂y ∂φ ∂z ∂x ∂y ∂z
In this way Tp S 2 is identified with the subspace of Tp R3 ≈ R3 given by

Tp S 2 ≈ {v ∈ Tp R3 = R3 : v · p = 0}. (5)

(exercise).
1.3.5 Definition. A vector field X on M associates to each point p in M a
vector X(p) ∈ Tp M . We also admit vector fields defined only on open subsets
of M.
Examples are the basis vector fields ∂/∂xk of a coordinate system: by definition,
∂/∂xk has components v i = δki relative to the coordinate system (xi ). On the
coordinate domain, every vector field can be written as

X = Xk (6)
∂xk
1.3. VECTORS AND DIFFERENTIALS 43

for certain scalar functions X k . X is said to be of class Ck if the X k have this


property. This notation (6) for vector fields is especially appropriate, because
X gives an operator on C ∞ functions f : M → R, denoted f → Xf and defined
by Xf := df (X). In coordinates (xi ) this says Xf = X i (∂f /∂xi ), i.e. the
operator f → Xf is (6) interpreted as differential operator.
We now consider a differentiable map f : M → N between two manifolds. It
should be clear how to defined the differential of f in accordance with the prin-
ciple that any notion for Rn which is local and preserved under diffeomorphism
will make sense on a manifold. To spell this out in detail, we need a lemma.
Relative to a coordinate system (x1 , · · · , xn ) on M and (y 1 , · · · , y m ) on N , we
may write f (p) as y j = f j (x1 , · · · , xn ) and consider the partials ∂y j /∂xi .
1.3.6 Lemma. Let f : M → N be a differentiable map between manifolds, po a
point in M .
In coordinates (xi ) around po and (y j ) around f (po ) consider q = f (p)
 asj a map
y = f (x , · · · x ) from R to R with differential given by η = ∂y
j j 1 n n m j
∂xi ξi.
po
The map from vectors at po with components (ξ i ) relative to (xi ) to vectors at
f (po ) with components (η j ) relative to (y j ) defined by this equation is indepen-
dent of these coordinate systems.
Proof. Fix a point p on M and a vector v at p. Suppose we take two pairs of
coordinate systems (xi ), (y j ) and (x̃i ), (ỹ j ) to construct the quantities (η i ) and
(η̃ i ) from the components (ξ i )and (ξ˜i ) of v at p with respect to (xi ) and (x̃i ):
j j
η j := ∂y i
∂xi ξ , η̃ j := ∂∂ỹx̃i ξ˜i .
We have to show that these (η i ) and (η̃ i ) are the components of the same vector
w at f (p) with respect to (y j ) and (ỹ j ) i.e. that
? j
∂ ỹ
η̃ j = ∂y i
ky .

the partials being taken at f (p). But this is just the Chain Rule:
∂ ỹ j ˜i
η̃ j = ∂ x̃i ξ [definition of η̃ j ]
j i
= ∂∂ỹx̃i ∂x
∂ x̃ k
kξ [(ξ k ) is a vector]
j
∂ ỹ k
= ∂x kξ [Chain Rule]
∂ ỹ j ∂y i k
= ∂yi ∂xk ξ [Chain Rule]
j
= ∂∂yỹ i η i [definition of η i ]

The lemma allows us to define the differential of a map f : M → N as the


linear map dfp : Tp → Tf (p) N which corresponds to the differential of the map
y j = f j (x) representing f with respect to coordinate systems.
1.3.7 Definition. Withthe  notation of the previous lemma, the vector at f (p)
∂y j
with components η = ∂xi ξ i is denoted dfp (v). The linear transformation
j
p
dfp : Tp M → Tf (p) N, v → dfp (v), is called the differential of f at p.
This definition applies as well to functions defined only in a neighbourhood
of the point p. As a special case, consider the differential of dfp of a scalar
44 CHAPTER 1. MANIFOLDS

valued functions f : M → R. In that case dfp : M → Tf (p) R = R is a linear


functional on Tp M . According to the definition, it is given by the formula
dfp (v) = (∂f /∂xi )p ξ i in coordinates (xi ) around p. Thus dfp looks like a gradient
in these coordinates, but the n–tuple (∂f /∂xi )p does not represent a vector at
p : dfp does not belong to Tp M but to the dual space Tp∗ M of linear functionals
on Tp M .
The differentials dxkp of the coordinate functions p → xi (p) of a coordinate
system M · · · → Rn , p → (xi (p)) are of particular importance. It follows from
the definition of dfp that (dxk )p has k–th component = 1, all others = 0. This
implies that
dxk (v) = ξ k , (7)
the k-th component of v. We can therefore think of the dxk as the components
of a general vector v at a general point p, just we can think of the xk as the
coordinates of a general point p: the dxk pick out the components of a vector
just like the xi pick out the coordinates of point.
1.3.8 Example: coordinate differentials for polar coordinates. We have
x = r cos θ, y = r sin θ,
dx = ∂x ∂x
∂r dr + ∂θ dθ = cos θdr − r sin θdθ,
∂y ∂y
dy = ∂r dr + ∂θ dθ = sin θdr + r cos θdθ.
In terms of the coordinate differentials the differential df of a scalar–valued
function f can be expressed as

∂f
df = dxk ; (8)
∂xk
the subscripts “p“ have been omitted to simplify the notation. In general, a map
f : M → N between manifolds, written in coordinates as y j = f j (x1 , · · · , xn )
has its differential dfp : Tp M → Tf (p) N given by the formula dy j = (∂f j /∂xi )dxi ,
as is evident if we think of dxi as the components ξ i = dxi (v) of a general vector
at p, and think of dy j similarly. The transformation property of vectors is built
into this notation.
The differential has the following geometric interpretation in terms of tangent
vectors of curves.
1.3.9 Lemma. Let f : M · · · → N be a C 1 map between manifolds, p = p(t) a
C 1 curve in M . Then df maps the tangent vector of p(t) to the tangent vector
of f (p(t)):
df (p(t))  dp(t) 
= dfp(t) .
dt dt
Proof. Write p = p(t) in coordinates as xi = xi (t) and q = f (p(t)) as y i = y i (t).
Then
dy j ∂y j dxi df (p(t))  dp(t) 
= says = dfp(t)
dt ∂xi dt dt dt
as required.
1.3. VECTORS AND DIFFERENTIALS 45

Note that the lemma gives a way of calculating differentials:


d
dfpo (v) = f (p(t))|t=to
dt
for any differentiable curve p(t) with p(to ) = po and ṗ(to ) = v.
1.3.10 Example. Let f : S 2 → R3 , (θ, φ) → (x, y, z) be the inclusion map,
given by
x = cos θ sin φ, y = sin θ sin φ, z = cos φ (9)
Then dfp : Tp S 2 → Tp R3 ≈ R3 is given by (4). If you think of dθ, dφ as the θφ–
components of a general vector in Tp S 2 and of dx, dy, dz as the xyz–components
of a general vector in Tp R3 (see (7)), then this map dfp : Tp S 2 → Tp R3 , is given
by differentiation of (9), i.e.
dx = ∂x ∂x
∂θ dθ + ∂φ dφ = − sin θ sin φdθ + cos θ cos φdφ,
dy = ∂y ∂y
∂θ dθ + ∂φ dφ = cos θ sin φdθ + sin θ cos φdφ,
dz = ∂z ∂z
∂θ dθ + ∂φ dφ = − sin φdφ.
The Inverse Function Theorem, its statement being local and invariant under
diffeomorphisms, generalizes immediately from Rn to manifolds:
1.3.11 Inverse Function Theorem. Let f : M → N be a C ∞ map of
manifolds. Suppose po is a point of M at which the differential dfpo : Tpo M →
Tf (po ) N is bijective. Then f is a diffeomorphism of a neighbourhood of po onto
a neighbourhood of f (po ).
Remarks. (a) To say that the linear dfpo is bijective means that rank(dfpo ) =
dim M = dim N
(b) The theorem implies that one can find coordinate systems around po and
f (po ) so that on the coordinate domain f : M → N becomes the identity map
Rn → Rn .
The Inverse Function Theorem has two important corollaries which we list them-
selves as theorems.
1.3.12 Immersion Theorem. Let f : M → N be a C ∞ map of manifolds,
n = dim M, m = dim N . Suppose po is a point of M at which the differential
dfpo : Tpo M → Tf (po ) N is injective. Let (x1 , · · · , xn ) be a coordinate system
around po . There is a coordinate system (y 1 , · · · , y m ) around f (po ) so that
p → q := f (p) becomes

(x1 , · · · , xn ) → (y 1 , · · · , y n , · · · , y m ) := (x1 , · · · , xn , 0, · · · , 0).

Remarks. (a) To say that the linear map dfpo is injective (i.e. one–to–one)
means that rank(dfpo ) = dim M ≤ dim N . We then call f an immersion at po .
(b) The theorem says that one can find coordinate systems around po and f (po )
so that on the coordinate domain f : M → N becomes the inclusion map
Rn → Rm (n ≤ m).
Proof. Using coordinates, it suffices to consider a (partially defined) map f :
Rn · · · → Rm (n ≤ m). Suppose we can a find local diffeomorphism ϕ : Rm · · · →
Rm so that f = ϕ ◦ i where i : Rn → Rm is the inclusion.
46 CHAPTER 1. MANIFOLDS

i
R n → Rm
id ↓ ↓ϕ
R n → Rm
f
Then the equation f (x) = ϕ(i(x)), says that f “becomes“ i if we use ϕ as
coordinates on Rm and the identity on Rn .
Now assume rank(dfpo ) = n ≤ m. Write q = f (p) as y j = f j (x1 , · · · , xn ), j ≤
m. At po , the m × n matrix [∂f j /∂xi ],j ≤ m, i ≤ n, has n linearly independent
rows (indexed by some j’s). Relabeling the coordinates (y j ) we may assume
that the n × n matrix [∂f j /∂xi ], i, j ≤ n, has rank n, hence is invertible. Define
ϕ by
ϕ(x1 , · · · xn , · · · , xm ) = f (x1 , · · · , xn ) + (0, · · · , 0, xn+1 , · · · , xm ).
i.e.
ϕ = (f 1 , · · · , f n , f n+1 + xn+1 , · · · , f m + xm ),
Since i(x1 , · · · , xn ) = (x1 , · · · , xn , 0, · · · , 0) we evidently have f = ϕ ◦ i. The
determinant of the matrix (∂ϕj /∂xi ) has the form
 j
∂f /∂xi 0

det = det[∂f j /∂xi ], i, j ≤ n
∗ 1
hence is nonzero at i(po ). By the Inverse Function Theorem ϕ is a local diffeo-
morphism at po , as required.
Example. a)Take for f : M → N the inclusion map i : S 2 → R3 . If we use
coordinates (θ, φ) on S 2 = {r = 1} and (x, y, z) on R3 then i is given by the
familiar formulas
i : (θ, φ) → (x, y, z) = (cos θ sin φ, sin θ sin φ, cos φ).
A coordinate system (y 1 , y 2 , y 2 ) on R3 as in the theorem is given by the slightly
modified spherical coordinates (θ, φ, ρ − 1) on R3 , for example: in these coor-
dinates the inclusion i : S 2 → R3 becomes the standard inclusion map (θ, φ) →
(θ, φ, ρ − 1) = (θ, φ, 0) on the coordinate domains. (That the same labels θ and
φ stand for coordinate functions both on S 2 and R3 should cause no confusion.)
b)Take for f : M → N a curve R · · · → M, t → p(t). This is an immersion at
t = to if ṗ(to ) 6= 0. The immersion theorem asserts that near such a point p(to )
there are coordinates (xi ) so that the curve p(t) becomes a coordinate line, say
x1 = t, x2 = 0, · · · , xn = 0.
1.3.13 Submersion Theorem. Let f : M → N be a C ∞ map of manifolds,
n = dim M , m = dim N . Suppose po is a point of M at which the differential
dfpo : Tpo M → Tf (po ) N is surjective. Let (y 1 , · · · , y m ) be a coordinate system
around f (po ). There is a coordinate system (x1 , · · · , xn ) around po so that
p → q := f (p)
becomes
(x1 , · · · , xm , · · · , xn ) → (y 1 , · · · , y m ) := (x1 , · · · , xm )
1.3. VECTORS AND DIFFERENTIALS 47

Remarks. (a) To say that the linear map dfpo is surjective (i.e. onto) means
that
rank(dfpo ) = dim N ≤ dim M.
We then call f a submersion at po .
(b) The theorem says that one can find coordinates systems around po and
f (po ) so that on the coordinate domain f : M → N becomes the projection
map Rn → Rm (n ≥ m).
Proof. Using coordinates, it suffices to consider a partially defined map f :
Rn · · · → Rm (n ≥ m). Suppose we can find a local diffeomorphism ϕ : Rn · · · →
Rn so that p ◦ ϕ = f where p : Rm → Rn is the projection.
f
Rn → Rm
ϕ↓ ↓ id
Rn → Rm
p
Then we can use ϕ as a coordinate system and the equation f (x) = p(ϕ(x)),
says that f “becomes“ i if we use ϕ as coordinates on Rn and the identity on
Rn .
Now assume rank(dfpo ) = m ≤ n. Write q = f (p) as y j = f j (x1 , · · · , xn ).
At po , the m × n matrix [∂f j /∂xi ],j ≤ m, i ≤ n, has m linearly independent
columns (indexed by some i’s). Relabeling the coordinates (xi ) we may assume
that the m × m matrix [∂f j /∂xi ], i, j ≤ m, has rank m, hence is invertible.
Define ϕ by

ϕ(x1 , · · · , xn ) = (f (x1 , · · · , xn ), xm+1 , · · · , xn ).

Since p(x1 , · · · , xm , · · · , xn ) = (x1 , · · · , xm ), we have f = p◦ϕ. The determinant


of the matrix (∂ϕj /∂xi ) has the form
 j
∂f /∂xi 0

det = det[∂f j /∂xi ], ij ≤ n
∗ 1

hence is nonzero at po . By the Inverse Function Theorem ϕ is a local diffeomor-


phism at po , as required.
Example. a)Take for f : M → N the radial projection map π : R3 − {0} → S 2 ,
p → p/kpk. If we use the spherical coordinates (θ, φ, ρ) on R3 − {0} and (θ, φ)
on S 2 then π becomes the standard projection map (θ, φ, ρ) → (θ, φ) on the
coordinate domains .
b)Take for f : M → N a scalar function f : M → R. This is a submersion at
po if dfpo 6= 0. The submersion theorem asserts that near such a point there are
coordinates (xi ) so that f becomes a coordinate function, say f (p) = x1 (p).
Remark. The immersion theorem says that an immersion f : M → N locally
becomes a linear map Rn → Rm , just like its tangent map dfpo : Tpo M →
Tf (po) N . The submersion theorem can be interpreted similarly. These a special
cases of a more general theorem (Rank Theorem), which says that f : M → N
becomes a linear map Rn → Rm , hence also like its tangent map dfpo : Tpo M →
Tf (po) N , provided dfp has constant rank for all p in a neighbourhood of po .
48 CHAPTER 1. MANIFOLDS

This is automatic for immersions and submersions (exercise). A proof of this


theorem can be found in Spivak, p.52, for example.
There remain a few things to be taken care of in connection with vectors and
differentials.
1.3.14 Covectors and 1–forms. We already remarked that the differential
dfp of a scalar function f : M → R is a linear functional on Tp M . The space of
all linear functionals on Tp M is called the cotangent space at p, denoted Tp* M .
The coordinate differentials dxi at p satisfy dxi (∂/∂xj ) = δji , which means that
the dxi form the dual basis to the ∂/∂xj . Any element w ∈ Tp∗ M is therefore
of the form w = ηi dxi ; its components ηi satisfy the transformation rule
 ∂xi 
η̃j = ηi (10)
∂ x̃j p

which is not the same as for vectors, since the upper indices are being summed
over. (Memory aid: the tilde on the right goes downstairs like the index j on the
left.) Elements of Tp∗ M are also called covectors at p, and this transformation
rule could be used to define “covector“ in a way analogous to the definition of
vector. Any covector at p can be realized as the differential of some C ∞ defined
near p, just like any vector can be realized as the tangent vector to some C ∞
curve through p. In spite of the similarity of their transformation laws, one
should carefully distinguish between vectors and covectors.
A differential 1-form (or covector field) ϕ on an open subset of M associates
to each point p in its domain a covector ϕp . Examples are the coordinate
differentials dxk : by definition, dxk has components ηi = δik relative to the
coordinate system (xi ). On the coordinate domain, every differential 1-form ϕ
can be written as
ϕ = ϕk dxk
for certain scalar functions ϕk . ϕ is said to be of class Ck if the ϕk have this
property.
1.3.15 Definition. Tangent bundle and cotangent bundle. The set of all
vectors on M is denoted T M and called the tangent bundle of M ; we make it into
a manifold by using as coordinates (xi , ξ i ) of a vector v at p the coordinates
xi (p) of p together with the components dxi (v) = ξ i of v. (Thus ξ i = dxi
as function on T M , but the notation dxi as coordinate on T M gets to be
confusing in combinations such as ∂/∂ξ i .) As (xi ) runs over a collection of
coordinate systems of M satisfying MAN 1-3, the (xi , ξ i ) do the same for T M .
The tangent bundle comes equipped with a projection map π : T M → M which
sends a vector v ∈ Tp M to the point π(v) = p to which it is attached.
1.3.16 Example. From (1.3.3) and (1.3.4), p 42 we get the identifications
a) T Rn = {(p, v) : p, v ∈ Rn } = Rn × Rn
b) T S 2 = {(p, v) ∈ R3 × R3 : p · v = 0 (dot product)}
The set of all covectors on M is denoted T ∗ M and called the cotangent bundle
of M ; we make it into a manifold by using as coordinates (xi , ξi ) of a covector
1.3. VECTORS AND DIFFERENTIALS 49

w at p the coordinates (xi ) of p together with the components (ξi ) of w. (If one
identifies v ∈ Tp M with the function ϕ → ϕ(v) on Tp∗ M , then ξi = ∂/∂xi as
a function on cotangent vectors.) As (xi ) runs over a collection of coordinate
systems of M satisfying MAN 1-3, p 25 the (xi , ξi ) do the same for T ∗ M . There
is again a projection map π : T ∗ M → M which sends a covector w ∈ Tp∗ M to
the point π(w) = p to which it is attached.

EXERCISES 1.3
1. Show that addition and scalar multiplication of vectors at a point p on a
manifold, as defined in the text, does indeed produce vectors.
2. Prove the two assertions left as exercises in 1.3.3.
3. Prove the two assertions left as exercises in 1.3.4.
4. (a) Prove the transformation rule for the components of a covector w ∈ Tp∗ M :
 ∂xi 
η̃j = ηi . (*)
∂ x̃j p

(b) Prove that covectors can be defined be defined in analogy with vectors in
the following way. Let w be a quantity which relative to a coordinate system
(xi ) around p is represented by an n–tuple (η i ) subject to the transformation
rule (*). Then the scalar ηi ξ i depending on the components of w and of a vector
v at p is independent of the coordinate system and defines a linear functional
on Tp M (i.e. a covector at p).
(c)Show that any covector at p can be realized as the differential dfp of some
C ∞ function f defined in a neighbourhood of p.
5. Justify the following rules from the definitions and the usual rules of differ-
entiation.
∂ x̃k i ∂ ∂xi ∂
(a) dx̃k = dx (b) =
∂xi ∂ x̃k ∂ x̃k ∂xi
6. Let (ρ, θ, φ) be spherical coordinates on R3 . Calculate the coordinate vector
fields ∂/∂ρ, ∂/∂θ, ∂/∂φ in terms of ∂/∂x, ∂/∂y, ∂/∂z, and the coordinate
differentials dρ, dθ, dφ in terms of dx, dy, dz. (You can leave their coefficients in
terms of ρ, θ, φ). Sketch some coordinate lines and the coordinate vector fields at
some point. (Start by drawing a sphere ρ =constant and some θ, φ–coordinate
lines on it.)
7. Define coordinates (u, v) on R2 by the formulas

x = cosh u cos v, y = sinh usinv.

a) Determine all points (x, y) in a neighbourhood of which (u, v) may be used


as coordinates.
b) Sketch the coordinate lines u = 0, 1, 3/2, 2 and v = 0, π/6, π/4, π/3, π/2,
2π/3, 3π/4, 5π/6.
50 CHAPTER 1. MANIFOLDS

c) Find the coordinate vector fields ∂/∂u, ∂/∂v in terms of ∂/∂x, ∂/∂y.
8. Define coordinates (u, v) on R2 by the formulas

1 2
x= (u − v 2 ), y = uv.
2
a) Determine all points (x, y) in a neighbourhood of which (u, v) may be used
as coordinates.
b) Sketch the coordinate lines u,v = 0, 1/2, 1, 3/2, 2 .
c) Find the coordinate vector fields ∂/∂u, ∂/∂v in terms of ∂/∂x, ∂/∂y.
9. Let (u, θ, ψ) hyperbolic coordinates be on R3 , defined by

x = u cos θ sinh ψ, y = u sin θ sinh ψ, z = u cosh ψ.

a) Sketch some surfaces u =constant (just enough to show the general shape of
these surfaces).
b) Determine all points (x, y, z) in a neighbourhood of which (u, θ, ψ) may be
used as coordinates.
c) Find the coordinate vector fields ∂/∂u, ∂/∂θ, ∂/∂ψ in terms of ∂/∂x, ∂/∂y, ∂/∂z.
10. Show that as (xi ) runs over a collection of coordinate systems for M satis-
fying MAN 1-3, the (xi , ξ i ) defined in the text do the same for T M .
11. Show that as (xi ) runs over a collection of coordinate systems for M satis-
fying MAN 1-3, the (xi , ξi ) defined in the text do the same for T ∗ M .
12. a) Prove that T (M ×M ) is diffeomorphic to (T M )×(T M ) for every manifold
M.
b) Is T (T M ) diffeomorphic to T M × T M for every manifold M ? (Explain.
Give some examples. Prove your answer if you can. Use the notation M ≈ N
to abbreviate “M is diffeomorphic with N “.)
13. Let M be a manifold and let W = T ∗ M . Let π : W → M and ρ : T W → W
be the projection maps. For any z ∈ T W define θ(z) ∈ R by

θ(z) := ρ(z)(dπρ(z) (z)).

(θ is called the canonical 1–form on the cotangent bundle T ∗ M .)


a) Explain why this makes sense and defines a 1–form θ on W .
b) Let (xi ) be a coordinate system on M , (xi , ξi ) the corresponding coordinates
on W , as in 3.17. Show that θ = ξi dxi . [Suggestion. Show first that dπ(z) =
dxi (z)∂/∂xi .]
14. Let M be a manifold, p ∈ M a point. For any v ∈ Tp M define a linear
functional f → Dv (f ) on C ∞ functions f : M · · · → R defined near p by the
rule Dv (f ) = dfp (v). Then D = Dv evidently satisfies

D(f g) = D(f )g(p) + f (p)D(g). (∗)


1.3. VECTORS AND DIFFERENTIALS 51

Show that, conversely, any linear functional f → D(f ) on C ∞ functions f :


M · · · → R defined near p which satisfies (*) is of the form D = Dv for a unique
vector v ∈ Tp M .
[Remark. Such linear functionals D are called point derivations at p. Hence
there is a one–to–one correspondence between vectors at p and point–derivations
at p. ]
15. Let M be a manifold, p ∈ M a point. Consider the set Cp of all C ∞
curves p(t) through p in M , with p(0) = p. Call two such curves p1 (t) and
p2 (t) tangential at p if they have the same coordinate tangent vector at t = 0
relative to some coordinate system. Prove in detail that there is a one–to–one
correspondence between equivalence classes for the relation “being tangential
at p“ and tangent vectors at p. (Show first that being tangential at p is an
equivalence relation on Cp .)
16. Suppose f : M → N is and f a immersion at po ∈ M . Prove that f
is an immersion at all points p in a neighbourhood of po . Prove the same
result for submersions. [Suggestion. Consider rank(df ). Review the proof of
the immersion (submersion) theorem.]
17. Let M be a manifold, T M its tangent bundle, and π : T M → M the natural
projection map. Let (xi ) be a coordinate system on M, (xi , ξ i ) the corresponding
coordinate system on T M (defined by ξ i = dxi as function on T M ). Prove that
the differential of π is given by
 ∂ 
dπ(z) = dxi (z) . (*)
∂xi
Determine if π is an immersion, a submersion, a diffeomorphism, or none of
these. [The formula (*) needs explanation: specify how it is to be interpreted.
What exactly is the symbol dxi on the right?]
18. Consider the following quotation from Élie Cartan’s Leçons of 1925–1926
(p33). “To each point p with coordinates (x1 , · · · , xn ) one can attach a system
of Cartesian coordinates with the point p as origin for which the basis vectors
e1 , · · · , en are chosen in such a way the coordinates of the infinitesimally close
point p0 (x1 + dx1 , · · · , xn + dxn ) are exactly (dx1 , · · · , dxn ). For this it suffices
that the vector ei be tangent to the ith coordinate curve (obtained by varying
only the coordinate xi ) and which, to be precise, represents the velocity of a point
traversing this curve when on considers the variable coordinate xi as time.”
(a)What is our notation for ei ?
(b)Find page and line in this section which corresponds to the sentence “For
this it suffices· · · ”.
(c)Is it strictly true that dx1 , · · · , dxn are Cartesian coordinates? Explain.
(d)What must be meant by the “the infinitesimally close point p0 (x1 +dx1 , · · · , xn +
dxn )” for the sentence to be correct?
(e)Be picky and explain why the sentence “To each point p with coordinates
(x1 , · · · , xn ) one can attach · · · ” might be misleading to less astute readers.
Rephrase it correctly with minimal amount of change.
52 CHAPTER 1. MANIFOLDS

(f )Rewrite the whole quotation in our language, again with minimal amount of
change.
1.4. SUBMANIFOLDS 53

1.4 Submanifolds
1.4.1 Definition.Let M be an n−dimensional manifold, S a subset of M . A
point p ∈ S is called a regular point of S if p has an open neighbourhood U
in M that lies in the domain of some coordinate system x1 , · · · , xn on M with
the property that the points of S in U are precisely those points in U whose
coordinates satisfy xm+1 = 0, · · · , xn = 0 for some m. This m is called the
dimension of S at p. Otherwise p is called a singular point of S. S is called
an m−dimensional (regular ) submanifold of M if every point of S is regular of
the same dimension m.
Remarks. a) We shall summarize the definition of “p is regular point of S“
by saying that S is given by the equations xm+1 = · · · , xn = 0 locally around
p. The number m is independent of the choice of the (xi ): if S is also given by
x̃m̃+1 = · · · = x̃n = 0, then maps (x1T, · · · , xm ) ↔ (x̃1 , · · · , x̃m̃ ) which relate the
coordinates of the points of S in U Ũ are inverses of each other and of class
C ∞ . Hence their Jacobian matrices are inverses of each other, in particular
m = m̃.
b) We admit the possibility that m = n or m = 0. An n-dimensional submani-
fold S must be open in M i.e. every point of S has neighbourhood in M which is
contained in S. At the other extreme, a 0-dimensional submanifold is discrete,
i.e. every point in S has a neighbourhood in M which contains only this one
point of S.
By definition, a coordinate system on a submanifold S consists of the restrictions
u1 = x1 |S , · · · , um = xm |S to S of the first m coordinates of a coordinate system
x1 , · · · , xnT
on M of the type mentioned above (for some p in S); as their domain
we take S U . TWe have to verity that coordinate systems satisfy T MAN1–3.
MAN 1. xS (S U ) consists of the (xi ) in the open subset x(U ) Rm of Rn .
(Here we identify Rm = {x ∈ Rn : xm+1 = · · · = xn = 0}.)
MAN 2. Then map (x1T , · · · , xm ) → (x̃1 , · · · , x̃m ) which relates theTcoordinates
of the points of S in U Ũ is of class C ∞ with
T m
T open domain x(U Ũ ) R .
MAN 3. Every p ∈ S lies in some domain U S, by definition.
Let S be a submanifold of M , and i : S → M the inclusion map. The differential
dip : Tp S → Tp M maps the tangent vector of a curve p(t) in S into the tangent
vector of the same curve p(t) = i(p(t)), considered as curve in M . We shall use
the following lemma to identify Tp S with a subspace of Tp M .
1.4.2 Lemma. Let S be a submanifold of M . Suppose S is given by the
equations
xm+1 = 0, · · · , xn = 0

locally around p. The differential of the inclusion S →M at p ∈ S is a bijection
of Tp S with the subspace of Tp M given by the linear equations

(dxm+1 )p = 0, · · · , (dxn )p = 0.

This subspace consists of tangent vectors at p of differentiable curves in M that


lie in S.
54 CHAPTER 1. MANIFOLDS

Proof. In the coordinates (x1 , · · · , xn ) on M and (u1 , · · · , um |S ) on S the


inclusion map S → M is given by

x1 = u1 , · · · , xm = um , xm+1 = 0, · · · , xn = 0.

In the corresponding linear coordinates (dx1 , · · · , dxn ) on Tp M and (du1 , · · · , dum )


on Tp S its differential is given by

dx1 = du1 , · · · , dxm = dum , dxm+1 = 0, · · · , dxn = 0.

This implies the assertion. .


This lemma may be generalized:

1.4.3 Theorem. Let M be a manifold, f 1 , · · · , f k C ∞ functions on M . Let S


be the set of points p ∈ M satisfying f 1 (p) = 0, · · · , f k (p) = 0. Suppose the
differentials df 1 , · · · , df k are linearly independent at every point of S.
(a)S is a submanifold of M of dimension m = n − k and its tangent space at
p ∈ S is the subspace of vectors v ∈ Tp M satisfying the linear equations

(df 1 )p (v) = 0, · · · , (df k )p (v) = 0.

(b)I f f k+1 , · · · , f n are any m = n−k additional C ∞ functions on M so that all


n differentials df 1 , · · · , df n are linearly independent at p, then their restrictions
to S, denoted
u1 := f k+1 |S , · · · , um := f k+m |S ,

form a coordinate system around p on S.

Proof 1. Let f = (f 1 , · · · , f k ). Since rank(dfp ) = k, the Submersion Theo-


rem implies there are coordinates (x1 , · · · , xn ) on M so that f locally becomes
f (x1 , · · · , xn ) = (xm+1 , · · · , xm+k ), where m + k = n. So f has component
functions f 1 = xm+1 , · · · , f m+k = xm+k . Thus we are back in the situation
of the lemma, proving part (a). For part (b) we only have to note that the
n-tuple F = (f 1 , · · · , f n ) can be used as a coordinate system on M around p
(Inverse Function Theorem), which is just of the kind required by the definition
of “submanifold“.
Proof 2. Supplement the k functions f 1 , · · · , f k , whose differentials are linearly
independent at p, by m = n − k additional functions f k+1 , · · · , f n , defined and
C ∞ near p, so that all n differentials df 1 , · · · , df n are linearly independent at
p. (This is possible, because df 1 , · · · , df m are linearly independent at p, by
hypothesis.) By the Inverse Function Theorem, the equation (x1 , · · · , xn ) =
(f 1 (p), · · · , f n (n)) defines a coordinate system x1 , · · · , xn around po = f (xo )
on M so that S is locally given by the equations xm+1 = · · · = xn = 0, and the
theorem follows from the lemma.
1.4. SUBMANIFOLDS 55

F 5.49

S
f 3.22

Fig 1. F (p) = (x1 , · · · , xn ).

Remarks. (1)The second proof is really the same as the first, with the proof of
the Submersion Theorem relegated to a parenthetical comment; it brings out
the nature that theorem.
(2)The theorem is local; even when the differentials of the f j are linearly depen-
dent at some points of S, the subset of S where they are linearly independent
is still a submanifold.
1.4.4 Examples. (a)The sphere S = { p ∈ R3 | x2 + y 2 + z 2 = 1}. et
f = x2 +y 2 +z 2 −1. Then df = 2(xdx+ydy +zdz) and dfp = 0 iff x = y = z = 0
i.e. p = (0, 0, 0). In particular, df is everywhere non–zero on S, hence S is a
submanifold, and its tangent space at a general point p = (x, y, z) is given by
xdx + ydy + zdz = 0. This means that, as a subspace of R3 , the tangent space
at the point p = (x, y, z) on S consists of all vectors v = (a, b, c) satisfying
xa + yb + zc = 0, as one would expect.
In spherical coordinates ρ, θ, φ on R3 the sphere S 2 is given by ρ = 1 on the
coordinate domain, as required by the definition of “submanifold“. According
to the definition, θ and φ provide coordinates on S 2 (where defined). If one
identifies the tangent spaces to S 2 with subspaces of R3 then the coordinate
vector fields ∂/∂θ and ∂/∂φ are given by the formulas of (1.3.4).
(b)The circle C = { p ∈ R3 | x2 + y 2 + z 2 = 1, ax + by + cz = d} with fixed a, b, c
not all zero. Let f = x2 +y 2 −1, g = ax+by+cz−d. Then df = 2(xdx+ydy+zdz)
and dg = adx + bdx + cdz are linearly independent unless (x, y, z) is a multiple
of (a, b, c). This can happen only if the plane ax + by + cz = d is parallel to
the tangent plane of the sphere at p(x, y, z) in which case C is either empty or
reduces to the point of tangency. Apart for that case, C is a submanifold of
R3 and its tangent space Tp C at any of its points p(x, y, z) is the subspace of
Tp R3 = R3 given by the two independent linear equations df = 0, dg = 0, hence
has dimension 3 − 2 = 1. Of course, C is also the submanifold of the sphere
S = {p ∈ R3 | f = 0} by g = 0 and Tp C the subspace of Tp S given dg = 0.
1.4.5 Example. The cone S = {p ∈ R3 | x2 + y 2 − z 2 = 0}. We shall show the
following.
(a) The cone S − {(0, 0, 0)} with the origin excluded is a 2-dimensional subman-
ifold of R3 .
(b) The cone S with the origin (0, 0, 0) is not a 2-dimensional submanifold of
R3 .
56 CHAPTER 1. MANIFOLDS

Proof of (a). Let f = x2 + y 2 − z 2 . Then df = 2(xdx + ydy − zdz) and dfp = 0


iff x = y = z = 0 i.e. p = (0, 0, 0). Hence the differential d(x2 + y 2 − z 2 ) =
2(xdx + ydy − zdz) is everywhere non−zero on S − {(0, 0, 0)} which is therefore
is a (3 − 1)-dimensional submanifold.

Proof of (b) (by contradiction). Suppose (!) S were a 2-dimensional subman-


ifold of R3 . Then the tangent vectors at (0, 0, 0) of curves in R3 which lie on
S would form a 2-dimensional subspace of R3 . But this is not the case: for
example, we can find three curves of the form p(t) = (ta, tb, tc) on S whose
tangent vectors at t = 0 are linearly independent.

Fig. 2. The cone

1.4.6 Theorem. Let M be an n-dimensional manifold, g : Rm → M , (u1 , · · · , um ) →


g(u1 , · · · , um ), a C ∞ map. Suppose the differential dguo has rank m at some
point uo .
a) There is a neighbourhood U of uo in Rm whose image S = g(U ) is an
m-dimensional submanifold of M .
b) The equation p = g(u1 , · · · , um ) defines a coordinate system (u1 , · · · , um )
around po = g(uo ) on S.
c) The tangent space of S at po = g(uo ) is the image of dguo .

Proof 1. Since rank(dguo ) = m, the Immersion Theorem implies there


are coordinates (x1 , · · · , xn ) on M so that g locally becomes g(x1 , · · · , xm ) =
(x1 , · · · , xm , 0, · · · , 0). In these coordinates, the image of a neighbourhood of
uo in Rm is given by xm+1 = 0, · · · , xn = 0, as required for a submanifold. This
proves (a) and (b). Part (c) is also clear, since in the above coordinates the
image of dguo is given by dxm+1 = · · · = dxn = 0.
Proof 2. Extend the map p = g(u1 , · · · , um ) to a map p = G(x1 , · · · , xm , xm+1 , · · · , xn )
of n variables, defined and C ∞ near the given point xo = (uo , 0), whose dif-
ferential dG has rank n at xo . (This is possible, because dg has rank m
at uo by hypothesis.) By the Inverse Function Theorem, the equation p =
G(x1 , · · · , xm , xm+1 , · · · , xn ) then defines a coordinate system x1 , · · · , xn around
po = G(xo ) = g(uo ) on M so that S is locally given by the equations xm+1 =
· · · = xn = 0, and the theorem follows from the lemma.
1.4. SUBMANIFOLDS 57

4.96 G
S
U 2.48 g

Fig. 3. p = G(x1 , · · · , xm )

Remarks. The remarks after the preceding theorem apply again, but with
an important modification. The theorem is again local in that g need not be
defined on all of Rm , only on some open set D containing xo . But even if dg
has rank m everywhere on D, the image g(D) need not be a submanifold of M ,
as one can see from curves in R2 with self-intersections.

F
p

Fig.4. A curve with a self–intersection

1.4.7 Example. a) Parametrized curves in R3 . Let p = p(t) be a parametrized


curve in R3 . Then the differential of g : t → p(t) has rank 1 at to iff dp/dt 6= 0
at t = to.
b) Parametrized surfaces in R3 . Let p = p(u, v) be a parametrized curve in
R3 . Then differential of g : (u, v) → p(u, v) has rank 2 at (uo , vo ) iff the 3×2
matrix [∂p/∂u, ∂p/∂v] has rank 2 at (u, v) = (uo , vo ) iff the cross product
∂p/∂u × ∂p/∂v 6= 0 at (u, v) = (uo , vo ).

1.4.8 Example: lemniscate of Bernoulli. Let C be the curve in R2 given by


r2 = cos 2θ in polar coordinates. It is a figure 8 with the origin as point of
self-intersection.

r
φ

Fig. 5. The lemniscate of Bernoulli


58 CHAPTER 1. MANIFOLDS

We claim: (a) C is a one-dimensional submanifold of R2 if the origin is excluded.


(b) The equations
cos t sin t cos t
x= 2 y=
1 + sin t 1 + sin2 t
define a map R → R2 , t → p(t), of R onto C with ṗ(t) 6= 0 everywhere.
(c) The above equations define a coordinate t around every point of C except
the origin.
(d) C is not a one-dimensional submanifold of R2 if the origin is included.
Verification. a) Let f = r2 − cos 2θ. Then df = 2rdr − 2 sin θdθ 6= 0 unless
r = 0 and sin θ = 0. Since (r, θ) can be used as coordinate in a neighbourhood
of every point except the origin, the assertion follows from theorem 1.4.3.
b) In Cartesian coordinates the equation r2 = cos 2θ = cos2 θ − sin2 θ
becomes
(x2 + y 2 )2 = x2 − y 2 .
By substitution on sees that x = x(t) and y = y(t) satisfy this equation and by
differentiation one sees that dx/dt and dy/dt never vanish simultaneously. That
g maps R onto C is seen as follows. The origin corresponds to t = (2n + 1)π/2,
n ∈ Z; as t runs through an interval between two successive such points, p(t)
runs over one loop of the figure 8.
c) This follows from (b) and Theorem 1.4.6. Note that it follows from the
discussion above that we can in fact choose as coordinate domain all of C −
{origin} if weS restrict t to the union of two successive open intervals, e.g.
(−π/2, π/2) (π/2, 3π/2).
d) One computes that there are two linearly independent tangent vectors ṗ(t)
for two successive values of t of the form t = (2n + 1)π/2 (as expected). Hence
C cannot be submanifold of R2 of dimension one, which is the only possible
dimension by (a).

EXERCISES 1.4
1. Show that the submanifold coordinates on S 2 are also coordinates on S 2
when the manifold structure on S 2 is defined by taking the orthogonal projection
coordinates in axioms MAN 1−3.
2. Let S = {p = (x, y, x2 ) ∈ R3 | x2 − y 2 − z 2 = 1}.
(a) Prove that S is a submanifold of R3 .
(b) Show that for any (ψ, θ) the point p = (x, y, z) given by

x = cosh ψ, y = sinh ψ cos θ, z = sinh ψ sin θ

lies on S and prove that (ψ, θ) defines a coordinate system on S. [Suggestion.


Use Theorem 1.4.6.]
(c) Describe Tp S as a subspace of R3 . (Compare Example 1.4.4.)
(d) Find the formulas for the coordinate vector fields ∂/∂ψ and ∂/∂θ in terms
of the coordinate vector fields ∂/∂x, ∂/∂y, ∂/∂z, on R3 . (Compare Example
1.4.4.)
3. Let C be the curve R3 with equations y − 2x2 = 0, z − x3 = 0.
1.4. SUBMANIFOLDS 59

a) Show that C is a one-dimensional submanifold of R3 .


b) Show that t = x can be used as coordinate on C with coordinate domain all
of C.
c) In a neighbourhood of which points of C can one use t = z as coordinate?
[Suggestion. For (a) use Theorem 1.4.3. For (b) use the description of Tp C in
Theorem 1.4.3 to show that the Inverse Function Theorem applies to C → R,
p → x.]
4. Let M =M3 (R) = R3×3 be the set of all real 3 × 3 matrices regarded as
a manifold. (We can take the matrix entries Xij of X ∈ M as a coordinate
system defined on all of M .) Let O(3) be the set of orthogonal 3 × 3 matrices:
O(3) = {X ∈M3 (R) | X ∗ = X −1 }, where X ∗ is the transpose of X. Show that
O(3) is a 3-dimensional submanifold of M3 (R) with tangent space at X ∈ O(3)
given by

TX O(3) = {V ∈ M3 (R) | (X −1 V )∗ = −X −1 V i.e. X −1 V is skew-symmetric}.

[Suggestion. Consider the map F from M3 (R) to the space Sym3 (R) of sym-
metric 3 × 3 matrices defined by F (X) = X ∗ X. Show that the differential dFX
of this map is given by
dFX (V ) = X ∗ V + X ∗ V.
Conclude that dFX surjective for all X ∈ O(3). Apply theorem 1.4.3.]

P∞ Let exp: M3 (R) →M3 (R) be


5. Continue with the setup of the previous problem.
the matrix exponential defined by exp V = k=0 V k /k!. If V ∈Skew3 (R) =
{V ∈M3 (R) | V ∗ = −V }, then exp V ∈ O(3) because (exp V )∗ (exp V ) = (exp −V )(exp V ) =
I. Show that the equation

X = exp V , V = (Vij ) ∈ Skew3 (R)

defines a coordinate system (Vij ) in a neighbourhood of I in O(3). [Suggestion.


Use theorem 1.4.6]
6. Let S = {p = (x, y, z) ∈ R3 | x2 + y 2 − z 2 = 1, x = c}. For which values of c
is S a submanifold of R3 ? For those c for which it is not, which points have to
be excluded so that the remaining set is a submanifold? Sketch.
7. a) Let S be the subset of R3 given by an equation z = f (x, y) where f is a
C ∞ function. (I.e. S = {p = (x, y, z) ∈ R3 | z = f (x, y)} is the graph of a C ∞
function f of two variables.) Show that S is a 2-dimensional submanifold of R3 .
b) Let S be a 2-dimensional submanifold of R3 . Show that S can locally be
given by an equation z = f (x, y) where (x, y, z) the Cartesian coordinates in

a suitable order. ( I.e. for every point po ∈ S one can T find a C function f
3
defined in some neighbourhood U of po in R so that S U consists of all points
p = (x, y, z) in U satisfying z = f (x, y).)[Suggestion. Use the Implicit Function
Theorem. See §1.1, Exercise 16 for the statement. ]
8. State and prove a generalization of the previous problem, parts (a) and (b),
for a m-dimensional submanifolds S of Rn .
60 CHAPTER 1. MANIFOLDS

9. Let S be the helicoid in R3 with equation z = θ in cylindrical coordinates,


i.e.

S = {p = (x, y, z) ∈ R3 | z = θ, x = r cos θ, y = r sin θ, r > 0, θ < ∞}.

a) Sketch S. Find an equation for S in Cartesian coordinates.


b) Show that S is a 2-dimensional submanifold of R3 .
c) Find all points (if any) in a neighbourhood of which one can use (r, θ) as
coordinates on S.
d) Same for (r, z).
e) Same for (x, y).
f) Same for (θ, z).
10. Let C: p = p(t), −∞ < t < ∞, be a C ∞ curve in R3 . The tangential
developable is the surface S in R3 swept out by the tangent line of C, i.e.

S = {p = p(u) + v ṗ(u) | −∞ < u, v < ∞}.

a) Sketch S for some C of your choice illustrating the general idea.


b) Which parameter points (uo , vo ),if any, have a neighbourhood U so that the
part SU of S parametrized by U is a submanifold of R3 ? Explain what this
means geometrically. Give an example.
c) Determine the tangent space to SU at a point po with parameters (uo , vo ) as
in (b) as a subspace of R3 . [Use Theorem 1.4.6 for parts (b) and (c)]
11. Let C be the helix in R3 with parametric equations

x = cos t, y = sin t, z = t.

Let S be the surface swept out by the tangent line of C. [See previous problem].
a) Find a parametric equations x = x(u, v), y = y(u, v), z = z(u, v) for S.
b) Which points of S have to be omitted (if any) so that the rest is a submanifold
of R3 ? Prove your answer. [This is not a special case of the previous problem.
Explain why not.]
In exercises 11 −16 a set S in R2 or in R3 is given.
a) Find the regular points of S and specify a coordinate system around each
regular point.
b) Find all singular points of S, if any. (Prove that these points are singular.)
If S depends parameters a, b, · · · you may have to consider various cases, de-
pending on the values of the parameters. Try to sketch S.
12. The surface with parametric equations

2au2 au(u2 − 1)
x= , y = , z = v.
1 + u2 1 + u2

13. The set of all points P in a plane for which the product of the distances to
two given points F1 , F2 has a constant value a2 . (Denote the distance between
F1 and F2 by b > 0. The set of these points P is called the ovals of Cassini.)
1.4. SUBMANIFOLDS 61

14. The curve with equation r = 2a cos θ + 2b in polar coordinates (a, b ≥ 0).
15. The curve in the plane with parametric equations x = cos3 t, y = sin3 t.
16. The endpoints of a mobile line segment AB of constant length 2a are
constrained to glide on the coordinate axis in the plane. Let OP be the perpen-
dicular to AB from the origin O to a point P on AB. S is the set of all possible
positions of P . [Suggestion. Show that r = a sin 2θ.]
17. The surface obtained by rotating the curve z =siny in the yz-plane about
the z-axis.
18. Show that T S 2 is diffeomorphic with the submanifold V of R6 defined by

V = {(x, y, z; v, η, ζ) : x2 + y 2 + z 2 = 1, vx + ηy + ζz = 0}.

Use the following steps.


(a) Prove that V is a submanifold of R3 × R3 = R6 .
(b) Let i : S 2 → R3 be the inclusion map and F = di : T S 2 → T R3 = R3 ×R3 its
differential. Show that F (T S 2 ) ⊂ V and that F : T S 2 → V is a diffeomorphism.
[Suggestion. Start by specifying coordinates on S 2 and on R3 and write out a
formula for i : S 2 → R3 in these coordinates.)
19. Let M be a manifold, S a submanifold of M . T
a) Show that a subset V of S is open in S if and only if V = S U for some
open subset V of M .
b) Show that a partially defined function g : S · · · → R defined on an open
subset of S is C ∞ (for the manifold structure on S) if an only if it extends to a
C ∞ function f : M · · · → R locally, i.e. in a neighbourhood in M of each point
where g is defined.
c) Show that the properties (a) and (b) characterize the manifold structure of
S uniquely.
20. Let M be an n–dimensional manifold, S an m–dimensional submanifold of
M . For p ∈ S, let
Tp⊥ S = {w ∈ Tp∗ M : w(Tp S) = 0}

and let T ⊥ S ⊂ T ∗ M be the union of the Tp⊥ S. (T ⊥ S is called the conormal


bundle of S in M ).
a) Show that T ⊥ S is an n–dimensional submanifold of T ∗ M .
b) Show that the canonical 1–form θ on T ∗ M (previous problem) is zero on
tangent vectors to T ⊥ S.
21. Let S be the surface in R3 obtained by rotating the circle (y − a)2 + z 2 = b2
in the yz–plane about the z–axis. Find the regular points of S. Explain why
the remaining points (if any) are not regular. Determine a subset of S, as
large as possible, on which one can use (x, y) as coordinates. [You may have
to distinguish various cases, depending on the values of a and b. Cover all
possibilities.]
62 CHAPTER 1. MANIFOLDS

22. Let M be a manifold, S a submanifold of M, i : S → M the inclusion map,


di : T S → T M its differential. Prove that di(T S) is a submanifold of T M .
What is its dimension? [Suggestion. Use the definition of submanifold.]
23. Let S be the surface in R3 obtained by rotating the circle (y − a)2 + z 2 = b2
(a > b > 0) in the yz–plane about the z–axis.
(a) Find an equation for S and show that S is a submanifold of R3 .
(b) Find a diffeomorphism F : S 1 × S 1 → S, expressed in the form
F : x = x(φ, θ), y = y(φ, θ), z = z(φ, θ)
where φ and θ are the usual angular coordinates on the circles. [Explain why F
it is C ∞ . Argue geometrically that it is one–to–one and onto. Then prove its
inverse is also C ∞ .]
24. Let S be the surface in R3 obtained by rotating the circle (y − a)2 + z 2 = b2
(a > b > 0) in the yz–plane about the z–axis and let Cs be the intersection of
S with the plane z = s. Sketch. Determine all values of s ∈ R for which Cs a
submanifold of R3 and specify its dimension. (Prove your answer.)
25. Under the hypothesis of part (a) of Theorem 1.4.3, prove that in part (b)
the linear independence of all n differentials (df 1 , · · · , df n ) at p is necessary for
f 1 |S , · · · f m |S to form a coordinate system around p on S.
26. (a)Prove the parenthetical assertion in proof 1 of theorem 1.4.3. (b)Same
for 1.4.6.

1.5 Riemann metrics


A manifold does not come with equipped with a notion of “metric“ just in
virtue of its definition, and there is no natural way to define such a notion using
only the manifold structure. The reason is that the definition of “metric” on
Rn (in terms of Euclidean distance) is neither local nor invariant under diffeo-
morphisms, hence cannot be transferred to manifolds, in contrast to notions
like “differentiable function“ or “tangent vector”. To introduce a metric on a
manifold one has to add a new piece of structure, in addition to the manifold
structure with which it comes equipped by virtue of the axioms. Nevertheless, a
notion of metric on a manifold can be introduced by a generalization of a notion
of metric of the type one has in a Euclidean space and on surfaces therein. So
we look at some examples of these first.
1.5.1 Some preliminary examples. The Euclidean metric in R3 is characterized
by the distance function
p
(x1 − x0 )2 − (y1 − y0 )2 − (z1 − z0 )2 .
One then uses this metric to define the length of a curve p(t) = (x(t), y(t)z(t))
between t = a and t = b by a limiting procedure, which leads to the expression
Z b r 2  2  2
dx dy dz
+ + dt.
a dt dt dt
1.5. RIEMANN METRICS 63

As a piece of notation, this expression often is written as


Z bp
(dx)2 + (dy)2 + (dz)2 .
a

The integrand is called the element of arc and denoted by ds, but its meaning
remains somewhat mysterious if introduced in this formal, symbolic way. It
actually has a perfectly precise meaning: it is a function on the set of all tan-
gent vectors on R3 , since the coordinate differentials dx, dy, dz are functions on
tangent vectors. But the notation ds for this function, is truly objectionable:
ds is not the differential of any function s. The notation is too old to change
and besides gives this simple object a pleasantly old fashioned flavour. The
function ds on tangent vectors characterizes the Euclidean metric just a well as
the distance functions we started out with. It is convenient to get rid of the
square root by considering the square of ds instead:
ds2 = (dx)2 + (dy)2 + (dz)2 .
This is now a quadratic function o*n tangent vectors which can be used to
characterize the Euclidean metric on R3 . For our purposes this ds2 is more a
suitable object than the Euclidean distance function, so we shall simply call
ds2 itself the metric. One reason why it is more suitable is that ds2 can be
easily written down in any coordinate system. For example, in cylindrical co-
ordinates we use x = r cos θ, y = r sin θ, z = z to express dx, dy, dz in terms of
dr, dθ, dz and substitute into the above expression for ds2 ; similarly for spherical
coordinates x = ρ cos θ sin φ, y = ρ sin θ sin φ, z = ρ cos φ. This gives:
Cylindrical: (dr)2 + r2 (dθ)2 + (dz)2
Spherical: (dρ)2 + ρ2 (sin2 φ(dθ)2 + (dφ)2 )
The same discussion applies to the Euclidean metric in Rn in any dimension.
For polar coordinates in the plane R2 one can draw a picture to illustrate the
ds2 in manner familiar from analytic geometry:

dr
ds
r dφ
r

O
Fig. 1. ds in polar coordinates : ds2 = (dr)2 + r2 (dθ)2

In Cartesian coordinates (xi ) the metric in Rn is ds2 = (dxi )2 . To find its


P
expression in arbitrary coordinates (y i ) one uses the coordinate transformation
xi = xi (y 1 , · · · , y n ) to express the dxi in terms of the dy i :
X ∂xi X XX ∂xi 2 X ∂xi ∂xi
dxi = k
dy k gives (dxi )2 = k
dy k = dy k dy l .
∂y ∂y ∂y k ∂y l
k I I k Ikl
64 CHAPTER 1. MANIFOLDS

Hence
X X X ∂xi ∂xi
(dxi )2 = gkl dy k dy l , where gkl = .
i
∂y k ∂y l
kl i

The whole discussion applies equally to the case when we start with some other
quadratic function instead to the Euclidean metric. A case of importance in
physics is the Minkowski metric on R4 , which is given by the formula

(dx0 )2 − (dx1 )2 − (dx2 )2 − (dx3 )2 .

More generally, one can consider a pseudo-Euclidean metric on Rn given by

±(dx1 )2 ± (dx1 )2 ± · · · ± (dxn )2

for some choice of the signs.


A generalization into another direction comes form the consideration of surfaces
in R3 , a subject which goes back to Gauss (at least) and which motivated
Riemann in his investigation of the metrics now named after him. Consider a
smooth surface S (2–dimensional submanifold) in R3 . A curve p(t) on S can be
considered as a special kind of curve in R3 , so we can defined its length between
t = a and t = b by the same formula as before :
Z b p
(dx)2 + (dy)2 + (dz)2 .
a

In this formulas we consider dx, dy, dz as functions of tangent vectors to S,


namely the differentials of the restrictions to S of the Cartesian coordinate
functions x, y, z on R3 . The integral is understood in the same sense as before:
we evaluate dx, dy, dz on the tangent vector ṗ(t) of p(t) and integrate with
respect to t. (Equivalently, we may substitute directly x(t), y(t), z(t) for x, y, z.)
In terms of a coordinate system (u, v) of S, we can write

x = x(u, v), y = y(u, v), z = z(u, v))

for the Cartesian coordinates of the point p(u, v) on S . Then on S we have


 ∂x ∂x 2  ∂y ∂y 2  ∂z ∂z 2
(dx)2 +(dy)2 +(dz)2 = du+ dv + du+ dv + du+ dv
∂u ∂v ∂u ∂v ∂u ∂v
which we can write as

ds2 = guu (du)2 + 2guv dudv + gvv (dv)2

where the coefficients guu , 2guv , gvv are obtained by expanding the squares in
the previous equation and the symbol ds2 denotes the quadratic function on
tangent vectors to S defined by this equation. This function can be thought of
as defining the metric on S, in the sense that it allows one to compute the length
of curves on S, and hence the distance between two points as the infimum over
the length of curves joining them.
1.5. RIEMANN METRICS 65

As a specific example, take for S the sphere x2 + y 2 + z 2 = R2 of radius R.


(We don’t take R = 1 here in order to see the dependence of the metric on the
radius.) By definition, the metric on S is obtained from the metric on R3 by
restriction. Since the coordinates (θ, φ) on S are the restrictions of the spherical
coordinates (ρ, θ, φ) on R3 , we immediately obtain the ds2 on S from that on
R3 :
ds2 = R2 (sin2 φ(dθ)2 + (dφ)2 ).
(Since ρ ≡ R on S, dρ ≡ 0 on T S).
The examples make it clear how a metric on manifold M shouldPbe defined: it
should be a function on tangent vectors to M which looks like gij dxi dxj in
i
a coordinate system (x ), perhaps with further properties to be specified. (We
don’t want gij = 0 for all ij, for example.) Such a function is called a quadratic
form, a concept which makes sense on any vector space, as we shall now discuss.
1.5.2 Definitions. Bilinear forms and quadratic forms. Let V be an n–
dimensional real vector space. A bilinear form on V is a function g : V ×V → R,
(u, v) → g(u, v), which is linear in each variable separately, i.e. satisfies

g(u,vv + w) = g(u, v) + g(u, w); g(u + v, w) = g(u, w) + g(v, w)

g(au, v) = ag(u, v); g(u, av) = ag(u, v)


for all u, v ∈ V , a ∈ R. It is symmetric if

g(u, v) = g(v, u)

and it is non-degenerate if

g(u, v) = 0 for all v ∈ V implies u = 0.

A bilinear form g on V gives a linear map V → V ∗ , v → g(v, ·) and to say that g


is non–degenerate means that V → V ∗ has kernel {0}, hence is an isomorphism,
since dim V ∗ = dim V . The quadratic form associated to a bilinear form g is
the function Q : V → R defined by

Q(v) = g(v, v).

If g is symmetric then Q determines g by the formula

Q(u + v) = Q(u) + Q(v) + 2g(u, v).

Hence a symmetric bilinear form g is essentially “the same thing“ as a quadratic


form Q. One says that g or Q is positive definite if Q(v) > 0 for v 6= 0. Then g
is necessarily non–degenerate. If (ei ) is a basis of V and if one writes v = v i ei ,
then one has g(v, w) = gij v i wj where gij = g(ei , ej ).
Remark. Any non-degenerate quadratic form Q can be expressed as

Q = ±(ξ 1 )2 ± · · · ± (ξ n )2
66 CHAPTER 1. MANIFOLDS

where ξ i are the component functionals with respect to a suitable basis ei , i.e.
v = ξ i (v)ei . Such a basis is called orthonormal for Q. The number of ±signs is
independent of the basis and is called the signature of the form.
After this excursion into linear algebra we now return to manifolds.
1.5.3 Definition. A Riemann metric on M associates to each p ∈ M a non–
degenerate symmetric bilinear form gp on Tp M .
The corresponding quadratic form Q is denoted ds2 and determines g uniquely.
Relative to a coordinate system (xi ) we can write
X
ds2 = gij dxi dxj .

The coefficients gij are given by

gij = g(∂/∂xi , ∂/∂xj ).

As part of the definition we require that the gij are C ∞ functions. This is
requirement evidently independent of the coordinates. An equivalent condition
is that g(X, Y ) be a C ∞ function for any two C ∞ vector fields X, Y on M or
an open subset of M .
We add some remarks. (1)We emphasize once more that ds2 is not the differen-
tial of some function s2 on M , nor the square of the differential of a function s
(except when dim M = 0, 1); the notation is rather explained by the examples
discussed earlier.
(2) The term “Riemann metric“ is sometimes reserved for positive definite met-
rics and then “pseudo-Riemann” or “semi-Riemann“ is used for the possibly
indefinite case. A manifold together with a Riemann metric is referred to as a
Riemannian manifold, and may again be qualified as “pseudo” or “semi”.
(3)At a given point po ∈ M one can find a coordinate system (x̃i ) so that the
metric ds2 = gij dxi dxj takes on the pseudo-Euclidean form ±δij dx̃i dx̃j . [Rea-
son: as remarked above, quadratic form gij (po )ξ i ξ j can be written as ±δij ξ˜i ξ˜j
in a suitable basis, i.e. by a linear transformation ξ i = ai ξ˜j , which can be used
j
as a coordinate transformation xi = aij x̃j .] But it is generally not possible to do
this simultaneously at all points in a coordinate domain, not even in arbitrarily
small neighbourhoods of a given point.
(4)As a bilinear form on tangent spaces, a Riemann metric is a conceptually
very simple piece of structure on a manifold, and the elaborate discussion of
arclength etc. may indeed seem superfluous. But a look at some surfaces
should be enough to see the essence of a Riemann metric is not to be found in
the algebra of bilinear forms.
For reference we record the transformation rule for the coefficients of the metric,
but we omit the verification.
1.5.4 Lemma. The coefficients gij and g̃kl of the metric with respect to two
coordinate systems (xi ) and (x̃j ) are related by the transformation law
∂xi ∂xj
g̃kl = gij .
∂ x̃k ∂ x̃l
1.5. RIEMANN METRICS 67

We now consider a submanifold S of a manifold M equipped with a Riemann


metric g. Since the tangent spaces of S are subspaces to those of M we can
restrict g to a bilinear form gS on the tangent spaces of S. If the metric on
M is positive definite, then so is gS . In general, however, it happen that this
form gS is degenerate; but if it is non–degenerate at every point of S, then gS
is a Riemann metric on S, called the called the Riemann metric on S induced
by the Riemann metric g on M . For example, the metric on a surface S in R3
discussed earlier is induced from the Euclidean metric in R3 in this sense.
A Riemann metric on a manifold makes it possible to define a number of geo-
metric concepts familiar from calculus, for example volume.
1.5.5 Definition. Let R be a bounded region contained is the domain of a
coordinate system (xi ). The volume of R (with respect to the Riemann metric
g) is Z Z q
··· | det gij |dx1 · · · dxn

where the integral is over the coordinate-region {(xi (p)) | p ∈ R} corresponding


to the points in R. If M is two-dimensional one says area instead of volume, if
M is one-dimensional one says arclength.
1.5.6 Proposition. The above definition of volume is independent of the coor-
dinate system.
Proof. Let (x̃i ) be another coordinate system. Then

∂xi ∂xj
g̃kl = gij .
∂ x̃k ∂ x̃l
So
∂xi ∂xj ∂xi
det g̃kl = det(gij k l
) = det gij (det k )2 ,
∂ x̃ ∂ x̃ ∂ x̃
and

∂xi
Z Z q Z Z q
1 n
··· | det g̃ij |dx̃ · · · dx̃ = · · · | det gij || det k |dx̃1 · · · dx̃n
∂ x̃
Z Z q
= ··· | det gij |dx1 · · · dxn ,

by the change of variables formula.

1.5.7 Remarks.
(a) If f is a real-valued function one can define the integral of f over R by the
formula Z Z q
· · · f | det gij | dx1 · · · dxn ,
68 CHAPTER 1. MANIFOLDS

provided f (p) is an integrable function of the coordinates (xi ) of p. The same


proof as above shows that this definition is independent of the coordinate system.
(b) If the region R is not contained in the domain of a single coordinate system
the integral over R may defined by subdividing R into smaller regions, just as
one does in calculus.
1.5.8 Example. Let S be a two-dimensional submanifold of R3 (smooth surface).
The Euclidean metric dx2 + dy 2 + dz 2 on R3 gives a Riemann metric g = ds2 on
S by restriction. Let u, v coordinates on S. Write p = p(u, v) for the point on
S with coordinates (u, v). Then

q ∂p ∂p
| det gij | = ×
∂u ∂v .

Here gij is the matrix of the Riemann metric in the coordinate system u, v. The
right-hand side is the norm of the cross-product of vectors in R3 . This shows
that the “volume“ defined above agrees with the usual definition of surface area
for a surface in R3 .
We now consider the problem of finding the shortest line between two points
on a Riemannian manifold with a positive definite metric. The problem is this.
Fix two points A, B in M and consider curves p = p(t), a ≤ t ≤ b from p(a) = A
to p(b) = B. Given such a curve, consider its arc–length
Z b Z b p
ds = Qdt.
a a

as a function of the curve p(t). The integrand Q is a function on tangent
vectors evaluated at the velocity vector ṗ(t) of p(t). The integral is independent
of the parametrization. We are looking for “the“ curve for which this integral is
minimal, but we have no guarantee that such a curve is unique or even exists.
This is what is called a variational problem and it will be best to consider it in
a more general setting.
Consider a function S on the set of paths p(t), a ≤ t ≤ b between two given
points A = p(a) and B = p(b) of form
Z b
S= L(p, ṗ)dt. (1)
a

We now use the term “path“ to emphasize that in general the parametrization is
important: p(t) is to be considered as function on a given interval [a, b], which we
assume to be differentiable. The integrand L is assumed to be a given function
L = L(p, v) on the tangent vectors on M , i.e. a function on the tangent bundle
T M . We here denote elements of T M as pairs (p, v), p ∈ M being the point at
which the vector v ∈ Tp M is located. The problem is to find the path or paths
p(t) for which make S a maximum, a minimum, or more generally stationary,
in the following sense.
1.5. RIEMANN METRICS 69

Consider a one–parameter family of paths p = p(t, ), a ≤ t ≤ b, −α ≤  ≤ α,


from A = p(a, ) to B = p(b, ) which agrees with a given path p = p(t) for
 = 0. (We assume p(t, ) is at least of class C 2 in (t, ).)

Fig. 2. The paths p(t, )

Then S evaluated at the path p(, t) becomes a function S of  and if p(t) =


p(0, t) makes S a maximum or a minimum then
 dS 
=0 (2)
d =0

for all such one–parameter variations p(, t) of p(t). The converse is not neces-
sarily true, but any path p(t) for which (2) holds for all variations p(, t) of p(t)
is called a stationary (or critical ) path for the path–function S.
We now compute the derivative (2). Choose a coordinate system x = (xi ) on
M . We assume that the curve p(t) under consideration lies in the coordinate
domain, but this is not essential: otherwise we would have to cover the curve by
several coordinate systems. We get a coordinate system (x, ξ) on T M by taking
as coordinates of (p, v) the coordinates xi of p together with the components ξ i
of v. (Thus ξ i = dxi as function on tangent vectors.) Let x = x(t, ) be the
coordinate point of p(t, ), and in (2) set L = L(x, ξ) evaluated at ξ = ẋ. Then
b b  ∂L ∂xk ∂L ∂ ẋk 
Z Z
dS ∂L
= dt = + dt.
d a ∂ a ∂xk ∂ ∂ξ k ∂

Change the order of the differentiation with respect to t and  in the second
term in parentheses and integrate by parts to find that this

b t=b
d ∂L  ∂xk ∂L ∂xk
Z  ∂L 
= − dt +
a ∂xk dt ∂ξ k ∂ ∂ξ k ∂ t=a

The terms in brackets is zero, because of the boundary conditions x(, a) ≡ x(A)
and x(, b) ≡ x(B). The whole expression has to vanish at  = 0, for all
x = x(t, ) satisfying the boundary conditions. The partial derivative ∂xk /∂
at  = 0 can be any C 1 function wk (t) of t which vanishes at t = a, b since we
can take xk (t, ) = xk (t) + wk (t), for example. Thus the initial curve x = x(t)
satisfies Z b
∂L d ∂L  k
k
− w (t)dt = 0
a ∂x dt ∂ξ k
70 CHAPTER 1. MANIFOLDS

for all such wk (t). From this one can conclude that

∂L d ∂L
− = 0. (3)
∂xk dt ∂ξ k
for all t, a ≤ t ≤ b and all k. For otherwise there is a k so that this expression is
non-zero, say positive, at some to , hence in some interval about to . For this k,
choose wk equal to zero outside such an interval and positive on a subinterval
and take all other wk equal to zero. The integral will be positive as well, contrary
to the assumption. The equation (3) is called the Euler–Lagrange equation for
the variational problem (2).
We add some remarks on what is called the principle of conservation of energy
connected with the variational problem (2). The energy E = E(p, v) associated
to L = L(p, v) is the function defined by

∂L k
E= ξ −L
∂ξ k

in a coordinate system (x, ξ) on T M as above. (It is actually independent of


the choice of the coordinate system x on M ). If x = x(t) satisfies the Euler–
Lagrange equation (3) and
 we take ξ = ẋ, then.. E = E(x, ξ) satisfies
∂L ..k

dE d ∂L k d ∂L k ∂L k ∂L k
dt = dt ∂ξ k ẋ − L = dt ∂ξ k ẋ + ∂ξ k x − ∂xk ẋ − ∂ξ k x
 
d ∂L ∂L
= dt ∂ξ k
− ∂x k ẋk = 0. (4)

Thus E =constant along the curve p(t).


We now
√ return to the particular problem of arc length. Thus we have to take
L = Q in (1). The integral (1) becomes
Z b p
Qdt
a

and is in this case independent of the parametrization of the path p(t). The
derivative (2) becomes
Z b
1  −1/2 ∂Q 
Q dt.
a 2 ∂ =0

Because of the independence of parametrization, we may assume that the initial


curve is parametrized by arclength, i.e. Q ≡ 1 for  = 0. Then we get
Z b
1  ∂Q 
dt
2 a ∂ =0

√ i j
This means in effect that we can replace L = Q by L = Q = gij √ξ ξ . For this
new L the energy E becomes E = 2Q − Q = Q. So the speed Q is constant
along p(t). We now write out (3) explicitly and summarize the result.
1.5. RIEMANN METRICS 71

1.5.9 Theorem. Let M be Riemannian manifold. A path p = p(t) makes


Rb
a
Qdt stationary if and only if its parametric equations xi = xi (t) in any
coordinate system (xi ) satisfy
d  dxi  1 ∂gij dxi dxj
gik − = 0. (5)
dt dt 2 ∂xk dt dt
Rb √
This holds also for a curve of minimal arc–length a Qdt between two given
points if traversed at constant speed. Any curve satisfying (5) has constant
speed, i.e. g(ṗ, ṗ) =constant.
1.5.10 Definition. Any curve on a Riemannian manifold M satisfying (5) is
called a geodesic (of the Riemann metric g).
For this definition the metric need not be positive definite, but even if it is, a
geodesic need not be the shortest curve between any two of its points. (Think of
a great circle on a sphere; see the example below.) But if the metric is positive
definite it can be shown that for p sufficiently close to q the geodesic from p to
q is the unique shortest line.
1.5.11 Proposition. Given a point po ∈ M and a vector vo at po there is a
unique geodesic p = p(t), defined in some interval about t = 0, so that p(0) = po
and ṗ(0) = vo .
Proof. The system of second-order differential equations (5) has a unique
solution (xi (t)) with a given initial condition (xi (0)), (ẋi (0)).
1.5.12 Examples.
A. Geodesics in Euclidean space. For the Euclidean metric ds2 = (dx1 )2 + · · · +
(dxn )2 the equations (5) read d2 xi /dt2 = 0, so the geodesics are the straight
lines xi (t) = ci t + xio traversed at constant speed. (No great surprise, but good
to know that (5) works as expected.)
B. Geodesics on the sphere. The metric is ds2 = sin2 φdθ2 +dφ2 (we take R = 1).
The equations (5) become
d  2 dθ  d2 φ  dθ 2
sin φ = 0, 2
− sin φ cos φ =0
dt dt dt dt
These equations are obviously satisfied if we take θ = α, φ = νt. This is a great
circle through the north-pole (x, y, z) = (0, 0, 1) traversed with constant speed.
Since any vector at po is the tangent vector of some such curve, all geodesics
starting at this point are of this type (by the above proposition). Furthermore,
given any point po on the sphere, we can always choose an orthogonal coordinate
system in which po has coordinates (0, 0, 1). Hence the geodesics starting at any
point (hence all geodesics on the sphere) are great circles traversed at constant
speed.
1.5.13 Definition. A map f : M → N between Riemannian manifolds is called
a (local) isometry if it is a (local) diffeomorphism and preserves the metric, i.e.
for all p ∈ M , the differential dfp : Tp M → Tf (p) N satisfies
ds2M (v) = ds2N (dfp (v))
72 CHAPTER 1. MANIFOLDS

for all v ∈ Tp M .
1.5.14 Remarks. (1) In terms of scalar products, this is equivalent to

gM (v, w) = gN (dfp (v), dfp (w))

for all v, w ∈ Tp M .
(2) Let (x1 , · · · , xn ) and (y 1 , · · · , y m ) be coordinates on M and N respectively.
Write
ds2M = gij dxi dxj ds2N = hab dy a dy b
for the metrics and

y a = f a (x1 , · · · , xn ), a = 1, · · · , m (6)

for the map f . To say that f is an isometry means that ds2N becomes ds2M if we
substitute for the y a and dy a in terms if xi and dxi by means of the equation
(6).
(5) If f is an isometry then it preserves arclength of curves as well. Conversely,
any C ∞ map preserving arclength of curves is an isometry. (Exercise).
1.5.15 Examples
a) Euclidean space. The linear transformations of Rn which preserve the Eu-
clidean metric are of the form x → Ax where A is an orthogonal real n×n matrix
(AA∗ = I). The set of all such matrices is called the orthogonal group, denoted
O(n). If the Euclidean metric is replaced by a pseudo-Euclidean metric with
p plus signs and q minus signs the corresponding set of “pseudo-orthogonal“
matrices is denoted O(p, q). It can be shown that any transformation of Rn
which preserves a pseudo-Euclidean metric is of the form x → xo + Ax with A
linear orthogonal.
b) The sphere. Any orthogonal transformation A ∈ O(3) of R3 gives an isometry
of S 2 onto itself, and it can be shown that all isometries of S 2 onto itself are of
this form. The same holds in higher dimensions and for pseudo-spheres defined
by a pseudo-Euclidean metric.
c) Curves and surfaces. Let C : p = p(σ), σ ∈ R, be a curve in R3 parametrized
by arclength, i.e. ṗ(σ) has length = 1. If we assume that the curve is non-
singular, i.e. a submanifold of R3 , then C is a 1-dimensional manifold with a
Riemann metric and the map σ → p(σ) is an isometry of R onto C. In fact it
can be shown that for any connected 1-dimensional Riemann
manifold C there exists an isometry of an interval on the Euclidean line R onto
C, which one can still call parametrization by arclength. The map σ → p(σ) is
not necessarily 1–1 on R (e.g. for a circle), but it is always locally 1–1. Thus
any 1-dimensional Riemann manifold is locally isometric with the straight line
R. This does not hold in dimension 2 or higher: for example a sphere is not
locally isometric with the plane R2 (exercise 18); only very special surfaces are,
e.g. cylinders (exercise 17; such surfaces are called developable.)
1.5. RIEMANN METRICS 73

Fig. 3. A cylinder “developed“ into a plane

d) Poincaré disc and Klein upper half-plane. Let D = {z = x + iy | x2 + y 2 < 1}


the unit disc and H = {w = u + iv ∈ C | v > 0} be the upper halfplane. The
Poincaré metric on D and the Klein metric on H and are defined by

4dzdz̄ dx2 + dy 2 −4dwdw̄ du2 + dv 2


ds2D = 2
=4 ds2H = = .
(1 − z z̄) (1 − x2 − y 2 )2 (w − w̄)2 v2

The complex coordinates z and w are only used as shorthand for the real coor-
dinates (x, y) and (u, v). The map f : w → z defined by

1 + iw
z=
1 − iw
sends H onto D and is an isometry for these metrics. To verify the latter use
the above formula for z and the formula
2idw
dz =
(1 − iw)2

for dz to substitute into the equation for ds2D ; the result is ds2H . (Exercise.)
1.5.16 The set of all invertible isometries of a Riemannian metric ds2 manifold
M is called the isometry group of M , denoted Isom(M ) or Isom(M, ds2 ). It is a
group in the algebraic sense, which means that the composite of two isometries
is an isometry as is the inverse of any one isometry. In contrast to the above
example, the isometry group of a general Riemann manifold may well consist of
the identity transformation only, as is easy to believe if one thinks of a general
surface in space.
1.5.17 On a connected manifold M with a positive-definite Riemann metric one
can define the distance between two points as the infimum of the lengths of all
curves joining these points. (It can be shown that this makes M into a metric
space in the sense of topology.)

EXERCISES 1.5
1. Verify the formula for the Euclidean metric in spherical coordinates:

ds2 = (dρ)2 + ρ2 (sin2 φ(dθ)2 + (dφ)2 )

2. Using (x, y) as coordinates on S 2 , show that the metric on


74 CHAPTER 1. MANIFOLDS

S 2 = {p = (x, y, z) | x2 + y 2 + z 2 = R2 }

is given by
(xdx + ydy)2
dx2 + dy 2 +
R2 − x2 − y 2
Specify a domain for the coordinates (x, y) on S 2 .
3. Let M = R3 . Let (x0 , x1 , x2 ) be Cartesian coordinates on R3 . Define a
metric ds2 by

ds2 = −(dx0 )2 + (dx1 )2 + (dx2 )2 .


Pseudo-spherical coordinates (ρ, ψ, θ) on R3 are defined by the formulas

x0 = ρ cosh ψ, x1 = ρ sinh ψ cos θ, x2 = ρ sinh ψ sin θ.

Show that
ds2 = −dρ2 + ρ2 [(dψ)2 + sinh2 ψ(dθ)2 ]
4. Let M = R3 . Let (x0 , x1 , x2 ) be Cartesian coordinates on R3 . Define a
metric ds2 by
ds2 = −(dx0 )2 + (dx1 )2 + (dx2 )2 .
Let S = {p = (x0 , x1 , x2 ) ∈ R3 | (x0 )2 − (x1 )2 − (x2 )2 = 1} with the metric ds2
induced by the metric ds2 on R3 . (With this metric S is called a pseudo-sphere).
(a) Prove that S is a submanifold of R3 .
(b) Show that for any (ψ, θ) the point p = (x0 , x1 , x2 ) given by

x0 = cosh ψ, x1 = sinh ψ cos θ, x2 = sinh ψ sin θ

lies on S. Use (ψ, θ) as coordinates on S and show that

ds2 = (dψ)2 + sinh2 ψ(dθ)2 .

(This shows that the induced metric ds2 on S is positive definite.)


i j
5. Prove the transformation rule g̃kl = gij ∂∂x ∂x
x̃k ∂ x̃l
.

p ∂p ∂p
6. Prove the formula | det gij | = ∂u × for surfaces as stated in the text.
 ∂v 
∂p ∂p 2
[Suggestion. Use the fact that gij = g ∂x i , ∂xj . Remember that kA × Bk =
2 2
kAk kBk − (A · B)2 for two vectors A, B in R3 .]
7. Find the expression for the Euclidean metric dx2 + dy 2 on R2 using the
following coordinates (u, v).
1 2
a)x = cosh u cos v, y = sinh u sin v b) x = (u − v 2 ), y = uv.
2
8. Let S be a surface in R3 with equation z = f (x, y) where f is a C ∞ function.
1.5. RIEMANN METRICS 75

a) Show that S is a submanifold of R3 .


b) Describe the tangent space Tp S at a point p = (x, y, z) of S as a subspace of
T p R3 = R3 .
c) Show that (x, y) define a coordinate system on S and that the metric ds2 on
S induced by the Euclidean metric dx2 + dy 2 + dz 2 on R3 is given by

ds2 = (1 + fx2 )dx2 + 2fx fy dxdy + (1 + fy2 )dy 2 .

d) Show that the area of a region R on S is given by the integral


ZZ q
1 + fx2 + fy2 dxdy

over the coordinate region corresponding to R. (Use definition 1.5.5.)


9. Let S be the surface in R3 obtained by rotating the curve y = f (z) about
the z–axis.
a) Show that S is given by the equation x2 + y 2 = f (z)2 and prove that S is a
submanifold of R3 . (Assume f (z) 6= 0 for all z.)
b) Show that the equations x = f (z) cos θ, y = f (z) sin θ, z = z define a
coordinate system (z, θ) on S. [Suggestion. Use Theorem 1.4.6]
c) Find a formula for the Riemann metric ds2 on S induced by the Euclidean
metric dx2 + dy 2 + dz 2 on R3 .
d) Prove the coordinate vector fields ∂/∂z and ∂/∂θ on S are everywhere or-
thogonal with respect to the inner product g(u, v) of the metric on S. Sketch.
10. Show that the generating curves θ =const. on a surface of rotation are
geodesics when parametrized by arclength. Is every geodesic of this type? [See
the preceding problem. “Parametrized by arclength“ means that the tangent
vector has length one.]
11. Determine the geodesics on the pseudo-sphere of problem 4. [Suggestion.
Imitate the discussion for the sphere.]
12. Let S = {p = (x, y, z) | x2 + y 2 = 1} be a right circular cylinder in R3 .
Show that the helix
x = cos t, y = sin t, z = ct
is a geodesic on S. Find all geodesics on S.
13. Prove remark 1.5.14 (2).
14. Prove that the map f defined in example 1.5.15(d) does map D onto H
and complete the verification that it is an isometry.
 
a b
15. With any complex 2×2 matrix A = associate the linear fractional
c d
transformation f : w → z defined by

aw + b
z= .
cw + d
76 CHAPTER 1. MANIFOLDS

a) Show that a composite of two such transformations corresponds to the prod-


uct of the two matrices. Deduce that such transformation is invertible if det A 6=
0, and if so, one can arrange
 det A = 1 without changing f .
a b
b)Suppose A = , det A = 1. Show that f gives an isometry of Poincaré
b̄ ā
disc D onto itself. [The convention of using the same notation (xi ) for a coordi-
nate system as for the coordinates of a general points may cause some confusion
here: in the formula for f , both z = x + iy and w = u + iv will belong to D; you
can think of (x, y) and (u, v) as two different coordinate systems on D, although
they are really the same.]
c) Suppose A is real and det A = 1. Show that f gives an isometry of Klein
upper halfplane H onto itself.
16. Let P be the pseudosphere x2 − y 2 − z 2 = 1 in R3 with the metric ds2P
induced by the Pseudo-Euclidean metric −dx2 + dy 2 + dz 2 in R3 .
a) For any point p = (x, y, z) on the upper pseudosphere x > 0 let (u, v, 0)
be the point where the line from (−1, 0, 0) to (x, y, z) intersects the yz-plane.
Sketch. Show that
2u 2v 2
x= , y= , z= − 1.
1 − u2 − v 2 1 − u2 − v 2 1 − u2 − v 2
b) Show that the map (x, y, z) → (u, v) is an isometry from the upper pseudo-
sphere onto the Poincaré disc. [Warning: the calculations in this problem are a
bit lengthy.]
17. Let S be the cylinder in R3 with base a curve in the xy-plane x = x(σ), y =
y(σ), defined for all −∞ < σ < ∞ and parametrized by arclength σ. Assume
that S is a submanifold of R3 with (σ, z) as coordinate system. Show that there
is an isometry f : R2 → S from the Euclidean plane onto S. Use this fact to
find the geodesics on S.
18. a) Calculate the circumference L and the area A of a disc of radius ρ about
a point of the sphere S 2 , e.g. the points with 0 ≤ φ ≤ ρ. [Answer: L = 2π sin ρ,
A = 2π(1 − cos ρ)].
b) Prove that the sphere is not locally isometric with the plane R2 .
1.6. TENSORS 77

1.6 Tensors
1.6.1 Definition. A tensor T at a point p ∈ M is a quantity which, relative
to a coordinate system (xk ) around p, is represented by an indexed system of
ij··· ij··· ab···
real numbers (Tlk··· ). The (Tkl··· ) and (T̃cd··· ) representing T in two coordinate
i j
systems (x ) and (x̃ ) are related by the transformation law

ab··· ij··· ∂ x̃a ∂ x̃b ∂xk ∂xl


T̃cd··· = Tkl··· · · · ··· . (1)
∂xi ∂xj ∂ x̃c ∂ x̃d
where the partials are taken at p.
ij···
The Tkl··· are called the components of T relative to the coordinate system (xi ).
If there are r upper indices ij · · · and s lower indices kl · · · the tensor is said to
be of type (r,s). It spite of its frightful appearance, the rule (1) is very easily
remembered: the indices on the ∂ x̃a /∂xi and ∂xk /∂ x̃c etc. on the right “cancel“
ij···
against those on Tkl··· to produce the indices on the left and the tildes on the
right are placed up or down like the corresponding indices on the left.
If the undefined term “quantity“ in this definition is found objectionable, one
can simply take it to be a rule which associates to every coordinate system (xi )
ij···
around p a system of numbers (Tkl··· ). But then a statement like “a mag-
netic field is a (0, 2)–tensor” requires some conceptual or linguistic acrobatics of
another sort.
ij···
1.6.2 Theorem. An indexed system of numbers (Tkl··· ) depending on a coor-
k
dinate system (x ) around p defines a tensor at p if and only if the function
ij···
T (λ, µ, · · · ; v, w, · · · ) = Tkl··· λi µj · · · v k wl · · ·

of r covariant vectors λ = (λi ), µ = (µj ) · · · and s contravariant vectors v =


(v k ),w = (wl ) · · · is independent of the coordinate system (xk ). This establishes a
one–to–one correspondence between tensors and functions of the type indicated.
Explanation. A function of this type is called multilinear ; it is linear in each
of the (co)vectors

λ = (λi ), µ = (µj ), · · · , v = (v k ), w = (wl ), · · ·

separately (i.e. when all but one of them is kept fixed). The theorem says that
one can think of a tensor at p as a multilinear function T (λ, µ, · · · ; v, w, · · · ) of
r covectors λ, µ, · · · and s vectors v,w, · · · at p, independent of any coordinate
system. Here we write the covectors first. One could of course list the variables
λ, µ, · · · ; v, w · · · in any order convenient. In components, this is indicated by
the position of the indices, e.g.
j j
T (v, λ, w) = Tik v λj w k

for a tensor of type (1, 2).


78 CHAPTER 1. MANIFOLDS

Proof. The function is independent of the coordinate system if and only if it


satisfies
ab··· ij···
T̃cd··· λ̃a µ̃b · · · ṽ c w̃d · · · = Tkl··· λi µj · · · v k wl · · · (2)
In view of the transformation rule for vectors and covectors this means that

ab··· ij··· ∂ x̃a ∂ x̃b ∂xk ∂xl


T̃cd··· λ̃a µ̃b · · · ṽ c w̃d · · · = Tkl··· · · · · · · λ̃a µ̃b · · · ṽ c w̃d · · ·
∂xi ∂xj ∂ x̃c ∂ x̃d
Fix a set of indices a, b, · · · , c, d, · · · . Choose the corresponding components
λ̃a , µ̃b , · · · ṽ c , w̃d · · · to be = 1 and all other components = 0. This gives
the transformation law (1). Conversely if (1) holds then this argument shows
that (2) holds as well. That the given rule is a one–to–one correspondence is
clear, since relative to a coordinate system both the tensor and the multilinear
ij···
function are uniquely specified by the system of numbers (Tkl··· ). .
As mentioned, one can think of a tensor as being “the same thing“ as a mul-
tilinear function, and this is in fact often taken as the definition of “tensor”,
instead of the transformation law (1). But that point of view is sometimes quite
awkward. For example, a linear transformation A of Tp M defines a (1, 1) ten-
sor (Aji ) at p, namely its matrix with respect to the basis ∂/∂xi , and it would
make little sense to insist that we should rather think of A a bilinear form
on Tp M × Tp∗ M . It seems more reasonable to consider the multilinear–form
interpretation of tensors as just one of many.
1.6.3 Definition. A tensor field T associates to each point p of M a tensor
T (p) at p. The tensor field is of class Ck if its components with respect to any
coordinate system are of class Ck .
Tensor fields are often simply called tensors as well, especially in the physics
literature.
1.6.4 Operations on tensors at a given point p.
(1) Addition and scalar multiplication. Componentwise. Only tensors of the
same type can be added. E.g. If T has components (Tij ), and S has components
(Sij ), then S + T has components Sij + Tij .
(2) Symmetry operations. If the upper or lower indices of a tensor are permuted
among themselves the result is again a tensor. E.g if (Tij ) is a tensor, then the
equation
Sij = Tji
defines another tensor S. This tensor may also defined by the formula

S(v, w) = T (w, v).

(3) Tensor product. The product of two tensors T = (Tji ) and S = (Slk ) is the
tensor T ⊗ S (also denoted ST ) with components

(T ⊗ S)ik i k
jl = Tj Sl .
1.6. TENSORS 79

The indices i = (i1 , i2 · · · ) etc. can be multi-indices, so that this definition


applies to tensors of any type.
(4) Contraction. This operation consists of setting an upper index equal to a
lower index and summing over it (summation convention). For example, from
a (1, 2) tensor (Tijk ) one can form two (0, 1) tensors U,V by contractions:
k k
Uj = Tkj , Vi = Tik . [Sum over k]

One has to verify that the above operations do produce tensors. As an example,
we consider the contraction of a (1, 1) tensor:

∂ x̃k ∂xj
T̃kk = Tji = Tji δij = Tii .
∂xi ∂ x̃k
The verification for a general tensor is the same, except that one has some more
indices around.
1.6.5 Lemma. Denote by Tpr,s M the set of tensors of type (r, s) at p ∈ M .
a)Tpr,s M is a vector space under the operations addition and scalar multiplica-
tion.
b) Let (xi ) be a coordinate system around p. T he r × s–fold tensor products
tensors
∂ ∂
i
⊗ j ⊗ · · · ⊗ dxk ⊗ dxl ⊗ · · · (*)
∂x ∂x
ij···
form a basis for Tpr,s M . If T ∈ Tpr,s M has components (Tkl··· ) relative to (xi ),
then
ij··· ∂ ∂
T = Tkl··· ⊗ j ⊗ · · · ⊗ dxk ⊗ dxl ⊗ · · · . (**)
∂xi ∂x
Proof. a) is clear. For (b), remember that ∂/∂xi has i–component = 1 and all
other components = 0, and similarly for dxk . The equation (**) is then clear,
ij···
since both sides have the same components Tkl··· . This equation shows also that
r,s
every T ∈ Tp M is uniquely a linear combination of the tensors (*), so that
these do indeed form a basis.
1.6.6 Tensors on a Riemannian manifold.
We now assume that M comes equipped with a Riemann metric g, a non–
degenerate, symmetric bilinear form on the tangent spaces Tp M which, relative
to a coordinate system (xi ), is given by

g(v, w) = gij v i wj .

1.6.7 Lemma. For any v ∈ Tp M there is a unique element gv ∈ Tp∗ M so that


gv(w) = g(v, w) for all w ∈ Tp M . The map v → gv is a linear isomorphism
g : Tp M → Tp∗ M .
Proof. In coordinates, the equation λ(w) = g(v, w) says λi wi = gij v i wj . This
holds for all w if and only if λi = gij v i and thus determines λ = gv uniquely.
80 CHAPTER 1. MANIFOLDS

The linear map v → gv is an isomorphism, because its matrix (gij ) is invertible.

In terms of components the map v → g(v) is usually denoted (v i ) → (vi ) and is


given by
vi = gij v j .

This shows that the operation of lowering indices on a vector is independent


of the coordinate system (but dependent on the metric g). The inverse map
Tp∗ M → Tp M, λ → v = g −1 λ, is given by

v i = g ij λj

where (g ij ) is the inverse matrix of (gij ):

g ik gkj = δji .

In components the map λ → g −1 λ is expressed by raising the indices: (λi ) →


(λi ). Since the operation v i → vi is independent of the coordinate system, so is
its inverse operation λi → λi . We use this fact to prove the following.

1.6.8 Lemma. (g ij ) is a (2, 0)-tensor.

Proof. For any two covectors (vi ), (wj ), the scalar

g ij vi wj = g ij gia v a gjb wb = δaj gjb v a wb = gab v a wa (4)

is independent of the coordinate system. Hence the theorem applies.

1.6.9 Definition. The scalar product of covectors λ = (λi ), µ = (µj ) is defined


by
g −1 (λ, µ) = g ij λi µj

where (g ij ) is the inverse matrix of (gij ).


This means that we transfer the scalar product in Tp M to Tp∗ M by the isomor-
phism Tp M → Tp∗ M, v → gv, i.e. g −1 (λ, µ) = g(g −1 λ, g −1 µ).
The tensors (gij ) and (g ij ) may be used to raise and lower the indices of arbitrary
tensors in a Riemannian space, for example

Tij = g jk Tik , Sij = gik Sjk .

The scalar product (S, T ) of any two tensors of the same type is defined by
raising or lowering the appropriate indices and contracting, e.g for T = (Tij )
and S = (Sij ),
(S, T ) = Tij S ij = T ij Sij .
1.6. TENSORS 81

Appendix 1: Tensors in Euclidean 3-space R3


We consider R3 as a Euclidean space, equipped with its standard scalar product
g(v, w) = v·w. This allows one to first of all to “identify“ vectors with covectors.
In Cartesian coordinates (x1 , x2 , x3 ) this identification v ↔ λ is simply given by
λi = v i , since gij = δij . This extends to arbitrary tensors, as a special case of
raising and lowering indices in a Riemannian manifold. There are further special
identifications which are peculiar to three dimensions, concerning alternating
(0, s)–tensors, i.e. tensors whose components change sign when two indices are
exchanged. The alternating condition is automatic for s = 0, 1; for s = 2
it says Tij = −Tji ; for s = 3 it means that under a permutation (ijk) of
(123) the components change according to the sign ijk of the permutation:
Tijk = ijk T123 . If two indices are equal, the component is zero, since it must
change sign under the exchange of the equal indices. Thus an alternating (0, 3)
tensor T on R3 is of the form Tijk = τ ijk where τ = T123 is a real number and
the other components are zero. This means that as alternating 3–linear form
T (u, v, w) of three vectors one has

T (u, v, w) = τ det[u, v, w]

where det[u, v, w] is the determinant of the matrix of components of u, v, w with


respect to the standard basis e1 , e2 , e3 . Thus one can “identify” alternating
(0, 3)–tensors T on R3 with scalars τ . This depends only on the “oriented
volume element“ of the basis (e1 , e2 , e3 ), meaning we can replace (e1 , e2 , e3 ) by
(Ae1 , Ae2 , Ae3 ) as long as det A = 1. There are no non–zero alternating (0, s)–
tensors with s > 3 on R3 , since at least two of the s > 3 indices 1, 2, 3 on a
component of such a tensor must be equal. Alternating (0, 2)–tensors on R3 can
be described with the help of the cross–product as follows.
Lemma. In R3 there is a one-to-one correspondence between alternating (0, 2)-
tensors F and vectors v so that F ↔ v if

F (a, b) = v · (a × b) = det[v, a, b]

as alternating bilinear form of two vector variables a, b.


Proof. We note that

det[v, a, b] = F (a, b) means kij v k ai bj = Fij ai bj . (*)

Given (v k ), (*) can be solved for Fij :

Fij = kij v k .

Conversely, given (Fij ), (*)can be solved for v k :


1 kij 1 1
vk =  det[v, ei , ej ] = kij F (ei , ej ) = kij Fij .
2 2 2
Here ei = (δij ), the i-th standard basis vector.
82 CHAPTER 1. MANIFOLDS

Geometrically one can think of an alternating (0, 2)–tensor F on R3 as a function


which associates to the “plane element“ of two vectors a, b the scalar F (a, b).
The “plane element” of a, b is to be thought of as represented by the cross–
product a×b, which determines the plane and oriented area of the parallelogram
spanned by a and b. We now discuss the physical interpretation of some types
of tensors.
Type (0, 0). A tensor of type (0, 0) is a scalar f (p) depending on p. E. g. the
electric potential of
a stationary electric field.
Type (1, 0). A tensor of type (1, 0) is a (contravariant) vector. E. g. the velocity
vector ṗ(t) of a curve p(t), or the velocity v(p) at p of a fluid in motion, the
electric current J(p) at p.
Type (0, 1). A tensor of type (0, 1) is a covariant vector. E. g. the differential
df = (∂f /∂xi )dxi at p of a scalar function. A force F = (Fi ) acting at a point p
should be considered a covector: for if v = (v i ) is the velocity vector of an object
passing through p, then the scalar F (v) = Fi v i is the work per unit time done
by F on the object, hence independent of the coordinate system. This agrees
with the fact that a static electric field E can be represented as the differential
of a scalar function ϕ, its potential: E = dϕ, i.e. Ei = ∂ϕ/∂xi .
Type (2, 0). In Cartesian coordinates in R3 , the cross-product v × w of two
vectors v = (v i ) and w = (wj ) has components
(v 2 w3 − v 3 w2 , −v 1 w3 + v 3 w1 , v 1 w2 − v 2 w1 ).
On any manifold, if v = (v i ) and w = (wj ) are vectors at a point p then the
quantity T ij = v i wj − v j wi is a tensor of type (2, 0) at p. Note that this tensor
is alternating i.e. satisfies the symmetry condition
T ij = −T ji .
Thus the cross-product v × w of two vectors in R3 can be thought of as an
alternating (2, 0) tensor. (But this is not always appropriate: e.g. the velocity
vector of a point p performing a uniform rotation about the point o is r × w
were r is the direction vector from o to p and w is a vector along the axis of
rotation of length equal to the angular speed. Such a rotation cannot be defined
on a general manifold.)
On any manifold, the alternating (2, 0)-tensor T ij = v i wj −v j wi can be thought
of as specifying the two-dimensional “plane element“ spanned by v = (v i ) and
w = (wj ) at p.
Type (0, 2). A tensor T of type (0, 2) can be thought of as a bilinear function
T (v, w) = Tij v i wj
of two vector variables v = (v i ), w= (wj ) at the same point p. Such a function
can be symmetric, i.e.
T (v, w) = T (w, v) i.e. T ij = T ji ,
1.6. TENSORS 83

or alternating
T (v, w) = −T (w, v) i.e. T ij = −T ji ,
but need not be either. For example, on any manifold a Riemann metric

g(v, w) = gij v i wj

is a symmetric (0, 2)-tensor field (which in addition must be nondegenerate:


det(gij ) 6= 0). A magnetic field B on R3 should be considered an alternating
(0, 2) tensor field in R3 : for if J is the current in a conductor moving with
velocity V , then the scalar (B × J) · V is the work done by the magnetic field
on the conductor, hence must be independent of coordinates. But this scalar

(B × J) · V = B · (J × V ) det[B, J, V ] = kij B k J i V j

is bilinear and alternating in the variables J, V hence defines a tensor (Bij ) of


type (0,2) given by
Bij = kij B k
in Cartesian coordinates. kij = ±1 is the sign of the permutation (kij) of
(123).
The scalar is
B(J, V ) = Bij J i V j ,
It is clear that in any coordinate system

B(J, v) = −B(V, J), i.e. Bij = −Bji

since this equation holds in the Cartesian coordinates.


As noted above, in R3 , any alternating (0, 2) tensor T can be written as T (a, b) =
v · (a × b) for a unique vector v. Thus T (v, w) depends only on the vector v × w
which characterizes the “2-plane element“ spanned by v and w. This is true
on any manifold: an alternating (0, 2)-tensor T can be thought of as a function
T (a, b) of two vector variables a, b at the same point p which depends only on
the “2-plane element” spanned by a and b.
Type (1, 1). A tensor T of type (1, 1) can be thought of a linear transformation
v→ T (v) of the space of vectors at a point p: T transforms a vector v = (v i ) at
p into the vector T (v) = (wj ) given by

wj = Tij v i .

Type (1, 3). The stress tensor S of an elastic medium in R3 associates to the
plane element spanned by two vectors a, b at a point p the force acting on this
plane element. If this plane is displaced with velocity v, then the work done
per unit time is a scalar which does not depend on the coordinate system. This
scalar is of the form
S(a, b, v) = Sijk ai bj v k
84 CHAPTER 1. MANIFOLDS

if a = (ai ), b = (bj ), v = (v k ). If this formula is to remain true in any coordi-


nate system, S must be a tensor of type (0, 3). Furthermore, since this scalar
depends only on the plane-element spanned by a,b, it must be alternating in
these variables:

S(a, b, v) = −S(b, a, v), i.e. Sijk = −Sjik .

Use the correspondence (b) to write in Cartesian coordinates

Sijk = ijl Pkl .

Then
S(a, b, v) = Plk ijk ai aj v l = P(a × b) · v.
This formula uses the correspondence (b) and remains true in any coordinate
system related to the Cartesian coordinate system by a transformation with
Jacobian determinant 1.

Appendix 2: Algebraic Definition of Tensors


The tensor product V ⊗ W of two finite dimensional vector spaces (over any
field) may be defined as follows. Take a basis {ej | j = 1, · · · , m} for V and a
basis {fk |k = 1, · · · , n} for W . Create a vector space, denoted
P V ⊗ W , with a
basisPconsisting of nm symbols ej ⊗ fk . Generally, for x = j xj ej ∈ V and
y = k yk fk ∈ W , let
X
x⊗y = xj yk ej ⊗ fk ∈ V ⊗ W.
ik

The space V ⊗ W is independent of the bases {ej } and {fk } in the following
sense. Any other bases {ẽj } and {f˜k } lead to symbols ẽj ⊗
˜ f˜k forming a basis of
˜ . We may then identify V ⊗W
V ⊗W ˜ with V ⊗ W by sending the basis ẽj ⊗ ˜ f˜k of
˜
V ⊗W to the vectors ẽj ⊗ f˜k in V ⊗ W , and vice versa.
The space V ⊗ W is called the tensor product of V and W . The vector x ⊗ y ∈
V ⊗ W is called the tensor product of x ∈ V and P w ∈ W . (One should keep
in mind that an element of V ⊗ W looks likeP ζij ej ⊗ fk and may not be
expressible as a single tensor product x ⊗ y = xj yk ej ⊗ fk .)
The triple tensor products (V1 ⊗ V2 ) ⊗ V3 and V1 ⊗ (V2 ⊗ V3 ) may be identified
in an obvious way and simply denoted V1 ⊗ V2 ⊗ V3 . Similarly one can form the
tensor product V1 ⊗ V2 ⊗ · · · ⊗ VN of any finite number of vector spaces.
This construction applies in particular if one takes for each Vj the tangent space
Tp M or the cotangent space Tp∗ M at a given point p of a manifold M . It should
be clear from lemma 1.6.5 that one obtains in this way the same notion of tensor
at p as in definition 1.6.1
Another message from Herman Weyl. For edification and moral support
contemplate this message from Hermann Weyl’s Raum-Zeit-Materie (my trans-
lation). Entering into tensor calculus has certainly its conceptual difficulties,
1.6. TENSORS 85

apart of the fear of indices, which has to be overcome. But formally this calcu-
lus is of extreme simplicity, much simpler, for example, than elementary vector
calculus. Two operations: multiplication and contraction, i.e. juxtaposition of
components of tensors with distinct indices and identification of two indices, one
up one down, with implicit summation. It has often been attempted to introduce
an invariant notation into tensor calculus· · · . But then one needs such a mass
of symbols and such an apparatus of rules of calculation (if one does not want
to go back to components after all) that the net effect is very much negative.
One has to protest strongly against these orgies of formalism, with which one
now starts to bother even engineers. – Elsewhere one can find similar sentiments
expressed in similar words with reversed casting of hero and villain.

EXERCISES 1.6
ij···
1. Let (Tkl··· )be the components of a tensor (field) of type (r,s) with respect to
ij···
the coordinate system (xi ). Show that the quantities ∂Tkl··· /∂xm , depending
on one more index m, do not transform like the components of a tensor unless
(r,s) = (0, 0). [You may take (r,s) = (1, 1) to simplify the notation.]
2. Let (ϕk ) be the components of a tensor (field) of type (0, 1) (= 1-form) with
respect to the coordinate system (xi ). Show that the quantities (∂ϕi /∂xj ) −
(∂ϕj /∂xi ) depending on two indices ij form the components of a tensor of type
(0, 2).
3. Let (X i ), (Y j ) be the components of two tensors (fields) of type (1, 0)
(=vector fields) with respect to the coordinate system (xi ). Show that the
quantities X j (∂Y i /∂xj ) − Y j (∂X i /∂xj ) depending on one index i form the
components of a tensor of type (1, 0).
4. Let f be a C2 function. (a) Do the quantities ∂ 2 f /∂xi ∂xj depending on the
indices ij form the components of a tensor field? [Prove your answer.]
(b) Suppose p is a point for which dfp = 0. Do the quantities (∂ 2 f /∂xi ∂xj )p
depending on the indices ij form the components of a tensor at p? [Prove your
answer.]
5. (a) Let I be a quantity represented by (δij ) relative to every coordinate
system. Is I a (1, 1)-tensor? Prove your answer. [δij =Kronecker delta:= 1 if
i =j and = 0 otherwise.]
(b) Let J be a quantity represented by (δij ) relative to every coordinate system.
Is J a (0, 2)-tensor? Prove your answer. [δij =Kronecker delta := 1 if i =j and
= 0 otherwise.]
6. (a) Let (Tji ) be a quantity depending on a coordinate system around the
point p with the property that for every (1, 1)-tensor (Sij ) at p the scalar Tji Sij
is independent of the coordinate system. Prove that (Tji ) represents a (1, 1)-
tensor at p.
(b) Let (Tji ) be a quantity depending on a coordinate system around the point
p with the property that for every (0, 1)-tensor (Si ) at p the quantity Tji Si , de-
86 CHAPTER 1. MANIFOLDS

pending on the index j, represents a (0, 1)-tensor at p. Prove that (Tji ) represents
a (1, 1)-tensor at p.
ij···
(c) State a general rule for quantities (Tkl··· ) which contains both of the rules
(a) and (b) as special cases. [You need not prove this general rule: the proof is
the same, just some more indices.]
7. Let p = p(t) be a C2 curve given by xi = xi (t) in a coordinate system (xi ). Is
the “acceleration“ (ẍi (t)) = (d2 xi (t)/dt2 ) a vector at p(t)? Prove your answer.
8. Let (Si ) and (T i ) be the components in a coordinate system (xi ) of C ∞ tensor
fields of type (0, 1) and (1, 0) respectively. Which of the following quantities are
tensor fields? Prove your answer and indicate the type of those which are tensor
fields.
(a) ∂(Si T i )/∂xk (b) ∂(Si T j )/∂xk
(c)(∂Si /∂x ) − (∂Sj /∂xi )
j
(d)(∂T i /∂xj ) − (∂T j /∂xi ).
9. Prove that the operations on tensors defined in the text do produce tensors
in the following cases.
a) Addition and scalar multiplication. If Tij and Sij are tensors, so are Sij + Tij
and cSij (c any scalar).
b) Symmetry operations. If Tij is a tensor, then the equation Sij = Tji de-
fines another tensor S. Prove that this tensor may also defined by the formula
S(v, w) = T (w, v).
c) Tensor product. If Ti and S j are tensors, so is Ti S j .
10. a) In the equation Aji dxi ⊗(∂/∂xj ) = Ãji dx̃i ⊗(∂/∂ x̃j ) transform the left side
using the transformation rules for dxi and ∂/∂xj to find the transformation rule
for Aji (thus verifying that the transformation rule is “built into the notation“
as asserted in the text).
b) Find the tensor field ydx ⊗ (∂/∂y) − xdy ⊗ (∂/∂x) on R2 in polar coordinates
(r, φ).
11. Let (x1 , x2 ) = (r, θ) be polar coordinates on R2 . Let T be the tensor with
components T11 = tan θ, T12 = 0, T21 = 1 + r, T22 = er in these coordinates.
Find the components T ij when both indices are raised using the Euclidean
metric on R2 .
12. Let V ⊗ W be the tensor product of two finite–dimensional vector spaces as
defined in Appendix 2.
a) Let {ei }, {fj } be bases for V and W , respectively. Let {ẽr } and {f˜s } be two
other bases and write ei = ari ẽr, fj = bsj f˜s . Suppose T ∈ V ⊗W has components
T ij with respect to the basis {ei ⊗ fj } and T̃ rs with respect to {ẽr ⊗ f˜s }. Show
that T̃ rs = T ij ari bsj . Generalize this transformation rule to the tensor product
V ⊗ W ⊗ · · · of any finite number of vector spaces. (You need not prove the
generalization in detail. Just indicate the modification in the argument.)
b) Let M be a manifold, p ∈ M a point. Show that the space of tensors of type
(r, s) at p, as defined in 1.6.1 is the same as the tensor product of r copies is of
Tp M and s copies is of Tp∗ M , as defined in Appendix 2.
Chapter 2

Connections and curvature

2.1 Connections

For a scalar-valued, differentiable function f on a manifold M defined in neigh-


borhood of a point po ∈ M we can define the derivative Dv f = dfp (v) of f along
a tangent vector v ∈ Tp M by the formula
d 1 
Dv f = f (p(t))|t=0 = lim f (p(t))− f (po )
dt t→0 t

where p(t) is any differentiable curve on M with p(0) = po and ṗ(0) = v. One
would naturally like to define a directional derivative of an arbitrary differen-
tiable tensor field F on M , but this cannot be done in the same way, because
one cannot subtract tensors at different points on M . In fact, on an arbitrary
manifold this cannot be done at all (in a reasonable way) unless one adds an-
other piece of structure to the manifold, called a “covariant derivative“. This
we now do, starting with a covariant derivative of vector fields, which will later
be used to define a covariant derivative of arbitrary tensor fields.
2.1.1 Definition. A covariant derivative on M is an operation which produces
for every “input“ consisting of (1) a tangent vector v ∈ Tp M at some point p
and (2) a C ∞ vector field X defined in a neighbourhood of p a vector in Tp M ,
denoted ∇v X. This operation is subject to the following axioms:
CD1. ∇(u+v) X = (∇u X)+ (∇v X)
CD2. ∇av X = a∇v X
CD3. ∇v (X+ Y ) = (∇v X)+ (∇v Y )
CD4. ∇v (f X) = (Dv f )X+ f (p)(∇v X)
for all u,v ∈ Tp M , all C ∞ vector fields X,Y defined around p, all C ∞ functions
f defined in a neighbourhood of p, and all scalars a∈ R.
If X,Y are two vector fields, we define another vector field ∇X Y by (∇X Y )p =
∇Xp Y for all points p where both X and Y are defined. As final axiom we
require:

87
88 CHAPTER 2. CONNECTIONS AND CURVATURE

CD5. If X and Y are C ∞ , so is ∇X Y

2.1.2 Example. Let M = Rn . A vector field X on Rn can be viewed as


an n–tuple of functions, its components with respect to the standard Cartesian
∂ n ∂
coordinates on Rn : X = X 1 ∂x 1 +· · ·+X ∂xn . Take for ∇v X the componentwise

directional derivative Dv X, i.e.

∂ ∂
Dv X = (Dv X 1 ) + · · · + (Dv X n ) n . (1)
∂x1 ∂x
This means that
1 
Dv X(po ) = lim X(p(t)) − X(po ) (2)
t→0 t

where p(t) is any differentiable curve on M with p(0) = p and ṗ(0) = v. This
defines a covariant derivative on Rn .

Now let S be a submanifold of the Euclidean space E = Rn with its positive


definite inner product. For any p ∈ S the tangent space Tp S is a subspace of
Tp E = Rn . A vector field X on S is an E-valued function on S whose value Xp
at p ∈ S lies in the subspace Tp S of E. Define a covariant derivative on S by
the formula
∇Sv X = component of Dv X tangential to S. (4)

D is the covariant derivative on E defined in example 2.1.2. This definition


means that
Dv X = ∇Sv X + a vector orthogonal to Tp S. (5)

Note that for v ∈ Tp S the covariant derivative Dv may be defined by equation


(2), even though X is only defined on S: one chooses for p(t) a curve which
lies in S. In fact, to define ∇v X at po it evidently suffices to know X along
a curve p(t) with p(0) = po and ṗ(0) = p. (We shall see later that is true
for any covariant derivative on a manifold.) ∇S is a covariant derivative on
the submanifold S of Rn , called the covariant derivative induced by the inner
product on Rn . (The inner product is needed for the orthogonal splitting.)

2.1.3 Example. Let S be the sphere S = {p ∈ R3 | kpk = 1} in E = R3 . Let


x, y, z be the Cartesian coordinates on R3 . Define coordinates φ, θ on S by (θ, φ)
defined by
x = cos θ sin φ, y = sin θ sin φ, z = cos φ

As vectors on R3 tangential to S the coordinate vector fields are

∂ ∂ ∂ ∂
= cos θ cos φ + sin θ cos φ − sin φ
∂φ ∂x ∂y ∂z
∂ ∂ ∂
= − sin θ sin φ + cos θ sin φ .
∂θ ∂x ∂y
2.1. CONNECTIONS 89

Dv X is the componentwise derivative in the coordinate system x, y, z,

∂ ∂ ∂
if X = X x + Xy + Xz
∂x ∂y ∂z
∂ ∂ ∂
then Dv X = (Dv X x ) + (Dv X y ) + (Dv X z )
∂x ∂y ∂z

where Dv f = df (v) is the directional derivative of a function f as above. For


∂ ∂ D ∂
example, if we take X = ∂φ as well as v = ∂φ and write ∂φ for Dv when v = ∂φ
then we find
D ∂ ∂ ∂ ∂
= − sin φ cos θ − sin φ sin θ − cos φ
∂φ ∂φ ∂x ∂y ∂z
∂ ∂ ∂
= −x −y −z .
∂x ∂y ∂z
∂ ∂ ∂
This vector is obviously orthogonal to Tp S = {a ∂x + b ∂y + c ∂z | ax + by+
S
cz = 0}. Therefore its tangential component is zero: ∇ ∂
∂φ ∂φ ≡ 0. Note that we
can calculate DY X even if X is only defined on S 2 , provided Y is tangential to
D ∂
S 2 , e.g. in the calculation of ∂φ ∂φ above.

From now on we assume that M has a covariant derivative ∇. Let x1 , · · · xn


be a coordinate system on M . To simplify notation, write ∂j = ∂/∂xj , and
∇j X = ∇X/∂xj for the covariant derivative of X P along ∂j . On the coordinate
domain, vector fields X,Y can be written as X = j X j ∂j and Y = j Y j ∂j .
P
Calculate:
X
∇Y X = ∇Y j ∂j (X i ∂i ) [by CD1 and CD3]
ij
X
= Y j (∂j X i )∂i + Y j X i (∇j ∂i ) [by CD2 and CD4]
ij

.Since ∇j ∂i is itself a C ∞ vector field [CD5], it can be written as


X
∇j ∂i = Γkij ∂k (6)
k

for certain C ∞ functions Γkij on the coordinate domain. Omitting the summa-
tion signs we get

∇Y X = (Y j ∂j X k + Γkij Y j X i )∂k . (7)


One sees from this equation that one can calculate any covariant derivative as
soon as one knows the Γkij . In spite of their appearance, the Γkij are not the
components of a tensor field on M . The reason can be traced to the fact that
to calculate ∇Y X at a point p ∈ M it is not sufficient to know the vector Xp
at p.
90 CHAPTER 2. CONNECTIONS AND CURVATURE

As just remarked, the value of ∇Y X at p depends not only on the value of X


at p; it depends also on the partial derivatives at p of the components of X
relative to a coordinate system. On the other hand, it is not necessary to know
all of these partial derivatives to calculate ∇v X for a given v ∈ Tp M . In fact it
suffices to know the directional derivatives of these components alone the vector
v only. To see this, let p(t) be a differentiable curve on M . Using equation (7)
with Y replaced by the tangent vector ṗ(t) of p(t) one gets
 dxj dxj i 
∇ṗ X = ∂j X k + Γkij X ∂k
dt dt
 dX k dxj i 
= + Γkij X ∂k .
dt dt
For a given value of t, this expression depends only on the components X i of
k
X at p = p(t) and their derivatives dX dt along the tangent vector v = ṗ(t) of
the curve p(t). If one omits even the curve p = p(t) from the notation the last
equation reads

∇X = (dX k + ωjk )∂k , ωjk = Γkij dxj . (8)


This equation evidently gives the covariant derivative ∇v X as a linear function
of a tangent vector v with value in the tangent space. Its compontents dX k +
ωjk are linear functions of v, i.e. differential forms. The differential forms ωjk =
Γkij dxj are called the connection forms with respect to the coordinate frame
{∂k } and the Γkij connection coefficients.
2.1.4 Example. Let S be an m-dimensional submanifold of the Euclidean
space E with the induced covariant derivative ∇ = ∇S . Let X be a vector field
on S and p = p(t) be a curve. Then X may be also considered as a vector field
on E defined along S, hence as a function on S with values in E = Tp E. By
definition, the covariant derivative DX/dt in E is the derivative dX/dt of this
vector-valued function along p = p(t), the covariant derivative ∇X/dt on S its
cangential component. Thus
DX ∇X
= + a vector orthogonal to S
dt dt

n terms of a coordinate system (u1 , · · · , um ) on S, write X = X i ∂u i and then

DX ∂
= (dX k + ωjk ) k + a vector orthogonal to S
dt ∂u
In particular take X to be the ith coordinate vector field ∂/∂ui on S and p = p(t)
the jth coordinate line through a point with tangent vector ṗ = ∂/∂uj . Then
X = ∂/∂ui is the ith partial ∂p/∂ui of the E–valued function p = p(u1 , · · · , um )
and its covariant derivative DX/dt in E is
∂2p ∇ ∂
= + a vector orthogonal to S
∂uj ∂ui ∂uj ∂ui
X ∂
= Γkij + a vector orthogonal to S
∂uk
k
2.1. CONNECTIONS 91


Take the scalar product of both sides with ∂ul
in E to find that

∂2p ∂ ∂ ∂
j i
· l
= Γkij ·
∂u ∂u ∂u ∂u ∂ul
k


since ∂ul
is tangential to S. This may be written as

∂2p ∂
· = Γkij gkl
∂uj ∂ui ∂ul
where gkl is the coefficient of the induced Riemann metric ds2 = gij dui duj on
S. Solve for Γkij to find that

∂2p ∂p
Γkij = g lk · .
∂u ∂u ∂ul
i j

where g kl glj = δjk .


As specific case, take S to be the sphere x2 + y 2 + z 2 = 1 in E = R3 with
coordinates p = p(θ, φ) defined by x = cos θ sin φ , y = sin θ sin φ, z = cos φ. As
vector fields in E along S the coordinate vector fields are
∂p ∂ ∂
= − sin θ sin φ + cos θ sin φ .
∂θ ∂x ∂y
∂p ∂ ∂ ∂
= cos θ cos φ + sin θ cos φ − sin φ
∂φ ∂x ∂y ∂z
The componentwise derivatives in E are

∂2p ∂ ∂
= − cos θ sin φ − sin θ sin φ
∂θ2 ∂x ∂y
∂2p ∂ ∂ ∂
= cos θ sin φ − sin θ sin φ − cos φ
∂φ2 ∂x ∂y ∂z
∂2p ∂ ∂
= − sin θ cos φ + cos θ cos φ
∂θ∂φ ∂x ∂y
∂2p ∂
The only nonzero scalar products ∂uj ∂ui · ∂ul
are

∂ 2 p ∂p ∂ 2 p ∂p
· = − sin φ cos φ, · = sin φ cos φ
∂θ2 ∂φ ∂θ∂φ ∂θ

and the Riemann metric is ds2 = sin2 θ dθ2 + dφ2 . Since the matrix gij is
1 ∂2p ∂p
diagonal the equations for the Γs become Γkij = gkk ∂ui ∂uj · ∂uk and the only
nonzero Γs are
sin φ cos φ sin φ cos φ
Γφθθ = − , Γθφθ = Γθθφ = .
sin2 θ sin2 θ
2.1.5 Definition. Let p(t) be a curve in M . A vector field X(t) along p(t)
92 CHAPTER 2. CONNECTIONS AND CURVATURE

is a rule which associates to each value of t for which p(t) is defined a vector
X(t) ∈ Tp(t) M at p(t).
If the curve p(t) intersects itself, i.e. if p(t1 ) = p(t2 ) for some t1 6= t2 a vector
field X(t) along p(t) will generally attach two different vectors X(t1 ) and X(t2 )
to the point p(t1 ) = p(t2 ).
Relative to a coordinate system we can write
X  ∂ 
X(t) = X i (t) (9)
i
∂xi p(t)

at all points p(t) in the coordinate domain. When the curve is C ∞ the we say
X is C ∞ if the X i are C ∞ functions of t for all coordinate systems.
If p(t) is a C ∞ curve and X(t) a C ∞ vector field along p(t), we define
∇X
= ∇ṗ X.
dt
The right side makes sense in view of (8) and defines another vector field along
p(t).
In coordinates,
∇X X  dX k X dxj i 
= + Γkij X ∂k . (10)
dt dt ij
dt
k

2.1.6 Lemma (Chain Rule). Let X(t) be a vector field along a curve p(t). Let
X̃(t̃) and p̃(t̃) be obtained from X(t), p(t) by a change of parameter t = f (t̃),
i.e. p̃(t̃) = p(f (t̃)), X̃(t̃) = X(f (t̃)). T hen

∇X̃ dt ∇X
= .
dt̃ dt̃ dt
Proof. A change of parameter t = f (t̃) in a curve p(t) is understood to be a
diffeomorphism between the intervals of definition. Now calculate:

∇X̃ X  dX̃ k X dxj i  ∂


= + Γkij X̃
dt̃ dt̃ ij
dt̃ ∂xk
k
X dX k dt X dxj dt i  ∂
= + Γkij X
dt dt̃ ij
dt dt̃ ∂xk
k

dt ∇X
= .
dt̃ dt
2.1.7 Theorem. Let p(t) be a C ∞ curve on M , po = p(to ) a point on p(t).
Given any vector v ∈ Tpo M at po there is a unique C ∞ vector field X along
p(t) so that
∇X
≡ 0, X(to ) = v.
dt
2.1. CONNECTIONS 93

Proof. In coordinates, the conditions on X are

dX k X k dxj i
+ Γij X = 0, X k (to ) = v k .
dt ij
dt

From the theory of differential equations one knows that these equations do
indeed have a unique solution X 1 , · · · , X n where the X k are C ∞ functions of
t. This proves the theorem when the curve p(t) lies in a single coordinate do-
main. Otherwise one has to cover the curve with several overlapping coordinate
domains and apply this argument successively within each coordinate domain,
taking as initial vector X(tj ) for the j-th coordinate domain the vector at the
point p(tj ) obtained from the previous coordinate domain.
Remark. Actually the existence theorem for systems of differential equations
only guarantees the existence of a solution X k (t) for t in some interval around
0. This should be kept in mind, but we shall not bother to state it explicitly.
This caveat applies whenever we deal with solutions of differential equations.
2.1.8 Supplement to the theorem. T he vector field X along a curve C : p =
p(t) satisfying
∇X
dt ≡ 0, X(to ) = v.

is independent of the parametrization in the following sense.


Let p̃(t̃) be a curve obtained from p(t) by a change of parameter t = f (t̃), i.e.
p(t) = p̃(t̃) if t = f (t̃). For po = p(to ) = p̃(t̃o ) and v ∈ Tpo M , let X and X̃
be the vector fields along p(t) and p̃(t̃) satisfying

∇X ∇X̃
≡ 0, ≡ 0,
dt dt̃
X(to ) = v, X̃(t̃o ) = v.
T hen
X(t) = X̃(t̃) if t = f (t̃).
Proof. Consider X(t) as a function of t̃ by substituting t = f (t̃). As a function
of t̃, X is then a vector field along p̃(t). It suffices to show that ∇X/dt̃ = 0,
which is clear form the Chain Rule above.
2.1.9 Definitions. Let C : p = p(t) be a C ∞ curve on M .
(a) A vector C ∞ field X(t) along c satisfying ∇X/dt ≡ 0 is said to be
parallel along p(t).
(b) Let po = p(to ) and p1 = p(t1 ) two points on C. For each vector
vo ∈ Tpo M define a vector v1 ∈ Tp1 M by w = X(t1 ) where X is the unique
vector field along p(t) satisfying ∇X/dt ≡ 0, X(to ) = vo . The vector v1 ∈ Tp1 M
is called parallel transport of vo ∈ Tpo M from po to p1 along p(t). It is denoted

v1 = T (to → t1 )vo . (12)


94 CHAPTER 2. CONNECTIONS AND CURVATURE

(c) The map T (to → t1 ) : Tpo M → Tp1 M is called parallel transport from
po to p1 along p(t).
It is important to remember that T (to → t1 ) depends in general on the curve
p(t) from po to p1 , not just on the endpoints po and p1 . This is important and
to bring this out we may write C : p = p(t) for the curve and TC (to → t1 ) for
the parallel transport along C.
2.1.10 Theorem. Let p(t) be a C ∞ curve on M , po = p(to ) and p1 = p(t1 )
two points on p(t). T he parallel transport along p(t) from po to p1 ,

T (to → t1 ) : Tpo M → Tp1 M

is an invertible linear transformation.


Proof. The linearity of T (to → t1 ) comes from the linearity of the covariant
derivative along p(t):

∇ ∇ ∇ ∇ ∇X
(X + Y ) = ( X) + ( Y ) , ( aX) = a( ) (a ∈ R).
dt dt dt dt dt
The map T (to → t1 ) is invertible: its inverse is in fact T (t1 → to ), as follows
from the definitions.
A covariant derivative on a manifold is also called an affine connection, be-
cause it leads to a notion of parallel transport along curves which “connects“
tangent vectors at different points, a bit like in the “affine space” Rn , where one
may transport tangent vectors from one point to another parallel translation
(see the example below). The fundamental difference is that parallel transport
for a general covariant derivative depends on a curve connecting the points,
while in an affine space it depends on the points only. While the terms “covari-
ant derivatives” and “connection” are logically equivalent, they carry different
connotations, which can be gleaned from the words themselves.
2.1.11 Example 2.1.2 (continued). Let M = Rn with the covariant derivative
defined earlier. We identify tangent vectors to M at any point with vectors in
Rn . Then parallel transport along any curve keeps the components of vectors
with respect to the standard coordinates constant, i.e. T (to → t1 ) : Tpo M →
Tp1 M is the identity mapping if we identify both Tpo M and Tp1 M with Rn as
just described.
2.1.12 Example 2.1.3 (continued). Let E = R3 with its positive definite
inner product, S = {p ∈ E | kpk = 1} the unit sphere about the origin. S is a
submanifold of E and for any p ∈ S, Tp S is identified with the subspace {v ∈ E
| v · p = 0} of E orthogonal to p. S has a covariant derivative ∇S induced by
the covariant derivative ∇E in E: for v ∈ Tp S and X a vector field on S,

∇Sp X = component of ∇E v X tangential to S.


 
= ∇E
v X − p · (∇ E
v X) p.
2.1. CONNECTIONS 95

If X(t) is a vector field along a curve p(t) on S, then its covariant derivative on
S is
∇X DX  dX 
= − p(t) · ( ) p(t)
dt dt dt
where DX/dt is the covariant derivative on R3 , i.e. the componentwise deriva-
tive. We shall explicitly determine the parallel transport along great circles on
S. Let po ∈ S be a point of S, e ∈ Tp S a unit vector. Thus

e · po = 0, e · e = 1.

The great circle through po in the direction e is the curve p(t) : on S given by

p(t) = (cos t)po + (sin t)e.

A vector field X(t) along p(t) is parallel if


dX  dX 
− p(t) · ( ) p(t) ≡ 0.
dt dt
Take in particular for X the vector field E(t) = ṗ(t) along p(t). This means
that
E(t) = (− sin t)po + (cos t)e .
E ..
Then ∇ E/dt = p(t) = −p(t), from which it follows that
∇E
≡ 0.
dt
Let f ∈ Tp S be one of the two unit vectors orthogonal to e:R

f · po = 0, f · e = 0, f · f = 1.
Then f · p(t) = 0 for any t, so f ∈ Tp(t) S for any t. Define a second vector field
F along p(t) by setting
F (t) = f
for all t. Then again
∇F
≡ 0.
dt
Thus the two vector fields E, F , are parallel along the great circle p(t) and form
an orthonormal basis of the tangent space to S at any point of p(t).
Any tangent vector v ∈ Tpo S can be uniquely written as

v = ae + bf

with a, b ∈ R. Let X(t) be the vector field along p(t) defined by

X(t) = aE(t) + bF (t)

for all t. Then ∇X/dt ≡ 0. Thus the parallel transport along p(t) is given by

T (0 → t) : Tpo S → Tp(t) S, ae + bf → aE(t) + bF (t).


96 CHAPTER 2. CONNECTIONS AND CURVATURE

In other words,
parallel transport along p(t) leaves the components of vectors with respect to the
bases E, F unchanged.
This may also be described geometrically like this:
w = T (0 →t)v has the same length as v and makes the same angle with the
tangent vector ṗ(t) at p(t) as v makes with the tangent vector ṗ(0) at p(0).
(This follows from the fact that E, F are orthonormal and E(t) = ṗ(t).)

0
0 0 0

Fig. 1. Parallel transport along a great circle on S 2

2.1.13 Theorem. Let p = p(t) be a C ∞ curve in M , po = p(to ) a point on p(t),


v1 , · · · , vn a basis for the tangent space Tpo M at po . T here are unique parallel
vector fields E1 (t), · · · , En (t) along p(t) so that

E1 (to ) = v1 , · · · , En (to ) = vn .
For any value of t, the vectors E1 (t), · · · , En (t) form a basis for Tp(t) M .
Proof. This follows from theorems 2.1.7, 2.1.10, since an invertible linear trans-
formation maps a basis to a basis.
2.1.14 Definition. Let p = p(t) be a C ∞ curve in M . A parallel frame along
p(t) is an n–tuple of parallel vector fields E1 (t), · · · , En (t) along p(t) which form
a basis for Tp(t) M for any value of t.
2.1.15 Theorem. Let p = p(t) be a C ∞ curve in M , E1 (t), · · · , En (t) a parallel
frame along p(t).
(a) Let X(t) = ξ i (t) Ei (t) be a C ∞ vector field along p(t). T hen

∇X dξ i
= Ei (t).
dt dt
(b) Let v =a i Ei (to ) ∈ Tp(to ) M a vector at p(to ). T hen for any value of t,

T (to → t)v = a i Ei (t).

Proof. (a) Since ∇Ei /dt = 0, by hypothesis, we find

∇X ∇(ξ i Ei ) dξ i ∇Ei dξ i
= = Ei + ξ i = Ei + 0
dt dt dt dt dt
2.1. CONNECTIONS 97

as required.
(b) Apply T (to → t) to v = ai Ei (to ) we get by linearity,

T (to → t)v = ai T (to → t)Ei (to ) = ai Ei (t)

as required.
Remarks. This theorem says that with respect to a parallel frame along p(t),
(a) the covariant derivative along p(t) equals the componentwise derivative,
(b) the parallel transport along p(t) equals the transport by constant compo-
nents.
We now turn to covariant differentiation of arbitrary tensor fields.
2.1.16 Theorem. T here is a unique operation which produces for every “input“
consisting of (1) a tangent vector v ∈ Tp M at some point p and (2) a C ∞
tensor field defined in a neighbourhood of p a tensor ∇v T at p of the same type,
denoted ∇v T , subject to the following conditions.
0. If X is a vector field, the ∇v X is its covariant derivative with respect to the
given covariant-derivative operation on M .
1. ∇(u+v) T = (∇u T )+ (∇v T )
2. ∇av T =a(∇v T )
3. ∇v (T + S) = (∇v T )+ (∇v S)
4. ∇v (S · T ) = (∇v S) · T (p)+ S(p) · (∇v T )
for all p ∈ M,all u, v ∈ Tp M , all a∈ R, and all tensor fields S, T defined in a
neighbourhood of p. T he products of tensors like S · T are tensor products S ⊗ T
contracted with respect to any collection of indices (possibly not contracted at
all ).
Remark. If X is a vector field and T a tensor field, then ∇X T is a tensor field
of the same type as T . If X = X k ∂k , then ∇X T = X k ∇k T , by rules (1.) and
(i)
(2.). So one only needs to know the components (∇k T )(j) of ∇k T . The explicit
formula is easily worked out, but will not be needed.
We shall indicate the proof after looking at some examples.
2.1.17 Examples.
(a) Covariant derivative of a scalar function. Let f be a C ∞ covariant derivative
of a covector scalar function on M . For every vector field X we have

∇v (f X) = (∇v f )X(p) + f (p)(∇v X)

by (4). On the other hand since ∇v f X is the given covariant derivative, by (0.),
we have
∇v (f X) = (Dv f )X(p) + f (p)(∇v X)
where
Dv f = dfp (v)
by the axiom CD4. Thus
∇v f = Dv f.
98 CHAPTER 2. CONNECTIONS AND CURVATURE

(b) Covariant derivative of a covector field Let F = Fi dxi be a C ∞ covector


field (differential 1-form). Let X = X i ∂i be any vector field. Take F ·X = Fi X i .
By 4.
∇j (F · X) = (∇j F ) · X + F · (∇j X)
In terms of components:

∇j (F i Xi ) = (∇j F )i X i + Fi (∇j X)i (1)

By part (a), the left side is:

∂j (Fi X i ) = (∂j Fi )X i + Fi (∂j X i ) (2)

Substitute the right side of (1) for the left side of (2):

(∇j F )i Xi + Fi (∇j X)i = (∂j Fi )Xi + Fi (∂j X i ). (3)

Rearrange terms in (3):

[(∇j F )i − ∂j Fi ]X i = Fi [−(∇j X)i + ∂j X i ]. (4)

Since
∇j X = (∂j X i )∂i + X i (∇j ∂i ) = (∂j X k )∂k + X i Γkij ∂k
we get
(∇j X)k = ∂j X k + X i Γkij (5)
Substitute (5) into (4) after changing an index i to k in (4):

[(∇j F )i − ∂j Fi ]X i = Fk [−(∇j X)k + ∂j X k ] = Fk [−X i Γkij ]

Since X is arbitrary, we may compare coefficients of X i :

(∇j F )i − ∂j Fi = −Fk Γkij .

We arrive at the formula:

(∇j F )i = ∂j Fi − Fk Γkij .

which is also written as Fi,j = ∂j Fi − Fk Γkij .


In particular, for the coordinate differential F = dxa , i.e. Fi = δia we find

∇j (dxb ) = −Γbij dxi

which should be compared with


∂ ∂
∇j ( ) = Γkaj k .
∂xa ∂x
The last two formulas together with the product rule 4. evidently suffice to
a···
compute ∇v T for any tensor field T = Tb··· dxb ⊗ ∂x∂ a ⊗ · · · . This proves the
uniqueness part of the theorem. To prove the existence part one has to show
2.1. CONNECTIONS 99

that if one defines ∇v T in this unique way, then 1.–4. hold, but we shall omit
this verification.

EXERCISES 2.1
1. (a) Verify that the covariant derivatives defined in example 2.1.2 above do
indeed satisfy the axioms CD1 - CD5.
(b) Do the same for example 2.1.3.
2. Supply all details in the proof of Theorem 2.1.10: Let T = TC (to → t1 ) :
Tp(to ) M → Tp(t1 ) M . Prove:
(a) T is linear, i.e.

T (u + v) = T (u) + T (v), T (av) = aT (v)

for all u, v ∈ Tp(to ) M, a ∈ R.


(b) The map TC (to → t1 ) is invertible with inverse TC (t1 → to ).
3. Verify the assertions concerning parallel transport in Rn in example 2.1.11.
4. Prove theorem 2.1.16.
5. Let M be the 3-dimensional vector space R3 with an indefinite inner product

(v, v) = −x2 − y 2 + z 2 if v = xe1 + ye2 + ze3 .

and corresponding metric


−dx2 − dy 2 + dz 2 .
Let S be the pseudo-sphere

S = {p ∈ R3 | − x2 − y 2 + z 2 = 1}.

(a) Let p ∈ S. Show that every vector v ∈ R3 can be uniquely written as a sum
of a vector tangential to S at p an a vector orthogonal to S at p:

v = a + b, a ∈ Tp S, (b, w)q = 0 for all w ∈ Tp S.

[Suggestion: Let e, f be an orthonormal basis e, f for Tp S:

(e, e) = −1, (f, f ) = −1, (e, f ) = 0.

Then any a∈ Tp S can be written as a= αe+βf with α, β ∈ R. Try to determine


α, β so that b = v − a is orthogonal to e, f .]
(b) Define a covariant derivative on S as in the positive definite case:

∇Sv X = component of ∇v X tangential to Tp S.

Let po be a point of S, e ∈ Tpo S a unit vector (i.e. (e, e) = −1). Let p(t) be the
curve on S defined by

p(t) = (cosh t)p + (sinh t)e.


100 CHAPTER 2. CONNECTIONS AND CURVATURE

Find two vector fields E(t), F (t) along p(t) so that parallel transport along p(t)
is given by

T (0 → t): Tpo S → Tp(t) S, aE(0) + bF (0) → aE(t) + bF (t).

6. Let S be the surface (torus) in Euclidean 3–space R3 with the equation


p
( x2 + y 2 − a)2 + z 2 = b2 a > b > 0.

Let (θ, φ) be the coordinates on S defined by

x = (a + b cos φ) cos θ, y = (a + b cos φ) sin θ, z = b sin φ.

Let C: p = p(t) be the curve with parametric equations φ = t, θ = 0 (constant).


a) Show that the vector fields E = k∂p/∂φk−1 ∂p/∂φ and F = k∂p/∂θk−1 ∂p/∂θ
form a parallel orthonormal frame along C.
b) Consider the parallel transport along C from po = (a+b, 0, 0) to p1 = (a, 0, b).
Find the parallel transport v1 = (α1 , β1 , γ1 ) of a vector vo = (αo , βo , γo ) ∈ R3
tangent to S at po .
7. Let (θ, r) be polar coordinates on R2 :

x = r cos θ, y = r sin θ.

Calculate all Γkij for the covariant derivative on R2 defined in example 2.1.2.
(Use r, θ as indices instead of 1, 2.)
8. Let (ρ, θ, φ) be spherical coordinates on R3 :

x = ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ.

Calculate Γkφρ , k = ρ, φ, θ, for the covariant derivative on R3 defined in example


2.1.2. (Use ρ, φ, θ as indices instead of 1, 2, 3.)
9. Define coordinates (u, v, z) on R3 by the formula

1 2
x= (u − v 2 ), y = uv, z = z.
2
Calculate Γkuz , kj = u, v, z, for the covariant derivative on R3 defined in example
2.1.2. (Use u, v, z as indices instead of 1, 2, 3.)
10. Let S = {p = (x, y, z) ∈ R3 | z = x2 + y 2 } and let ∇ = ∇S be the induced
covariant derivative. Define a coordinate system (r, θ) on S by x = r cos θ,
y = rsinθ, z = r2 . Calculate ∇r ∂r and ∇θ ∂θ .
11. Let S be a surface of revolution, S = {p = (x, y, z) ∈ R3 | r = f (z)}
where r2 = x2 + y 2 and f a positive differentiable function. Let ∇ = ∇S be the
induced covariant derivative (Example 11.3). Define a coordinate system (z, θ)
on S by x = f (z) cos θ, y = f (z)sinθ, z = z.
2.1. CONNECTIONS 101

a) Show that the coordinate vector fields ∂z = ∂p/∂z and ∂θ = ∂p/∂θ are
orthogonal at each point of S.
b) Let C: p = p(t) be a meridian on S, i.e. a curve with parametric equations
z = t, θ = θo . Show that the vector fields E = k∂θ k−1 ∂θ and F = k∂z k−1 ∂z
form a parallel frame along C.
[Suggestion. Show first that E is parallel along C. To prove that F is also
parallel along C differentiate the equations (F, F ) = 1 and (F, E) = 0 with
respect to t.]
12. Let S be a surface in R3 with its induced connection ∇. Let C: p = p(t)
be a curve on S.
a) Let E(t), F (t) be two vector fields along C. Show that

d  ∇E   ∇F 
(E, F ) = , F + E,
dt dt dt
b) Show that parallel transport along C preserves the scalar products vectors,
i.e.
(TC (to → t1 )u, TC (to → t1 )v) = (u, v)
for all u, v ∈ Tp(to ) S.
c) Show that there exist orthonormal parallel frames E1 , E2 along C.
13. Let T be a tensor-field of type (1, 1). Find a derive a formula for (∇k T )ij .

[Suggestion: Write T = Tba ∂x a ⊗ dx
b
and use the appropriate rules 0.– 4.]
102 CHAPTER 2. CONNECTIONS AND CURVATURE

2.2 Geodesics
2.2.1 Definition. Let M be a manifold with a connection ∇. A C ∞ curve
p = p(t) on M is said to be a geodesic (of the connection) ∇ if

∇ṗ
= 0. (1)
dt
i.e. the tangent vector field ṗ(t) is parallel along p(t). We write the equation
(1) also as
∇2 p
= 0.
dt2
In terms of the coordinates (x1 (t), · · · , xn (t)) of p = p(t) relative to a coordinate
system x1 , · · · , xn the equation (1) becomes

d2 xk X i
k dx dx
j
+ Γij = 0.
dt2 ij
dt dt

2.2.2 Example. Let M = Rn with its usual covariant derivatives. For p(t) =
(x1 (t), · · · , xn (t)) the equation

∇2 p
= 0.
dt2
says that ẍi (t) = 0 for all t, so xi (t) = ai + tbi and x(t) = a + tb is a straight
line.
2.2.3 Example. Let S = {p ∈ E : kpk = 1} be the unit sphere in a Euclidean
3-space E = R3 . Let c be the great circle through p  S with direction vector
e:
p(t) = (cos t)p + (sin t)e
where
e · p = 0, e · e = 1.
We have shown in §5 that ∇2 p/dt2 = 0. Thus each great circle is a geodesic.
The same is true if we change the parametrization by some constant factor, say
α:
p(t) = (cos αt)p + (sin αt)e. (2)
It will follow from the theorem below that every geodesic is of this form
2.2.4 Theorem. Let M be a manifold with a connection ∇. Let p ∈ M be
a point in M , v ∈ Tp M a vector at p. Then there is a unique geodesic p(t),
defined on some interval around t= 0, so that

p(0) = p, ṗ(0) = v.

Proof. In terms of a coordinate system around p, the condition on p is:


2.2. GEODESICS 103

d2 xk X dxi dxj
2
+ Γkij =0 (3)
dt ij
dt dt

dxk (0)
xk (0) = x0 , = vk . (4)
dt
The system of differential equations (3) has a unique solution subject to the
initial condition (4).
2.2.5 Definition. Fix po ∈ M . A coordinate system (x̃i ) around po is said to
be a local inertial frame (or geodesic coordinate system) at po if all Γ̃kij vanish
at po , i.e.
 ∇ ∂ 
= 0 for all ij.
∂ x̃j ∂ x̃i po
2.2.6 Theorem. There exists a local inertial frame (x̃i ) around po if and only
if the Γkij with respect to an arbitrary coordinate system (xi ) satisfy

Γkij = Γkji at po (all ijk).

i.e
∇ ∂ ∇ ∂
= at po (all ij).
∂xi ∂xj ∂xj ∂xi
Proof. We shall need the transformation formula for the Γkij :

∂ x̃t  ∂ 2 xk ∂xi ∂xj k 


Γ̃trs = + Γ (5)
∂xk ∂ x̃r ∂ x̃s ∂ x̃r ∂ x̃s ij
(See exercise 3)
(⇒) Assume there exists a local inertial frame (xi ) around po . Then Γkij = 0
at po for all ijk. Then the Γ̃trs with respect to an arbitrary coordinate system
(x̃i ) are given by
∂ x̃t ∂ 2 xk
Γ̃trs = at po .
∂xk ∂ x̃r ∂ x̃s
This expression is obviously symmetric in rs.
(⇐) Assume for some coordinate system (xi )

Γkij = Γkji at po (all ijk).


k
Let γij be the value of Γkij = Γkji at po . Let (xio ) be the (xi )-coordinates of po .
Consider the equations
1 i r s
xi − xio = x̃i − γrs x̃ x̃ (6)
2
which define xi =fi (x̃1 , · · · , x̃n ). We have

∂xi 1 i s
j
= δji − (γjs i r
x̃ + γrj x̃ )
∂ x̃ 2
104 CHAPTER 2. CONNECTIONS AND CURVATURE

hence at the point (x̃i ) = (0),


 ∂xi 
= δji .
∂ x̃j (0)

By the inverse function theorem, the equations (6) define coordinates (x̃i )
around po with x̃i (po ) = 0. By (5), the Γ̃trs at po with respect to (x̃i ) are

Γ̃trs = δkt (−γrs


k
+ δri δsj γij
k t
) = −γrs t
+ γrs = 0.

2.2.7 Remark. A local inertial frame (x̃i ) at po is not unique. It is unique only
up to a coordinate transformation of the form

xi − xio = aij (x̃j − x̃jo ) + o(2)

where o(2) denotes some function of the x̃i which vanishes to second order at
(x̃io ), i.e. whose partials of order ≤ 1 all vanish at (x̃io ). (aij ) can be any matrix
with det(aij ) 6= 0 and (xio ) any n–tuple of real numbers.
2.2.8 Definition. The covariant derivative ∇ is said to be symmetric if the
condition of the theorem holds for all point po , i.e. if Γkij ≡ Γkji for some (hence
every) coordinate system (xi ).
2.2.9 Lemma. Fix po ∈ M . For v ∈ Tpo M , let pv (t) be the unique geodesic with
pv (0) = po , p0v (0) = v. Then pv (at) = pav (t) for all a, t ∈ R.
Proof. Calculate:
d   dp 
v
pv (at) = a (at)
dt dt
∇ d   ∇ dp 
v
pv (at) = a2 (at) = 0
dt dt dt dt

d dpv
= a2

pv (at) (at) = av
dt t=0 dt t=0
Hence pv (at) = pav (t).
2.2.10 Corollary. Fix p ∈ M . There is a unique map Tp M → M which sends
the line tv in Tp M to the geodesic pv (t) with initial velocity v.
Proof. This map is given by v → pv (1): it follows from the lemma that
pv (t) = ptv (1), so this map does send the line tv to the geodesic pv (t).
Remark. This map Tp M → M is of class C ∞ . This follows from a theorem
on differential equation, which guarantees that the solution xk = xk (t) of (3)
depends in a C ∞ fashion on the initial conditions (4).
2.2.11 Definition. The map Tp M → M of the previous corollary is called the
geodesic spray or exponential map of the connection at p, denoted v → expp v.
2.2.12 Theorem. For any p ∈ M the geodesic spray expp : Tp M → M maps a
neighbourhood of 0 in Tp M one-to-one onto a neighbourhood of p in M .
2.2. GEODESICS 105

Proof. We apply the inverse function theorem. The differential of expp at 0 is


given by
d 
v → ( expp (tv) =v
dt t=0
hence is the identity on Tp M .
This theorem allows us to introduce a coordinate system (xi ) in a neighbourhood
of po by using the components of v ∈ Tpo M with respect to some fixed basis
e1 , · · · , en as the coordinates of the point p = exppo v. In more detail, this
means that the coordinates (xi ) are defined by the equation
p = exppo (x1 e1 + · · · + xn en ). (7)

Fig. 1. The geodesic spray on S 2

2.2.13 Theorem. Assume ∇ is symmetric. Fix po ∈ M , and a basis (e1 , · · · ,


en ) for Tpo M . Then the equation (7) defines a local inertial frame (xi ) at po .
2.2.14 Definition. Such a coordinate system (xi ) around po is called a normal
inertial frame at po (or a normal geodesic coordinate system at po ). It is unique
up to a linear change of coordinates x̃j =aji xi , corresponding to a change of
basis ẽi =aji ej .
Proof of the theorem. We have to show that all Γkij = 0 at po . Fix v = ξ i ei .
The (xi )-coordinates of p = exppo (tv) are

xi = tξ i .
Since these xi = xi (t) are the coordinates of the geodesic pv (t) we get
d2 xk X i
k dx dx
j
+ Γij = 0 (for all k).
dt2 ij
dt dt

This gives X
Γkij ξ i ξ j = 0.
ij

Here Γkij= Γkij (tξ 1 , · · · ,tξ n ).


Set t= 0 in this equation and then take the second-
2 i j
order partial ∂ /∂ξ ∂ξ to find that
Γkij + Γkji = 0 at po (for all ijk).
106 CHAPTER 2. CONNECTIONS AND CURVATURE

Since ∇ is symmetric, this gives

Γkij = 0 at po (for all ijk)

as required.

EXERCISES 2.2
1. Use theorem 2.2.4 to prove that every geodesic on the sphere S of example
2.2.3 is a great circle as in equation (2).
2. Referring to exercise 7 of §11, show that for each constant α the curve

c(t) = (cosh αt)p + (sinh αt)e.

is a geodesic on S.
3. Prove the transformation formula (5) for the Γkij from the definition of the
Γkij .
4. Suppose( x̃i ) is a local inertial frame at po . Let (xi ) be a coordinate system
so that
xi − xio = aij (x̃j − x̃jo ) + o(2)
as in remark 2.2.6. Prove that (xi ) is also a local inertial frame at po . (It can
be shown that any two local inertial frames at po are related by a coordinate
transformation of this type.)
5. Let (x̃i ) be a coordinate system on Rn so that Γ̃trs = 0 at all points of
Rn . Show that (x̃i ) is an affine coordinate system, i.e. (x̃i ) is related to the
Cartesian coordinate system (xi ) by an affine transformation:

x̃i = x̃io + aij (xj − xjo ).

6. Let S be the cylinder x2 + y 2 = 1 in R3 with the induced connection. Show


that for any constants a,b,c,d the curve p = p(t) with parametric equations

x = cos(at + b), y = sin(at + b), z = ct + d

is a geodesic and that every geodesic is of this form.


7. Let p = p(t) be a geodesic on some manifold M with a connection ∇. Let
p = p(t(s)) be the curve obtained by a change of parameter t = t(s). Show that
p(t(s)) is a geodesic if and only if t = as + b for some constants a, b.
8. Let p = p(t) be a geodesic on some manifold M with a connection ∇. Suppose
there is to 6= t1 so that p(to ) = p(t1 ) and ṗ(to ) = ṗ(t1 ) . Show that the geodesic
is periodic, i.e. there is a constant c so that p(t + c) = p(t) for all t.
9. Let S be a surface of revolution, S = {p = (x, y, z) ∈ R3 | r = f (z)} where
r2 = x2 + y 2 and f a positive differentiable function. Let ∇ = ∇S be the
induced covariant derivative (Example 2.5.3). Define a coordinate system (z, θ)
2.2. GEODESICS 107

on S by x = f (z) cos θ, y = f (z) sin θ, z = z. Let p = p(t) be a curve of the


form θ = θo parametrized by arclength, i.e. with parametric equations θ = θo ,
z = z(t) and (p0 (t), p0 (t)) ≡ 1. Show that p(t) is a geodesic. [Suggestion. Show
that the vector p00 (t) in R3 is orthogonal to both coordinate vector fields ∂z and
∂θ on S at p(t). Note that p0 (t) = (∂p/∂z)z 0 (t) is parallel to ∂z while ∂z and ∂θ
are orthogonal (§5 exercise 9). Differentiate the equations (p0 (t), p0 (t)) ≡ 1 and
(∂z , ∂θ ) ≡ 0 with respect to t.]
108 CHAPTER 2. CONNECTIONS AND CURVATURE

2.3 Riemann curvature


Throughout, M is a manifold with a covariant derivative ∇.
2.3. 1 Definition. A (parametrized ) surface in M is a C ∞ function R2 → M ,
(s, t) → p = p(s, t), defined (at least) on some rectangle {(s, t) : |s − so | ≤a,
|t − to | ≤b}.
Given a surface p = p(s, t), we define

∂p dp(s, t0 ) ∂p dp(so , t)
= , = .
∂s (so to ) ds s=so ∂t (so to ) dt t=to

∂p/∂s, ∂/∂t are the tangent vector fields along the surface p = p(s, t). The
covariant derivatives along these vector fields are denoted ∇/∂s, ∇/∂t.

0
0

Fig. 1. A parametrized surface

2.3. 2 Theorem. Let po be a point of M , u, v ∈ Tpo M two vectors at po . Let


p = p(s, t) be a surface so that p(so , t0 ) = po ,

∂p ∂p
= u, = v.
∂s (so t0 ) ∂t (so t0 )

The vector  ∇ ∂p ∇ ∂p 
T (u, v) = −
∂t ∂s ∂s ∂t (so ,t0 )
depends only on u and v. Relative to a coordinate system (xi ) around po this
vector is given by
 ∂ 
T (u, v) = Tijk ui v j
∂xk po
where Tijk = Γkij − Γkji is a tensor of type (1, 2), called the torsion tensor of the
covariant derivative ∇.
Proof. Write p = p(s, t) in the coordinates (xi ) as xi = xi (s,t). Calculate:
∂p ∂xi ∂
∂s = ∂s ∂xi 
∇ ∂p ∇ ∂xi ∂
∂t ∂s = ∂t ∂s ∂xi
∂ 2 xi ∂ ∂xi ∇ ∂
= ∂t∂s ∂xi + ∂s ∂t ∂xi
2.3. RIEMANN CURVATURE 109

∂ 2 xi ∂ ∂xi ∂xj ∇ ∂
=
∂t∂s ∂xi + ∂s ∂t ∂xj ∂xi Interchange
 s and t and subtract to get
∇ ∂p ∇ ∂p ∂xi ∂xj ∇ ∂ ∇ ∂
∂t ∂s − ∂s ∂t = ∂s ∂t j
∂x ∂xi − i
∂x ∂xj .
Substitute
∇ ∂
∂xj ∂xi = Γkij ∂x∂ k
to get  
∇ ∂p ∇ ∂p ∂xi ∂xj k k ∂
− ∂s
∂t ∂s ∂t = ∂s ∂t Γij − Γ ji ∂xk .
For s= so , t = to this
 becomes
   
∇ ∂p ∇ ∂p
∂t ∂s − ∂s ∂t = Γkij − Γkji ui v j ∂x∂ k
so to po
Read from left to right this equation shows first of all that the left side depends
only on u and v. Read from right to left the equation shows that the right side
is independent of the coordinates, hence defines a tensor Tijk = Γkij − Γkji .
2.3. 3 Theorem. Let po be a point of M , u, v, w ∈ Tpo M three vectors at po . Let
p = p(s, t) be a surface so that p(so , t0 ) = po ,

∂p ∂p
= u, = v.
∂s (so t0 ) ∂t (so t0 )

and X a C∞ vector field defined in a neighbourhood of po with X(po ) = w. The vector


 ∇ ∇X ∇ ∇X 
R(u, v)w = −
∂t ∂s ∂s ∂t (so ,t0 )

depends only on u, v, w. Relative to a coordinate system (xi ) around po this vector is given
by
i
R(u, v)w = Rmkl uk v l wm (∂i )po
where ∂i = ∂/∂xi and

i
Rmkl = (∂l Γimk − ∂k Γiml + Γpmk Γipl − Γpml Γipk ) (R)
i
R = (Rmkl ) is a tensor of type (1, 3), called the curvature tensor of the covariant derivative
∇.
Proof. Exercise.
2.3. 4 Lemma. For any vector field X(t) and any curve p = p(t)

∇X(t) 1 
= lim X(t + ) − T (t → t + )X(t) .
dt →0 

or equivalently

∇X(to ) 1  
= lim X(t) − T (to → t)X(to ) .
dt t→t0 t − to

Proof. Recall the formula for the covariant derivative in terms of a parallel
frame E1 (t), · · · , En (t) along c(t) (Theorem 11.15): if X(t) = ξ i (t)Ei (t), then
110 CHAPTER 2. CONNECTIONS AND CURVATURE
 
∇X dξ i
dt = = lim→0 1 ξ i (t + ) − ξ i (t) Ei (t)
dt Ei (t)
 
= lim→0 1 ξ i (t + ) − ξ i (t) Ei (t + )
 
= lim→0 1 ξ i (t + )Ei (t + ) − ξ i (t)Ei (t + )
 
= lim→0 1 X(t + ) − T (t → t + )X(t) ,
the last equality coming from the fact the components with respect to a parallel
frame remain under parallel transport.
We now give a geometric interpretation of R.
2.3. 5 Theorem. Let po be a point of M , u, v, w ∈ Tpo M three vectors at po .
Let p = p(s,t) be a surface so that p(so , t0 ) = po ,

∂p ∂p
= u, = v.
∂s (so t0 ) ∂t (so t0 )

and X a C ∞ defined in a neighbourhood of po with X(po ) = w. Define a linear


transformation T = T (so , t0 ; ∆s, ∆t) of Tpo M by

T w = parallel transport of w ∈ Tp o M along the boundary of the “rectangle“

{p(s, t) | so ≤ s ≤ so + ∆s, t0 ≤ t ≤ ∆t}.


Then
1
R(u, v)w = lim (w − T w) (1)
∆s,∆t→0 ∆s∆t

0
(s0 ,t)0
0 (s,t)
0

0
0

00 Tw
(s,t0)
0

(s0,t0) w

Fig. 2. Parallel transport around a parallelogram.

Proof. Set s = so + ∆s, t = to + ∆t. Decompose T into the parallel transport


along the four sides of the “rectangle“:

T w = T (so , t → to )T (s → so , t)T (s, to → t)T (so → s, to )w. (2)


2.3. RIEMANN CURVATURE 111

Imagine this expression substituted for T in the limit on the right-hand side of
(1). Factor out the first two terms of (2) from the parentheses in (1) to get
1
lim (w − T w) =
∆s,∆t→0 ∆s∆t
1 n o
= lim T (so , t → to )T (s → so , t) × (3)
∆s,∆t→0 ∆s∆t
n  o
× T (so → s, t)T (so , to → t)w − T (s, to → t)T (so → s, to ) w .

[Here we use T (s → so , t)T (so → s, t) = I (= identity) etc.] The expression in


the first braces in (3) approaches I. It may be omitted from the limit (3). To
calculate the limit of the expression in the second braces in (3) we write the
formula in the Lemma as
 ∇X  1  
= lim X(t) − T (to → t)X(to ) (4)
dt to ∆t → 0 ∆t
Choose a C ∞ vector field X(s, t) along the surface p(s, t) so that w = X(so , to ).
With (4) in mind we rewrite (3) by adding and subtracting some terms:
1 1
lim (w − T w) = lim (5)
∆s,∆t→0 ∆s∆t ∆s,∆t→0 ∆s∆t

T (so → s, t)[T (so , to → t)X(so , to ) − X(so , t)] + [(T (so → s, t)X(so , t) − X(s, t)]

−T (s, to → t)[T (so → s, to )X(so , to )−X(s, to )]−[T (s, to → t)X(s, to )−X(s, t)].
Multiply through by 1/∆s∆t and take the appropriate limits of the expression
in the brackets using (4) to get:
1
lim (w − T w)
∆s,∆t→0 ∆s∆t
  ∇X 
1 1  ∇X 
= limit of − T (so → s, t) −
∆s ∂t (so, t0 ) ∆t ∂s (so, t)

1  ∇X  1  ∇X 
+ T (s, to → t) +
∆t ∂s (so, to ) ∆s ∂t (s,to )

1 n ∇X   ∇X  o
= limit of − T (so → s, t)
∆s ∂t (s,to ) ∂t (so, t0 )
1 n  ∇X   ∇X  o
+ T (s, to → t) −
∆t ∂s (so, to ) ∂s (so, t)
 ∇ ∇X ∇ ∇X 
= −
∂s ∂t ∂t ∂s (so ,to )
= R(u, v)w.
112 CHAPTER 2. CONNECTIONS AND CURVATURE

2.3. 6 Theorem (Riemann). If R = 0, then locally around any point there is a


coordinate system (xi ) on M such that Γkij ≡ 0, i.e. with respect to this coordi-
nate system (xi ), the covariant derivative ∇ is the componentwise derivative:
X ∂  X ∂
∇Y Xi = (DY X i ) i .
∂xi i
∂x

T he coordinate system is unique up to an affine transformation: any other such


coordinate system (x̃j ) is of the form

x̃j = x̃jo + c ji xi

for some constants x̃jo , c ji with detc ji 6= 0.


(Proof omitted.)
2.3. 7 Remark. If R = 0 the covariant derivative ∇ is said to be flat. A
coordinate system (xi ) as in the theorem is called affine. With respect to such
an affine coordinate system (xi ) parallel transport Tpo M → Tp1 M along any
curve leaves unchanged the components with respect to (xi ). In particular, for
a flat connection parallel transport is independent of the path.

EXERCISES 2.3
1. Let ∇ be a covariant derivative on M , T its torsion tensor.
(a) Show that the equation

0 1
∇v X = ∇v X + T (v, X)
2
defines another covariant derivative 0 ∇ on M . [Verify at least CD4.]
(b) Show that 0 ∇ is symmetric i.e the torsion tensor 0 T of 0 ∇ is 0 T = 0.
(c) Show that ∇ and 0 ∇ have the same geodesics, i.e. a C ∞ curve p = p(t) is a
geodesic for ∇ if and only if it is a geodesic for 0 ∇.

2. Prove theorem 2.3.3. [Suggestion: use the proof of theorem 2.3.2 as pattern.]
3. Take for granted the existence of a tensor R satisfying
 ∇ ∇X ∇ ∇X   ∂ 
i
R(u, v)w = − , R(u, v)w = Rmkl uk v l w m ,
∂s ∂t ∂t ∂s (so ,to ) ∂xi po

as asserted in theorem 2.3.3. Prove that


i
(∇k ∇l − ∇l ∇k )∂m = −Rmkl ∂i .

[Do not use formula (R).]


4. Suppose parallel transport Tpo M → Tp1 M is independent of the path from
po to p1 (for all po , p1 ∈ M ). Prove that R = 0.
2.4. GAUSS CURVATURE 113

2.4 Gauss curvature

Let S be a smooth surface in Euclidean three-space E = R3 , i.e. a two-


dimensional submanifold. For p ∈ S, let N = N (p) be one of the two unit
normal vectors to Tp S. (It does not matter which one, but we do assume that
the choice is consistent, so that N (p) depends continuously on p ∈ S. This is
always possible locally, i.e. in sufficiently small regions of S, but not necessarily
globally.) Since N (p) is a unit vector, one can view p → N (p) as a map from
the surface S to the unit sphere S 2 . This is the Gauss map of the surface. To
visualize it, imagine you move on the surface holding a copy of S 2 which keeps
its orientation with respect to the surrounding space (no rotation, pure trans-
lation), equipped with a compass needle N (p) always pointing perpendicular of
the surface. The variation of the needle on the sphere reflects the curvature of
the surface, the unevenness in the terrain (but can only offer spiritual guidance
if you are lost on the surface.)
0

0
0

Fig.1. The Gauss map

2.4. 1 Lemma.
a) As subspace of R3 ,
Tp S = N (p)⊥ = TN (p) S 2
where N (p)⊥ = {u ∈ R3 = R3 | (u, N (p)) = 0} is the subspace of R3 orthogonal
to N (p).
b) If u ∈ Tp S, then
(N (p), dNp (u)) = 0.
c) For any u, v ∈ Tp S,

(dNp (u), v) = (u, dNp (v)).

Proof. (a) is clear from the definition of N .


(b) This just says that the differential of N : S → S 2 maps Tp S into TN (p) S 2 ,
which is clear.
114 CHAPTER 2. CONNECTIONS AND CURVATURE

(c) Let p = p(s, t) be any C ∞ map R2 → S; thus p = p(s, t) is a parametrized


surface in R3 , namely a parametrization of S. Since (∂p/∂s, N ) = 0 one gets
by differentiation that
 ∂2p   ∂p ∂N 
,N + , =0
∂t∂s ∂s ∂t
which gives
 ∂p  ∂p   ∂2p 
, dN =− ,N (1)
∂s ∂t ∂t∂s
Since the right side is symmetric in s, t, this shows that (c) holds for p = p(s, t),
u = ∂p/∂s, v = ∂p/∂t. Since any triple p, u, v with p ∈ S and u, v ∈ Tp S can
be realized in this way one gets (c).

Since Tp S = TN (p) S 2 as subspace of R3 we can consider dNp : Tp S → TN (p) S 2


also as a linear transformation of the tangent plane Tp S called the principal
operator of the surface. Part (c) shows that this linear transformation is self-
adjoint, i.e. the bilinear from (u, dNp (v)) on Tp S is symmetric. The negative
−(u, dNp (v)) of this bilinear from is called the second fundamental form of S,
a term also applied to the corresponding quadratic form −(dN (u), u). We shall
denote it Π. (The traditional symbol II looks strange when equipped with
indices.)
Π(u, v) = −(u, dNp (v)).
Incidentally, first fundamental form of S is just another name for inner product
(u, v) or the corresponding quadratic form (u, u), which is just the Riemann
metric ds2 on S induced by the Euclidean metric in R3 .
The connection ∇S = ∇ on S induced by the standard connection ∇E = D on
E is given by
∇Y X = tangential component of DY X,
∇Y X = DY X − (DY X, N )N.
We shall calculate the torsion and the curvature of this connection. For this
purpose, let p = p(s, t) be a parametrization of S, ∂p/∂s and ∂p/∂t the corre-
sponding tangent vector fields, and W = W (s, t) any other vector field on S.
We have  ∂p ∂p  ∇ ∂p ∇ ∂p
T , = − ,
∂s ∂t ∂t ∂s ∂s ∂t
 ∂p ∂p  ∇ ∇W ∇ ∇W
R , = − ,
∂s ∂t ∂t ∂s ∂s ∂t
The first equation gives
 ∂p ∂p   ∂ ∂p ∂ ∂p 
T , = tangential component of − = 0,
∂s ∂t ∂t ∂s ∂s ∂t
To work out the second equation, calculate:
∇ ∇W ∇  ∂W ∂W 
= −( , N )N
∂t ∂s ∂t ∂s ∂s
2.4. GAUSS CURVATURE 115

∂ 2 W  ∂ ∂W  ∂W ∂N
= tang. component of − ( , N) N − ( , N)
∂t∂s ∂t ∂s ∂s ∂t
The second term may be omitted because it is orthogonal to S. The last term
is already tangential to S, because ∂N/∂t is orthogonal to N , by part (b) of the
lemma. Thus
∇ ∇W  ∂2W  ∂W ∂N
= tang. comp. of −( , N)
∂t ∂s ∂t∂s ∂s ∂t
In this equation, interchange s an t and subtract to find
∂p ∂p ∂W ∂N ∂W ∂N
R( , )W = ( , N) −( , N) .
∂s ∂t ∂t ∂s ∂s ∂t
These formulas can be rewritten as follows. Since (W, N ) = 0 one finds by
differentiation that
∂W ∂N
( , N ) + (W, ) = 0.
∂t ∂t
and similarly with t replaced by s. Hence
∂p ∂p ∂W ∂N ∂N ∂N
R( , )W = (N, ) − (N, ) .
∂s ∂t ∂s ∂t ∂t ∂s
If we consider p → N (p) as a function on S with values in S 2 ⊂ R3 , and write
u, v, w for three tangent vectors at p on S, this equation becomes

R(u, v)w = (w, dN (u))dN (v) − (w, dN (v))dN (u).

Hence

(R(u, v)w, z) = (w, dN (u))(dN (v), z) − (w, dN (v))(dN (u), z). (3)

At a given point p ∈ S the differential dN of N : S → S 2 is a linear trans-


formation dN : Tp S → TN (p) S 2 between 2-dimensional spaces, both of which
may be identified with the subspace N (p)⊥ of R3 orthogonal to N (p). Use an
orthonormal basis for N (p)⊥ to write its elements v as column-two-vectors [v],
dN as a 2×2 matrix [dN ], and scalar products (v, w) as matrix products [v]∗ [w]
with the transpose [v]∗ . Then (3) can be written
 as
(w, dN (u) (w, dN (v)
(R(u, v)w, z) = det
(z, dN (u)) (z, dN (v))
= det{[w, z]∗ [dN (u), dN (v)]}
= det{[w, z]∗ [dN ][u, v]}
= det{[w, z]∗ det[dN ] det[u, v]}
= det[dN ] det{[w,
 z]∗ [u, v]}
(w, u) (w, v)
= det N det
(z, u) (z, v)
Thus  
(w, u) (w, v)
(R(u, v)w, z) = K det (5)
(z, u) (z, v)
116 CHAPTER 2. CONNECTIONS AND CURVATURE

where

K = det dN. (6)

This scalar function K = K(p) is called the Gauss curvature of S. The equation
(5) shows that for a surface S the curvature tensor R is essentially determined
by K. In connection with this formula it is important to remember that N
is to be considered as a map N : S → S 2 and dN as a linear transformation
dN : Tp S → TN (p) S 2 between 2-dimensional spaces, both of which may be
identified with the subspace N (p)⊥ of R3 orthogonal to N (p).
One can give a geometric interpretation of K as follows. In (5), set w = u and
z = v to find

(R(u, v)u, v) = KA(u, v)2

where

A(u, v) = | det([u, v])|

i.e.

A(u, v)2 = (u, u)(v, v) − (u, v)2 .

Geometrically,

A(u, v) = (area of parallelogram spanned by u, v).

By the multiplication rule for determinants,

det[dN (u), dN (v)] = det[dN ] det[u, v]

where the brackets indicate matrices. Thus

det[dN (u), dN (v)] A(dN (u), dN (v))


K= =± . (6’)
det[u, v] A(u, v)

Think of N as the Gauss map N : S → S 2 . Then K(p) is the square root of


the ratio of the area of the image on S 2 of “infinitesimal parallelogram“ on S
spanned by u, v by its own area. Thus K is the “amount” of variation of the
normal on S per unit area. The sign ± in (6’) indicates whether the Gauss map
preserves or reverses the sense of rotation around the common normal direction.
(The normal directions at point on S and at its image on S 2 are specified by
the same vector N , so that the sense of rotation around it can be compared, for
example by considering a small loop on S and its image on S 2 .)
2.4. GAUSS CURVATURE 117

Large curvature, large Gauss image Small curvature, small Gauss image
Fig. 2

Now consider dN = dNp again as linear transformation of Tp S. Recall that


dN is a self-adjoint linear transformation of the two dimensional vector space
Tp S, hence dN has two real eigenvalues, say k1 ,k2 . These are called the principal
curvatures of S at p. The corresponding eigenvectors (which are tangent vectors
to S at p) give the principal directions of S at p. They are always orthogonal to
each other, being eigenvectors of a symmetric linear transformation (assuming
the two eigenvalues are distinct). For any unit vector u∈ Tp S, k(u) = (dN (u),u)
is the curvature of S at p in the direction u. When u is an eigenvector of dN ,
this is just the corresponding principal curvature. In general, if u = p0 (s) is the
unit-tangent tangent of a curve p = p(s), parametrized by arclength, then

dN
k(u) = ( , u).
ds
The right side is the tangential component of the vectorial rate of change of N
along the curve p = p(s), hence k(u) does measure the “curvature of S in the
direction u“.
2.4. 2 Lemma. T he principal curvatures k1 ,k2 are the maximum and the
minimum values of k(u) as u varies over all unit-vectors in Tp S.
Proof : For fixed p = po on S, consider k(u) = (dNpo (u),u) as a function on the
unit-circle (u,u) = 1 in Tpo S. Suppose u =uo is a minimum or a maximum of
k(u). Then for any parametrization u =u(t) of (u,u) = 1 with u(to ) =uo and
u0 (to ) = vo :

d
0= (dNpo (u), u) = (dNpo (vo ), uo ) + (dNpo (uo ), vo ) = 2(dNpo (uo ), vo ).
dt t=to

Hence dNpo (uo ) must be orthogonal to the tangent vector vo to the circle at uo ,
i.e. dNpo (uo ) =ko uo for some scalar ko .
118 CHAPTER 2. CONNECTIONS AND CURVATURE

2.4.3 Remarks on the curvature of curves and the curvature of sur-


faces. Let C be a (smooth) space-curve, i.e. a one-dimensional submanifold
of R3 . To each point of C one can attach an orthonormal triple of vectors
T, N, B called the tangent, principal normal, binormal and defined as follows.
Let p = p(s) be a parametrization of C by arclength. Then

dp 1 dT
T = ,N = , B = T × N.
ds kdT /dsk ds

These three vector fields along p = p(s) form what is called the Frenet frame
along C. They satisfy the Frenet formulas

dT dN dB
= κN , = −κT + τ B, = −τ N
ds ds ds
where κ, τ are scalar functions, which are uniquely defined by these formulas,
called curvature and torsion of the curve C. (You can consult your calculus
text for more details).
Take in particular the case of a plane curve C : p = p(s) in R2 . In that case
the we have dN/ds = −κT , hence κ = ±|dN/ds|. This equation is entirely
analogous to (6’).

Fig.4. Curvature of a plane curve

From this point of view it is very surprising that there is a fundamental difference
between K and κ: K is independent of the embedding of the surface S in R3 , but
κ is very much dependent on the embedding of the curve in R2 . The curvature
κ of a curve C should therefore not be confused with the Riemann curvature
tensor R of C as 1-dimensional Riemann manifold, which would be zero. For
emphasis we may call κ the space curvature of C, because is does depend on
the embedding of C in R2 or R3 , not just on the Riemann metric of C. In the
same way the torsion τ of a space curve C has nothing to do with the torsion
tensor.
Nevertheless, there is a relation between the Riemann or Gauss curvature of a
surface S and the space curvature of curves C on it. From the above formulas
we find that generally
 dN 
κ=− ,T .
ds
2.4. GAUSS CURVATURE 119

From this it follows that the curvature k(u) of a surface S in the direction u ∈
Tp S is (up to a sign) the same a the curvature at p of the curve of intersection
of S with the plane through p spanned by u and the normal vector N (p). These
curves are called normal sections of S and for this reason k(u) is also called the
normal curvature of S in the direction u ∈ Tp S.
2.4.4 Remarks on the computation of the Gauss curvature. For the
purpose of computation one can start with the formula
det(dN (vi ), vj )
K = det dN = (8)
det(vi , vj )
which holds for any two v1 , v2 ∈ Tp S for which the denominator is nonzero. If
M is any nonzero normal vector, then we can write N = µM where µ = kM k−1 .
Since dN = µ(dM ) + (dµ)M formula (8) gives
det(dM (vi ), vj )
K= (9)
(M, M ) det(vi , vj )

Suppose S is given in parametrized form S: p = p(x1 , x2 ). Take for vi the vector


field pi = ∂p/∂xi and write Mi = ∂M/∂xi for the componentwise derivative of
M in R3 . Then (9) becomes
det(Mi , pj )
K= (10)
(M, M ) det(pi , pj )
For another formula for K, use the formula (1) in the form

(dN (pi ), pj ) = −(pij , N )

where pij = ∂ 2 p/∂xi ∂xj together with (9) to get


det(pij , M )
K= (11)
(M, M ) det(pi , pj )

Take M = p1 ×p2 , the cross product. Since kp1 ×p2 k2 = det(pi , pj ), the equation
(11) can be written as
det(pij , p1 × p2 )
K= . (12)
(det(pi , pj ))2
This can be expressed in terms of the fundamental forms as follows. Write

ds2 = gij dxi dxj , Π = Πij dxi dxj .

Then
(pij , p1 × p2 )
Πij = Π(pi , pj ) = −(dN (pi ), pj ) = , gij = (pi , pj )
kp1 × p2 k
Hence (11) says
detΠij
K= .
detgij
120 CHAPTER 2. CONNECTIONS AND CURVATURE

2.4. 5 Example: the sphere. Let S = {p ∈ R3 | x2 + y 2 + z 2 = r2 } be the


sphere of radius r. (We do not take r = 1 in order to see how the curvature
depends on r.) We may take N = r−1 p when p is considered a vector. For any
coordinate system on S 2 , the formula (10) becomes

det(r−1 pi , pj )
K= = r−2
det(pi , pj )

So the sphere of radius r has constant Gauss-curvature K = 1/r2 .


2.4. 6 Example: paraboloids. Let S be the surface with equation ax2 +
by 2 − z = 0. A normal vector is the gradient of the defining function, i.e. M =
∇(ax2 + by 2 − z) = (2ax, 2by, −1). As coordinates on S we take (x1 , x2 ) = (x, y)
and writep = p(x, y) = (x, y, ax2 + by 2 ). By formula (10),

1 (Mx , px )(My , py ) − (Mx , py )(My , px )


K=
(M, M ) (px , px )(py , py ) − (px , py )(py , px )

1 4ab − 0
=
(4a2 x2 + 4b2 y 2 + 1) (1 + (2ax)2 )(1 + (2by)2 ) − (2ax2by)2
4ab
=
(4a2 x2 + 4b2 y 2 + 1)2
Concerning the sign of K, note that
K > 0 if ab > 0 (elliptic paraboloid: cup or cap shaped),
K < 0 if ab < 0 (hyperbolic paraboloid: saddle shaped),
K = 0 if ab = 0 (parabolic cylinder, plane, or empty: degenerate cases).

EXERCISES 2.4
In the following problems S is a surface in R3 , (s, t) a coordinate system on S,
N a unit-normal.
1. a) Let (x1 , x2 ) be a coordinate system on S. Show that the only covariant
j
components Rimkl = gij Rmkl the Riemann curvature tensor R are

R1212 = −R2112 = −R1221 = R2121

and are given by


2
R1212 = K(g11 g22 − g12 ).
in terms of the Gauss curvature K and the metric gij . [Suggestion. Use formula
(5)]
j
b)Write out a formula for the Rmkl (one contravariant component).
c) Write out the Rmkl for the sphere S = {p = (x, y, z) ∈ R3 | x2 + y 2 + z 2 = r2 }
i

using the coordinates (φ, θ) on S by x = r sin φ cos θ, y = r sin φ sin θ, z =


r cos φ. [You may use Example 2.4.5.]
2. Determine if the following statements are true or false. (Provide a proof or
a counterexample).
2.4. GAUSS CURVATURE 121

a) If Π = 0 everywhere, then S is a plane or a part of a plane.


b) If K = 0 everywhere, then S is a plane or a part of a plane.
3. Suppose S is given as a graph z = f (x, y).
a) Show that the second fundamental form is
1 2
Π= q (fxx dx2 + fyy
2
dy 2 + 2fxy dxdy).
2 2
1 + fx + fy

b) Show that the Gauss curvature is


2
fxx fyy − fxy
K= .
(1 + fx2 + fy2 )2

4. Let S be a surface of revolution about the z-axis, with equation r = f (z) in


cylindrical coordinates (r, θ, z). (Assume f > 0 everywhere.) Using (θ, z) as
coordinates on S, show that the second fundamental form is
1
Π= p (f dθ2 − fzz dz 2 )
2
fz + 1

(The subscripts z indicate derivatives with respect to z).


5. Let S be given by an equation f (x, y, z) = 0. Show that the Gauss curvature
of S is  
fxx fxy fxz fx
−1 fyx fyy fyz fy 
K= 2 det 
fzx
.
(fx + fy2 + fz2 )2 fzy fzz fz 
fx fy fz 0
The subscripts denote partial derivatives. [Suggestion. Let M = (fx , fy , fz ),
µ = (fx2 + fy2 + fz2 )−1/2 , N = µM . Consider M as a column and dM as a 3 × 3
matrix. Show that the equation to be proved is equivalent to
 
µdM N
K = − det .
N∗ 0

Split the vectors v ∈ R3 on which µdM operates as v = t + bN with t ∈ Tp S


and decompose µdM accordingly as a block matrix. You will need to remember
elementary column operations on determinants.]
6. Let C:p = p(s) curve in R3 . The tangential developable is the surface S in
R3 swept out by the tangent line of C, i.e. S consists of the points p = p(s, t)
given by
p = p(s) + tṗ(s).
Show that the principal curvatures of S are
τ
k1 = 0, k2 =

122 CHAPTER 2. CONNECTIONS AND CURVATURE

where κ and τ are the curvature and the torsion of the curve C.
7. Calculate the first fundamental form ds2 , the second fundamental form Π,
and the Gauss curvature K for the following surfaces S: p = p(s, t) using the
given parameters s, t as coordinates. The letters a, b, c denote positive constants.
a) Ellipsoid of revolution: x = a cos s cos t, y = a cosssin t, z = c sin s.
b) Hyperboloid of revolution of one sheet: x = a cosh s cosh t, y = a coshssin t,
z = c sinh s.
c) Hyperboloid of revolution of two sheets: x = a sinh s cosh t, y = a sinhssin t,
z = c cosh s.
d) Paraboloid of revolution: x = s cos t, y = s sin t, z = s2 .
e) Circular cylinder: x = R cos t, y = R sin t, z = s.
f) Circular cone without vertex: x = s cos t, y = s sin t, z = as (s 6= 0).
g) Torus: x = (a + b cos s) cos t, y = (a + b cos s) sin t, z = b sin s.
h) Catenoid: x = cosh(s/a) cos t, y = cosh(s/a) sin t, z = s.
i) Helicoid: x = s cos t, y = s sin t, z = at.
8. Find the principal curvatures k1 , k2 at the points (±a, 0, 0) of the hyperboloid
of two sheets
x2 y2 z2
− − = 1.
a2 b2 c2
2.5. LEVI-CIVITA’S CONNECTION 123

2.5 Levi-Civita’s connection


Throughout, M denotes a manifold with a Riemann metric denoted g or ds2 .
We shall now often use the term connection for covariant derivative.
2.5. 1 Definition. A connection ∇ is said to be compatible with the Riemann
metric g if parallel transport (defined by ∇) along any curve preserves scalar
products (defined by g): for any curve p = p(t)

g(T (to → t1 )u, T (to → t1 )v)) = g(u, v) (1)

for all u, v ∈ Tp(to ) M , any to , t1 .


Actually it suffices that parallel transport preserve the quadratic form (square-
length) ds2 , because of the following lemma.
2.5. 2 Lemma. A linear transformation T : Tpo M → Tp1 M which preserves
the quadratic form ds2 (u) = g(u, u) also preserves the scalar product g(u, v).
Proof. This follows from the equation
1
g(u, v) = (g(u + v, u + v) − g(u, u) − g(v, v)).
2
2.5. 3 Theorem. A connection ∇ is compatible with the Riemann metric g if
and only if for any two vector fields X(t), Y (t) along a curve c: p = p(t)

d  ∇X   ∇Y 
g(X, Y ) = g , Y + g X, . (2)
dt dt dt
Proof. We have to show (1) ⇔ (2).
(⇒) Assume (1) holds. Let E1 (t), · · · , En (t) be a parallel frame along p(t).
Write
g(Ei (to ), Ej (to )) = cij .
Since
Ei (t) = Tc (to → t)Ei (to )
(1) gives
g(Ei (t), Ej (t)) = cij
for all t. Write
X(t) = X i (t)Ei (t), Y (t) = Y i (t)Ei (t).
Then X X
g(X, Y ) = X i Y j g(Ei , Ej ) = cij X i Y i .
ij ij

So
d X dX i dY ji  X dX i j  X dY j 
g(X, Y ) = cij Y j +cij X i = cij Y + cij X i
dt ij
dt dt ij
dt ij
dt
124 CHAPTER 2. CONNECTIONS AND CURVATURE

 ∇X   ∇Y 
=g ,Y + g X, .
dt dt
This proves (2).
(⇐) Assume (2) holds. Let c: p = p(t) be any curve, po = p(to ), and u, v ∈
Tp(to ) M two vectors. Let

X(t) = T (to → t)u, Y (t) = T (to → t)v.

Then
∇X ∇Y
= 0, = 0.
dt dt
Thus by (2)
d  ∇X   ∇Y 
g(X, Y ) = g , Y + g X, = 0.
dt dt dt
Consequently
g(X, Y ) = constant.
If we compare this constant at t = to and at t = t1 we get

g(X(to ), Y (to )) = g(X(t1 ), Y (t1 )).

This is the desired equation (1).


2.5. 4 Corollary. A connection ∇ is compatible with the Riemann metric g if
and only if for any three C ∞ vector fields X, Y, Z

DZ g(X, Y ) = g(∇Z X, Y ) + g(X, ∇Z Y ) (3)

where generally DZ f = df (Z).


Proof. At a given point po , both sides of (3) depend only on the value Z(po ) of
Z at po and on values of X and Y along some curve c: p = p(t) with p(to ) = po ,
p0 (to ) = Z(po ). If we apply (2) to such a curve and set t = to we find that (3)
holds at po . Since po is arbitrary, (3) holds at all points.
2.5. 5 Theorem (Levi-Civita). For any Riemannian metric g there is a
unique symmetric connection ∇ that is compatible with g.
Proof. (uniqueness) Assume ∇ is symmetric and compatible with g. Fix a
coordinate system (xi ). From (3):

∂i g(∂j , ∂k ) = g(∇i ∂j , ∂k ) + g(∂j , ∇i ∂k ). (4)

This may be written as


gjk,i = Γk,ji + Γj,ki (40 )
where
gjk,i = ∂i gjk = ∂i g(∂j , ∂k )
and
Γk,ji = g(∇i ∂j , ∂k ) = g(Γlji ∂l , ∂k )
2.5. LEVI-CIVITA’S CONNECTION 125

i.e.
Γk,ji = gkl Γlji . (5)
Since ∇ is symmetric
Γlji = Γlij
and therefore
Γk,ji = Γk,ij . (6)
0
Equations (4 ) and (6) may be solved for the Γ’s as follows. Write out the
equations corresponding to (40 ) with the cyclic permutation of the indices: (ijk),
(kij), (jki). Add the last two and subtract the first. This gives (together with
(6)):

gki,j + gij,k − gjk,i = (Γi,kj + Γk,ij ) + (Γj,ik + Γi,jk ) − (Γk,ji + Γj,ki ) = 2Γi,jk .

Therefore
1
Γi,kj = (gki,j + gij,k − gjk,i ). (7)
2
Use the inverse matrix (g ab ) of (gij ) to solve (5) and (7) for Γljk :

1 li
Γljk = g (gki,j + gij,k − gjk,i ). (8)
2
This proves that the Γ’s, and therefore the connection ∇ are uniquely deter-
mined by the gij , i.e. by the metric g.
(Existence). Given g, choose a coordinate system (xi ) and define ∇ to be the
(symmetric) connection which has Γljk given by (8) relative to this coordinate
system. If one defines Γk,ji by (5) one checks that (40 ) and therefore (4) holds.
This implies that (3) holds, since any vector field is a linear combination of the
∂i . So ∇ is compatible with the metric g. That ∇ is symmetric is clear from
(8).
2.5. 6 Definitions. (a) The unique connection ∇ compatible with a given
Riemann metric g is called the Levi-Civita connection of g.
(b) The Γk,ij and the Γljk defined by (7) and (8) are called Christoffel symbols
of the first and second kind, respectively. Sometimes the following notation is
used (sometimes with other conventions concerning the positions of the entries
in the symbols):
1
[jk, i] = Γi,kj = (gki,j + gij,k − gjk,i ), (9)
2
nlo 1
= Γljk = g li (gki,j + gij,k − gjk,i ). (10)
jk 2
These equations are called Christoffel’s formulas.

Riemann metric and Levi-Civita connection on submanifolds.


Riemann metric and curvature. The basic fact is this.
126 CHAPTER 2. CONNECTIONS AND CURVATURE

2.5. 7 Theorem (Riemann). If the curvature tensor R of the Levi-Civita


connection of a Riemann metric is R = 0, then locally there is a coordinate
system (xi ) on M such that gij = ±δij , i.e. with respect to this coordinate
system (xi ), ds2 becomes a pseudo-Euclidean metric
X
ds2 = ±(dxi )2 .
i

The coordinate system is unique up to a pseudo-Euclidean transformation: any


other such coordinate system (x̃j ) is of the form

x̃j = x̃jo + cji xi

for some constants x̃jo , cji with glk cli ckj = gij δij .
P
lk

Proof omitted.
Geodesics of the Levi-Civita connection. We have two notions of geodesic
on M : (1) the geodesics of the Riemann metric g, defined as “shortest lines”,
characterized by the differential equation

d dxb 1 dxc dxd


(gbr ) − (gcd,r ) =0 (for all r). (11)
dt dt 2 dt dt
(2) the geodesics of the Levi-Civita connections ∇, defined as “straightest lines”,
characterized by the differential equation

d2 xk dxi dxj
2
+ Γkij = 0 (for all k). (12)
dt dt dt
The following theorem says that these two notions of “geodesic“ coincide, so
that we can simply speak of “geodesics”.
2.5. 8 Theorem. The geodesics of the Riemann metric are the same as the
geodesics of the Levi-Civita connection.
Proof. We have to show that (11) is equivalent to (12) if we set
1 kl
Γkij = g (gjl,i + gli,j − gij,l ).
2
This follows by direct calculation.
We know from Proposition 5.9 that a geodesics of the metric have constant
speed, hence so do the geodesics of the Levi-Civita connection, but it is just as
easy to see directly clear since the velocity vector of a geodesic is parallel and
parallel transport preserves length.
Riemannian submanifolds.
2.5. 9 Definition. Let S be an m–dimensional submanifold of a manifold M
with a Riemann metric g. Assume the restriction g S of g to tangent vectors
to S is non-degenerate, i.e. if {v1 , · · · , vm } is a basis for Tp S ⊂ Tp M , then
2.5. LEVI-CIVITA’S CONNECTION 127

det g(vi , vj ) 6= 0. Then g S is called the induced Riemannian metric on S and


S is called a Riemannian submanifold of M, g. This is always assumed if we
speak of an induced Riemann metric gS on S. It is automatic if the metric g is
positive definite.
2.5. 10 Lemma. Let S be an m-dimensional Riemannian submanifold of M
and p ∈ S. Every vector v ∈ Tp M can be uniquely written in the form v =
v S + v N where v S is tangential to S and v N is orthogonal to S, i.e.

Tp M = Tp S ⊕ Tp⊥ S

where Tp⊥ S = {v ∈ Tp M : g(w, v) = 0 for all v ∈ Tp S}.


Proof. Let (e1 , · · · , em , · · · , en ) be a basis for Tp M whose first m vectors
form a basis for Tp S. Write v = xi ei . Then v ∈ Tp⊥ S iff g(e1 , ei )xi =
0, · · · , g(em , ei )xi = 0. This is a system of m equations in the n unknowns xi
which is of rank m, because dettij≤m g(ei , ej ) 6= 0. Hence it has n−m linearly in-
dependent solutions and no nonzero solution belongs Tto span{ei : i ≤ m} = Tp S.
Hence i.e. dimTp⊥ = n − m, dimTp S = m, and Tp⊥ Tp S = 0. This implies the
assertion.
The components in the splitting v = v S + v N of the lemma are called the
tangential and normal components of v, respectively.
2.5. 11 Theorem. Let S be an m-dimensional submanifold of a manifold M
with a Riemannian metric g, g S the induced Riemann metric on S. Let ∇ be
the Levi-Civita connection of g, ∇S the Levi-Civita connection of g S . Then for
any two vector-fields X, Y on S,

∇SY X = tangential component of ∇Y X (9)

Proof. First define a connection on S by (9). To show that this connection is


symmetric we use Theorem 2.3.2 and the notation introduced there:

∇S p ∇S p ∇ p ∇ p S
− = − =0
∂s ∂t ∂t ∂s ∂s ∂t ∂t ∂s
since the Levi-Civita connection ∇ on M is symmetric. It remains to show that
∇S is compatible with g S . For this we use Lemma 2.5.4. Let X, Y, Z be three
vector fields on S. Considered as vector fields on M along S they satisfy

DZ g(X, Y ) = g(∇Z X, Y ) + g(X, ∇Z Y ).

Since X, Y are already tangential to S, this equation may be written as

DZ g S (X, Y ) = g S (∇SZ X, Y ) + g S (X, ∇SZ Y ).

It follows that ∇S is indeed the Levi-Civita connection of g S .


Example: surfaces in R3 . Let S be a surface in R3 . Recall that the Gauss
curvature K is defined by K = det dN where N is a unit normal vector-field and
128 CHAPTER 2. CONNECTIONS AND CURVATURE

dN is considered a linear transformation of Tp S. It is related to the Riemann


curvature R of the connection ∇S by the formula
 
(w, u) (w, v)
(R(u, v)w, z) = K det
(z, u) (z, v)

The theorem implies that ∇S , and hence R, is completely determined by the


Riemann metric g S on S, which is Gauss’s theorema egregium.
We shall now discuss some further properties of geodesics on a Riemann mani-
fold. The main point is what is called Gauss’s Lemma, which can take on any
one of forms to be given below, but we start with some preliminaries.
2.5. 12 Definition. Let S be a non-degenerate submanifold of M . The set
N S ⊂ T M all normal vectors to S, is called the normal bundle of S:

N S = {v ∈ Tp M : p ∈ S, v ∈ Tp S ⊥ }

where Tp S ⊥ is the orthogonal complement of Tp S in Tp M with respect to the


metric on M .

The Riemannian submanifold S will remain fixed from now on and N will denote
its normal bundle.
2.5. 13 Lemma . N is an n-dimensional submanifold of the tangent bundle
TM.
Proof. In a neighbourhood of any point of S we can find an orthonormal family
E1 , · · · , En of vectors fields on M , so that at points p in S the first m = dim S
of them form a basis for Tp S (with the help of the Gram–Schmidt process,
for example). Set ξ i (v) = g(Ei , v) as function on T M . We may also choose a
coordinate system (x1 , · · · , xn ) so that S is locally given by xm+1 = 0, · · · , xn =
0. The 2n function (xi , ξ i ) form a coordinate system on T M so that N is given
by xm+1 = 0, · · · , xn = 0, ξ 1 = 0, · · · , ξ m = 0, as required by the definition of
“submanifold“.
The set N comes equipped with a map N → S which maps a normal vector
v ∈ Tp S ⊥ to its base-point s ∈ S. We can think of S as a subset of N , the
zero-section, which consists of the zero-vectors 0p at points p of S.
2.5. 14 Proposition. The geodesic spray exp : N → M is locally bijective
around any point of S.
Proof. We use the inverse function theorem. For this we have to calculate
the differential of exp along the zero section S in N . Fix po ∈ S. Then S
and Tpo S ⊥ ⊂ Tpo M are two submanifolds of N whose tangent spaces Tpo S and
Tpo S ⊥ are orthogonal complements in Tpo M . (g is nondegenerate on Tpo S).
We have to calculate d exppo (w). We consider two cases.
(1) w ∈ Tpo S is the tangent vector of a curve p(t) in S
(2) w ∈ Tpo S ⊥ is the tangent vector of the straight line tw in Tpo S ⊥ .
2.5. LEVI-CIVITA’S CONNECTION 129

Then we find
d  d 
(1) d exppo (w) = exp 0p(t) = p(t) = w ∈ Tpo S
dt t=0 dt t=0
d 
(2) d exppo (w) = exp tw = w ∈ Tpo S ⊥
dt t=0
Thus dexppo w = w in either case, hence d exppo has full rank n = dim N =
dim M .
For any c ∈ R, let
Nc = {v ∈ N | g(v, v) = c}.
At a given point p ∈ S, the vectors in Nc at p from a “sphere“ (in the sense of
the metric g) in the normal space Tp S ⊥ . Let Sc = exp Nc be the image of Nc
under the geodesic spray exp, called a normal√tube around S in M ; it may be
thought of as the points at constant distance c from S, at least if the metric
g is positive definite.
We shall need the following observation.
2.5. 15 Lemma. Let t → p = p(s, t) be a family of geodesics depending on a
parameter s. Assume they all have the same (constant) speed independent of s,
i.e. g(∂p/∂t, ∂p/∂t) = c is independent of s. Then g(∂p/∂s, ∂p/∂t) is constant
along each geodesic.
Proof. Because of the constant speed,
∂  ∂p ∂p   ∇∂p ∂p 
0= g , = 2g ,
∂s ∂t ∂t ∂s∂t ∂t
Hence  ∇∂p ∂p   ∂p ∇∂p 
∂  ∂p ∂p 
g , =g , +g , =0+0
∂t ∂s ∂t ∂t∂s ∂t ∂s ∂t∂t
the first 0 because of the symmetry of ∇ and the previous equality, the second
0 because t → p(s, t) is a geodesic.
We now return to S and N .
2.5. 16 Gauss Lemma (Version 1). The geodesics through a point of S with
initial velocity orthogonal to S meet the normal tubes Sc around S orthogonally.
Proof. A curve in Sc = exp Nc is of the form exp p(s) v(s) where p(s) ∈ S
and v(s) ∈ Tp(s) S ⊥ . Let t → p(s, t) = expp(s) tv(s) be the geodesic with
initial velocity v(s). Since v(s) ∈ Nc these geodesics have speed independent
of s: g(∂p/∂t, ∂p/∂t) = g(v(s), v(s)) = c. Hence the lemma applies. Since
(∂p/∂t)t=0 = v(s) ∈ Tp(s) S ⊥ is perpendicular to (∂p/∂s)t=0 = dp(s)/ds ∈
Tp(s) S at t = 0, the lemma says that ∂p/∂t remains perpendicular to ∂p/∂s for
all t. On the other hand, p(s, 1) = exp v(s) lies in Sc , so ∂p/∂s is tangential to
Sc at t = 1. Since all tangent vectors to Sc = exp Nc are of this form we get the
assertion.
2.5. 17 Gauss Lemma (Version 2). Let p be any point of M . The geodesics
through p meets the spheres Sc in M centered at p orthogonally.
130 CHAPTER 2. CONNECTIONS AND CURVATURE

Proof. This is the special case of the previous lemma when S = {p} reduces to
a single point.

Fig. 1. Gauss’s Lemma for a sphere

2.5. 18 Remark. The spheres around p are by definition the images of the
“spheres“ g(v, v) = c in Tp M under the geodesic spray. If the metric is positive
definite, these are really the points at constant distance from p, as follows from
the definition of the geodesics of a Riemann metric. For example, when M = S 2
and p the north pole, the “spheres” centered at p are the circles of latitude
φ =constant and the geodesics through p the circles of longitude θ =constant.

EXERCISES 2.5
1. Complete the proof of Theorem 2.5.10.
2. Let gbe Riemann metric, ∇the Levi-Civita connection of g. Consider the
Riemann metric g = (gij ) as a (0, 2)-tensor. Prove that ∇g = 0, i.e. ∇v g =
0 for any vector v. [Suggestion. Let X, Y, Z be a vector field and consider
DZ (g(X, Y )). Use the axioms defining the covariant derivative of arbitrary
tensor fields.]
3. Use Christoffel’s formulas to calculate the Christoffel symbols Γij,k , Γkij in
spherical coordinates (ρ, θ, φ): x = ρ cos θ sin φ, y = ρ sin θ sin φ, z = ρ cos θ.
[Euclidean metric ds2 = dx2 + dy 2 + dz 2 on R3 . Use ρ, θ, φ as indices i, j, k
rather than 1, 2, 3.]
4. Use Christoffel’s formulas to calculate the Γkij for the metric

ds2 = (dx1 )2 + [(x2 )2 − (x1 )2 ](dx2 )2 .

5. Let S be an m-dimensional submanifold of a Riemannian manifold M .


Let (x1 , · · · , xm ) be a coordinate system on S and write points of S as p =
p(x1 , · · · , xm ). Use Christoffel’s formula (10) to show that the Christoffel sym-
bols Γkij of the induced Riemann metric ds2 = gij dxi dxj on S are given by
 ∇ ∂p ∂p 
Γkij = g kl g , .
∂xj ∂xk ∂xl
[The inner product g and the covariant derivative ∇ on the right are taken on
M and g kl glj = δkj .]
6. a) Let M, g be a manifold with a Riemann metric, f : M → M an isometry.
Suppose there is a po ∈ M and a basis (v1 , · · · , vn ) of Tpo M so that f (po ) = po
2.5. LEVI-CIVITA’S CONNECTION 131

and dfp (vi ) = vi , i = 1, · · · , n. Show that f (p) = p for all points p in a


neighbourhood of po . [Suggestion: consider geodesics.]
b) Show that every isometry of the sphere S 2 is given by an orthogonal linear
transformation of R3 . [Suggestion. Fix po ∈ S 2 and an orthonormal basis v1 , v2
for Tpo S 2 . Let A be an isometry of S 2 . Apply (a) to f = B −1 A where B is a
suitably chosen orthogonal transformation.]
7. Let S be a 2–dimensional manifold with a positive definite Riemannian
metric ds2 . Show that around any point of S one can introduce an orthogonal
coordinate system, i.e. a coordinate system (u, v) so that the metric takes the
form
ds2 = a(u, v)du2 + b(u, v)dv 2 .
[Suggestion: use Version 2 of Gauss’s Lemma.]
8. Prove that (4) implies (3).
The following problems deal with some aspects of the question to what extent
the connection determines the Riemann metric by the requirement that they be
compatible (as in Definition 2.5.1).
9. Show that a Riemann metric g on Rn is compatible with the standard
connection D with zero components Γkij = 0 in the Cartesian coordinates (xi ),
if and only if g has constant components gij = cij in the Cartesian coordinates
(xi ).
10. Start with a connection ∇ on M . Suppose g and 0 g are two Riemann
metrics compatible with ∇. (a) If g and 0 g agree at a single point po then they
agree in a whole neighbourhood of po . [Roughly: the connections determines
the metric in terms of its value at a single point, in the following sense.
(b) Suppose g and 0 g have the same signature, i.e. they have the same number of
±signs if expressed in terms of an orthonormal bases (not necessarily the same
for both) at a given point po (see remark (3) after definition 5.3). Show that
in any coordinate system (xi ), the form gij dxi dxj can be (locally) transformed
into 0 gij dxi dxj by a linear transformation with constant coefficients xi → cij xj .
[Roughly: the connection determines the metric up to a linear coordinate trans-
formation.]
132 CHAPTER 2. CONNECTIONS AND CURVATURE

2.6 Curvature identities


Let M be a manifold with Riemannian metric g, ∇ its Levi-Civita connection.
For brevity, write (u, v) = g(u, v) for the scalar-product.
2.6. 1 Theorem (Curvature Identities). For any vectors u, v, w,a,b at a
point p ∈ M
1. R(u, v) = −R(v,u)
2. (R(u, v)a,b) = −(a, R(u, v)b) [ R(u, v) is skew-adjoint]
3. R(u, v)w + R(w, u)v + R(v, w)u = 0 [Bianchi’s Identity]
4. (R(u, v)a,b) = (R(a,b)u, v)
Proof. Let p = p(s, t) be a surface in M , U = ∂p ∂p
∂s , V = ∂t (for any values of

s, t). Let X = X(s, t) be a C -vector field along p = p(s, t). Then

∇ ∇X ∇ ∇X
R(U, V )X = − . (*)
∂t ∂s ∂s ∂t
1. This is clear from (*).
2. Let A = A(s, t), B = B(s, t) be C ∞ vector fields along p = p(s, t). Compute

∂ ∇A ∇B
(A, B) = ( , B) + (A, ),
∂s ∂s ∂s
∂ ∂ ∇ ∇ ∇A ∇B ∇A ∇B ∂∇ ∂∇
(A, B) = ( A, B) + ( , )+( , ) + (A, B).
∂t ∂s ∂t ∂s ∂s ∂t ∂t ∂s ∂t ∂s
Interchange s and t, subtract, and use (*):

0 = (R(U, V )A, B) + (A, R(U, V )B).

3. This time take p = p(s, t,r) to be a C ∞ -function of three scalar-variables


(s, t,r). Let
∂p ∂p ∂p
U= , V = , W= .
∂s ∂t ∂r
Then
∇ ∇ ∂p ∇ ∇ ∂p
R(U, V )W = − .
∂t ∂s ∂r ∂s ∂t ∂r
∇ ∂p ∇ ∂p
In this equation, permute (s, t, r) cyclically and add (using ∂t ∂r = ∂r ∂t etc.)
to find the desired formula.
4. This follows from 1–3 by a somewhat tedious calculation.
By definition
i
R(u, v)w = Rqkl uk v l wq ∂i .
The components of R with respect to a coordinate system (xi ) are defined by
the equation
i
R(∂k , ∂l )∂q = Rqkl ∂i .
We also set
i
(R(∂k , ∂l )∂q , ∂j ) = Rqkl gij = Rjqkl
2.6. CURVATURE IDENTITIES 133

2.6. 2 Corollary. The components of R with respect to any coordinate system


satisfy the following identities.
1. Rbcd aa = −Rbdca

2. Rabcd = −Rbacd
a a a
3. Rbcd + Rdbc + Rcdb =0
4. Rabcd = Rdcba
Proof. This follows immediately from the theorem.
2.6. 3 Theorem (Jacobi’s Equation). Let p = ps (t) be a one-parameter
family of geodesics (s =parameter ). Then

∇2 ∂p ∂p ∂p ∂p
= R( , ) .
∂t2 ∂s ∂s ∂t ∂t
Proof.
∇2 ∂p ∇ ∇ ∂p
2
= [symmetry of ∇]
∂t ∂s ∂t ∂s ∂t
∇ ∇ ∂p ∇ ∇ ∂p ∇ ∂p
= − [ ∂t ∂t = 0 since ps (t) is a geodesic]
∂t ∂s ∂t ∂s ∂t ∂t
∂p ∂p ∂p
= R( , ) .
∂s ∂t ∂t
t

Fig.1 A family of geodesics

2.6. 4 Definition. For any two vectors v, w, Ric(v, w) is the trace of the linear
transformation u→ R(u, v)w.
In coordinates:
i
R(u, v)w = Rqkl uk v l wq ∂i ,
k
Ric(v, w) = Rqkl v l wq .
Thus Ric is a tensor obtained by a contraction of R:
k
(Ric)ql = Rqkl .

One also writes (Ric)ql = Rql .


134 CHAPTER 2. CONNECTIONS AND CURVATURE

2.6. 5 Theorem. The Ricci tensor is symmetric, i.e. Ric(v, w) = Ric(w, v) for
all vectors v, w.
Proof. Fix v, w and take the trace of Bianchi’s identity, considered as a linear
transformation of u:

tr{u → R(u, v)w} + tr{u → R(v, w)u} + tr{u → R(w, u)v} = 0.

Since u→ R(v, w)u is a skew-adjoint transformation, tr{u→ R(v, w)u} = 0.


Since R(w,u) = −R(u, w), tr{u→ R(w,u)v} = −tr{u→ R(u, w)v}. Thus tr{u→
R(u, v)w} =tr{u→ R(u, w)v}, i.e. Ric(v, w) = Ric(w, v).
Chapter 3

Calculus on manifolds

135
136 CHAPTER 3. CALCULUS ON MANIFOLDS

3.1 Differential forms

By definition, a covariant tensor T at p ∈ M is a quantity which relative to a


coordinate system (xi ) around p is represented by an expression
X
T = Tij··· dxi ⊗ dxj ⊗ · · ·

with real coefficients Tij··· and with the differentials dxi being taken at p. The
tensor need not be homogeneous, i.e. the sum involve (0, k)–tensors for dif-
ferent values of k. Such expressions are added and multiplied in the natural
way, but one must be careful to observe that multiplication of the dxi is not
commutative, nor satisfies any other relation besides the associative law and the
distributive law. Covariant tensors at p can be thought of as purely formal alge-
braic expressions of this type. Differential forms are obtained by the same sort
of construction if one imposes in addition the rule that the dxi anticommute.
This leads to the following definition.
3.1. 1 Definition. A differential form $ at p ∈ M is a quantity which relative
to a coordinate system (xi ) around p is represented by a formal expression
X
$= fij··· dxi ∧ dxj ∧ · · · (3.1)

with real coefficients fij··· and with the differentials dxi being taken at p. Such
expressions are added and multiplied in the natural way but subject to the
relation
dxi ∧ dxj = −dxj ∧ dxi . (3.2)
If all the wedge products dxi ∧ dxi ∧ · · · in (1) contain exactly k factors, then $
said to be homogeneous of degree k and is called a k–form at p. A differential
form on M (also simply called a form) associates to each p ∈ M a form at
p. The form is of class C ∞ if its coefficients fij··· (relative to any coordinate
system) are C ∞ functions of p. This will always be assumed to be the case.
Remarks. (1) The definition means that we consider identical expressions
(1) which can be obtained from each other using the relations (2), possibly
repeatedly or in conjunction with the other rules of addition and multiplication.
For example, since dxi ∧ dxi = −dxi ∧ dxi (any i) one finds that 2dxi ∧ dxi = 0,
so dxi ∧ dxi = 0. The expressions for $ in two coordinate systems (xi ), (x̃i )
are related by the substitutions xi = f i (x̃1 , · · · , x̃n ), dxi = (∂f i /∂ x̃j )dx̃j on the
intersection of the coordinate domains.
(2) By definition k-fold wedge product dxi ∧ dxj ∧ · · · transforms like the (0,k)-
tensor dxi ⊗dxj ⊗· · · , but is alternating in the differentials dxi , dxj , i.e. changes
sign if two adjacent differentials are interchanged.
(3) Every differential form is uniquely a sum of homogeneous differential forms.
For his reason one can restrict attention to forms of a given degree, except that
the wedge-product of a k-form and an l-form is a (k + l)-form.
3.1. DIFFERENTIAL FORMS 137

3.1. 2 Example. In polar coordinates x = r cos θ, y = r sin θ,


dx = cos θdr − r sin θdθ, dy = sin θdr + r cos θdθ
dx ∧ dy = (cos θdr − r sin θdθ) ∧ (sin θdr + r cos θdθ)
= cos θ sin θdr ∧dr +r cos2 θdr ∧dθ −r sin2 θdθ ∧dr −r2 sin θ cos θdθ ∧dθ
= rdr ∧ dθ
3.1. 3 Lemma. Let $ be a differential k-form, (xi ) a coordinate system. Then
$ can be uniquely represented in the form

fij··· dxi ∧ dxj ∧ · · · (i < j < · · · ) (3.3)

with indices in increasing order.


Proof. It is clear that $ can be represented in this form, since the differentials
dxi may reordered at will at the expense of introducing some change of sign.
Uniqueness is equally obvious: if two expressions (1) can be transformed into
each other using (2), then the terms involving a given set of indices {i, j · · · }
must be equal, since no new indices are introduced using (2). Since the indices
are ordered i < j < · · · there is only one such term in either expression, and
these must then be equal.
3.1. 4 Definition. A (0,k)-tensor Tij··· dxi ⊗ dxj ⊗ · · · is alternating if its
components change sign when two adjacent indices are interchanged T···ij··· =
−T···ji··· .for all k-tuples of indices (· · · , i, j, · · · ). Equivalently,
T (· · · , v, w, · · · ) = −T (· · · , w, v, · · · )
for all k-tuples of vectors (· · · , v, w, · · · ).
3.1. 5 Remark. Under a general permutation σ of the indices, the compo-
nents of an alternating tensor are multiplied by the sign ±1 of the permutation:
T···σ(i)σ(j)··· =sgn(σ)T···ji··· ...
3.1. 6 Lemma. There is a one-to-one correspondence between differential k-
forms and alternating (0,k )-tensors so that the form fij··· dxi ∧ dxj ∧ · · · (i <
j < · · · ) corresponds to the tensor Tij... dxi ⊗ dxj ⊗ · · · defined by
(1) Tij··· = fij··· if i < j < · · · and
(2) Tij··· changes sign when two adjacent indices are interchanged.
Proof. As noted above every k form can be written uniquely as

fij··· dxi ∧ dxj ∧ · · · (i < j < · · · )

where the sum goes over ordered k-tuples i < j < · · · . It is also clear that
an alternating (0,k)-tensor Tij··· dxi ⊗ dxj ⊗ · · · is uniquely determined by its
components
Tij··· = fij··· if i < j < · · · .
indexed by ordered k-tuples i < j < · · · . Hence the formula

Tij··· = fij··· if i < j < · · · .


138 CHAPTER 3. CALCULUS ON MANIFOLDS

does give a one-to-one correspondence. The fact that the Tij··· defined in this
way transform like a (0,k)-tensor follows from the transformation law of the dxi .
.
3.1. 7 Example: forms on R3 .
(1) The 1-form Adx + Bdy + Cdz is a covector, as we know.
(2) The differential 2-form P dy ∧ dz + Qdz ∧ dx + Rdx ∧ dy corresponds to
the (0, 2)-tensor T with components Tzy = −Tyz = P , Tzx = −Txz = Q,
Txy = −Tyx = R. This is just the tensor

P (dy ⊗ dz − dz ⊗ dy) + Q(dz ⊗ dx − dx ⊗ dz) + R(dx ⊗ dy − dy ⊗ dx)

If we write v = (P, Q, R), then T (a, b) = v · (a × b) for all vectors a, b..


(3) The differential 3-form Ddx ∧ dy ∧ dz corresponds to the (0, 3)-tensor T with
components Dijk where ijk =sign of the permutation (ijk) of (xyz). (We use
(xyz) as indices rather than (123).) Thus every 3-form in R3 is a multiple of the
form dx ∧ dy ∧ dz. This form we know: for three vectors u = (v i ), v = (v j ), w =
(wk ), ijk ui v j wk = det[u, v, w]. Thus T (u, v, w) = D det[u, v, w].
3.1. 8 Remark. (a) We shall now identify k-forms with alternating (0,k)-
tensors. This identification can be expressed by the formula
X
dxi1 ∧ · · · ∧ dxik = sgn(τ )dxiσ(1) ⊗ · · · ⊗ dxiσ(k) (3.4)
σ∈Sk

The sum runs over the group Sk of all permutations (1, · · · , k) → (σ(1), · · · , σ(k));
sgn(σ) = ±1 is the sign of the permutation σ. The multiplication of forms then
takes on the following form. Write k = p + q and let Sp × Sq be the subgroup of
the group Sk which permutes the indices {1, · · · , p} and {p+1, · · · , p+q} among
themselves. Choose a set of coset representatives [Sp+q /Sp ×Sq ] for the quotient
so that every element of σ ∈ Sp+q can be uniquely written as σ = τ σ 0 σ 00 with
τ ∈[Sp+q /Sp × Sq ] and (σ 0 , σ 00 ) ∈ Sp × Sq . If in (4) one first performs the sum
over (σ 0 , σ 00 ) and then over τ one finds the formidable formula

(dxi1 ∧ · · · ∧ dxip ) ∧ (dxip+1 ∧ · · · ∧ dxip+q ) (3.5)


X
= sgn(τ )(dxiτ (1) ∧ · · · ∧ dxiτ (p) ) ⊗ (dxiτ (p+1) ∧ · · · ∧ dxiτ (p+q) )
τ ∈ [Sp+q /Sp × Sq ]
For [Sp+q /Sp × Sq ] one can take (for example) the τ ∈ Sp+q satisfying

τ (1) < · · · < τ (p) and τ (p + 1) < · · · < τ (p + q).

(These τ ’s are called “shuffle permutations”). On the other hand, one can
also let the sum in (5) run over all τ ∈ Sp+q provided one divides by p!q! to
compensate for the redundancy. Finally we note that (4) and (5) remain valid
if the dxi are replaced by arbitrary 1–forms θi , and are then independent of the
coordinates.
3.1. DIFFERENTIAL FORMS 139

The formula (5) gives the multiplication law for differential forms when consid-
ered as tensors via (4). Luckily, the formulas (4) and (5) are rarely needed. The
whole point of differential forms is that both the alternating property and the
transformation law is built into the notation, so that they can be manipulated
“mechanically“.
(b) We now have (at least) three equivalent ways of thinking about k-forms:
(i) formal expressions fij··· dxi ∧ dxj ∧ · · ·
(ii) alternating tensors Tij··· dxi ⊗ dxj ⊗ · · ·
(iii) alternating multilinear functions T (v, w, · · · )
3.1. 9 Theorem. On an n–dimensional manifold, any differential n–form is
written as
Ddx1 ∧ dx2 ∧ · · · ∧ dxn
relative to a coordinate system (xi ). Under a change of coordinates x̃i =
x̃i (x1 , · · · , xn ),

Ddx1 ∧ dx2 ∧ · · · ∧ dxn = D̃dx̃1 ∧ dx̃2 ∧ · · · ∧ dx̃n

where D = D̃ det(∂ x̃i /∂xj ).


Proof. Any n–form is a linear combination of terms of the type dxi1 ∧· · ·∧dxin .
This = 0 if two indices occur twice. Hence any n–form is a multiple of the form
dx1 ∧ · · · ∧ dxn . This proves the first assertion. To prove the transformation
rule, compute:
∂ x̃1 i1 ∂ x̃n ∂ x̃1 ∂ x̃n
dx̃1 ∧ · · · ∧ dx̃n = i
dx ∧ · · · ∧ in dxin = i1 ···in i1 · · · in dx1 ∧ · · · ∧ dxn
∂x 1 ∂x ∂x ∂x
 ∂ x̃i 
= det dx1 ∧ · · · ∧ dxn
∂xj
Hence
D̃dx̃1 ∧ · · · ∧ dx̃n = Ddx1 ∧ · · · ∧ dxn
where D̃ det(∂ x̃i /∂xj ) = D.

Differential forms on a Riemannian manifold

From now on assume that there is given a Riemann metric g on M . In coordi-


nates (xi ) we write ds2 = gij dxi dxj as usual.
3.1. 10 Theorem. The n–form
q
| det(gij )| dx1 ∧ · · · ∧ dxn (3.6)

is independent of the coordinate system (xi ) up to a ± sign.


More precisely, the forms (9) corresponding to two coordinate systems (xi ),
(x̃j ) agree on the intersection of the coordinate domains up to the sign of
det(∂xi /∂ x̃j ).
140 CHAPTER 3. CALCULUS ON MANIFOLDS

Proof. Exercise.
3.1. 11 Remarks and definitions. If M is connected, then the collection of
all coordinate system on M falls into two classes characterized by the property
that the Jacobian is positive on the intersection of the coordinate domains for
any two coordinate systems in the same class. This follows from the fact that
the various Jacobians det(∂xi /∂ x̃j ) are always non-zero (where defined), hence
cannot change sign along any continuous curve. Singling out one of these two
classes determines what is called an orientation on M . The coordinate systems
in the distinguished class are then called positively oriented. If M is has several
connected components, the orientation may be chosen independently on each.
We shall now assume that M is oriented and only use positively oriented coor-
dinate systems. This eliminates the ambiguous sign in the above n-from, which
is then called the volume element of the Riemann metric g on M , denoted volg .
3.1. 12 Example. In R3 with the Euclidean metric dx2 + dy 2 + dz 2 the volume
element is :
Cartesian coordinates (x, y, z): dx ∧ dy ∧ dz
Cylindrical coordinates (r, θ, z): rdr ∧ dθ ∧ dz
Spherical coordinates (ρ, θ, φ): ρ2 sin φdρ ∧ dθ ∧ dφ
3.1. 13 Example. Let S be a two-dimensional submanifold of R3 (smooth
surface). The Euclidean metric dx2 + dy 2 + dz 2 on R3 gives a Riemann metric
g = ds2 on S by restriction. Let u,v coordinates on S. Write p = p(u, v) for the
point on S with coordinates (u, v). Then

q ∂p ∂p
| det gij | =
× .
∂u ∂v

Here gij is the matrix of the Riemann metric in the coordinate system u,v. The
right-hand side is the norm of the cross-product of vectors in R3 .
Recall that in a Riemannian space indices on tensors can be raised and lowered
at will.
3.1. 14 Definition. The scalar product g(α, β) of two k-forms

α = aij··· dxi ∧ dxj · · · , β = bij··· dxi ∧ dxj · · · , i < j · · · ,

is
g(α, β) = aij··· bij··· ,
sum over ordered k-tuples i < j < · · · .
3.1. 15 Theorem. For any k-form α there is a unique (n−k )-form * α so that

β ∧ ∗α = g(β, α)vol g . (3.7)

for all k-forms β. The explicit formula for this *-operator is


q
* ai k +1 ···in = |det(gij )|i1 ···in ai1 ···ik , (3.8)
3.1. DIFFERENTIAL FORMS 141

where the sum is extended over ordered k-tuples. Furthermore


* (* α) = (−1k(n−k) sgn det(gij )α. (3.9)
Proof. The statement concerns only tensors at a given point po . So we fix
po and we choose a coordinate system (xi ) around po so that the metric at
the point po takes on the pseudo-Euclidean form gij (po ) = gi δij with gi = ±1.
To verify the first assertion is suffices to take for α a fixed basis k-form α =
dxi1 ∧ · · · ∧ dxik with i1 < · · · < ik . Consider the equation (7) for all basis forms
β = dxj1 ∧ · · · ∧ dxjk , j1 < · · · jk . Since gij = gi δij at po , distinct basis k-forms
are orthogonal, and one finds
(dxi1 ∧ · · · ∧ dxik ) ∧ ∗(dxi1 ∧ · · · ∧ dxik ) = gi1 · · · gik dx1 ∧ · · · ∧ dxn
for β = dxi1 ∧ · · · ∧ dxik and zero for all other basis forms β. One sees that
∗(dxi1 ∧ · · · ∧ dxik ) = i1 ···in gi1 · · · gik dxik+1 ∧ · · · ∧ dxin (3.10)
where ik+1 , · · · , in are the remaining indices in ascending order. This proves the
existence and uniqueness of ∗α satisfying (7). The formula (10) also implies (8)
for the components of the basis formsα in the particular coordinate system (xi )
and hence for the components of any k-form α in this coordinate system (xi ).
But the γi1 ···in are the components of the (0, n) tensor corresponding to the
n-form in theorem 3.1.10. It follows that (8) is an equation between components
of tensors of the same type, hence holds in all coordinate systems as soon as it
holds in one. The formula (9) can been seen in the same way: it suffices to take
the special α and (xi ) above.

3.1. 16 Example: Euclidean 3-space. Metric: dx2 + dy 2 + dz 2


The *-operator is given by:
(0) For 0-forms: ∗f= f dx ∧ dy ∧ dz
(1) For 1-forms: *(Pdx+Qdy + Rdz) = (Pdy ∧ dz−Qdx ∧ dz + Rdx ∧ dy)
(2) For 2-forms: *(Ady ∧ dz + Bdx ∧ dz+Cdx ∧ dy) = (Adx − Bdy+Cdz).
(3) For 3-forms: *(Ddx ∧ dy ∧ dz) = D.
If we identify vectors with covectors by the Euclidean metric, then we have the
formulas
*(a ∧ b) = a × b, ∗ (a ∧ b ∧ c) = a · (b × c).
3.1. 17 Example: Minkowski space. Metric: (dx0 )2 −(dx1 )2 −(dx2 )2 −(dx3 )2
2-form: F = Fij dxi ∧ dxj is written as
X
F = Eα dx0 ∧ dxα + B 1 dx2 ∧ dx3 + B 2 dx3 dx1 + B 3 dx1 ∧ dx2 ;
α

where α = 1, 2,3. Then


X
*F = − B α dx0 ∧ dxα + E1 dx2 ∧ dx3 + E2 dx3 ∧ dx1 + E3 dx1 ∧ dx2 .
α
142 CHAPTER 3. CALCULUS ON MANIFOLDS

Appendix: algebraic definition of alternating tensors


Let V be a finite–dimensional vector space (over any field). Let T = T (V ) be
the space of all tensor products of elements of V . If we fix a basis {e1 , ·P· · , en }
for V then the elements of T can be uniquely written as finite sums aij···
ei ⊗ ej · · · . Tensor multiplication a ⊗ b makes T into an algebra, called the
tensor algebra of V . (It is the algebra freely generated by a basis e1 , · · · , en ,
i.e. without any relations.) Let I be the ideal of T generated by tensors of the
form x ⊗ y − y ⊗ x with x, y ∈ V . Let Λ = Λ(V ) be the quotient algebra T /I.
The multiplication in Λ is denoted a ∧ b. In terms of a basis {ei }, this means
that Λ is the algebra generated by the basis elements subject to the relations
ei ∧ ej = −ej ∧ ei . It is called the exterior algebra of V . As a vector space Λ(V )
is isomorphic with the subspace of all alternating tensors in T (V ), as in 3.1.3,
3.1.4. (But the formula analogous to (4), i.e.
X
sgn(σ) eiσ(1) ⊗ · · · ⊗ eiσ(k) → ei1 ∧ · · · ∧ eik
σ

is not quite the natural map T → Λ = T /I restricted to alternating tensors;


the latter would have an additional factor k! on the right.) It is of course
possible to realize Λ as the space of alternating tensors in the first place, without
introducing the quotient T /I; but that makes the multiplication in Λ look rather
mysterious (as in (5)).
When this construction is applied with V the space Tp∗ M of covectors at a given
point p of a manifold M we evidently get the algebra Λ(Tp∗ M ) of differential
forms at the point p. A differential form on all of M , as defined in 3.1.1,
associates to each p ∈ M an element of Λ(Tp∗ M ) whose coefficients (with respect
to a basis dxi ∧ dxbj ∧ · · · (i < j < · · · ) coming from a local coordinate system
{xi }) are C ∞ functions of the xi .

EXERCISES 2.2
1. Prove the formulas in Example 3.1. 2.
2. Prove the formula in Example 3.1. 13
3. Prove the formula (0)-(3) in Example 3.1.16
4. Prove the formulas *(a∧b) =a×b, and *(a∧b∧c) =a·(b×c) in Example 3.1.16
5. Prove the formula for *F in Example 3.1. 16
6. Prove the Tij··· defined as in Lemma 3.1.5 in terms of a (0,k)-form fij··· dxi ∧
dx · · · do transform like the components of a (0,k)-tensor, as stated in the proof
of the lemma.
7. Verify the formula *(*F ) = (−1)k(n−k) sgndet(gij )F of Theorem 3.1.15 for
(0,k)–tensors on R3 , k= 0, 1, 2, 3 directly using the formulas of Example 3.1.16.
8. Let ϕ be the following differential form on R3 :

a)(x2 +y 2 +z 2 )dx+ydy +dz b)xdy ∧dz +ydx∧dz c)(x2 +y 2 +z 2 )dx∧dy ∧dz


3.1. DIFFERENTIAL FORMS 143

Find the expression for ϕ in spherical coordinates (ρ, θ, φ).


9. Let ϕ be the following differential form on R3 :
2
a) sin θdρ + ρ cos φdθ + ρ2 dφ b) ρ cos θdρ ∧ dθ − dρ ∧ dφ c)ρ2 dρ ∧ dθ ∧ dφ

(ρ, θ, φ are spherical coordinates.)


(1) Use the formula (7) of Theorem 3.1.15 to find *ϕ.
(2) Use the formula (8) of Theorem 3.1.15 to find *ϕ.
10. Give a direct proof of Theorem 3.1.15, parts (7) and (8), using an arbitrary
coordinate system (xi ).
11. Prove (4).
12. Prove (5).
13. Let Λ = T /I be the exterior algebra of V as defined in 3.1.18. Show that
the natural map T → Λ = T /I restricted to alternating k–tensors is given by
(4) with an additional factor k! on the right.
144 CHAPTER 3. CALCULUS ON MANIFOLDS

3.2 Differential calculus


On an arbitrary manifold it is not possible to define a derivative operation on
arbitrary tensors in a general and natural way. But for differential forms (i.e.
alternating covariant tensors) this is possible, as will now be discussed.
3.2. 1 Theorem. Let $ =f ij··· dxi ∧ dxj · · · be a differential k-form. Then the
differential (k +1)-form

∂fij··· k
dfij··· ∧ dxi ∧ dxj · · · = dx ∧ dxi ∧ dxj · · ·
∂xk
is independent of the coordinate system (xi ).
Proof. Consider first the case of a differential 1-form. Thus assume fi dxi =
f˜a dx̃a . This equation gives fi dxi = f˜a (∂ x̃a /∂xi )dxi hence fi = f˜a (∂ x̃a /∂xi ), as
we know. Now compute: 
a
∂fi
∂xk
dxk ∧ dxi = ∂x∂ k f˜a ∂∂xx̃ i dxk ∧ dxi
 ˜ a 
= ∂x∂ fa ∂ x̃ ˜ ∂ 2 x̃a dxk ∧ dxi
k ∂xi + fa ∂xk ∂xi

∂ f˜a ∂ x̃b ∂ x̃a 2 a


= ∂ x̃b ∂xk ∂xi
dxk ∧ dxi + f˜a ∂x∂ kx̃∂xi dxk ∧ dxi
∂ f˜a
= ∂ x̃b
dx̃b ∧ dx̃a + 0
because the terms ki and ik cancel in view of the symmetry of the second
partials. The proof for a general differential k-form is obtained by adding some
dots · · · to indicate the remaining indices and wedge–factors, which are not
affected by the argument.
3.2. 2 Remark. If T = (Ti1 ···ik ) is the alternating (0,k)-tensor corresponding
to $ then the alternating (0,k+1)-tensor dT corresponding to d$ is given by
the formula
k+1
X ∂Tj1 ···jq−1 jq+1 ···jk+1
(dT )j1 ···jk+1 = (−1)q−1
q=1
∂xjq

3.2. 3 Definition. Let $ = fij··· dxi ∧ dxj · · · be a differential k-form. Define a


differential (k+1)-form d$, called the exterior derivative of $, by the formula

∂fij··· k
d$ := dfij··· ∧ dxi ∧ dxj · · · = dx ∧ dxi ∧ dxj · · · .
∂xk
Actually, the above recipe defines a form d$ separately on each coordinate
domain, even if $ is defined on all of M . However, because of 3.2.1 the forms
d$ on any two coordinate domains agree on their intersection, so d$ is really
a single form defined on all of M after all. In the future we shall take this kind
of argument for granted.
3.2. 4 Example: exterior derivative in R3 .
0-forms (functions): $ = f, d$ = ∂f ∂f
∂x dx + ∂y dy +
∂f
∂z dz.
1-forms (covectors): $ = Adx + Bdy + Cdz
3.2. DIFFERENTIAL CALCULUS 145

d$ = ( ∂C ∂B ∂C ∂A ∂B
∂y − ∂z ), dy ∧ dz − ( ∂x − ∂z )dz ∧ dx + ( ∂x −
∂A
∂y )dx ∧ dy
2-forms :4 $ = P dx ∧ dz + Qdz ∧ dx + Rdx ∧ dy
∂Q
d$ = ( ∂P ∂R
∂x + ∂y + ∂dz )dxdydz
3-form: $ = Ddx ∧ dy ∧ dz
3.2. 5 Theorem (Product Rule). Let α, β be differential forms with α ho-
mogeneous of degree|α|. Then

d(α ∧ β) = (dα) ∧ β + (−1)|α| α ∧ (dβ).

Proof. By induction on |α|it suffices to prove this for α a 1-form, say α = ak dxk .
Put β = bij··· dxi ∧ dxj · · · . Then

d(α∧β) = d(ak bij dxk ∧dxi ∧dxj · · · ) = {(dak )bij +ai (dbij··· )}∧dxk ∧dxi ∧dxj · · ·

= (dak ∧dxk )∧(bij dxi ∧dxj · · · )−(ai dxk )∧(dbij··· ∧dxi ∧dxj · · · ) = (dα)∧β−α∧(dβ).

Remark.. The exterior derivative operation $ → d$ on differential forms


$ defined on open subsets of M is uniquely characterized by the following
properties.
a) If f is a scalar function, then df is the usual differential of f .
b) For any two forms α, β, one has d(α + β) = dα + dβ.
c) For any two forms α, β with α homogeneous of degree p one has

d(α ∧ β) = (dα) ∧ β + (−1)|α| α ∧ (dβ).

This is clear since any differential form fij··· dxi ∧dxj · · · on a coordinate domain
can be built up from scalar functions and their differentials by sums and wedge
products. Note that in this “axiomatic“ characterization we postulate that d
operates also on forms defined only on open subsets. This postulate is natural,
but actually not necessary.
3.2. 6 Theorem. For any differential form $ of class C 2 , d(d$) = 0.
Proof. First consider the case when $ = f is a C 2 function. Then

∂f i ∂2f
dd$ = d( dx ) = dxj ∧ dxi
∂xi ∂xj ∂xi
and this is zero, since the terms ij and ji cancel, because of the symmetry of
the second partials. The general case is obtained by adding some dots, like
$ = f... ... to indicate indices and wedge-factors, which are not affected by the
argument. (Alternatively one can argue to induction.)
3.2. 7 Corollary. If $ is a differential form which can be written as an exterior
derivative $ = dϕ, then d$ = 0.
146 CHAPTER 3. CALCULUS ON MANIFOLDS

3.2. 8 Remark. This corollary has a local converse:


Poincarés Lemma. If $ is a C ∞ differential from satisfying d$ = 0 then
every point has a neighbourhood U so that $ = dϕ on U for some C ∞ form
defined on U .
However there may not be one single form ϕ so that $ = dϕ on the whole man-
ifold M . (This is true provided M is simply connected, which means intuitively
that every closed path in M can be continuously deformed into a point.)
3.2. 9 Definition (pullback of forms). Let F : N → M be a C ∞ map-
ping between manifolds. Let (y 1 , · · · , y m ) be a coordinate system on N and
(x1 , · · · , xn ) a coordinate system on M . Then F is given by equations

xi = F i (y 1 , · · · , y m ), i = 1, · · · , n.

Let $ be a differential form on M :

$ = Fij··· (x1 , · · · , xn )dxi ∧ dxj · · ·

f ∗ $ is the differential form on N ,

F * $ = gkl··· (y 1 , · · · , y m )dy k ∧ dy l · · ·

obtained from $ by the substitution

∂F i j
xi = F i (y 1 , · · · , y m ), dxi = dy . (*)
∂y j

F ∗ $ is called the pull-back of $ by F .


3.2. 10 Lemma. The differential form F ∗ $ is independent of the coordinate
systems (xi ), (y j ).
Proof. The equations (*) are those defining dF . So in terms of the alternating
multilinear function $(v, w, · · · ) the definition of F ∗ $ amounts to

(F ∗ $)(v, w, · · · ) = $(dF (v), dF (w), · · · )

from which the independence of coordinates is evident.


3.2. 11 Remark. Any (0,k)-tensor T on M (not necessarily alternating) can
be written as a linear combination of tensor products dxi ⊗ dxj ⊗ · · · of the
coordinate differentials dxi . By the same procedure just used one can define a
(0,k)-tensor F ∗ T on N .
3.2. 12 Theorem. T he pull-back operation has the following properties
(a) F ∗ ($1 + $2 ) = F ∗ $1 + F ∗ $2
(b) F ∗ ($1 ∧ $2 ) = (F ∗ $1 ) ∧ (F ∗ $2 )
(c) d(F ∗ $) = F ∗ (d$)
(d ) (G ◦ F )∗ $ = G∗ (F ∗ $) (if the composite G ◦ F makes sense.)
3.2. DIFFERENTIAL CALCULUS 147

Proof. Exercise.

3.2. 13 Examples.
(a) Let $ be the 1-form on R2 given by

$ = xdy

in Cartesian coordinates x, y on R2 . Let F : R → R2 , t→ F (t) = (x, y) be the


map given by
x = sin t, y = cos t.

Then F ∗ $ is the 1-form on R given by

F ∗ $ = F ∗ (xdy) = sin td cos t = − sin2 tdt.

(b) Let $ be the 1-form on R given by

$ = dt

in the Cartesian coordinate t on R. Let F : R2 → R, (x, y) → F (x, y) =t be the


map given by
t = (x − y)2 .

Then
F ∗ $ = F ∗ (dt) = d(x − y)2 = 2(x − y)(dx − dy).

(c) The symmetric (0, 2)-tensor g representing the Euclidean metric dx2 + dy 2 +
dz 2 on R3 has components (δij ) in Cartesian coordinates. It can be written as

g = dx ⊗ dx + dy ⊗ dy + dz ⊗ dz .

Let S 2 = {(x, y, z) ∈ R2 | x2 + y 2 + z 2 = R2 } be the sphere (a submanifold of


R3 ) and F : S 2 → R3 the inclusion. Then F ∗ g is the symmetric (0, 2)-tensor
on S 2 which represents the Riemann metric on S 2 obtained from the Euclidean
metric on R3 by restriction to S 2 . (F ∗ g is defined in accordance with remark
3.2.11)
(d) Let T be any (0,k)-tensor on M . T can be thought of as a multilinear
function T (u,v, · · · ) on vectors on M . Let S be a submanifold of M and i : S →
M the inclusion mapping S.The pull-back i∗ T of T by i is just the restriction
of T to tangent vectors to S. This means that

i∗ T (u, v, · · · ) = T (u, v, · · · )

when u,v, · · · are tangent vectors to S. The tensor i∗ T is also denoted T |S ,


called the restriction of T to S. (This works only for tensors of type (0,k)!)
148 CHAPTER 3. CALCULUS ON MANIFOLDS

Differential calculus on a Riemannian manifold


From now on assume given a Riemann metric ds2 = gij dxi dxj . Recall that the
Riemann metric gives the *-operator on differential forms.
3.2. 14 Definition. Let $ be a k-form. Then the δ$ is the (k−1)-form defined
by
*δ$ = d*$.
Remark. One can also write

−1
δ$ = * d*$ = (−1)(n−k+1)(k−1) *d*$.

3.2. 15 Example: Vector analysis in R3 . Identify vectors and covectors


on R3 using the Euclidean metric ds2 = dx2 + dy 2 + dz 2 , so that the 1-form
F = Fx dx+Fy dy+Fz dz is identified with the vector field Fx (∂/∂x)+Fy (∂/∂y)+
Fz (∂/∂z). Then
∗dF = curlF and d ∗ F = ∗divF.
The second equation says that δF =divF . Summary: if 1-forms and 2-forms on
R3 are identified with vector fields and 3-forms with scalar functions (using the
metric and the star operator) then d and δ become curl and div, respectively.
If f is a scalar function, then

δdf = div gradf = ∆f ,

the Laplacian of f . These formulas may be used to do vector analysis in arbi-


trary coordinates (see the exercises at the end).

EXERCISES 3.2
1. Let f be a C2 function (0-form). Prove that d(df ) = 0.
2. Let $ = i<j fij dxi ∧ dxj . Write d$ in the form
P

X
d$ = fijk dxi ∧ dxj ∧ dxk
i<j<k

(Find fijk .)
3. Let ϕ = fi dxi be a 1-form. Find a formula for δϕ.
4. Prove the assertion of Remark 2, assuming Theorem 3.2.1.
5. Prove parts (a) and (b) of Theorem 3.2.11.
6. Prove part (c) of Theorem 3.2.11.
7. Prove part (d) of Theorem 3.2.11.
8. Prove from the definitions the formula

i∗ T (u, v, · · · ) = T (u, v, · · · )
3.2. DIFFERENTIAL CALCULUS 149

of E xample 3.2.15 (d).


9. Let $ =Pdx+Qdy + Rdz be a 1-form on R3 .
(a) Find a formula for $ in cylindrical coordinates (r, θ, z).
(b) Find a formula for d$ in cylindrical coordinates (r, θ, z).
10. Let u1 ,u2 ,u3 be an orthogonal coordinate system in R3 , i.e. a coordinate
system so that Euclidean metric ds2 becomes “diagonal“:

ds2 = (h1 du1 )2 + (h2 du2 )2 + (h3 du3 )2

for certain positive functions h1 , h2 , h3 . Let $ = A1 du1 + A2 du2 + A3 du3 be a


1-form on R3 .
(a) Find *d$. (b) Find d*$.
11. Let $ = i<j fij dxi ∧ dxj . Use the definition 3 to prove that
P
 
∂f ∂f
d$ = i<j<k ∂xijk + ∂xjki + ∂f i j k
P
∂xj dx ∧ dx ∧ dx .
ki

i
12. a) Let $ = fi dxi be a C ∞ 1-form. Prove that δ$ = − γ1 ∂(γf )
∂xi where
p
γ = | det gij |. Deduce
 that

1 ∂ ij ∂f
δdf = − γ ∂xi γg ∂xj
[Remark. The operator ∆ : f → −δdf is called the Laplace-Beltrami operator
of the Riemann metric.]
13. Use problem 12 to write down ∆f = δdf
(a) in cylindrical coordinates (r, θ, z) on R3 ,
(b) in spherical coordinates (ρ, θ, φ) on R3 ,
(b) in geographical coordinates (θ, φ) on S 2 .
14. Identify vectors and covectors on R3 using the Euclidean metric ds2 =
dx2 + dy 2 + dz 2 , so that the 1-form Fx dx + Fy dy + Fz dz is identified with the
vector field Fx (∂/∂x) + Fy (∂/∂y) + Fz (∂/∂z). Show that
a) ∗dF =curlF b) d ∗ F = ∗divF
for any 1-form F , as stated in E xample 3.2.15
15. Let F = Fr (∂/∂r) + Fθ (∂/∂θ) + Fz (∂/∂z) be a vector field in cylindrical
coordinates (r, θ, z) on R3 . Use problem 14 to find a formula for curlF and
divF .
16. Let F = Fρ (∂/∂ρ) + Fθ (∂/∂θ) + Fφ (∂/∂φ) be a vector field in spherical
coordinates (ρ, θ, φ) on R3 . Use problem 14 to find a formula for curlF and
divF .
17. Find all radial solutions f to Laplace’s equation ∆f = 0 in R2 and in R3 .
(“Radial“ means f (p) = f (r). You may use problem 12 to find a formula of
∆f .)
150 CHAPTER 3. CALCULUS ON MANIFOLDS

3.3 Integral calculus


It is not possible to define the integral of a scalar functions f on a manifold in
coordinates (xi ) as
Z Z
· · · f (x1 , · · · , xn )dx1 · · · dxn

if one wants the value of the integral to come out independent of the coordinates.
The reason for this is the Change of Variables formula form calculus.
3.3.1 Theorem (Change of Variables Formula). Let x̃j = F j (x1 , · · · , xn )
be a C ∞ mapping Rn → Rn which maps an open set U one-to-one onto an
open set Ũ . Then for any integrable function f on U ,

∂ x̃i
Z Z Z Z
· · · f dx̃1 · · · dx̃n = · · · f | det j |dx1 · · · dxn .
∂x

Explanation. The domain of integration Ũ on the left corresponds to the


domain of integration U on the right under the mapping F : Ũ = F (U ). On
the left, the function f is expressed in terms of the x̃j by means of the mapping
x̃i = F i (x1 , · · · , xn ).
Proof. See your Calculus text (e.g. Marsden-Tromba), at least for the case
n= 2, 3.
From now on (x1 , · · · , xn ) denotes again a coordinate system on an n–dimensional
manifold M , as usual. Recall that an orientation on is a class of coordinate sys-
tems any two of which, say (xi ), (x̃j ), satisfy

∂xi
det >0
∂ x̃j
at all points in the domain of both coordinate systems. Any coordinate sys-
tem (xi ) for which this determinant is positive for all (x̃j ) in the given class
is itself called positively oriented. An orientation on manifold consists of the
specification of such a class.
3.3.2 Theorem and definition. Let M be an oriented n–dimensional mani-
fold, (xi ) a positively oriented coordinate system defined on an open set U . For
any n–form $ = f dxb 1 ∧ · · · ∧ dxn on U the integral

Z Z Z
$ := · · · f dx1 · · · dxn
U

independent of the coordinates (xi ).


Proof. Let (x̃j ) be another coordinate system. Write f dx1 ∧ · · · ∧ dxn =
f˜dx̃1 ∧ · · · ∧ dx̃n . Then f = f˜ det(∂ x̃j /∂xi ). Therefore
j
· · · f dx1 · · · dxn = · · · f˜ det ∂∂xx̃i dx1 · · · dxn
R R R R
3.3. INTEGRAL CALCULUS 151

j
= ± · · · f˜| det ∂∂xx̃i |dx1 · · · dxn
R R

= ± · · · f˜dx̃1 · · · dx̃n
R R

Remarks. This theorem–definition presupposes that the integral on the right


exists. This is certainly the case if f is continuous and the (xi (p)), p ∈ U , lie in
a bounded subset of Rn . It also presupposes that the region of integration lies
in the domain of the coordinate system (xi ). If this is not the case the integral
is defined by subdividing the region of integration into sufficiently small pieces
each of which lies in the domain of a coordinate system, as one does in calculus.
(See the appendix for an alternative procedure.)
3.3.3 Example. Suppose M has a Riemann metric g. The volume element is
the n–form
volg = | det gij |1/2 dx1 ∧ · · · ∧ dxn .
It is independent of the coordinates up to sign. Therefore the positive number
Z Z Z
volg = · · · | det gij |1/2 dx1 · · · dxn .
U

is independent of the coordinate system. It is called the volume of the region


U . (When dim M = 1 it is naturally called length, dim M = 2 it is called area.)
More generally, if f is a scalar valued function, then f volg is an n–form, and
Z Z Z
f volg = · · · f | det gij |1/2 dx1 · · · dxn .
U

is independent of the coordinates up to sign. It is called the integral of f with


respect to the volume element volg .
3.3.4 Definition. Let ω be an m-form on M (m ≤ dim M ), S an m-dimensional
submanifold of M . Then Z Z
ω= ω|S
S S

where ω|S is the restriction of ω to S.


Explanation. Recall that ω|S is an m-form on S (defined in §6). Since m =
dim S, the integral on the right is defined by definition 2 above.
3.3.5 Example: integration of forms on R3 .
0-forms. A 0-form is a function f . A zero dimensional manifold is a discrete set
of points {pi }. The “integral“ of f over {pi } is the sum
X
f (pi ).
i

The sum need not be finite. The integral exists provided the series converges.
1-forms. A 1-form on R3 may be written as

P dx + Qdy + Rdz.
152 CHAPTER 3. CALCULUS ON MANIFOLDS

Let C be a 1-dimensional submanifold of R3 . Let t be a coordinate on C. Let


p = p(t) be the point on C with coordinate t and write

x = x(t), y = y(t), z = z(t)

for the Cartesian coordinates of p(t). Thus C can be considered a curve in R3


parametrized by t. The restriction of P dx + Qdy + Rdz to C is
dx dy dz
P +Q +R
dt dt dt
where P, Q, R are evaluated at p(t). Thus
Z Z
dx dy dz
Pdx + Qdy + Rdz = (P +Q + R )dt,
C C dt dt dt
the usual line integral. If the coordinate t does not cover all of C, then the
integral must be defined by subdivision as remarked earlier.
2-forms. A 2-form on R3 may be written as

Ady ∧ dz − Bdx ∧ dz + Cdx ∧ dy.

Let S be a 2-dimensional submanifold of R3 . Let (u, v) be coordinates on S.


Let p = p(u, v) be the point on S with coordinates (u, v) and write

x = x(u, v), y = y(u, v), z = z(u, v)

for the Cartesian coordinates of p(u, v). Thus S can be considered a surface in
R3 parametrized by (u, v) ∈ D. The restriction of Ady ∧dz −Bdx∧dz +Cdx∧dy
to S is
n  ∂y ∂z ∂z ∂y   ∂x ∂z ∂z ∂x   ∂x ∂y ∂y ∂x o
A − +B − +C − du ∧ dv.
∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v
In calculus notation this expression is V · N where V = Ai + Bj + Ck and
∂p ∂p  ∂y ∂z ∂z ∂y   ∂x ∂z ∂z ∂x   ∂x ∂y ∂y ∂x 
N= × = − i− − j+ − k.
∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v
V is the vector field corresponding to the given 2-form under the correspondence
discussed in §5. N is the normal vector corresponding to the parametrization
of the surface. Thus the integral above is the usual surface integral:
Z ZZ
Ady ∧ dz − Bdx ∧ dz + Cdx ∧ dy = V · N dudv.
S
D
If the coordinates (u, v) do not cover all of S, then the integral must be defined
by subdivision as remarked earlier.
3-forms. A 3-form on R3 may be written as

f dx ∧ dy ∧ dz
3.3. INTEGRAL CALCULUS 153

and its integral is the usual triple integral


Z ZZZ
f dx ∧ dy ∧ dz = f dxdydz.
U
U
3.3.6 Definitions. Let S be a k–dimensional manifold, R subset of S with the
following property. R can be covered by finitely many open coordinate cubes
Q, consisting of all points p ∈ S whose coordinates of p in a given coordinate
system (x1 , · · · , xk ) satisfy
−1 < xi < 1, in such a way that each Q is either completely contained in R or
else intersects R in a half-cube
R ∩ Q = { all points in Q satisfying x1 ≤ 0}. (*)
The boundary of R consists of all points in the in the sets (*) satisfying x1 = 0.
It is denoted ∂R. If S is a submanifold of M , then R is also called a bounded
k-dimensional submanifold of M .
Remarks. a) It may happen that ∂R is empty. In that case R is itself a
k-dimensional submanifold and we say that R is without boundary.
b)Instead of cubes and half-cubes one can also use balls and half-balls to defined
“bounded submanifold“.
3.3.7 Lemma. The boundary ∂R of a bounded k-dimensional submanifold is
a bounded k-1-dimensional submanifold without boundary with coordinates the
restrictions to ∂R of the n−1 last coordinates x2 , · · · , xn of the coordinates
x1 , · · · , xn on S of the type entering into (* ).
Proof. Similar to the proof for submanifolds (without boundary).
3.3.8 Conventions. (a) On an oriented n–dimensional manifold only positively
oriented coordinates are admitted in the definition on the integral of a n–form.
This specifies the ambiguous sign ± in the definition of the integral.
(b) Any n-form $ is called positive if on the domain of any positive coordinate
system (xi )
$ = Ddx1 ∧ · · · ∧ dxn with D > 0.
Note that one can also specify an orientation by saying which n-forms are pos-
itive.
3.3.9 Lemma and definition. Let R ⊂ S be a bounded k–dimensional sub-
manifold. Assume that S is oriented. As (x1 , · · · , xn ) runs over the positively
oriented coordinate systems on S of the type referred to in Lemma 3.3.8, the
corresponding coordinate systems (x2 , · · · , xn ) on ∂R defined an orientation on
∂R, called the induced orientation.
Proof. Let (xi ) and (x̃i ) be two positively oriented coordinate systems of this
type. Then on ∂R we have x1 = 0 and x̃1 = 0, so if the x̃i are considered as
functions of the xi by the coordinate transformation, then
0 ≡ x̃1 = x̃1 (0, x2 , · · · , xk ) on ∂R
154 CHAPTER 3. CALCULUS ON MANIFOLDS

Thus ∂ x̃1 /∂x2 = 0, · · · , ∂ x̃1 /∂xk = 0 on ∂R. Expanding det(∂ x̃i /∂xj ) along
the “row“ (∂x1 /∂xj ) one finds that
 ∂ x̃i   ∂ x̃1   ∂ x̃i 
det j
= 1
det on ∂R.
∂x 1≤ij≤k ∂x ∂xj 2≤ij≤k
The LHS is > 0, since (xi ) and (x̃i ) are positively oriented. The derivative
∂x1 /∂ x̃1 cannot be < 0 on ∂R, since x̃1 = x̃1 (x1 , x2o , · · · xko ) is = 0 for x1 = 0
and is < 0 for x1 < 0. It follows that in the above equation the first factor
(∂x1 /∂ x̃1 ) on the right is > 0 on ∂R, and hence so is the second factor, as
required.
3.3.10 Example. Let S be a 2-dimensional submanifold of R3 . For each p ∈ S,
there are two unit vectors ±n(p) orthogonal to Tp S. Suppose we choose one
of them, say n(p), depending continuously on p ∈ S. Then we can specify an
orientation on S by stipulating that a coordinate system (u, v) on S is positive
if
∂p ∂p
× = Dn with D > 0. (*)
∂u ∂v
Let R be a bounded submanifold of S with boundary C = ∂R. Choose a positive
coordinate system (u, v) on S around a point of C as above, so that u < 0 on
R and u = 0 on C. Then p = p(0, v) defines the positive coordinate v on C
and ∂p/∂v is the tangent vector along C. Along C, the equation(*) amounts
to this. If we walk upright along C (head in the direction of n) in the positive
direction (direction of ∂p/∂v), then R is to our left (∂p/∂u points outward from
R, in the direction in the direction of increasing u, since the u-component of
∂p/∂u = ∂/∂u is+1).
The next theorem is a kind of change of variables formula for integrals of forms.
3.3.11 Theorem. Let F : M → N be a diffeomorphism of n–dimensional
manifolds. For any m–dimensional oriented bounded submanifold R of N and
any m–form $ on N one has
Z Z

F $= $
R F (R)

if F (R) is given the orientation corresponding to the orientation on R under


the diffeomorphism F .
Proof. Replacing M by the m–dimensional submanifold S containing R and N
by F (S) we may assume that m = n and in the first place. By subdivision we
may assume that R is contained in the domain of coordinates (xi ) on and F (R)
in the domain of coordinates (x̃j ) on N . Then we can write $ = f dx1 ∧· · ·∧dxn
and x̃j = F j (x1 , · · · , xn ). The formula then reduces to the usual change of
variables formula 3.3.1.
3.3.12 Theorem (Stokes’s Formula). Let ω be a C ∞ k-form on M , R a
k-dimensional oriented bounded submanifold of M . Then
Z Z
dω = ω.
R ∂R
3.3. INTEGRAL CALCULUS 155

Proof. We first prove this for the special case of a n-cube I n in Rn . Let

I n = {(x1 , · · · , xn ) ∈ Rn | 0 ≤ xj ≤ 1}.
1 n
We specify the orientation so that the Cartesian coordinates
n
Sn (x0 , · · ·1, x ) from
a positively oriented coordinate system. We have ∂I = j=1 (Ij ∪ Ij ) where
Ij0 = {(x1 , · · · , xn ) ∈ I n | xj = 0}, Ij1 = {(x1 , · · · , xn ) ∈ I n | xj = 1}.
I1
2
∗ ←− ∗
I10 ↓ ↓ I11
∗ −→
0

I2
Near Ij0 the points of I n satisfy xj ≥ 0. If one uses xj as first coordinate,
one gets a coordinate system (xj , x1 , · · · [xj ] · · · xn ) on I n whose orientation is
positive or negative according to the sign of (−1)j−1 . The coordinate system
(−xj , x1 , · · · [xj ] · · · xn ) on I n is of the type required in definition 3.3.11 for
R = I n near a point of S = Ij0 . It follows that (x1 , · · · [xj ] · · · xn ) is a coordi-
nate system on Ij0 whose orientation (specified according to 3.3.11) is positive or
negative according to the sign of (−1)j . Similarly, (x1 , · · · [xj ] · · · xn ) is a coor-
dinate system on Ij0 whose orientation (specified according to 3.3.11) is positive
or negative according to the sign of (−1)j−1 . We summarize this by specifying
the required sign as follows
Positive n–form on I n : dx1 ∧ · · · ∧ dxn
Positive (n−1)-form on Ij0 : (−1)j dx1 ∧ · · · [dxj ] · · · ∧ dxn
Positive (n−1)-form on Ij1 : (−1)j−1 dx1 ∧ · · · [dxj ] · · · ∧ dxn
A general (n−1)-form can be written as
X
ω= fj dx1 ∧ · · · [dxj ] · · · ∧ dxn .
j

To prove Stokes formula it therefore suffices to consider

ωj = fj dx1 ∧ · · · [dxj ] · · · ∧ dxn .

Compute
∂fj j
∧ dx1 ∧ · · · [dxj ] · · · ∧ dxn
R R
In
dωj = I n ∂xj dx
[definition of dω]
∂fj j−1
dx1 n
R
= In ∂xj (−1) ∧ · · · ∧ dx
[move dxj in place]
R1 R1 ∂fj
= 0
··· 0 ∂xj
(−1)j−1 dx1 · · · dx n

R1 R 1 nR 1 o [repeated integral: definition 3.3.2]


∂fj j−1
= 0
··· 0 0 ∂xj (−1) dxj dx1 · · · [dxj ] · · · dxn
[integrate first over dxj ]
R1 R 1  1
= ··· fj (−1)j−1 1dx1 · · · [dxj ] · · · dxn

0 0 xj =0
[Fundamental Theorem of Calculus]
156 CHAPTER 3. CALCULUS ON MANIFOLDS

R1 R1
= 0
··· fj |xj =1 (−1)j−1 dx1 · · · [dxj ] · · · dxn +
0
R1 R1
+ 0 · · · 0 fj |xj =0 (−1)j dx1 · · · [dxj ] · · · dxn
fj dx1 ∧ · · · [dxj ] · · · ∧ dxn + I 0 fj dx1 ∧ · · · [dxj ] · · · ∧ dxn
R R
= Ij1 j

R [definition 3.3.2]
= ∂I n
ωj
[ I 0 ωj = 0 if k 6= j, because dxk = 0 on Ik0 , Ik1 ]
R
k
This proves the formula for a cube. To prove it in general we need too appeal
to a theorem in topology, which implies that any bounded submanifold can be
subdivided into a finite number of coordinate-cubes, a procedure familiar from
surface integrals and volume integrals in R3 . (A coordinate cube is a subset of
M which becomes a cube in a suitable coordinate system).

Remarks. (a) Actually, the solid cube I n is not a bounded submanifold of Rn ,


because of its edges and corners (intersections of two or more faces Ij0,1 ). One
way to remedy this sort of situation is to argue that R can be approximated by
bounded submanifolds (by rounding off edges and corners) in such a way that
both sides of Stokes’s formula approach the desired limit.
(b) There is another approach to integration theory in which integrals over
bounded m-dimensional submanifolds are replaced P by integrals over formal lin-
ear combination (called “chains“) of m-cubes γ = k ck γk . Each γk is a map
γk : Q → M of the standard cube in Rm and the integral of an m form $ over
γ is simply defined as Z X Z
$= ck γk∗ ω.
γ Q

The advantage of this procedure is that one does not have to worry about
subdivisions into cubes, since this is already built in. This disadvantage is that
it is often much more natural to integrate over bounded submanifolds rather
than over chains. (Think of the surface and volume integrals from calculus, for
example.)

Appendix: partition of unity


We fix an n–dimensional manifold M . The definition of the integral by a subdi-
vision of the domain of integration into pieces contained in coordinate domains
is hard use for proofs. There is an alternative procedure, which cuts up the form
rather than the domain, which is more suitable for theoretical considerations.
It is based on the following lemma.
Lemma. Let D be a compact subset of M . There are continuous non–negative
functions g1 , · · · , gl on M so that

g1 + · · · + gl ≡ 1on D, gk ≡ 0 outside of some coordinate ball Bk .

Proof. For each point p∈M one can find a continuous function hp so that hp ≡ 0
3.3. INTEGRAL CALCULUS 157

outside a coordinate ball Bp around p while hp > 0 on a smaller coordinate ball


Cp ⊂Bp . Since D is compact, it is covered by finitely many of the Cp , say
C1 , · · · ,Cl . Let h1 , · · · ,hl be the corresponding P functions hp , B1 , · · · ,Bl the
corresponding coordinate balls Bp . The function hk is > 0 on D,P hence one
can find a strictly P positive continuous function h on M so that h≡ hk on D
(e.g. h=max{, hk } for sufficiently small  > 0). The functions gk =hk /h
have all the required properties.
Such a family of functions P {gk } will be called a (continuous) partition of unity
on D and we write 1 = R gk (on D).
We now define integrals D $ without cutting up D. Let $ be an n–form
on M . Exceptionally, we do not require that it be C ∞ , only that locally $ =
f dx1 ∧ · · · ∧ dxn with f integrable (in the sense of Riemann or Lebesgue, it does
not matter here). We only need to define integrals over all of M R , since we can
always make $ ≡ 0 outside of some domain D and the integral M $ is already
defined if $ ≡ 0 outside of some coordinate domain.
Proposition and definition.
P Let $ be an n–form on M vanishing outside of
a compact set D. Let 1 = gk (on D) be a continuous partition of unity on
D as in the lemma. Then the number
Z Z
g1 $ + · · · + gl $
B1 Bl

is independent ofRthe partition of unity chosen and is called the integral of $


over M , denoted M $.
Proof. Let {gi } and {g̃j } be two partitions of unity for D. Then on D,
X X
$= gi $ = g̃j $
i j

Multiply through by gk for a fixed k and integrate to get:


Z X Z X
gk gi $ = gk g̃j $
M i M j

Since gk vanishes outside of a coordinate ball, the additivity of the integral on


Rn gives
XZ XZ
gk gi $ = gk g̃j $.
i M j M

Now add over k:


XZ XZ
gk gi $ = gk g̃j $.
i,k M j,k M

Since each gi and g̃j vanishes outside a coordinate ball, the same additivity gives
XZ X XZ X
gk gi $ = gk g̃j $.
k M k j M k
158 CHAPTER 3. CALCULUS ON MANIFOLDS

P
Since k gk = 1 on D this gives
XZ XZ
gi $ = g̃j $
i M j M

as required. R
Thus the integral M $ is defined whenever $ is locally integrable and vanishes
outside of a compact set.
P
Remark. Partitions of unity 1 = gk as in the lemma can be found with the
gk even of class C ∞ . But this is not required here.

EXERCISES 3.3
1. Verify in detail the assertion in the proof of Theorem 3.3.11 that “the for-
mula reduces to the usual change of variables formula 3.3.1“. Explain how the
orientation on F (R) is defined.
2. Let C be the parabola with equation y = 2x2 .
(a) Prove that C is a submanifold of R2 .
(b) Let ω be the differential 2-form on R2 defined by

ω = 3xydx + y 2 dy.
R
Find U
ω where U is the part of C between the points (0, 0) and (1, 2).
3. Let S be the cylinder in R3 with equation x2 + y 2 = 16.
(a) Prove that S is a submanifold of R3 .
(b) Let ω be the differential 1-form on R3 defined by

ω = zdy ∧ dz − xdx ∧ dz − 3y 2 zdx ∧ dy.


R
Find U
ω where U is the part of S between the planes z = 2 and z = 5.
4. Let ω be the 1-form in R3 defined by

ω = (2x − y)dx − yz 2 dy − y 2 zdz.

Let S be the sphere in R3 with equation x2 + y 2 + z 2 = 1 and C the circle


x2 + y 2 = 1, z = 0 in the xy-plane. Calculate the following integral from the
definitions
R given in this section. Explain the result in view of Stokes’s Theorem.
(a) RC ω.
(b) S+ dω, where S+ is half of S in z ≥ 0.

5. Let S be the surface in R3 with equation z = f (x, y) where f is a C ∞


function.
(a) Show that S is a submanifold of R3 .
(b) Find a formula for the Riemann metric ds2 on S obtained by restriction of
the Euclidean metric in R3 .
3.3. INTEGRAL CALCULUS 159

(c) Show that the area (see Example 3.3.3) of an open subset U of S is given
by the integral
Z s  2   2
∂f ∂f
+ + 1 dx ∧ dy .
U ∂x ∂y
6. Let S be the hypersurface in Rn with equation F (x1 , · · · , xn ) = 0 where F
is a C ∞ function satisfying (∂F/∂xn )p 6= 0 at all points p on S.
(a) Show that S is an (n−1)-dimensional submanifold of Rn and that x1 , · · · , xn−1
form a coordinate system in a neighbourhood of any point of S.
(b) Show that the Riemann metric ds2 on S obtained by restriction of the
Euclidean metric in Rn in the coordinates x1 , · · · , xn−1 on S is given by gij =
δij + Fn−2 Fi Fj (1 ≤ i, j ≤ n − 1) where Fi = ∂F/∂xi .
(c) Show that the volume (see Example 3.3.3) of an open subset U of S is given
by the integral
r 2  2
∂F ∂F
Z
∂x1 + · · · + ∂xn

∂F
dx1 ∧ · · · ∧ dxn−1 .
U ∂xn

[Suggestion. To calculate det[gij ] use an orthonormal basis e1 , · · · , en−1 for


Rn−1 with first vector p
e1 = (F1 , · · · , Fn )/ F12 + · · · + Fn2 . ]
7. Let B = {p = (x, y, z) ∈ R3 | x2 + y 2 + z 2 ≤ 1} be the solid unit ball in R3 .
a) Show that B is a bounded submanifold of R3 with boundary the unit sphere
S2.
b) Give R3 the usual orientation, so that the standard coordinate system (x, y, z)
is positive. Show that the induced orientation on S 2 (Definition 3.3.9) corre-
sponds to the outward normal on S 2 .
8. Prove Lemma 3.3.7
160 CHAPTER 3. CALCULUS ON MANIFOLDS

3.4 Lie derivatives


Throughout this section we fix a C ∞ manifold M . All curves, functions, vector
fields, etc. are understood to be C ∞ . The following theorem is a consequence
of the existence and uniqueness theorem for systems of ordinary differential
equations, which we take for granted.
3.4.1 Theorem. Let X be a C ∞ vector field on M . For any po ∈ M the
differential equation
p0 (t) = X(p(t)) (1)
with the initial condition
p(0) = po
has a unique C ∞ solution p(t) defined in some interval about t = 0.
Write momentarily p(t) = p(X, t) to bring out its dependence on X. For any
a ∈ R one has
p(aX, t) = p(X, at)
because as a function of t, both sides of this equation satisfy p0 (t) = aX(p(t)),
p(0) = po . Hence p(X, t) depends only on tX, namely p(X, t) = p(tX, 0),
whenever defined. It also depends on po , of course; we shall now denote it by
the symbol exp(tX)po . Thus (with po replaced by p) the defining property of
exp(tX)p is

d
exp(tX)p = X(exp(tX)p), exp(tX)p|t=0 = p. (2)
dt
We think of exp(tX) : M · · · → M as a partially defined transformation of M ,
whose domain consists of those p ∈ M for which exp(tX)p exists in virtue of
the above theorem. More precisely, the map (t, p) → M , R × M · · · → M , is a
C ∞ map defined in a neighbourhood of [0] × M in R × M . In the future we
shall not belabor this point: expressions involving exp(tX)p are understood to
be valid whenever defined.
3.4.2 Theorem. (a) For all s, t ∈ R one has

exp((s + t)X)p = exp(sX) exp(tX)p, (3)

exp(−tX) exp(tX)p = p, (4)


where defined.
(b) For every C ∞ function f : M · · · → R defined in a neighbourhood of p on
M,
∞ k
X t
f (exp(tX)p) ∼ X k f (p), (5)
k!
k=0

as Taylor series at t = 0 in the variable t.


3.4. LIE DERIVATIVES 161

Comment. P In (b) the vector field X is considered a differential operator, given


by X = X i ∂/∂xi in coordinates. The formula may be taken as justification
for the notation exp(tX)p.
Proof. a) Fix p and t and consider both sides of equation (13) as function of
s only. Assuming t is within the interval of definition of exp(tX)p, both sides
of the equation are defined for s in an interval about 0 and C ∞ there. We
check that the right side satisfies the differential equation and initial condition
defining the left side:

d d  
exp((s + t)X)p = exp(rX)p = X exp((s + t)X)p ,
ds dr r=s+t

exp((s + t)X)p|s=0 = exp(tX)p.


This proves (3), and (4) is a consequence thereof.
(b) It follows from (2) that
d
f (exp(tX)p) = Xf (exp(tX)p).
dt
Using this equation repeatedly to calculate the higher-order derivatives one finds
that the Taylor series of f (p(t)) at t = 0 may be written as
X tk
f (exp(tX)p) ∼ X k f (p),
k!
k

as desired.
The family {exp(tX) : M · · · → M | t ∈ R} of locally defined transformations
on M is called the one parameter group of (local) transformations, or the flow,
generated by the vector field X.
3.4.3 Example. The family of transformations of M = R2 generated by the
vector field
∂ ∂
X = −y +x
∂x ∂y
is the one–parameter family of rotations given by
exp(tX)(x, y) = ((cos t)x − (sin t)y, (sin t)x + (cos t)y).
It is of course defined for all (x, y).
3.4.4 Pull–back of tensor fields. Let F : N → M be a C ∞ map of manifolds.
If f is a function on M we denote by F ∗ f the function on N defined by
F ∗ f (p) = f (F (p))
i.e. F ∗ f = f ◦ F . More generally, if ϕ is a covariant tensor filed on M , say of
type (0, k), then we can consider ϕ a multilinear function on vector fields on M
and define F ∗ ϕ by the rule
F ∗ ϕ(X, Y, · · · ) = ϕ(dF (X), dF (Y ), · · · )
162 CHAPTER 3. CALCULUS ON MANIFOLDS

for all vector fields X, Y, · · · on N , i.e. F ∗ ϕ = ϕ ◦ dF as multilinear function


on vector fields. For general tensor fields one cannot define such a pull–back
operation in a natural way for all maps F . Suppose however that F is a
diffeomorphism for N onto M . Then with every vector–field X on M we can
associate a vector field F ∗ X on N by the rule

(F ∗ X)(p) = dFp−1 (X(F (p))).

The pull–back operation F ∗ can now be uniquely extended to arbitrary tensor


fields so as to respect tensor addition and tensor multiplication, i.e. so that

F ∗ (S + T ) = (F ∗ S) + (F ∗ T ), (6)

F ∗ (S ⊗ T ) = (F ∗ S) ⊗ (F ∗ T ) (7)
for all tensor fields S, T on M . In fact, if we use coordinates (xi ) on M and (y j )
on N , write p = F (q) as xj = xj (y 1 , · · · , y n ), and express tensors as a linear
combination of the tensor products of the dxi , ∂/∂xi and dy a , ∂/∂y b all of this
amounts to the familiar rule

ij··· ∂y a ∂y b ∂xk ∂xl


(F ∗ T )ab...
cd··· = Tkl··· i j
··· ··· (8)
∂x ∂x ∂y c ∂y d
as one may check. Still another way to characterize F ∗ T is obtained by con-
sidering a tensor T of type (r, s) as a multilinear function of r covectors and s
vectors. Then one has

(F ∗ T )(F ∗ ϕ, F ∗ ψ, · · · , F ∗ X, F ∗ Y, · · · ) = T (ϕ, ψ, · · · , X, Y, · · · ) (9)

for any r covector fields ϕ, ψ, · · · on M and any s vector fields X, Y, · · · on M .


If M is a scalar function on M then the chain rule says that

d(F ∗ f ) = F ∗ (df ). (10)

If f is a scalar function and X vector field, then the scalar functions Xf = df (X)
on M and (F ∗ X)(F ∗ f ) satisfy

F ∗ (Xf ) = (F ∗ X)(F ∗ f ). (11)

If we write g = F ∗ f this equation says that

(F ∗ X)g = [F ∗ ◦ X ◦ (F ∗ )−1 ]g (12)

as operators on scalar functions g on N , which is sometimes useful.


The pull–back operation for a composite of maps satisfies

(G ◦ F )∗ = F ∗ ◦ G∗ (13)

whenever defined. Note the reversal of the order of composition! (The verifica-
tions of all of these rules are left as exercises.)
3.4. LIE DERIVATIVES 163

We shall apply these definitions with F = exp(tX) : M · · · → M . Of course,


this is only partially defined, so we need to take for N a suitably small open
set in M and then take for M its image. As usual, we shall not mention this
explicitly. Thus if T is a tensor field on M then exp(tX)∗ T will be a tensor field
of the same type. They need not be defined on all of M , but if T is defined at
a given p ∈ M , so is exp(tX)∗ T for all t sufficiently close to t = 0. In particular
we can define

d 1
exp(tX)∗ T (p) [exp(tX)∗ T (p) − T (p)]. (14)

LX T (p) := = lim
dt t=0
t→0 t

This is called the Lie derivative of T along X. It is very important to note that
for general tensor fields T the Lie derivative LX T (p) depends not only on the
value of the vector field X at p, as does the directional derivative Xf = df (X)
for scalar functions. (This will be seen from the discussion of special types of
tensor fields below.) We shall momentarily see how to compute it. The following
rules follow immediately from the definition.
3.4.5 Lemma. Let X be a vector field, S, T tensor fields. T hen
a) LX (S + T ) = LX (S) + LX (T )
b) LX (S · T ) = LX (S) · T + S · LX (T )
c) LX (ϕ ∧ ψ) = LX (ϕ) ∧ ψ + ϕ ∧ LX (ψ)
d) F ∗ (LX (T )) = LF ∗ X (F ∗ T )
Explanation. In (a) we assume that S and T are of the same type, so that
the sum is defined. In (b) we use the symbol S · T to denotes any (partial)
contraction with respect to some components. In (c) ϕ and ψ are differential
forms. (d) requires that F be a diffeomorphism so that F ∗ T and F ∗ X are
defined .
Proof. (a) is clear from the definition. (b) follows directly from the definition
(14) of LX T (p) as a limit, just like the product rule for scalar functions, as
follows. Let f (t) = exp(tX)∗ S and g(t) = exp(tX)∗ S. We have
1 1 1
[f (t) · g(t) − f (0) · g(0)] = [f (t) − f (0)] · g(t) + f (0) · [g(t) − g(0)], (15)
t t t
which gives (b) as t → 0. (c) is proved in the same way. To prove (d) it suffices
to consider for T scalar functions, vector fields, and covector fields, since any
tensor is built from these using sums and tensor products, to which the rules
(a) and (b) apply. The details are left as an exercise.
Remark. The only property of the “product“ S · T needed to prove a product
rule using (15) is its R–bilinearity.
3.4.6 Lie derivative of scalar functions. Let f be a scalar function. From
(13) we get

d d
exp(tX)∗ f (p)

LX f (p) = = f (exp(tX)p) = dfp (X(p)) = Xf (p),
dt t=0 dt t=0
164 CHAPTER 3. CALCULUS ON MANIFOLDS

the usual directional derivative along X.


3.4.7 Lie derivative of vector fields. Let Y be another vector field. From
(12) we get
exp(tX)∗ (Y )f = exp(tX)∗ ◦ Y ◦ exp(−tX)∗ f
as operators on scalar functions f . Differentiating at t = 0, we get

d
[exp(tX)∗ ◦ Y ◦ exp(−tX)∗ ]f

(LX Y )f = (16)
dt s=t=0

This is easily computed with the help of the formula (2), which says that
X tk
exp(tX)∗ f ∼ X k f. (17)
k!
One finds that
[exp(tX)∗ ◦ Y ◦ exp(−tX)∗ ]f
= [1 + tX + · · · ]Y [1 − tX + · · · ]f
= [Y + t(XY − Y X) + · · · ]f
Thus (16) gives the basic formula

(LX Y )f = (XY − Y X)f (18)

as operators on functions. At first sight this formula looks strange, since a


vector field X
Z= Z i (∂/∂xi )

is a first–order differential operator, so the left side of (18) has order one as
differential operator, while the right side appears to have order two. The
explanation comes from the following lemma.
3.4.8 Lemma. Let X, Y be two vector fields on M . There is a unique vector
field, denoted [X, Y ], so that [X, Y ] = XY − Y X as operators on functions
defined on open subsets of M .
Proof. Write locally in coordinates x1 , x2 , · · · , xn on M :
X ∂ X ∂
X= Xk k
,Y = Yk .
∂x ∂xk
k k

By aPsimple computation using the symmetry of second partials one sees that
Z = k Z k ∂x∂ k satisfies Zf = XY f − Y Xf for all analytic functions f on the
coordinate domain if and only if
X ∂Y k ∂X k
Zk = Xj j
−Yj . (19)
j
∂x ∂xj
3.4. LIE DERIVATIVES 165

This formula defines [X, Y ] on the coordinate domain. Because of the unique-
ness, the Z’s defined on the coordinate domains of two coordinate systems agree
on the intersection, from which one sees that Z is defined globally on the whole
manifold (assuming X and Y are).
The bracket operation (19) on vector fields is called the Lie bracket. Using this
the formula (18) becomes
LX Y = [X, Y ]. (20)
The Lie bracket [X, Y ] is evidently bilinear in X, and Y and skew–symmetric:

[X, Y ] = −[Y, X].

In addition it satisfies the Jacobi identity, whose proof is left as an exercise:

[[X, Y ], Z] + [[Z, X], Y ] + [[Y, Z], X] = 0. (21)

Remark. A Lie algebra of vector fields is a family L of vector fields which is


closed under R–linear combinations and Lie brackets, i.e.

X, Y ∈ L ⇒ aX + bY ∈ L (for all a, b ∈ R)

X, Y ∈ L ⇒ [X, Y ] ∈ L.

3.4.9 Lie derivative of differential forms. We first discuss another operation


on forms.
3.4.10 Lemma. Let X be a vector field on M . There is a unique operation
ϕ → iX ϕ which associates to each k–form ϕ a (k − 1)–form iX ϕ so that
a) iX θ = θ(X) if θ is a 1–form
b) iX (ϕ + ψ) = iX (ϕ) + iX (ψ) for any forms ϕ, ψ
c) iX (α ∧ β) = (iX α) ∧ β + (−1)|α| α ∧ (iX β) if α is homogeneous of degree
|α|.
This operator iX is given by
d) (iX ϕ)(X1 , · · · , Xk−1 ) = ϕ(X, X1 , · · · , Xk−1 )
for any vector fields X1 , · · · , Xk−1 . It satisfies the following rules
e) iX ◦ iX = 0
f )iX+Y = iX + iY for any two vector fields X, Y
g)if X = f iX for any scalar function f
h) F ∗ ◦ iX = iF ∗ X ◦ F ∗ for any diffeomorphism F .
Remark. If f is a scalar function it is understood that iX f = 0, since there
are no (−1)–forms.
Outline of proof. It is clear that there is at most one operation satisfying
(a)−(c), since any form can be built from 1–forms by wedge–products and sums.
Property (d) does define some operation ϕ → iX ϕ and one can check that it
satisfies (a) − (c). The rest consists of easy verifications.
We now turn to the Lie derivative on forms.
166 CHAPTER 3. CALCULUS ON MANIFOLDS

3.4.11 Lemma. For all vector fields X, Y we have


a) d ◦ LX = LX ◦ d
b) i[X,Y ] = LX ◦ iY − iY ◦ LX
as operators on forms.
Proof. a) Since generally F ∗ (dϕ) = d(F ∗ ϕ), we have d(exp(tX)∗ ϕ) = exp(tX)∗ (dϕ).
Differentiation at t = 0 gives d(LX ϕ) = LX (dϕ), hence (a).
b) By 3.4.10, (h)

exp(tX)∗ (iY ϕ) = iexp(tX)∗ Y (exp(tX)∗ ϕ).

The derivative at t = 0 of the right side can again be evaluated as a limit using
(15) if we write it as a symbolic product f (t) · g(t) with f (t) = exp(tX)∗ Y and
g(t) = exp(tX)∗ ϕ. This gives

LX (iY ϕ) = i[X,Y ] ϕ + iY LX ϕ,

which is equivalent to (b).


3.4.12 Lemma. Let θ be a 1–form and X, Y two vector fields. Then

Xθ(Y ) − Y θ(X) = dθ(X, Y ) + θ([X, Y ]) (22)

Proof. We use coordinates. The first term on the left is

j j
∂ i ∂θ i j ∂Y
Xθ(Y ) = X i (θ j j
Y ) = X Y j
+ X θ
∂xi ∂xi ∂xi
Interchanging X and Y and subtracting gives

∂θj i j j i j

i ∂Y
j
i ∂X
j
Xθ(Y )−Y θ(X) = (X Y −X Y )+θ X −Y = dθ(X, Y )+θ([X, Y ])
∂xi ∂xi ∂xi
as desired.
The following formula is often useful.
3.4.13 Lemma (Cartan’s Homotopy Formula). For any vector field X,

LX = d ◦ iX + iX ◦ d (23)

as operators on forms.
Proof. We first verify that the operators A on both sides of this relation satisfy

A(ϕ + ψ) = (Aϕ) + (Aψ), A(ϕ ∧ ψ) = (Aϕ) ∧ ψ + ϕ ∧ (Aψ). (24)

For A = LX this follows from Lemma 3.4.5 (c). For A = d ◦ iX − iX ◦ d,


compute:
iX (α ∧ β) = (iX α) ∧ β + (−1)|α| α ∧ (iX β)
d◦iX (α∧β) = (diX α)∧β+(−1)|α|−1 (iX α)∧(dβ)+(−1)|α| (dα)∧iX β+α∧(diX β)
3.4. LIE DERIVATIVES 167

and similarly

iX ◦d(α∧β) = (iX dα)∧β+(−1)|α|−1 (dα)∧(iX β)+(−1)|α| (iX α)∧dβ+α∧(iX dβ).

Adding the last two equations one gets the relation (24) for A = d ◦ iX + iX ◦ d.
Since any form can be built form scalar functions f and 1–forms θ using sums
and wedge products one sees from (24) that it suffices to show that (23) is true
for these. For a scalar function f we have:

LX f = df (X) = iX df

For a 1–form θ we have

(LX θ)(Y ) = Xθ(Y ) − θ([X, Y ]) by 3.4.11 (b)

= Y θ(X) + dθ(X, Y ) by(22)


= (diX θ)(Y ) + (iX dθ)(Y )
as required.
3.4.14 Corollary. For any vector field X, LX = (d + iX )2 .
3.4.15 Example: Nelson’s parking problem. This example is intended to
give some feeling for the Lie bracket [X, Y ]. We start with the formula

exp(tX) exp(tY ) exp(−tX) exp(−tY ) = exp(t2 [X,Y ] + o(t2 )) (25)


p
which is proved like (16). In this formula, replace t by t/k, raise both sides
to the power k= 0, 1, 2, · · · and take the limit as k→ ∞. There results
r r r r
 1 1 1 1 k
lim exp( X) exp( Y ) exp(− X) exp(− Y ) = exp(t[X, Y )),
k→∞ 2 2 2 2
(26)
which we interpret as an iterative formula for the one−parameter group gener-
ated by [X, Y ]. We now join Nelson (Tensor Analysis, 1967, p.33).
Consider a car. The configuration space of a car is the four dimensional mani-
fold M = R2 ×S 1 ×S 1 parametrized by (x,y, φ, θ), where (x,y) are the Cartesian
coordinates of the center of the front axle, the angle φ measures the direction
in which the car is headed, an θ is the angle made by the front wheels with
the car. (More realistically, the configuration space is the open submanifold
−θmax < θ < θmax .) See Figure 1.
There are two distinguished vector fields, called Steer and Drive, on M corre-
sponding to the two ways in which we can change the configuration of a car.
Clearly

Steer = (27)
∂θ
since in the corresponding flow θ changes at a uniform rate while x, y and
φ remain the same. To compute Drive, suppose that the car, starting in the
168 CHAPTER 3. CALCULUS ON MANIFOLDS

configuration (x, y, φ, θ) moves an infinitesimal distance h in the direction in


which the front wheels are pointing.
In the notation of Figure 1,

D = (x + h cos(φ + θ) + o(h), y + h sin(φ + θ) + o(h)).

(x,y)
θ

Fig. 1. Nelson’s car

Let l = AB be the length of the tie rod (if that is the name of the thing connecting
the front and rear axles). Then C D =l too since the tie rod does not change
length (in non−relativistic mechanics). It is readily seen that C E =l +o(h),
and since DE =hsin θ+o(h), the angle BC D (which is the increment in φ) is
(hsin θ)/l )+o(h) while θ remains the same. Let us choose units so that l = 1.
Then
∂ ∂ ∂
Drive = cos(φ + θ) + sin(φ + θ) + sin θ . (28)
∂x ∂y ∂φ
By (27) and (28 )
∂ ∂ ∂
[Steer, Drive] = − sin(φ + θ) + cos(φ + θ) + cos(θ) . (29)
∂x ∂y ∂φ
Let
∂ ∂
Slide = − sin φ + cos φ ,
∂x ∂y

Rotate = .
∂φ
Then the Lie bracket of Steer and Drive is equal to Slide+ Rotate on θ = 0, and
generates a flow which is the simultaneous action of sliding and rotating. This
motion is just what is needed to get out of a tight parking spot. By formula (26)
this motion may be approximated arbitrarily closely, even with the restrictions
−θmax < θ < θmax with θmax arbitrarily small, in the following way: steer,
drive reverse steer, reverse drive, steer, drive reverse steer , · · ·. What makes the
process so laborious is the square roots in (26).
Let us denote the Lie bracket (29) of Steer and Drive by Wriggle. Then further
simple computations show that we have the commutation relations

[Steer, Drive] = Wriggle,

[Steer, Wriggle] = −Drive,


3.4. LIE DERIVATIVES 169

(30)

[Wriggle, Drive] = Slide,

and the commutator of Slide with Steer, Drive and Wriggle is zero. Thus the
four vector fields span a four-dimensional Lie algebra over R.
To get out of an extremely tight parking spot, Wriggle is insufficient because it
may produce too much rotation. The last commutation relation shows, however,
that one may get out of an arbitrarily tight parking spot in the following way:
wriggle, drive, reverse wriggle, (this requires a cool head), reverse drive, wriggle,
drive, · · · .

EXERCISES 3.4
1. Prove (8).
2. Prove (9).
3. Prove (11).
4. Prove (13).
5. Prove the formula for exp(tX) in example 3.4.3.
6. Fix a non–zero vector v ∈ R3 and let X be the vector field on R3 defined by

X(p) = v × p (cross-product).

(We identify Tp R3 with R3 itself.) Choose a right-handed orthonormal basis


(e1 , e2 , e3 ) for R3 with e3 a unit vector parallel to v. Show that X generates the
one–parameter group of rotations around v given by the formula

exp(tX)e1 = (cos ct)e1 + (sinct)e2

exp(tX)e2 = (−sinct)e1 + (cos ct)e2

exp(tX)e3 = e3 .
where c = kvk. [For the purpose of this problem, an ordered orthonormal basis
(e1 , e2 , e3 ) is defined to be right-handed if it satisfies the usual cross–product
relation given by the “right-hand rule“, i.e.

e1 × e2 = e3 , e2 × e3 = e1 , e3 × e1 = e2 .

Any orthonormal basis can be ordered so that it becomes right–handed.]


7. Let X be the coordinate vector ∂/∂θ for the geographical coordinate system
(θ, φ) on S 2 (see example 3.2.8). Find a formula of exp(tX)p in terms of the
Cartesian coordinates (x, y, z) of p ∈ S 2 ⊂ R3 .
8. Fix a vector v ∈ Rn and let X be the constant vector field X(p) = v ∈
Tp Rn = Rn . Find a formula for the one–parameter group exp(tX) generated by
X.
170 CHAPTER 3. CALCULUS ON MANIFOLDS

9. Let X be a linear transformation of Rn (n × n matrix) considered as a vector


field p → X(p) ∈ Tp Rn = Rn . Show that the one–parameter group exp(tX)
generated by X is given by the matrix exponential

X tk k
exp(tX)p = X p
k!
k=0

where X k p is the k–th matrix power of X operating on p ∈ Rn .


10. Supply all details in the proof of Lemma 3.4.5.
11. Prove the Jacobi identity (21). [Suggestion. Use the operator formula
[X, Y ] = XY − Y X.]
12. Prove that the operation iX defined by Lemma 3.4.10 (d) satisfies (a)–
(c). [Suggestion. (a) and (b) are clear. For (c), argue that it suffices to take
X = ∂/∂xi , α = dxj1 , β = dxj2 ∧ · · · ∧ dxjk . Argue further that it suffices
to consider i ∈ {j1 , · · · , jk }. Relabel (j1 , · · · , jk ) = (1, · · · , k) so that i = 1 or
i = 2. Write dx1 ∧ dx2 ∧ · · · ∧ dxk = (dx1 ⊗ dx2 − dx2 ⊗ dx2 ) ∧ dx3 ∧ · · · ∧ dxk
and complete the proof. ]
13. Prove Lemma 3.4.10, (e)–(h).
14. Supply all details in the proof of Lemma 3.4.11 (b).
15. Prove in detail that A = d ◦ iX + iX ◦ d satisfies (24), as indicated.
16. Let X be a vector field. Prove that for any scalar function f , 1–form θ,
and vector field Y ,
a) Lf X θ = θ(X)df + f (LX θ)
b) Lf X Y = f ([X, Y ]) − df (Y )X
17. Let X = P (∂/∂x) + Q(∂/∂x) + R(∂/∂x) be a vector field on R3 , vol=
dx ∧ dy ∧ dz the usual volume form. Show that

LX (vol) = (divX)vol
∂P ∂Q ∂R
where divX = ∂x + ∂y + ∂z as usual.
18. Prove the following formula for the exterior derivative dϕ of a (k − 1)–form
ϕ. For any k vector fields X1 , · · · , Xk ,
k
X
dϕ(X1 , · · · , Xk ) = (−1)j+1 Xj ϕ(X1 , · · · , X
b j · · · , Xk )
j=1

(−1)i+j ϕ([Xi , Xj ], X1 , · · · , X
P bi , · · · , X
b j , · · · Xk )
+ i<j

where the terms with a hat are to be omitted. The term Xj ϕ(X1 , · · · , X b j · · · , Xk )
is the differential operator Xj applied to the scalar function ϕ(X1 , · · · , X
bj · · · , Xk ).
[Suggestion. Use induction on k. Calculate the Lie derivative Xϕ(X1 , · · · , Xk )
of ϕ(X1 , · · · , Xk ) by the product rule. Use LX = diX + iX d.]
3.4. LIE DERIVATIVES 171

19. Let M be an n–dimensional manifold, $ an n–form, and X a vector field.


In coordinates (xi ), write

$ = adx1 ∧ · · · ∧ dxn , X = X i .
∂xi
Show that
X ∂(aX i ) 
LX $ = dx1 ∧ · · · ∧ dxn .
i
∂xi

20. Let g be a Riemann metric on M , considered as a (0, 2) tensor g = gij dxi ⊗


dxj , and X = X k (∂/∂xk ) a vector field. Show that LX g is the (0, 2) tensors
with components
∂gij ∂X k ∂X k
(LX g)ij = X l + gkj + gik .
∂xl ∂xi ∂xj
[Suggestion. Consider the scalar function
 ∂ ∂ 
gij = g ,
∂xi ∂xj
as a contraction of the three tensors g, ∂/∂xi ,and ∂/∂xj . Calculate its Lie
derivative along X using the product rule 3.4.5 (b), extended to triple products.
Solve for (LX g)ij .]
21. Let g be a Riemann metric on M , considered as a (0, 2) tensor g = gij dxi ⊗
dxj , and X = X k (∂/∂xk ) a vector field. Show that the 1–parameter group
exp(tX) generated by X consists of isometries of the metric g if and only if
LX g = 0. Find the explicit conditions on the components X k that this be
the case. [Such a vector field is called a Killing vector field or an infinitesimal
isometry of the metric g. See 5.22 for the definition of “isometry“. You may
assume that exp(tX) : M → M is defined on all of M and for all t, but this is
not necessary if the result is understood to hold locally. For the last part, use
the previous problem.]
22. Let M be an n–dimensional manifold, S a bounded, m–dimensional, ori-
ented submanifold S of M . Let X be a vector field on M , which is tangential
to S, i.e. X(p) ∈ Tp S for all p ∈ S. Show:

R X ϕ) |S = 0Rfor any C (m + ∞
a) (i 1)–form ϕ on M ,
b) S LX $ = ∂S iX $ for any C m–form $ on M .
[The restriction $|S of a form on M to S is the pull–back by the inclusion map

S →M .]
23. Let M be an n–dimensional manifold, S a bounded, m–dimensional, ori-
ented submanifold of M . Let X be a vector field on M . Show that for any C ∞
m–form $ on M
Z Z
d
$= iX (exp(tX)∗ $) .
dt exp(tX)S ∂S
172 CHAPTER 3. CALCULUS ON MANIFOLDS

F ∗$ =
R
[Suggestion.
R Show first that for any diffeomorphism F one has S
F (S)
$.]
24. Prove (25).
25. Verify the bracket relation (29).
26. Verify the bracket relations (30).
27. In his Leçons of 1925-26 Élie Cartan introduces the exterior derivative on
differential forms this way (after discussing the formulas of Green, Stokes, and
Gauss in 3-space). The operation which produces all of theseR formulas can be
given in a very simple form. Take the case of a line integral $ over a closed
curve C. Let S a piece of a 2–surface (in a space of n dimensions) bounded by
C. Introduce on S two interchangeable differentiation symbols d1 and d2 and
partition S into the corresponding family of infinitely small parallelograms. If p
is the vertex of on of these parallelograms (Fig. 2) and if p1 , p2 are the vertices
obtained from p by the operations p1 and p2 , then

p3
p2

p1
p Fig. 2.
R p1 R p2
p
$ = $(d1 ), $ = $(d2 ),
R p3 R p1 R pp2 R p3
p1
$ = p $ + d1 p $ = $(d2 ) + d1 $(d2 ), p2
$ = $(d1 ) + d2 $(d1 ).
R
Hence the integral $ over the boundary of the parallelogram equals
$(d1 ) + [$(d2 ) + d1 $(d2 )] − [$(d1 ) + d2 $(d1 ) − $(d2 )
= d2 $(d1 ) − d2 $(d1 ).
The last expression is the exterior derivative d$.
(a)Rewrite Cartan’s discussion in our language. [Suggestion. Take d1 and
d2 to be the two vector fields ∂/∂u1 , ∂/∂u2 on the 2-surface S tangent to a
parametrization p = p(u1 , u2 ).]
(b)Write a formula generalizing Cartan’s d$ = d2 $(d1 ) − d2 $(d1 ) in case
d1 , d2 are not necessarily interchangeable. [Explain what this means.]
Chapter 4

Special Topics

4.1 General Relativity


The postulates of General Relativity. The physical ideas underlying Ein-
stein’s relativity theory are deep, but its mathematical postulates are simple:
GR1. Spacetime is a four-dimensional manifold M .
GR2. M has a Riemannian metric g of signature (− + ++); the world line
p = p(s) of a freely-falling object is a geodesic:

∇ dp
= 0. (1)
ds ds
GR3. In spacetime regions free of matter the metric g satisfies the field equations

Ric[g] = 0. (2)

Discussion. (1) The axiom GR1 is not peculiar to GR, but is at the basis
of virtually all physical theories since time immemorial. This does not mean
that it is cast in stone, but it is hard to see how mathematics as we know it
could be applied to physics without it . Newton (and everybody else before
Einstein) made further assumptions on how “space-time coordinates “ are to
be defined (“inertial coordinates for absolute space and absolute time”). What
is especially striking in Einstein’s theory is that the four-dimensional manifold
spacetime is not further separated into a direct product of a three-dimensional
manifold “space“ and a one-dimensional manifold “time“, and that there are no
further restrictions on the space-time coordinates beyond the general manifold
axioms.
(2) In the tangent space Tp M at any point p ∈ M one has a light cone, consisting
of null-vectors (ds2 = 0); it separates the vectors into timelike (ds2 < 0) and
spacelike (ds2 > 0). The set of timelike vectors at p consist of two connected
components, one of which is assumed to be designated as forward in a manner

173
174 CHAPTER 4. SPECIAL TOPICS

varying continuously with p. The world lines of all material objects are assumed
to have a forward, timelike direction.
(3 )In general the parametrization p = p(t) of a world line is immaterial.
For a geodesic however the parameter t is determined up to t → at + b with
a, b =constant. In GR2 the parameter s is normalized so that g(dp/ds, dp/ds) =
−1 and is then unique up to s → s + so ; it is called proper time along the world
line. (It corresponds to parametrization by arclength for a positive definite
metric.)
(4) Relative to a coordinate system (x0 , x1 , x2 , x3 ) on spacetime M the equations
(1) and (2) above read follows.

d2 xk dxi dxj
2
+ Γkij = 0, (10 )
ds ds ds

∂Γkql ∂Γkqk
− + − Γkpk Γpql + Γkpl Γpqk = 0, (20 )
∂xk ∂xl
where
1 li
Γljk = g (gki,j + gij,k − gjk,i ). (3)
2
Equation (1), or equivalently (10 ), takes the place of Newton’s second law

d2 xα 1 ∂ϕ
2
+ = 0. (4)
dt m ∂xα

where m is the mass of the object and ϕ = ϕ(x1 , x2 , x2 ) is a static gravitational


potential. This is to be understood in the sense that there are special kinds of co-
ordinates (x0 , x1 , x2 , x3 ), which we may call Newtonian inertial frames, for which
the equations (4) hold for α = 1, 2, 3 provided the world line is parametrized so
that x0 = at + b.
Einstein’s field equations (2) represent a system of second-order, partial dif-
ferential equations for the metric gij analogous to Laplace’s equations for the
gravitational potential of a static field in Newton’s theory, i.e.
X ∂2ϕ
=0 (5)
α
∂xα ∂xα

(We keep the convention that the index α runs over α = 1, 2, 3 only.) Thus one
can think of the metric g as a sort of gravitational potential in spacetime.
he field equations. We wish to understand the relation of the field equations
(2) in Einstein’s theory to the field equation (6) of Newton’s theory. For this
we consider the mathematical description of the same physical situation in the
two theories, namely the world lines of a collection of objects (say stars) in a
given gravitational field. It will suffice to consider a one-parameter family of
objects, say p = p(r, s), where r labels the object and s is the parameter along
its world-line.
4.1. GENERAL RELATIVITY 175

(a) In Einstein’s theory p = p(r, s) is a family of geodesics depending on a


parameter r:
∇ ∂p
= 0. (6)
∂s ∂r
For small ∆r, the vector (∂p/∂r)∆r at p(r, s) can be thought of as the relative
position at proper time s of the object r + ∆r as seen from the object ∆r.
Remark. Let us assume that at proper time s = 0 the object r sees the object
r + ∆r as being contemporary. This means that the relative position vector
(∂p/∂r)∆r is orthogonal to the spacetime direction ∂p/∂s of the object r. By
Gauss’s lemma, this remains true for all s, so r + ∆r remains a contemporary
of r. So it makes good sense to think of ∂p/∂r as a relative position vector.
The motion of the position vector (∂p/∂r)∆r of r+∆s as seen from r is governed
by Jacobi’s equation
∇2 ∂p  ∂p ∂p  ∂p
= R , . (7)
∂s2 ∂r ∂r ∂s ∂s
This equation expresses its second covariant derivative (the “relative accelera-
tion“) in terms of the position vector ∂p/∂r of r+∆r and the spacetime direction
∂p/∂s of r.
Introduce a local inertial frame (x0 , x1 , x2 , x3 ) for ∇ at po = p(ro , so ) so that
∂/∂x0 = ∂p/∂s at po . (This is a “local rest frame“ for the object ro at po in
the sense that there ∂xα /∂s = 0 for α = 1, 2, 3 and ∂x0 /∂s = 1.) At po and in
these coordinates (7) becomes
3
∂ 2 ∂xi X ∂xj
= Fji (at p0 ) (8)
∂s2 ∂r j=0
∂r

where Fji is the matrix defined by the equation


 ∂ 3
∂  ∂ X ∂
R j
, 0 0
= Fji i (at p0 ). (9)
∂x ∂x ∂x i=1
∂x

Note that the left-hand side vanishes for j = 0, hence F0i = 0 for all i.
(b) In Newton’s theory we choose a Newtonian inertial frame (x0 , x1 , x2 , x3 ) so
that the equation of motion (4) reads
∂ 2 xα 1 ∂ϕ
=− (4)
∂t2 m ∂xα
If we introduce a new parameter s by s = ct + to this becomes
∂ 2 xα 1 ∂ϕ
c2 =− .
∂s2 m ∂xα
By differentiation with respect to r we find
3
∂ 2 ∂xα 1 X ∂ 2 ϕ ∂xβ
= − (10)
∂s2 ∂r mc2 ∂xβ ∂xα ∂r
β=1
176 CHAPTER 4. SPECIAL TOPICS

(c) Compare (8) and (10). The xi = xi (r, s) in (8) refer the solutions of Ein-
stein’s equation of motion (7), the xα = xα (r, s) in (10) to solutions of Newton’s
equations of motion (4). Assume now that the local inertial frame for ∇ in (8)
is the same as the Newtonian inertial frame in (10) and assume further that
derivatives on the left-hand side of (8) and (10) agree at p0 . Then we find that
3 3
X ∂xj 1 X ∂ 2 ϕ ∂xβ
Fjα =− 2 (at p0 ) (11)
j=0
∂r mc ∂xβ ∂xα ∂r
β=1

for all α = 1, 2, 3. Since ∂p/∂r is an arbitrary spacelike vector at the point po


it follows from (11) that
1 ∂2ϕ
Fβα = − (at p0 ) (15)
mc2 ∂xβ ∂xα
for α, β = 1, 2, 3. Hence
3 3 3
X X 1 X ∂2ϕ
Fii = Fαα = − (at p0 ) (16)
i=0 α=1
mc2 α=1 ∂xα ∂xα

By the definition (9) of Fji this says that


3
 ∂ ∂  1 X ∂2ϕ
Ric(g) , = − (at p0 ) (17)
∂x0 ∂x0 mc2 α=1 ∂xα ∂xα

Newton’s field equations (5) say that the right-hand side of (17) is zero. Since
∂/∂x0 is an arbitrary timelike vector at po we find that Ric(g) = 0 at p0 .
In summary, the situation is this. As a manifold (no metric or special coordi-
nates), spacetime is the same in Einstein’s and in Newton’s theory. In Einstein’s
theory, write xi = xi (g; r, s) for a one-parameter family of solutions of the equa-
tions of motion (1) corresponding to a metric g written in a local inertial frame
for ∇ at po . In Newton’s theory write xα = xα (ϕ; r, t) for a one parameter family
of solutions of the equations of motion (4) corresponding to a static potential ϕ
written in a Newtonian inertial frame. Assume the two inertial frames represent
the same coordinate system on spacetime and require that the derivatives on
the left-hand sides of (8) and (10) agree at po . Then the relation (17) between
g and ϕ must hold. In particular, if ϕ satisfies Laplace’s equation (5) at po then
g satisfies Einstein’s field equation (2) at po .
Remark. In Newton’s theory, Laplace’s equation ∆ϕ = 0 gets replaced by
Poisson’s equation ∆ϕ = r in the presence of matter, where r depends on the
matter. In Einstein’s theory the equation Ric(g) = 0 gets analogously replaced
by Ric(g) = T , where T is a 2–tensor, called the energy-momentum tensor. But
this tensor is not an exact, fundamental representation of matter, only a rough,
macroscopic approximation. Einstein once put it something like this. “The field
equations Ric(g) = T have the aspect of an edifice whose left wing is constructed
from marble and whose right wing is constructed from inferior lumber.“
4.1. GENERAL RELATIVITY 177

Appendix: some special Relativity


Special relativity refers to the geometry in a four–dimensional vector space W
with an inner product of type (− + + + +), called Minkowski space. In terms
of components with respect to an appropriate basis (ei ) we can write w ∈ W as
w = ξ0 e0 + ξ1 e1 + ξ2 e2 + ξ3 e3 and

w2 := (w, w) = −ξ02 + ξ12 + ξ22 + ξ32 .

We shall think of this vector space as the tangent space W = Tp M at some


point (event) in the space–time M of general relativity, but this is not necessary
and historically special relativity came first.
Let e ∈ Tp M be a time–like space–time vector. Think of e as the tangent
vector at p to the world–line of some observer passing through p (i.e. present
at the event p). Let w ∈ Tp M another space–time vector at p, interpreted as
the tangent vector to the world–line of some other object. Think of w as an
infinitesimal space–time displacement from p. The observer e splits w as an
orthogonal sum
w = τe + d
and interprets τ as the relative time–duration of w and d as the relative space–
displacement of w, where relative refers to e. These components τ e and d depend
only on the space–time direction of e, i.e. on the line Re. So we assume that e
is normalized so that (e, e) = −1. Then evidently

τ = −(w, e), d = w + (w, e)e.

Thus the observer e would say the object w moves through a space–displacement
d during a time–duration τ relative to himself. Hence e would consider d/τ as
the relative space–velocity of w. For a light ray one has w2 = 0, so −τ 2 + d2 = 0
and d2 /τ 2 = 1, i.e. the observer e is using units so that velocity of light is 1.
Now suppose we have another observer e0 ∈ Tp M at p, again normalized to
(e0 , e0 ) = −1. Then e0 will split w as

w = τ 0 e0 + d0 .

The map (τ, d) → (τ 0 , d0 ) defined by the condition τ e+d = τ 0 e0 +d0 is the Lorentz
transformation which relates the (infinitesimal) space–time displacements w as
observed by e and e0 . (The term Lorentz transformation is also applied to any
linear transformation of W which preserves the inner product.) It is easy to
find a formula for it. Write
e0 = ae + av
so that v is the space–velocity of e0 relative to e. Taking inner products of this
equation gives
−1 = a2 (−1 + v 2 )
where v 2 = (v, v). So
a2 = (1 − v 2 )−1 .
178 CHAPTER 4. SPECIAL TOPICS

The equation τ e + d = τ 0 e0 + d0 gives τ e + d = τ 0 ae + (av + d0 ) so the Lorentz


transformation is
τ = aτ 0 , d = d0 + av.
In particular
τ0
τ=√ .
1 − v2

Since 1 − v 2 < 1 when real, τ ≥ τ 0 ; this phenomenon is called time dilation.
Consider now another situation, where e and e0 observe some three–dimensional
object. Such an object will not have a world–line but a world–tube, which we
can picture as three–parameter family of world–lines representing the point–
particles which make up the object. If the object is one–dimensional (a stick)
we get a world–band, as we shall now assume. Locally around p we approximate
the world–band by the piece of a 2–plane between two parallel lines in Tp M .
We shall assume that e0 is parallel to these lines, which means that the stick is
a rest relative of e0 . We assume further that the stick as seen by e0 points into
the direction of e, which implies that e, e0 lie also in this 2–plane. The band
B intersects the 3–spaces orthogonal to e, e0 (the rest–spaces of e, e0 ) in line–
segments, which we represent by vectors d, d0 . The length of the stick as seen by
e, e0 is then l = |d|, l0 = |d0 |respectively. The relation between this can be found
by simple vector algebra in the 2–plane in question. (The metric in this plane
has type (−, +), but the reasoning becomes more transparent if one draws upon
one’s intuition in a Euclidean space as a guide to the vector algebra.) As above
we have
e2 = −1, (e0 ) = −1, e0 = ae + av, a2 = (1 − v 2 )−1 .
Since d0 is the component of d orthogonal to e0 , we have d = d0 + (d, e0 )e0 and

d2 = (d0 )2 − (v, e0 )2 .

Since d, v are both orthogonal to e in the same 2–plane, d/|d| = v/|v|. Substi-
tuting d in the previous equation we get

d2
d2 = (d0 )2 − (v, e0 )2 .
v2
Since e0 = ae + av we find d2 = (d0 )2 − d2 a2 v 2 , or

(d0 )2 = d2 (1 + a2 v 2 ) = d2 (1 + (1 − v 2 )−1 v 2 ) = d2 (1 − v 2 )−1 .

This gives the desired relation between the relative lengths of the stick:
p
l = l0 1 − v 2 .

Thus l ≤ l0 and this is known as the Lorentz contraction. From a purely math-
ematical point of view all of this is elementary vector algebra, but its physical
interpretation is startling.
4.2. THE SCHWARZSCHILD METRIC 179

4.2 The Schwarzschild metric


We now turn to metrics with spherical symmetry. This is not an entirely obvious
concept in general relativity. To explain it, we start with two examples of
symmetry in the familiar setting of R3 with the Euclidean metric ds2 = dx2 +
dy 2 + dz 2 . (Any unfamiliar terms in these examples should be self-explanatory,
but will be defined precisely later.)
Example 1: spherical symmetry in R3 . Consider the action the group
SO(3) of rotation about the origin on the space R3 , written p → a · p with
a ∈ SO(3) and p ∈ R3 . We record the following facts.
(a) The group SO(3) acts on R3 by isometries.
(b) The “orbit“ S(po ) := {p = a · po | a ∈ SO(3)} of SO(3) through a point po
is the sphere about the origin through po .
(c) Fix po and let C = C(po ) ≈ R be the line through po orthogonal to S(po ).
In a neighbourhood of po the line C intersects each sphere S(p) orthogonally in
a single point.
(d) Let S = S 2 be the unit sphere, ds2S the standard metric on S = S 2 , and
ds2C the induced metric on C. In a neighbourhood of po there is a local diffeo-
morphism C × S → R3 so that

ds2 = ds2C + ρ2 ds2S

where ρ = ρ(p) is the radius of the sphere S(p) though p.


The first three statements are again geometrically obvious. For the last one
recall the expression for the Euclidean metric in spherical coordinates:

ds2 = dρ2 + ρ2 (sinφdθ2 + dφ2 )

If we take ρ as Cartesian coordinate on C ≈ R and (θ, φ) as coordinates on S =


S 2 , then the map C × S → R3 is just the spherical coordinate map (ρ; θ, φ) →
(x, y, z).
Example 2: cylindrical symmetry in R3 . Consider the group SO(2) of
rotations about the z-axis on R3 . We note that the statements (a)-(b) of the
previous example have obvious analogs.
(a) The group SO(2) acts on R3 by isometries.
(b) The “orbit“ S(po ) := {p = a · po | a ∈ SO(2)} of SO(2) through a point po
is the circle about the z-axis through po .
(c) Fix po and let C = C(po ) ≈ R be the plane through po orthogonal to S(po ).
In a neighbourhood of po the plane C intersects each circle S(p) orthogonally
in a single point.
180 CHAPTER 4. SPECIAL TOPICS

(d) Let S = S 1 be the unit circle, ds2S the standard metric on S = S 2 , and ds2C
the induced metric on C. In a neighbourhood of po there is a local diffeomor-
phism C × S → R3 so that

ds2 = ds2C + r2 ds2S

where r = s(p) is the radius of the circle S(p) though p.

The first three statements are again geometrically obvious. For the last one
recall the expression for the Euclidean metric in cylindrical coordinates:

ds2 = (dr2 + dz 2 ) + r2 dθ2

If we take (r, z) as Cartesian coordinates on C ≈ R2 and θ as coordinate on


S = S 1 , then the map C × S → R3 is just the cylindrical coordinate map
(r, z; θ) → (x, y, z).
We now turn to some generalities. We take the view that symmetry (whatever
that may mean) is characterized by a symmetry group. In general a transfor-
mation group of a space M (any set, in our case a manifold) is a family G of
transformations of M (mappings of M into itself) satisfying

(1) If a, b ∈ G, then the composite ab ∈ G.


(2) If a ∈ G, then a has an inverse a−1 and a−1 ∈ G.

4.2. 1 Example: rotation groups. The rotation group in R3 , denoted SO(3),

consists of all orthogonal linear transformations of R3 of determinant +1:

SO(3) = {a ∈ R | aa∗ = 1, det a = +1}

SO stands for “special orthogonal“. It can be realized geometrically as the group


of transformation of Euclidean 3–space which can be obtained by a rotation by
some angle about some axis through a given point (the origin): hence we can
take G =SO(3), M = R3 in the above definition. Alternatively, SO(3) can be
realized as the group of transformations of a 2–sphere S 2 which can be obtained
by such rotations: hence we may also take G =SO(3), M =S2 .
The rotation group SO(2) in R2 consists
 of linear 2
 transformations of R rep-
cos θ − sin θ
resented by matrices of the form . It acts naturally on R2 by
sin θ cos θ
rotations about the origin, but it also acts on R3 by rotations about the z−axis.
In this way SO(2) can thought of as the subgroup of SO(3) which leaves the
points on the z-axis fixed.
4.2. THE SCHWARZSCHILD METRIC 181

SO(2) orbits in R2 SO(3) orbits in R3


Fig. 1

This example shows that the same group G can be realized as a transformation
group on different spaces M : we say G acts on M and we write a · p for the
action of the transformation a ∈ G on the point p ∈ M . The orbit a point
p ∈ M by G is the set
G · p = {a · p | a ∈ G}
of all transforms of p by elements of G. In the example G =SO(3), M = R3 the
orbits are the spheres of radius ρ > 0 together with the origin {0}.
4.2. 2 Lemma. Let G be a group acting on a space M . Then M is the
disjoint union of the orbits of G.
Proof . Let G · p and G · q be two orbits. We have to show that they are either
disjoint or identical. So suppose they have a point in common, say b · p = c · q
for some b, c ∈ G. Then p = b−1 c · q ∈ G · q, hence a · p = ab−1 c · q ∈ G · q for
any a ∈ G, i.e. G · p ⊂ G · q. Similarly G · q ⊂ G · p, hence G · p = G · q.
From now on M will be space–time, a 4–dimensional manifold with a Riemann
metric g of signature (−, +, +, +).
4.2. 3 Definition. The metric g is spherically symmetric if M admits an action
of SO(3) by isometries of g so that every orbit of SO(3) in M is isometric to
a sphere in R3 of some radius r > 0 by an isometry preserving the action of
SO(3).
Remarks. a) We exclude the possibility r = 0, i.e. the spheres cannot de-
generate to points. Let us momentarily omit this restriction to explain where
the definition comes from. The centre of symmetry in our space–time should
consist of a world line L (e.g. the world line of the centre of the sun) where
r = 0. Consider the group G of isometries of the metric g which leaves L
pointwise fixed. For a general metric g this group will consist of the identity
transformation only. In any case, if we fix a point p on L, then G acts on the
3–dimensional space of tangent vectors at p orthogonal to L, so we can think
of L as a subgroup of SO(3). If it is all of SO(3) (for all points p ∈ L) then
we have spherical symmetry as in the definition in the region off L. However
in the definition, we do not postulate the existence of such a world line (and in
fact explicitly exclude it from consideration by the condition r > 0), since the
metric (gravitational field) might not be defined at the centre, in analogy with
Newton’s theory.
182 CHAPTER 4. SPECIAL TOPICS

b) It would suffice to require that the orbits of SO(3) in M are 2–dimensional.


This is because any 2–dimensional manifold on which SO(3) acts with a single
orbit can be mapped bijectively onto a 2–sphere S 2 (possibly with antipodal
points identified) so that the action becomes the rotation action. Furthermore,
any Riemann metric on such a sphere which is invariant under SO(3) is iso-
metric to the usual metric on a sphere of some radius r > 0, or the negative
thereof. (The latter is excluded here, because there is only one negative direction
available in a space–time with signature − + ++).
We now assume given a spherically symmetric space–time M, g and fix an action
of SO(3) on M as in the definition.
We for any p ∈ M , denote by S(p) the SO(3) orbit through p; it is isometric to
a 2–sphere of some radius r(p) > 0. Thus we have a radius function r(p) on M .
(r(p) is intrinsically defined, for example by the fact that the area of S(p) with
respect to the metric g is 4πr2 .)
For any p ∈ M , let C(p) = expp (Tp S(p)⊥ ) be the union of the geodesics
through p orthogonal to S(p). It is a 2–dimensional submanifold of M which
intersects S(p) orthogonally in the single point p, at least as long as we restrict
attention to sufficiently small neighbourhood of p, as shall do throughout this
discussion. Now fix po ∈ M and let C = C(po ).

C(p0 )

S(p0 )

Fig.2

Let S be a sphere about the origin in R3 . The geographic coordinates (θ, φ) on


S can be defined by the formula p = az (θ)ay (φ)ez where az (θ) and ay (φ) are
rotations about the z and y axis and ez is the north-pole of S, i.e. its point
of intersection with the positive z-axis. We transfer these coordinates to the
orbit S(po ) as follows. By assumption, S(po ) is an isometric copy of a sphere
S in R3 . We may choose this isometry so that it maps po to ez . It then maps
a · p to a · ez for any a ∈ SO(3), since the isometry preserves the action of
SO(3). In particular, we can transfer (θ, φ) to coordinates on S(po ) by setting
p = az (θ)ay (φ)ez . We can extend these from S(po ) to all of M as follows.
4.2. 4 Lemma. The map f : C × S → M , given by p = az (θ)ay (φ) · q, is local
diffeomorphism near (po , ez ) which maps {q} × S to S(q).
Proof. The second assertion is clear. To prove the first it suffices to show that
the differential of f at the point (po , ez ) is invertible. Since f maps {po } × S
diffeomorphically onto S(po ) and C × {ez } identically onto C, its differential
restricts to linear isomorphism of the corresponding tangent spaces and hence
is a linear isomorphism onto Tpo M = Tpo C ⊕ Tpo S(po ).
4.2. THE SCHWARZSCHILD METRIC 183

Corollary. Near po , C intersects each sphere S(q) in a single point.


Proof. This is clear, because the corresponding statement holds on C × S.
Remark. The group I(po ) acts also on the tangent space Tpo M by the dif-
ferential of the map p → a · p at p = po . This means that for curve p(t) with
.
p(0) = po and p(0) = v we have

d
a·v = a · p(t).
dt t=0

Since a ∈ I(po ) operates on M by isometries fixing po it maps geodesics through


po into geodesics through po , i.e.

exppo : Tpo M → M satisfies a · (exppo v) = exppo (a · v).

So if we identify vectors v with points p via p = exppo (v), as we can near po ,


then the action of I(po ) on Tpo M turns into its action on M . We record this
fact as follows

the action of I(po ) on M looks locally like its linear action on Tpo M (*)

In may help to keep this in mind for the proof of the following lemma.
4.2. 5 Lemma. C intersects each sphere S(q), q ∈ C, orthogonally.
Proof. Consider the action of the subgroup I(po ) of SO(3) fixing po . Since
the action of SO(3) on the orbit S(po ) is equivalent to its action an ordinary
sphere in Euclidean 3-space, the group I(po ) is just the rotation group SO(2)
2
in the 2–dimensional plane Tpo S(pT o ) ≈ R . It maps C(po ) into itself as well as
each orbit S(q), hence also C(po ) S(q) = {q}, i.e. I(po ) fixes all q ∈ C(po ),
0
i.e. I(po ) ⊂ I(q). Thus C(po ) is contained in T the set C (po ) ofTpoints fixed
0 0
by I(po ). Thus C (po ) ⊂ C(po ) and C (po ) S(po ) = C(po ) S(po ) since
both reduce to {po } near po (which is the only fixed point of I(po ) near po .).
It follows that C 0 (po ) = C(po ). But I(po ) ⊂ I(q) implies C 0 (po ) = C 0 (q),
hence C(po ) ⊃ C(q). Interchanging po and q in this argument we find also
C(q) ⊂ C(po ). So C(po ) = C(q) and intersects S(q) orthogonally.
In summary, the situation is now the following. Under local diffeomorphism
C × S → M the metric ds2 on M decomposes as an orthogonal sum

ds2 = ds2C + r2 ds2S (0)

where ds2C is the induced metric on C, ds2S the standard metric on S = S 2 ,


and r = r(q) is the radius function. To be specific, choose any orthogonal
coordinates τ, ρ on C with τ timelike and r spacelike . (That this is possible
follows from Gauss’s Lemma, for example.) Let φ, θ be usual coordinates φ, θ
on S ≈ S 2 . Then τ, ρ, φ, θ provide coordinates on M and

ds2 = −A−2 (τ, ρ)dτ 2 + B 2 (τ, ρ)dρ2 + r2 (τ, ρ)(dφ2 + sin2 φdθ2 ) (1)
184 CHAPTER 4. SPECIAL TOPICS

for some strictly positive functions A(τ, ρ), B(τ, ρ). We record the result as a
theorem.
4.2. 6 Theorem. Any spherically symmetric metric is of the form (1) in
suitable coordinates τ, ρ, φ, θ.
So far we have not used Einstein’s field equations, just spherical symmetry.
The field equations in empty space say that Ric[g] = 0. For the metric (1) this
amounts to the following equations (as one can check by some unpleasant but
straightforward computations). Here 0 denotes ∂/∂ρ and · denotes ∂/∂τ .

2B  r·0 B · r0 r· A0 
− + =0 (2)
A r Br rA
1 2  r 0 0  r 0 2
2B r
· ·  · 2
2 r
+ − − 3 + 2A + A =0 (3)
r2 B Br Br Br r
1  r · ·  r · 2 2 r0 A0  r0 2
2
+ 2A A + 3 A + − =0 (4)
r2 r r B 2 rA Br
1  A0 2  B · ·  r · ·  B · 2  r· 2 1  A0 2 2 r0 A0
− −A A −2A A −A2 −2A2 + 2 − 2 =0
B AB B r B r B A B rA
(5)
One has to distinguish three cases, according to the nature of the variable r =
r(p), the radius of the sphere S(p).
(a) r is a space-like variable, i.e. g(gradr,gradr) > 0,
(b) r is a time-like variable, i.e. g(gradr,gradr) < 0,
(c) r is a null variable, i.e. g(gradr,gradr) ≡ 0.
Here gradr = (g ij ∂r/∂xj )(∂/∂xi ) is the vector-field which corresponds to the
covector field dr by the Riemann metric g. It is orthogonal to the 3–dimensional
hypersurfaces r =constant. It is understood that we consider the cases where
one of these conditions (a)–(c) holds identically on some open set.
We first dispose of the exceptional case (c). So assume g(gradr,gradr) ≡ 0. This
means that −(Ar· )2 + (B −1 r0 )2 = 0 i.e.

r0
= Ar· (6)
B
up to sign, which may be adjusted by replacing the coordinate τ by −τ , if
necessary. But then one finds that r·0 determined by (2) is inconsistent with
(3), so this case is excluded.
Now consider the case (a) when g(gradr,gradr) > 0. In this case we can take
ρ = r as ρ-coordinate on C. Then r· = 0 and r0 = 1. The equations (2)–(4)
now simplify as follows.
B· = 0 (20 )
1 2  1 0  1 2
+ − − 3 =0 (30 )
r2 B Br Br
1 2 1A0  1 2
+ − =0 (40 )
r2 B 2 rA Br
4.2. THE SCHWARZSCHILD METRIC 185

1  A0 2 1  A0 2 2 A0
− + 2 − 2 =0 (50 )
B AB B A B rA
Equation (20 ) shows that B · = 0, i.e. B = B(r). Equation (40 ) differentiated
with respect to τ shows that (A· /A)0 = 0, i.e. (logA)·0 = 0. Hence A =
Ã(r)F (τ ). Now replace τ by t = t(τ ) so that dt = dτ /F (τ ) and then drop the
∼ ; we get A = A(r). Equation (30 ) simplifies to

−2rB −3 B 0 + B −2 = 1, i.e (rB −2 )0 = 1.

By integration we get rB −2 = r − 2m where −2m is a constant of integration,


i.e. B 2 = (1 − 2m/r)−1 . Equation (40 ) has the solution A = B and this
solution is unique up to a non-zero multiplicative constant γ, which one can
take be 1 after replacing t by γ −1 t. The metric (1) now becomes
 2m  2  2m −1 2
ds2 = − 1 − dt + 1 − dr + r2 (dφ2 + sin2 φdθ2 ). (7)
r r
It gives a solution for all r 6= 0, 2m. We shall however keep the restriction r > 0.
Furthermore, only for r > 2m is the radius function r space-like, as we assumed.
In the region where r < 2m the coordinate r becomes time-like, and to make the
signs in (7) agree with those of (1) one can then set ρ = t, τ = r in (7). This
gives at the same time the solution in the case (b), when r is timelike, i.e. in
the region r < 2m: it is still given by (7), only one has to set with τ = r, ρ = t
if one compares (7) with (1). We note that t is uniquely determined up to
t → t + to , because these are the only substitutions t → f (t) which leave the
metric invariant. Thus all four Schwarzschild coordinates t, r, φ, θ are essentially
uniquely determined. We record the result meticulously as a theorem.
4.2. 7 Theorem (Schwarzschild). Let g be spherically symmetric metric,
which satisfies Einstein’s field equations Ric[g] = 0. In a neighbourhood of any
point where the differential dr of the radius function r of g is non–zero, the
metric is the form (7) where φ, θ are the usual coordinates on a sphere, r is the
radius function, and t is uniquely determined up to t → t + to . Such coordinates
exist in a neighbourhood of any point where r > 0, r 6= 2m.
The Schwarzschild coordinates cannot make sense at the Schwarzschild radius
r = 2m, because the coefficients of the metric (7) in these coordinates become
singular there. We momentarily suspend judgment as to whether the metric
itself becomes singular there (if such points exist on the spacetime manifold)
or if the singularity is an artifact of the coordinates (as is the case with polar
coordinates in the plane at the origin).
The time translations (t, r, φ, θ) → (t + to , r, φ, θ) evidently leave the metric
invariant, i.e. define isometries of the metric, in addition the rotations from
SO(3). One says that the metric is static, which is known as Birkhoff ’s Theorem:
4.2. 8 Theorem (Birkhoff ). Any spherically symmetric metric, which sat-
isfies Einstein’s field equations Ric[g] = 0 is static, i.e. one can choose the
coordinates τ, ρ in (1) so that A and B are independent of τ .
186 CHAPTER 4. SPECIAL TOPICS

Remark. The discussion above does not accurately reflect the historical devel-
opment. Schwarzschild assumed from the outset that the metric is of the form
(1) with A and B independent of t, so Birkhoff’s theorem was not immediate
from Schwarzschild’s result. The definition of spherical symmetry used here
came later.
The Schwarzschild metric (7) can be written in many other ways by introducing
other coordinates τ, ρ instead of t, r (there is no point in changing the coordinates
φ, θ on the sphere). For example, one can write the metric (7) in the form
 2m 
ds2 = − 1 − dvdw + r2 (dφ2 + sin2 φdθ2 ). (8)
r
The coordinates v, w are related to Schwarzschild’s t, r by the equations

v = t + r∗ , w = t − r∗

with Z
dr
r∗ = = r + 2mlog(r − 2m).
1 − 2m/r
In terms of these v, w, Schwarzschild’s r is determined by
1
(v − w) = r + 2mlog(r − 2m).
2
Another possibility due to Kruskal (1960) is

16m2 e−r/2m
ds2 = (−dt̃2 + dx̃2 ) + r2 (dφ2 + sin2 φdθ2 ). (9)
r
The coordinates t̃, x̃ are defined by
1  v/2  1  v/2 
t̃ = e + e−w/2 , x̃ = e − e−w/2
2 2
and r must satisfy
t̃2 − x̃2 = −(r − 2m)er/2m . (10)
These coordinates t̃, x̃ must be restricted by

t̃2 − x̃2 < 2m (11)

so that there is a solution of (10) with r > 0. What is remarkable is that


the metric (9) is then defined for all (t̃, x̃, φ, θ) satisfying (11) and the radius
function r can take all values r > 0. So the singularity at r = 2m in the
Schwarzschild metric has disappeared. This is explained by the fact that the
coordinate transformation between the Scharzschild coordinates t, r and the
Kurskal coordinates t̃, x̃ is singular along r = 2m. One can now take the point
of view that the whole region (11) belongs to M with the metric given by (9).
This means that the manifold M is (by definition) the product C × S where S is
the sphere with coordinates (φ, θ) and C the region in the (t̃, x̃) plane described
4.2. THE SCHWARZSCHILD METRIC 187

by (11). As in Schwarzschild’s case, this region C is composed of a region in


which r is spacelike (r > 2m) and region in which r is timelike (r < 2m), but now
the metric stays regular at r = 2m. The Schwarzschild solution (7) now appears
as the local expression of the metric in a subregion of M . One may now wonder
whether the manifold M can be still enlarged in a non-trivial way. But this is
not the case: the Kruskal spacetime is the unique locally inextendible extension
of the Schwarzschild metric, in a mathematically precise sense explained in the
book by Hawking and Ellis (1973).
To get some feeling for this spacetime, consider again the situation in the t̃x̃–
plane. The metric (9) has the agreeable property that the light cones in this
plane are simply given by −dt̃2 + dx̃2 = 0. This means that along any timelike
curve one must remain inside the cones −dt̃2 + dx̃2 < 0. This implies that
anything that passes from the region r > 2m into the region r < 2m will never
get back out, and this includes light. So we get the famous black hole effect.
The hypersurface r = 2m therefore still acts like a one–way barrier, even though
the metric stays regular there. In the figure, the black region is the fatal zone
r < 2m. Once inside, you can count the time left to you by the “distance“ r
from the singularity r = 0. In the white region there is hope: some time like
curves go on forever (but others are headed for the fatal zone. Whether there
is hope for time like geodesics is another matter.) The cross−hatched area
is out of this world. The radial lines represent t =const. for Schwarzschild’s
t–coordinate. You will note that the singularity r = 0 consists of two pieces,
one in the past, the other in the future. I don’t know what physicists make of
that.
If one compares the equations §14- (10 )) and §14-(4) one comes to the conclusion
that c2 m is the mass of the centre, if c is velocity of light, which was taken to
be 1 in (7), but not in §14-(4). For a star like the sun the Schwarzschild radius
r = 2m = 2(mass)/c2 is about 3 km, hence rather irrelevant, because it would
lie deep inside, where one would not expect to apply the vacuum field equations
anyway.
Reference. The discussion of spherical symmetry is taken from Hawking and
Ellis The large Scale Structure of Space–Time (1973), who refer to original
papers of Schmidt (1967) and Bergman, Cahen and Komar (1965).
EXERCISES 4.2

1. Generalize the assertion “It follows that C 0 (po ) = C(po ).“ in the proof of
Lemma 4.2.5 to a more general situation where S(po ), C(po ) and C 0 (po ) are
replaced by three submanifolds S, C, C 0 of an arbitrary manifold M satisfying
suitable hypotheses. (State carefully the hypothesis required.)
188 CHAPTER 4. SPECIAL TOPICS

4.3 The rotation group SO(3)


What should we mean by a rotation in Euclidean 3–space R3 ? It should cer-
tainly be a transformation a : R3 → R3 which fixes a point (say the origin) and
preserves the Euclidean distance. You may recall from linear algebra that for a
linear transformation a ∈ M3 (R) of R3 the following properties are equivalent
are equivalent.
p
a) The transformation p → ap preserves Euclidean length kpk = x2 + y 2 + z 2 :

kapk = kpk

for all p = (x, y, z) ∈ R3 .


b) The transformation p → ap preserves the scalar product (p · p0 ) = xx0 + yy 0 +
zz 0 :
(ap · ap0 ) = (p · p0 )
for all p, p0 ∈ R3 .
c) The transformation p → ap is orthogonal: aa∗ = 1 where a∗ is the adjoint
(transpose) of a, i.e.
(ap · p0 ) = (p · a∗ p0 )
for all p, p0 ∈ R3 , and 1 ∈ M3 (R) is the identity transformation.
The set of all of these orthogonal transformations is denoted O(3). Let’s now
consider any transformation of R3 which preserves distance. We shall prove that
it is a composite of a linear transformation p → ap and a translation p → p + b,
as follows.
4.3.1 Theorem. Let T : R3 → R3 be any map preserving Euclidean distance:

kT (q) − T (p)k = kq − pk

for all q, p ∈ R3 . Then T is of the form

T (p) = ap + b

where a ∈O(3) and b ∈ R3 .


Proof. Let po , p1 be two distinct points. The point ps on the straight line from
po to p1 a distance s from po is uniquely characterized by the equations

kp1 − po k = kp1 − ps k+kps − po k, kps − po k = s.

We have ps = po + t(p1 − po ) where t = s/kp1 − po k. These data are preserved


by T . Thus

T (po + t(p1 − po )) = T (po ) + t(T (p1 ) − T (po )).

This means that for all u, v ∈ R3 and all s, t ≥ 0 satisfying s + t = 1 we have

T (su + tv) = sT (u) + tT (v). (4.1)


4.3. THE ROTATION GROUP SO(3) 189

(Set u = po , v = p1 , s = 1 − t.) We may assume that T (0) = 0, after replacing


p → T (p) by p → T (p) − T (0). The above equation with u = 0 then gives
T (tv) = tT (v). But then we can replace s, t by cs, ct in (1) for any c ≥ 0, and
(1) holds for all s, t ≥ 0. Since 0 = T (v + (−v)) = T (v) + T (−v), we have
T (−v) = −T (v), hence (1) holds for all s, t ∈ R. Thus T (0) = 0 implies that
T = a is linear, and generally T is of the form T (p) = a(p) + b where b = T (0).

The set O(3) of orthogonal transformations is a group, meaning the composite


(matrix product) ab of any two elements a, b ∈O(3) belongs again to O(3), as
does the inverse a−1 of any element a ∈O(3). These transformations a ∈O(3)
have determinant ±1 because det(a)2 = det(aa∗ ) = 1. Those a ∈O(3) with
det(a) = +1 form a subgroup, called the special orthogonal group or rota-
tion group and is denoted SO(3). Geometrically, the condition det(a) = +1
means that a preserves orientation, i.e. maps a right–handed basis (v1 , v2 , v3 )
to a right–handed basis (av1 , av2 , av3 ). Right–handedness means that the triple
scalar product v1 ·(v2 ×v3 ), which equals the determinant of the matrix [v1 , v2 , v3 ]
of the matrix of components of the vi with respect to the standard basis should
be positive. For an orthonormal basis this means that (v1 , v2 , v3 ) satisfies the
same cross–product relations (given by the right hand rule) as (e1 , e2 , e3 ), i.e.

v1 × v2 = v3 , v3 × v1 = v2 , v2 × v3 = v1 .

For reference we record the formula

v1 · (v2 × v3 ) = det[v1 , v2 , v3 ]

which can in fact be used to define the cross–product. We need some facts
about the matrix exponential function.
4.3.2 Theorem. The series

X 1 k
exp X := X
k!
k=0

converges for any X ∈ Mn (R) and defines a C ∞ map exp : Mn (R) → Mn (R)
which maps an open neighbourhood U0 of 0 diffeomorphically onto an open
neighbourhood U1 of 1.
Proof. We have termwise
∞ ∞
X 1 X 1
kX k k ≤ kXkk . (1)
k! k!
k=0 k=0

It follows that the series for exp X converges in norm for all X and defines a
C ∞ function. We want to apply the Inverse Function Theorem to X → exp X.
From the series we see that

exp X = 1 + X + o(X)
190 CHAPTER 4. SPECIAL TOPICS

and this implies that the differential of exp at 0 is the identity map X → X.
4.3.3 Notation. We write a → log a for the local inverse of X → exp X. It is
defined in a ∈ U1 . It can in fact be written as the log series

X (−1)k−1
log a = (a − 1)k
k
k=1

but we don’t need this.


We record some properties of the matrix exponential.
4.3.4 Proposition. a) For all X,

d
exp tX = X (exp tX) = (exp tX)X
dt
b) If XY = Y X, then

exp(X + Y ) = exp X exp Y.

c) For all X,
exp X exp(−X) = 1.
d) If a(t) ∈ Mn (R) is a differentiable function of t ∈ R satisfying ȧ(t) = Xa(t)
then a(t) = exp(tX)a(0).
Proof. a)Compute:
∞ ∞ ∞
d d X tk k X ktk−1 k X tk k+1
exp tX = X = X = X
dt dt k! k! k!
k=0 k=1 k=0

∞ k
X t
=X X k = X (exp tX) = (exp tX)X.
k!
k=0

b) If XY = XY , then (X + Y )n can be multiplied out and rearranged, so that


the binomial formal applies. Then
X 1 X 1 X k! X 1 X 1 
eX+Y = (X+Y )k = X iY j = Xi X j = eX eY
k! k! i!j! i
i! j
j!
k k i+j=k

These series calculations are permissible because of (1).


c) Follows form (b): exp X exp(−X) = exp(X − X) = exp 0 = 1 + 0 + · · · .
d) Assume ȧ(t) = Xa(t). Then

d  d  d
(exp(−tX))a(t) = exp(−tX) a(t)) + (exp(−tX)) a(t))
dt dt dt
= (exp(−tX))(−X)a(t) + (exp(−tX))(Xa(t)) ≡ 0
4.3. THE ROTATION GROUP SO(3) 191

hence (exp(−tX))a(t) =constant= a(0) and a(t) = exp(tX)a(0).


We now return to SO(3). We set

so(3) = {X ∈ M3 (R) : X ∗ = −X}.

Note that so(3) is a 3–dimensional vector space. For example, the following
three matrices form a basis.
     
0 0 0 0 0 1 0 −1 0
E1 = 0 0 −1 , E2 =  0 0 0 , E3 = 1 0 0 .
0 1 0 −1 0 0 0 0 0

4.3.5 Theorem. For any a ∈SO(3) there is a right–handed orthonormal basis


(u1 , u2 , u3 ) of R3 so that

au1 = cos αu1 + sin αu2 , au2 = − sin αu1 + cos αu2 , au3 = u3 .

Then a = exp X where X ∈so(3) is defined by

Xu1 = αu2 , Xu2 = −αu1 Xu3 = 0.

4.3.6 Remark. If we set u = αu3 , then these equations say that

Xv = u × v (4.2)

for all v ∈ R3 .
Proof. The eigenvalues λ of a matrix a satisfying a∗ a = 1 satisfy |λ| = 1.
For a real 3 × 3 matrix one eigenvalue will be real and the other two complex
conjugates. If in addition det a = 1, then the eigenvalues will have to be of the
form e±iα , 1. The eigenvectors can be chosen to of the form u2 ± iu1 , u3 where
u1 , u2 , u3 are real. If the eigenvalues are distinct these vectors are automati-
cally orthogonal and otherwise may be so chosen. They may assumed to be
normalized. The first relations then follow from

a(u2 ± iu1 ) = e±iα (u2 ± iu1 ), au3 = u3

since eiα = cos α + i sin α. The definition of X implies

X(u2 ± iu1 ) = ±iα(u2 ± iu1 ), Xu3 = 0.

Generally, if Xv = λv, then


X 1 X 1
(exp X)v = Xkv = λk v = eλ v.
k! k!
k k

The relation a = exp X follows.


In the situation of the theorem, we call u3 the axis of rotation of a ∈SO(3), α
the angle of rotation, and a the right–handed rotation about u3 with angle α.
192 CHAPTER 4. SPECIAL TOPICS

T T
We now set V0 =so(3) U0 and V1 =SO(3) U1 and assume U0 chosen so that
X ∈ U0 ⇒ X ∗ ∈ U0 .
4.3.7 Corollary. exp maps so(3) onto SO(3) and gives a bijection from V0
onto V1 .
Proof. The first assertion follows from the theorem. If exp X ∈SO(3), then

exp(X ∗ ) = (exp X)∗ = (exp X)−1 = exp(−X).

Ifexp X ∈ U1 ,this implies that X ∗ = −X, i.e. X ∈so(3), since exp : U0 → U1 is


bijective.
We use the local inverse log :SO(3)· · · →so(3) of exp :so(3) →SO(3) to define
coordinates on SO(3) as follows. For a ∈ V1 we write a = exp X with X ∈ V0 and
use this X ∈so(3) as the coordinate point for a ∈SO(3). If we want a coordinate
point in R3 we use a basis X1 , X2 , X3 for so(3), write X = x1 X1 + x2 X2 + x3 X3
and use (x1 , x2 , x3 ) as coordinate point. This gives coordinates around 1 in
SO(3). To define coordinates around a general point ao ∈SO(3) we write

a = ao exp X, a ∈ ao V1 , X ∈ V0

and use this X ∈ Vo as coordinate point for a ∈ ao V1 . Thus X = log(a−1


o a).
We check the manifold axioms.
MAN 1. a → log(a−1
o a) maps ao V1 one–to–one onto V0 .

MAN 2. The coordinate transformation X → X̃ between two such coordinate


systems is defined by
ao exp X = ão exp X̃.
Its domain consists of the X ∈ V0 for which ã−1
o ao exp X ∈ U1 and then is given
by
X = log(ã−1
o ao exp X).

If follows that its domain is open and the map is C ∞ .


MAN 3. ao ∈SO(3) lies in the coordinate domain ao V1 .
So the exponential coordinate systems make SO(3) into a manifold, directly
from the axiom. But we could also use the following result.
4.3.8 Theorem. SO(3) is a submanifold of M3 (R). Its tangent space at 1 is
so(3) and its tangent space at any a ∈SO(3) is aso(3) =so(3)a.
Proof. We first consider O(3)= {a ∈ M3 (R) : aa∗ = 1}. Consider the map f
from M3 (R)into the vector space Sym3 (R) of symmetric 3 × 3 matrices given
by f (a) = aa∗ − 1. Its differential be calculated by writing a = a(t) and taking
the derivative:
d d
dfa (ȧ) = f (a) = (aa∗ − 1) = ȧa∗ + aȧ∗ = (ȧa∗ ) + (ȧa∗ )∗ .
dt dt
4.3. THE ROTATION GROUP SO(3) 193

As long as a is invertible, any element of Sym3 (R) is of the form Y a∗ + (Y a∗ )∗


for some Y , so dfa is surjective at such a. Since this is true for all a ∈O(3) it
follows that O(3) is a submanifold of M3 (R). Furthermore, its tangent space at
a ∈O(3) consists of the X ∈ M3 (R) satisfying Y a∗ + (a∗ Y )∗ = 0. This means
that X = Y a∗ = Y a−1 must belong to so(3) and so Y ∈so(3)a. The equality
aso(3) =so(3)a is equivalent to aso(3)a−1 =so(3), which clear, since a−1 = a∗ .
This proves the theorem with SO(3) replaced by O(3). But since SO(3) is the
intersection of O(3) with the open set {a ∈ M3 (R) : det a > 0} it holds for
SO(3) as well.
4.3.9 Remark. O(3) is the disjoint union of the two open subsets O± (3)=
{a ∈O(3) : ± det a > 0}. O+ (3)=SO(3) and is connected, because SO(3) is
the image of the connected set so(3) under the continuous map exp. O− (3)=
cO+ (3) for any c ∈O(3) with det c = −1, e.g. c = −1, and therefore is con-
nected as well. It follows O(3) has two connected components: the connected
component of the identity element 1 is O+ (3) =SO(3) and the other one is
O− (3) = cSO(3). This mean that SO(3) can be characterized as the set of ele-
ments O(3) which can be connected to the identity by a continuous curve a(t)
in O(3), i.e. transformations which can be obtained by a continuous motion
starting from rest, preserving distance, and leaving the origin fixed, in perfect
agreement with the notion of a “rotation“.
We should verify that the submanifold structure on SO(3) is the same as the
one defined by exponential coordinates. For this we have to verify that the
exponential coordinates ao V1 → V0 , a → X = log(a−1 o a), are also submanifold
coordinates, which is clear, since the inverse map is V0 → a0 V1 , X → ao exp X .
There is a classical coordinate system on SO(3) that goes back to Euler (1775).
(It was lost in Euler’s voluminous writings until Jacobi (1827) called attention
to it because of its use in mechanics.) It is based on the following lemma.
4.3.10 Lemma. Every a ∈SO(3 ) can be written in the form

a = a3 (θ)a2 (φ)a3 (ψ)

where
a3 (θ) = exp(θE3 ), a2 (φ) = exp(φE2 )
and 0 ≤ θ, ψ < 2π, 0 ≤ φ ≤ π. Furthermore, (θ, φ, ψ) is unique as long as
φ 6= 0, π.
Proof. Consider the rotation-action of SO(3) on sphere S 2 . The geographical
coordinates (θ, φ) satisfy
p = a3 (θ)a2 (φ)e3 .
Thus for any a ∈O(3) one has an equation

ae3 = a3 (θ)a2 (φ)e3 .

for some (θ, φ) subject to the above inequalities and unique as long as φ 6= 0.
This equation implies that a = a3 (θ)a2 (φ)b for some b ∈SO(3) with be3 = e3 .
194 CHAPTER 4. SPECIAL TOPICS

Such a b ∈SO(3) is necessarily of the form b = a3 (ψ) for a unique ψ, 0 ≤ ψ ≤ π.

θ, φ, ψ are the Euler angles of the element a = a3 (θ)a2 (φ)a3 (ψ). They form a
coordinate system on SO(3) with domain consisting of those a’s whose Euler
angles satisfy 0 < θ, ψ < 2π, 0 < φ < π (strict inequalities). To prove this it
suffices to show that the map R3 → M3 (R), (θ, φ, ψ) → a3 (θ)a2 (φ)a3 (ψ), has
an injective differential for these (θ, φ, ψ). The partials of a = a3 (θ)a2 (φ)a3 (ψ)
are given by
∂a
a−1 = − sin φ cos ψE1 − sin φ sin ψE2 + cos φE3
∂θ
∂a
a−1 = sin ψE1 + cos ψE2
∂φ
∂a
a−1 = E3 .
∂ψ
The matrix of coefficients of E1 , E2 , E3 on the right has determinant− sin φ,
hence the three elements of so(3) given by these equations are linearly indepen-
dent as long as sin φ 6= 0. This proves the desired injectivity of the differential
on the domain in question.

EXERCISES 4.3
1. Fix po ∈ S 2 . Define F :SO(3) → S 2 by F (a) = apo . Prove that F is a
surjective submersion.
2. Identify T S 2 = {(p, v) ∈ S 2 × R3 : p ∈ S 2 , v ∈ Tp S 2 ⊂ R3 }. The circle
bundle over S 2 is the subset

S = {(p, v) ∈ T S 2 : kvk = 1}

of T S 2 .
a) Show that S is a submanifold of T S 2 .
b) Fix (po , vo ) ∈ S. Show that the map F :SO(3) → S, a → (apo , avo ) is a
diffeomorphism of SO(3) onto S.
3. a) For u ∈ R3 , let Xu be the linear transformation of R3 given by Xu (v) :=
u × v (cross product).
Show that Xu ∈so(3) and that u → Xu is a linear isomorphism R3 →so(3).
b) Show that Xau = aXu a−1 for any u ∈ R3 and a ∈SO(3).
c) Show that exp Xu is the right–handed rotation about u with angle kuk.
[Suggestion. Use a right–handed o.n. basis u1 , u2 , u3 with u3 = u/kuk, assuming
u 6= 0.]
d) Show that u → exp Xu maps the closed ball {kuk ≤ π} onto SO(3), is one–to–
one except that it maps antipodal points ±u on the boundary sphere {kuk = 1}
into the same point in SO(3). [Suggestion. Argue geometrically, using (c).]
4. Prove the formulas for the partials of a = a3 (θ)a2 (φ)a3 (ψ). [Suggestion. Use
the product rule on matrix products and the differentiation rule for exp(tX).]
4.3. THE ROTATION GROUP SO(3) 195

5. Define an indefinite scalar product on R3 by the formula (p · p0 ) = xx0 + yy 0 −


zz 0 . Let O(2, 1) be the group of linear transformations preserving this scalar
product and SO(2, 1) the subgroup of elements of determinant 1.
a) Make O(2, 1) and SO(2, 1) into manifolds using exponential coordinates.
b) Show that O(2, 1) and SO(2, 1) are submanifolds of M3 (R) and determine
the tangent spaces.
[Define now a∗ by (ap · p0 ) = (p · a∗ p0 ). Introduce so(2, 1) as before. In this case
the exponential map exp :so(2, 1) →SO(2, 1) is not surjective and SO(2, 1)
is not connected, but you need not prove this and it does not matter for this
problem.]
6. Continue with the setup of the preceding problem.
a) Show that for any two vectors v1 , v2 ∈ R3 there is a unique vector, denoted
v1 × v2 satisfying
((v1 × v2 ) · v3 ) = det [v1 , v2 , v3 ]
for all v1 ∈ R3 . [Suggestion. For fixed v1 , v2 is linear in v3 . Any linear functional
on R3 is of the form v → (u · v) for a unique u ∈ R3 . Why?]
b) For u ∈ R3 , define Xu ∈ M3 (R) by Xu (v) = u×v. Show that exp Xu ∈SO(2, 1)
for allu ∈ R3 and that
Xcu = cXu c−1
for all u ∈ R3 and all c ∈SO(2, 1).
*c) Determine when exp Xu = exp Xu0 .
7. Let U(2)= {a ∈ M2 (C) : aa∗ = 1} and SU(2)= {a ∈U(2): det a = 1}.
[For a ∈ M2 (C), a∗ denotes the adjoint of a with respect to the usual Hermitian
scalar product. M2 (C) is considered as a real vector space in order to make it
into a manifold. ]
a) Show that U(2) and SU(2) are submanifolds of M2 (C) and determine their
tangent spaces u(2) and su(2) at the identity 1.
b) Determine the dimension of U(2) and SU(2).
c) Let V = {Y ∈ M2 (C) : Y ∗ = Y,trY = 0}. Show that V is a 3–dimensional
real vector space with basis
     
0 i 0 1 1 0
F1 = F2 = F3 =
−i 0 1 0 0 −1

and that the formula


1
(Y · Y 0 ) =tr(Y Y 0 )
2
defines a positive definite inner product on V .
d) For a ∈U(2) define Ta : V → V by the formula Ta (Y ) = aY a−1 . Show
that Ta preserves the inner product on V, i.e. Ta ∈O(V ), the group of linear
transformations of V preserving the inner product. Show further that a → Ta
is a group homomorphism, i.e. Taa0 = Ta Ta0 and T1 = 1.
e) Let Y ∈ V . Show that a(t) := exp itY ∈SU(2) for all t ∈ R and that the trans-
formation Ta(t) of V is the right–handed rotation around Y with angle 2tkY k.
196 CHAPTER 4. SPECIAL TOPICS

[“Right–handed“ refers to the orientation specified by the basis (F1 , F2 , F3 ) of


V . Suggestion. Any Y ∈ V is of the form Y = γcF3 c−1 for some real γ ≥ 0 and
some c ∈U(2). (Why?) This reduces the problem to Y = F3 . (How?)]
f) Show that a → Ta maps U(2) onto SO(3) and that Ta = 1 iff a is a scalar
matrix. Deduce that already SU(2) gets mapped onto SO(3) and that a ∈SU(2)
satisfies Ta = 1 iff a = ±1. Deduce further that a, a0 ∈SU(2) satisfy Ta = Ta0
iff a0 = ±a. [These facts are summarized by saying that SU(2)→SO(3) is a
double covering of SO(3) by SU(2). Suggestion. If a ∈ M2 (C) commutes with
all matrices Z, then a is a scalar matrix. Any matrix is of the form Z = X + iY
where X, Y are Hermitian. (Why?)]
8. Let SL(2, R) = {a ∈ M2 (R) : det a = 1}.
a) Show that SL(2, R) is a submanifold of M2 (R) and determine its tangent space
sl(2, R) = T1 SL(2R) at the identity element 1. Show that the three matrices
     
1 0 0 1 0 −1
E1 = , E2 = , E3 = .
0 −1 1 0 1 0
form a basis for sl(2, R).
b) Show that
 τ   
e 0 cos θ − sin θ
exp(τ E1 ) = , exp(θE3 ) = ,
0 e−τ sin θ cos θ
c) Show that the equation
a = exp(φE1 ) exp(τ E2 ) exp(θE3 )
defines a coordinate system (φ, τ, θ) in some neighbourhood of 1 in SL(2, R).
9. Let G be one of the following groups of linear transformations of Rn
a) GL(n, R)= {a ∈ Mn (R) : det a 6= 0}
b) SL(n, R)= {a ∈GLn (R) : det a = 1}
c) O(n, R)= {a ∈ Mn (R) : aa∗ = 1}
d) SO(n)= {a ∈O(n) = +1}
(1) Show that G is a submanifold of Mn (R) and determine its tangent space g
at 1 ∈ G.
(2) Determine the dimension of G.
(3) Show that exp maps g into G and is a diffeomorphism from a neighbourhood
of 0 in g onto a neighbourhood of 1 in G.
[For (b) you may want to use the differentiation formula for det:
d da
det a = (det a)tr(a−1 )
dt dt
if a = a(t). Another possibility is to use the relation
det exp X = etrX
which can be verified by choosing a basis so that X ∈ Mn (C) becomes triangular,
as is always possible.
4.4. CARTAN’S MOBILE FRAME 197

4.4 Cartan’s mobile frame


To begin with, let S be any m-dimensional submanifold of Rn . An orthonormal
frame along S associates to each point p ∈ S a orthonormal n-tuple of vectors
(e1 , · · · , en ). The ei form a basis of Rn and (ei , ej ) = δij . We can think of the
ei as functions on S with values in Rn and these are required to be C ∞ . It will
also be convenient to think of the variable point p of S as the function as the
inclusion function S → Rn which associates to each point of S the same point
as point of Rn . So we have n + 1 functions p, e1 , · · · , en on S with values in
Rn ; the for any function f : S → Rn , the differentials dp, de1 , · · · , den are linear
functions dfp : Tp S → Tf (p) Rn = Rn . Since e1 , e2 , , en form a basis for Rn ,
we can write the vector dfp (v) ∈ Rn as a linear combination df (v) = $i (v)ei .
Everything depends also on p ∈ S, but we do not indicate this in the notation.
The $i are 1-forms on S and we simply write df = $i ei . In particular, when
we take for f the functions p, e1 , · · · , en we get 1-forms $i and $ji on S so that

dp = $i ei , dei = $ij ej . (1)

4.4.1 Lemma. The 1-forms $i , $ji on S satisfy Cartan’s structural equations:

(CS1) d$i = $j ∧ $ji


(CS2) d$ij = $ik ∧ $ki
(CS3) $ij = −$ji
Proof. Recall the differentiation rules d(dϕ) = 0, d(α ∧ β) = (dα) ∧ β +
(−1)degα α ∧ (dβ).
(1)0 = d(dp) = d($i ei ) = (d$i )ei − $i ∧ (dei ) = (d$i )ei − $i ∧ $il el =
(d$ − $j ∧ $ji )ei .
i

(2)0 = d(dei ) = d($ij ej ) = (d$ij )ej − $ij ∧ (dej ) = (d$ij )ej − $ij ∧ $jl el =
j
(d$i − $ik ∧ $ki )ei .
(3)0 = d(ei , ej ) = (dei , ej ) + (ei , dej ) = ($ik ek , ej ) + (ei , $jk ek ) = $ij + $ki .
From now on S is a surface in R3 .

4.4.2 Definition. A Darboux frame along S is an orthonormal frame (e1 , e2 , e3 )


along S so that at any point p ∈ S,
(1) e1 , e2 are tangential to S, e3 is orthogonal to S
(2) (e1 , e2 , e3 ) is positively oriented for the standard orientation on R3 .
The second condition means that the determinant of coefficient matrix [e1 ,e2 , e3 ]
of the ei with respect to the standard basis for R3 is positive. From now on
(e1 , e2 , e3 ) will denote a Darboux frame along S and we use the above notation
$i , $ij .

4.4.3 Interpretation. The forms $i and $ij have the following interpretations.
a) For any v ∈ Tp S we have dp(v) = v considered as vector in Tp R3 . Thus

v = $1 (v)e1 + $2 (v)e2 . (2)


198 CHAPTER 4. SPECIAL TOPICS

This shows that $1 , $2 are just the components of a general tangent vector
v ∈ Tp S to S with respect to the basis e1 , e2 of Tp S. We shall call the $i the
component forms of the frame. We note that $1 , $2 depend only on e1 , e2 .
b) For any vector field X = X i ei along S in R3 and any tangent vector v ∈ Tp S
the (componentwise) directional derivative Dv X (=covariant derivative in R3 )
is

Dv X = Dv (X i ei ) = dX i (v)ei + X i dei (v) = dX i (v)ei + X i $ij (v)ej .

Hence if X = X 1 e1 + X 2 e2 is a vector field on S, then the covariant derivative


∇v X =tangential component of Dv X is
X
∇v X = (dX α (v) + $βα (v)X β )eα
α=1,2

Convention. Greek indices α, β, · · · run only over {1, 2}, Roman indices i, j, k
run over {1, 2, 3}.
(This is not important in the above equation, since a vector field X on S has
normal component X 3 = 0 anyway.) In particular we have

∇v eβ = $βα (v)eα (3)

This shows that the $βα determine the connection ∇ on S. We shall call them
the connection forms for the frame.
4.4.4 Lemma. a) The Darboux frame (e1 , e2 , e3 ) satisfies the equations of
motion
dp = $1 e1 + $2 e2 + 0
de1 = 0 + $12 e2 + $13 e3
de2 = −$21 e1 + 0 + $23 e3
de3 = −$13 e1 − $23 e2 + 0
b) The forms $i , $ij satisfy Cartan’s structural equations
d$1 = −$2 ∧ $12 , d$2 = $1 ∧ $12
$1 ∧ $13 + $2 $23 = 0
d$12 = −$13 ∧ $23 .
Proof. a) In the first equation, $3 = 0, because dp/dt = dp(ṗ) is tangential to
S.
In the remaining three equation $ij = −$ji by (CS3).
b) This follows from (CS1 –3) together with the relation $3 = 0
4.4.5 Proposition. The Gauss curvature K satisfies

d$12 = −K$1 ∧ $2 . (4)

Proof. We use N = e3 as unit normal for S. Recall that the Gauss map
N : S → S 2 satisfies
N ∗ (areaS 2 ) = K(area).
4.4. CARTAN’S MOBILE FRAME 199

Since e1 , e2 form an orthonormal basis for Tp S = TN (p) S 2 the area elements at


p on S and at N (p) on S 2 are both equal to the 2–form $1 ∧ $2 . Thus we get
the relation
N ∗ ($1 ∧ $2 ) = K($1 ∧ $2 ).
From the equations of motion,

dN = de3 = −$13 e1 − $23 e2 .

This gives

N ∗ ($1 ∧ $2 ) = ($1 ◦ dN ) ∧ ($2 ◦ dN ) = (−$13 ) ∧ (−$23 ) == $13 ∧ $23 .

Hence
$13 ∧ $23 = K$1 ∧ $2 .
From Cartan’s structural equation d$12 = −$13 ∧ $23 we therefore get

d$12 = −K$1 ∧ $2 .

The Gauss–Bonnet Theorem. Let D be a bounded region on the surface S


and C its boundary. We assume that C is parametrized as a C ∞ curve p(t),
t ∈ [to , t1 ], with p(to ) = p(t1 ). Fix a unit vector v at p(to ) and let X(t) be
the vector at p(t) obtained by parallel transport to → t along C. This is a C ∞
vector field along C. Assume further that there is a C ∞ orthonormal frame
e1 , e2 defined in some open set of S containing D. At the point p(t), write

X = (cos θ)e1 + (sin θ)e2

where X, θ, e1 , e2 are all considered functions of t. The angle θ(t) can be thought
of the polar angle of X(t) relative to the vector e1 (p(t)). It is defined only up to
a constant multiple of 2π. To say something about it, we consider the derivative
of the scalar product (X, ei ) at values of t where p(t) is differentiable. We have

d ∇X ∇ei
(X, ei ) = ( , ei ) + (X, ).
dt dt dt
This gives
d d
(cos θ) = 0 + (sin θ) $12 (ṗ), (sin θ) = 0 − (cos θ) $12 (ṗ)
dt dt
hence
θ̇ = −$12 (ṗ).
Using this relation together with (4) and Stokes’ Theorem we find that the total
variation of θ around C, given by
Z t1
∆C θ := θ̇(t)dt (*)
t0
200 CHAPTER 4. SPECIAL TOPICS

is Z Z
∆C θ = −$12 = K$1 ∧ $2
C D

This relation holds provided D and C are oriented compatibly. We now assume
fixed an orientation on S and write the area element $1 ∧ $2 as dS. With this
notation, Z
∆C θ = K dS. (**)
D

The relation (**) is actually valid in much greater generality. Up to now we


have assume that the boundary C of D be a C ∞ curve. The equation (**) can
be extended to any bounded region D with a piecewise C ∞ boundary C since
we can approximate D be “rounding off“ the corners. The right side of (**)
then approaches the integral of KdS over D under this approximation, the left
side the sum of the integrals of θ̇(t)dt = −$12 |C over the C ∞ pieces of C. All
of this still presupposes, however, that D carries some C ∞ frame field e1 , e2 ,
although the value of (**) is independent of its particular choice (since the right
side evidently is). One can paraphrase (**) by saying that the rotation angle
of the parallel transport around a simple loop equals the surface integral of the
Gauss–curvature over its interior. This is a local version of the classical Gauss–
Bonnet theorem. (“Local” because it still assumes the existence of the frame
field on D, which can be guaranteed only if D is sufficiently small.)
Calculation of the Gauss curvature. The component forms $1 , $2 for a
basis of the 1–forms on S. In particular, one can write

$12 = λ1 $1 + λ2 $2

for certain scaler functions λ1 , λ2 . These can be calculated from Cartan’s


structural equations

d$1 = −$2 ∧ $12 , d$2 = $1 ∧ $12

which give
d$1 = λ1 $1 ∧ $2 , d$2 = λ2 $1 ∧ $2 . (5)
and find
−K$1 ∧ $2 = d$12 = d(λ1 ∧ $1 + λ2 ∧ $2 ). (6)
We write the equations (5) symbolically as

d$1 d$2
λ1 = λ2 =
$1∧ $2 $1 ∧ $2
(The “quotients“ denote functions λ1 , λ2 satisfying (5).) Then (6) becomes
 d$1 d$2 
−K$1 ∧ $2 = d $ 1
+ $ 2
. (7)
$1 ∧ $2 $1 ∧ $2
4.4. CARTAN’S MOBILE FRAME 201

The main point is that K is uniquely determined by $1 and $2 , which in turn


are determined by the Riemann metric on S: for any vector v = $1 (v)e1 +
$2 (v)e2 we have (v, v) = $1 (v)2 + $2 (v)2 , so

ds2 = ($1 )2 + ($2 )2 .

Hence K depends only on the metric in S, not on the realization of S as subset


in the Euclidean space R3 . One says that the Gauss curvature is an intrinsic
property of the surface, which is what Gauss called the theorema egregium. It is
a very surprising result if one thinks of the definition K = det dN , which does
use the realization of S as a subset of R3 ; it is especially surprising when one
considers that the space-curvature κ of a curve C: p = p(s) in R3 is not an intrin-
sic property of the curve, even though the definition κ = ±kdT /dsk = (scalar
rate of change of tangential direction T per unit length) looks superficially very
similar to K = det dN =(scalar rate of change of normal direction N per unit
area).
4.4.6 Example: Gauss curvature in orthogonal coordinates. Assume
that the surface S is given parametrically as p = p(s, t) so that the vector fields
∂p/∂s and ∂p/∂t are orthogonal. Then the metric is of the form

ds2 = A2 ds2 + B 2 dt2

for some scalar functions A, B. We may take

∂p ∂p
e1 = A−1 , e2 = B −1 .
∂s ∂t
Then
$1 = Ads, $2 = Bdt.
Hence

d$1 = −At ds ∧ dt, d$2 = Bs ds ∧ dt, $1 ∧ $2 = ABds ∧ dt

where the subscripts indicate partial derivatives. The formula (7) becomes
 A Bs   A  B  
t t s
−KABds ∧ dt = d − Ads + Bdt = + ds ∧ dt
AB AB B t A s
Thus
1  At  B  
s
K=− + (8)
AB B t A s
A footnote. The theory of the mobile frame (repère mobile) on general Rie-
mannian manifolds is an invention of Élie Cartan., expounded in his wonderful
book Leçons sur la géométrie des espaces de Riemann. An adaptation to mod-
ern tastes can be found in th book by his son Henri Cartan entitled Formes
diff érentielles.
202 CHAPTER 4. SPECIAL TOPICS

EXERCISES 4.4
The following problems refer to a surface S in R3 and use the notation explained
above.
1. Show that the second fundamental form of S is given by the formula

Φ = $1 $13 + $2 $23 .

2. a) Show that there are scalar functions a, b, c so that

$13 = a$1 + b$2 , $23 = b$1 + c$2

[Suggestion. First write $23 = d$1 + c$2 . To show that d = b, show first that
$1 ∧ $2 ∧ $23 = 0.]
b) Show that the normal curvature in the direction of a unit vector u ∈ Tp S
which makes an angle θ with e1 is

k(u) = a cos2 θ + 2b sin θ cos θ + c sin2 θ

[Suggestion. Use the previous problem.]


4.5. WEYL’S GAUGE THEORY PAPER OF 1929 203

4.5 Weyl’s gauge theory paper of 1929


The idea of gauge invariance is fundamental to present-day physics. It has a
colourful history, starting with Weyl’s 1918 attempt at a unification of Einstein’s
theory of gravitation and Maxwell’s theory of electromagnetism –a false start
as it turned out– then Weyl’s second attempt in 1929 –another false start in
a sense, because a quarter of a century ahead of Yang and Mills. The story
is told thoroughly in O’Raifereartaigh’s “The Dawning of Gauge Theory” and
there is no point in retelling it here. Instead I suggest that Weyl’s paper of
1929 may be read with profit and pleasure for what it has to say on issues very
much of current interest. The paper should speak for itself and the purpose of
the translation is to give it a chance to do so. This meant that I had to take
a few liberties as translator, which I would not have taken with a document
of historical interest only, translating not only into another language but also
into another time. Thus I updated Weyl’s notation to conform with what has
become standard in the meantime and in a few places I updated the formulation
as well. Everything of substance, however, I have kept the way Weyl said it,
including his delightful handling of infinitesimals, limiting the liberties (flagged
in footnotes) to what I thought was the minimum compatible with the purpose
of the translation.
WR

ELECTRON AND GRAVITATION


Hermann Weyl
Zeitschrift für Pysik 56, 330–352 (1929)

Introduction

In this paper I develop in detailed form a theory comprising gravitation, elec-


tromagetism and matter, a short sketch of which appeared in the Proc. Nat.
Acad., April 1929. Several authors have noticed a connection between Einstein’s
theory of parallelism at a distance and the spinor theory of the electron1 . My
approach is radically different, in spite of certain formal analogies, in that I
reject parallelism at a distance and retain Einstein’s classical relativity theory
of gravitation.
The adaptation of the Pauli-Dirac theory of the electron spin promises to lead
to physically fruitful results for two reasons. (1)The Dirac theory, in which
the electron field is represented by a potential ψ with four components, gives
twice to many energy levels; one should therefore *be able to return to the
two component field of Pauli’s theory without sacrificing relativistic invariance.
This is prevented by the term in Dirac’s action which contains the mass m of
the electron as a factor. Mass, however, is a gravitational effect; so there is
hope of finding a replacement for this term in the theory of gravitation which
1 E. Wigner, ZS. f. Phys. 53, 592, 1929 and others.
204 CHAPTER 4. SPECIAL TOPICS

produces the desired correction. (2)The Dirac field equations for ψ together with
Maxwell’s equations for the four potentials Aα for the electromagnetic field have
an invariance property formally identical to the one I called gauge invariance
in my theory of gravitation of 1918; the field equations remain invariant one
replaces simultaneously
∂λ
ψ by eiλ ψ and Aα by Aα − ∂x α
where λ denotes an arbitrary function of the place in four dimensional spacetime.
A factor e/c~ is incorporated into Aα (−e is the charge of the electron, c is the
velocity of light, and ~ is Planck’s constant divided by 2π). The relation of
“gauge invariance” to conservation of charge remains unchanged as well. But
there is one essential difference, crucial for agreement with empirical data in
that the exponent of the factor multiplying ψ is not real but purely imaginary.
ψ now takes on the role played by the ds of Einstein in that old gauge theory.
It appears to me that this new principle of gauge invariance, which is derived
from empirical data not speculation, indicates cogently that the electric field is
a necessary consequence of the electron field ψ and not of the gravitational field.
Since gauge invariance involves an arbitrary function λ it has the character of
“general” relativity and can naturally be understood only in that framework.
There are several reasons why I cannot believe in parallelism at a distance.
First of all, it a priori goes against my mathematical feeling to accept such an
artificial geometry; I find it difficult to understand the force by which the local
tetrads in at different spacetime points might have been rigidly frozen in their
distorted positions. There are also two important physical reason, it seems to
me. It is exactly the removal of the relation between the local tetrads which
changes the arbitrary gauge factor eiλ in ψ from a constant to an arbitrary
function; only this freedom explains the gauge invariance which in fact exists in
nature. Secondly, the possibility of rotating the local tetrads independently of
each other is equivalent with the symmetry of the energy–momentum tensor or
with conservation of energy–momentum, as we shall see.
In any attempt to establish field equations one must keep in mind that these
cannot be compared with experiment directly; only after quantization do they
provide a basis for statistical predictions concerning the behaviour of matter
particles and light quanta. The Dirac–Maxwell theory in its present form in-
volves only the electromagnetic potentials Aα and the electron field ψ. No
doubt, the proton field ψ 0 will have to be added. ψ, ψ 0 and Aα will enter as
functions of the same spacetime coordinates into the field equations and one
should not require that ψ be a function of a spacetime point (t, xyz) and ψ 0 a
function of an independent spacetime point (t0 , x0 y 0 z 0 ). It is natural to expect
that one of the two pairs of components of Dirac’s field represents the electron,
the other the proton. Furthermore, there will have to be two charge conserva-
tion laws, which (after quantization) will imply that the number of electrons
as well as the number of protons remains constant. To these will correspond a
gauge invariance involving two arbitrary functions.
We first examine the situation in special relativity to see if and to what ex-
tent the increase in the number of components of ψ from two to four is neces-
4.5. WEYL’S GAUGE THEORY PAPER OF 1929 205

sary because of the formal requirements of group theory, quite independently


of dynamical differential equations linked to experiment. We shall see that two
components suffice if the symmetry of left and right is given up.

The two–component theory


§1. The transformation law of ψ. Homogeneous coordinates j0 , · · · , j3 in
a 3 space with Cartesian coordinates x, y, z are defined by the equations
x = jj01 , y = jj20 , z = jj30 .
The equation of the unit sphere x2 + y 2 + z 2 = 1 reads
(1) −j02 + j12 + j22 + j32 = 0.
If it is projected from the south pole onto the equator plane z = 0, equipped
with the complex variable
x + iy = ζ = ψ ψ1
2

then one has the equations


(2) j0 = ψ̄1 ψ1 + ψ̄2 ψ2 j1 = ψ̄1 ψ2 + ψ̄2 ψ1
j2 = i(−ψ̄1 ψ2 + ψ̄2 ψ1 ) j3 = ψ̄1 ψ1 − ψ̄2 ψ2
The jα are Hermitian forms of ψ1 , ψ2 . Only the ratios of variables ψ1 , ψ3 and
of the coordinates jα are relevant. A linear transformation of the variables
ψ1 , ψ2 (with complex coefficients) produces a real linear transformation of the
jα preserving the unit sphere (1) and its orientation. It is easy to show, and
well-known, that one obtains any such linear transformation of the jα once and
only once in this way.
Instead of considering the jα as homogenous coordinates in 3 dimensional space,
we now interpret them as coordinates in 4 dimensional spacetime and (1) as the
equation of the light cone; we restrict the complex linear transformations L of
ψ1 , ψ2 to those with determinant of absolute value 1. L produces a Lorentz
transformation Λ = Λ(L) on the jα , i.e. a real linear transformation preserving
the quadratic form
−j02 + j12 + j22 + j32 .
The formulas for the jα and the remarks on preservation of orientation imply
that the Lorentz transformations thus obtained (1)do not interchange past and
future and (2)have determinant+1, not −1. These transformations form a con-
nected, closed continuum and are all so obtained, without exception. However,
the linear transformation L of the ψ is not uniquely determined by Λ, but only
up to an arbitrary factor eiλ of absolute value 1. L can be normalized by the
requirement that its determinant be equal to 1, but remains a double valued
even then. The condition (1) is to be retained; it is one of the most promising
aspects of the ψ–theory that is can account for the essential difference between
past and future. The condition (2) removes the symmetry between left and right.
It is only this symmetry, which actually exists in nature, that will force us (Part
II2 ) to introduce a second pair of ψ–components.
Let σα , α = 0, · · · , 3, denote the coefficient matrix of the Hemitian form of the
variables ψ1 , ψ2 which represents jα in (2):
2 Never written, as far as I know. [WR]
206 CHAPTER 4. SPECIAL TOPICS

(3) jα = ψ ∗ σα ψ.
ψ is taken as a 2–column, ψ ∗ is its conjugate transpose. σ0 is the identity
matrix; one has the equations
(4) σ12 = 1, σ2 σ3 = iσ1
and those obtained from these by cyclic permutation of the indices 1, 2, 3.
It is formally more convenient to replace the real variable j0 by the imaginary
variable ij0 . The Lorentz transformations Λ then appear as orthogonal trans-
formations of the four variables
j(0) = ij0 , j(α) = jα for α = 1, 2, 3.
Instead of (3) write
(5) j(α) = ψ ∗ σ(α)ψ.
so that σ(0) = iσ0 , σ(α) = σα for α = 1, 2, 3. The transformation law of the
components of the ψ field relative to a Lorentz frame in spacetime is char-
acterized by the requirement that quantities j(α) in (5) undergo the Lorentz
transformation Λ under if the Lorentz frame does. A quantity of this type
represents the field of a matter particle, as follows from the spin phenomenon.
The j(α) are the components of vector relative to a the Lorentz frame e(α);
e(1), e(2), e(3) are real space-like vectors forming left-handed Cartesian coordi-
nate system, e(0)/i is a real, time-like vector directed toward the future. The
transformation Λ describes the transition from one such Lorentz frame another,
and will be referred to as a rotation of the Lorentz frame. We get the same co-
efficients Λ(αβ) whether we make Λ act on the basis vectors e(α) of the tetrad
or on theP components j(α):
j = α j(α)e(α) = α j 0 (α)e0 (α),
P
if
e0 (α) = β Λ(αβ)e(β), j 0 (α) = β Λ(αβ)j(β);
P P
this follows from the orthoganality of Λ.
For what follows it is necessary to compute the infinitesimal transformation
(6) dψ = dL.ψ
which corresponds to an infinitesimal rotation dj = dΛ.j under j = j(ψ). The
transformation (6) is assumed to be normalized so that the trace of dL is 0. The
matrix dL depends linearly on thePdΛ; so we write
dL = 21 αβ dΛ(αβ)σ(αβ) = αβ dΛ(αβ)σ(αβ)
P
for certain complex 2 × 2 matrices σ(αβ) of trace 0 depending skew-symetrically
on (αβ) defined by this equation. The last sum runs only over the index pairs
(αβ) = (01), (02), (03); (23), (31), (12).
One must not forget that the skew-symmetric coefficients dΛ(αβ) are purely
imaginary for the first three pairs, real for the last three, but arbitrary otherwise.
One finds
1 1
(7) σ(23) = − 2i σ(1), σ(01) = 2i σ(1)
and two analogous pairs of equations resulting from cyclic permutation of the
indices 1, 2, 3. To verify this assertion one only has to check that the two in-
finitesimal transformations dψ = dLψ given by
1
dψ = 2i σ(1)ψ, and dψ = 12 σ(1)ψ,
correspond to the infinitesimal rotations dj = dΛj given by
4.5. WEYL’S GAUGE THEORY PAPER OF 1929 207

dj(0) = 0, dj(1) = 0, dj(2) = −j(3), dj(3) = j(2),


resp.
dj(0) = ij(1), dj(1) = −ij(0), dj(2) = 0, dj(3) = 0.
§2. Metric and parallel transport. We turn to general relativity. We char-
acterize the metric at a point x in spacetime by a local Lorentz frame or tetrad
e(α). Only the class of tetrads related by rotations Λ is determined by the met-
ric; one tetrad is chosen arbitrarily from this class. Physical laws are invariant
under arbitrary rotations of the tetrads and these can be chosen independently
of each other at different point. Let ψ1 (x), ψ2 (x),be the components of the
matter field at the point x relative to the local tetrad e(α) chosen there. A
vector v P at x can be written in the form
v = α v(α)e(α);
For the analytic representation of spacetime we shall a coordinate system xµ , µ =
0, · · · , 3; the xµ are four arbitrary, differentiable functions on spacetime distin-
guishing the spacetime points. The physical laws are therefore invariant under
arbitrary coordinate transformations. Let eµ (α) denote the coordinates of the
vector e(α) relative to the coordinate system xµ . These 4 × 4 quantities eµ (α)
characterize the gravitational field. The contravariant components v µ of a vec-
tor v with respect to the coordinate system are related to its components v(α)
with respect P to the µtetrad by the equations
µ
v = α v(α)e (α).
On the other hand, the v(α) can be calculate from the contravariant components
vµ by meansP of
v(α) = µ vµ eµ (α).
These equations govern the change of indices. The indices α referring to the
tetrad I have written as arguments because for these upper and lower positions
are indistinguishable. The change of indices in the reverse sense is accomplished
µ
by the P matrix (e µ (α)) inverse to (e
P (α)) : µ
ν ν
α eµ (α)e (α) = δµ and µ eµ (α)e (β) = δ(α, β).
The symbol δ is 0 or 1 depending on whether its indices are the same or not. The
convention of omitting the summation sign applies from now on to all repeated
indices. Let  denote the absolute value of the determinant det(eµ (α)). Division
by of a quantity by  will be indicated by a change to bold type, e.g.
eµ (α)= eµ (α)/.
A vector or a tensor can be represented by its components relative to the coordi-
nate system as well as by its components relative to the tetrad; but the quantity
ψ can only be represented by its components relative to the tetrad. For its trans-
formation law is governed by a representation of the Lorentz group which does
not extend to the group of all linear transformations. In order to accommodate
the matter field ψ it is therefore necessary to represent the gravitational field in
3
the way P described rather than in the form
µ ν
µν gµν dx dx .

3 In formal agreement with Einstein’s recent papers on gravitation and electromagnetism,

Sitzungsber. Preuß. Ak. Wissensch. 1928, p.217, 224; 1920 p.2. Einstein uses the letter h
instead of e.
208 CHAPTER 4. SPECIAL TOPICS

The theory of gravitation must now be recast in this new analytical form. I
start with the formulas for the infinitesimal parallel transport determined by
the metric. Let the vector e(α) at the point x go to the vector e0 (α) at the
infinitely close point x0 by parallel transport. The e0 (α) form a tetrad at x0
arising from the P tetrad e(α) at x by an infinitesimal rotation ω :
(8) ∇e(β) = γ ω(βγ).e(γ), [∇e(β) = e0 (β) − e(β; x0 )].
∇e(β) depends linearly on the vector v from x to x0 ; if its components dxµ
equal v µ = eµ (α)v(α) then ω(βγ) = ωµ (βγ)dxµ equals
(9) ωµ (βγ)v µ = ω(α; βγ)v(α).
The infinitesimal parallel transport of a vector w along v is given by the well-
known equations
∇w = −Γ(v).w, i.e. ∇wµ = −Γµρ (v)wρ , Γµρ (v) = Γµρν v ν ;
the quantities Γµρν are symmetric in ρν and independent of w and v. We therefore
have
e0 (β) − e(β) = −Γ(v).e(β)
in addition to (8). Subtracting the two differences on the left-hand sides gives
the differential de(β) = e(β, x0 ) − e(β, x) :
deµ (β) + Γµρ (v)eρ (β) = −ω(βγ).eµ (γ),
or
∂eµ (β) ν µ ρ ν µ
∂xν e (α) + Γρν e (β)e (α) = −ω(α; βγ).e (γ).
Taking into account that the ω(α; βγ) are skew symmetric in β and γ, one can
eliminate the ω(α; βγ) and find the well-known equations for the Γµρν . Taking
into account that the Γµρν and the Γµ (β, α) = Γµρν eρ (β)eν (α) symmetric in ρ
and ν one can eliminate the Γµρν and finds
µ µ
∂e (α) ν ∂e (β) ν µ
(10) ∂xν e (α) − ∂xν e (α) = (ω(α; βγ) − ω(β; αγ)) e (γ).
The left-hand side is a component of the Lie bracket of the two vector fields
e(α), e(β), which plays a fundamental role in Lie’s theory of infinitesimal trans-
formations, denoted [e(α), e(β)]. Since ω(β; αγ) is skew-symmetric in α and γ
one has
[e(α), e(β)]µ = (ω(α; βγ) + ω(β; γα)) eµ (γ).
or
(11) ω(α; βγ) + ω(β; γα) = [e(α), e(β)](γ).
If one takes the three cyclic permutations of αβγ in these equations and adds
the resulting equations with the signs+ − + then one obtains
2ω(α; βγ) = [e(α), e(β)](γ) − [e(β), e(γ)](α) + [e(γ), e(α)](β).
ω(α; βγ) is therefore indeed uniquely determined. The expression for it found
satisfies all requirements, being skew-symmetric in β and γ, as is easily seen.
In what follows we need in particular the contraction (sum over ρ)
µ
∂eµ (ρ) ν
ω(ρ; ρα) = [e(α), e(ρ)](ρ) = ∂e∂x(α) ν
µ e (α) − ∂xν e (α)eµ (ρ),
Since  = | det eµ (α)| satisfies
−d( 1 ) = d ν
 = eν (ρ)de (ρ)
one finds µ
(12) ω(ρ; ρα) =  ∂e (α)
∂xµν ,
where e(α) = e(α)/.
4.5. WEYL’S GAUGE THEORY PAPER OF 1929 209

§3. The matter action. With the help of the parallel transport one can
compute not only the covariant derivative of vector and tensor fields, but also
that of the ψ field. Let ψa (x) and ψa (x0 ) [a = 1, 2] denote the components
relative to a local tetrad e(α) at the point x and at an infinitely close point
x0 . The difference ψa (x0 ) − ψa (x) = dψa is the usual differential. On the other
hand, we parallel transport the tetrad e(α) at x to a tetrad e0 (α) at x0 . Let
ψa0 denote the components of ψ at x0 relative to the tetrad e0 (α). Both ψa and
ψ’a depend only on the choice of the tetrad e(α) at x; they have nothing to
with the tetrad at x0 . Under a rotation of the tetrad at x the ψa0 transform like
the ψa and the same holds for the differences ∇ψa = ψa0 − ψa . These are the
components of the covariant differential ∇ψ of ψ. The tetrad e0 (α) arises from
the local tetrad e(α) = e(α, x0 ) at x0 by the infinitesimal rotation ω of §2. The
corresponding infinitesimal transformation θ is therefore of the form
θ = 12 ω(βγ)σ(βγ)
and transforms ψa (x0 ) into ψa0 , i.e. ψ 0 − ψ(x0 ) = θ.ψ. Adding dψ = ψ(x0 ) − ψ(x)
to this one obtains
(13) ∇ψ = dψ + θ.ψ.
Everything depends linearly on the vector v from x to x0 ; if its components dxµ
equal v µ = eµ (α)v(α), then ∇ψ = ∇ψµ dxµ equals
∇ψµ v µ = ∇ψ(α)v(α), θ = θµ v µ = θ(α)v(α).
We find
∇ψµ = ( ∂x∂ µ + θµ )ψ or ∇ψ(α) = (eµ (α) ∂x∂ µ + θ(α))ψ
where
θ(α) = 12 ω(α; βγ)σ(βγ).
Generally, if ψ 0 is a field the same type as ψ then the quantities
ψ ∗ σ(α)ψ 0
are the components of a vector relative to the local tetrad. Hence
v 0 (α) = ψ ∗ σ(α)∇ψ(β)v(β)
defines a linear transformation v 7→ v 0 of the vector space at x which is inde-
pendent of the tetrad. Its trace
ψ ∗ σ(α)∇ψ(α)
therefore a scalar and the equation
(14) i m= ψ ∗ σ(α)∇ψ(α)
defines
R a scalar density m whose integral
m dx [dx = dx0 dx1 dx2 dx3 ]
can be used as the action of the matter field ψ in the gravitational field repre-
sented by the metric defining the parallel transport. To find an explicit expres-
sion for m we need to compute
(15) σ(α)θ(α) = 21 σ(α)θ(βγ).ω(α; βγ).
From (7) and (4) it follows that for α 6= β
σ(β)θ(βα) = 21 σ(α) [no sum over β]
and for any odd permutation αβγδ of the indices 0 1 2 3,
σ(β)θ(γδ) = 12 σ(α).
These two kinds of terms to the sum (15) give the contributions
µ
1 1 ∂e (α)
2 ω(ρ; ργ) = 2 ∂xµ
210 CHAPTER 4. SPECIAL TOPICS

resp.
i
2 ϕ(α)
:= ω(β; γδ) + ω(γ; δβ) + ω(δ; βγ).
If αβγδ is an odd permutation of the indices 0 1 2 3,then according to (11),
i
2 ϕ(α) := [e(β), e(γ)] + +(cycl.perm of βγδ)
µ
= ± ∂e∂x(β) ν
P
(16) ν e (γ)eµ (δ)

The sum run over the six permutations of βγδ with the appropriate signs (and
of course also over
 µ and ν). With this notation, 
µ
∂ψ 1 ∂e (α) ∗
(17) m = 1i ψ ∗ eµ (α)σ(α) ∂x µ + 2 ∂xµ ψ σ(α)ψ +
1
4 ϕ(α)j(α).
The secondhpart is i
1 ν ∂eµ (α)
4i det eµ (α), e (α), ∂x ν , j(α)
(sum over µ and ν). Each term of this sum is a 4 × 4 determinant whose rows
are obtained from the one written by setting α = 0, 1, 2, 3. The quantity j(α) is
(18) j(α) = ψ ∗ σ(α)ψ.
Generally,R it is not an action integral
(19) h dx R
itself which is of significance for the laws of nature, but only is variation δ h dx.
Hence it is not necessary that h itself be real, but it is sufficient that h̄−h be a
divergence. In that case we say that h is practically real. We have to check this
for m. eµ (α) is real for α = 1, 2, 3 and purely imaginary for α = 0. So eµ (α)σ(α)
is a Hermitian matrix. ϕ(α) is also real fro α = 1, 2, 3 and purely imaginary for
α = 0. Thus  µ

∂ψ 1 ∂e (α) ∗
m̄ = − 1i ψ ∗ µ ∂x µ + 2 ∂xµ ψ σ(α)ψ 1
+ 4 ϕ(α)j(α).
∗ µ
∂ψ ∂ψ ∂e (α) ∗
i(m−m̄) = ψ ∗ σ µ ∂x µ
µ + ∂xµ σ ψ + ∂xµ ψ σ(α)ψ
µ
= ∂x∂ µ (ψ ∗ σ µ ψ) = ∂∂xjµ .
Thus m is indeed partially real. We return to special relativity if we set
e0 (0) = −i, e1 (1) = e2 (2) = e3 (3) = 1,
and all other eµ (α) = 0.
§4. Energy. Let (19) be the action of matter in an extended sense, represented
by the ψ field and by the electromagnetic potentials Aµ . The laws of nature say
that the
R variation
δ h dx = 0
when the ψ and Aµ undergo arbitrary infinitesimal variations which vanish
outside of a finite region in spacetime. The variation of the ψ gives the equations
of matter in the restricted sense, the variation of the Aµ the electromagnetic
equations. If the eµ (α), which were kept fixed up to now, undergo an analogous
µ
infinitesimal
R variationR δe (α),µthen there will be an equation of the form
(20) δ h dx = T µ (α)δe (α) dx,
the induced variations δψa and δAµ being absent as a consequence of the preced-
ing laws. The tensor density T µ (α) defined in this way the energy-momentum.
Because of the invariance of the action density h, the variation of (20) must
vanish when the variation δeµ (α) is produced
(1)by infinitesimal rotations of the local tetrads e(α), the coordinates xµ
being kept fixed.
4.5. WEYL’S GAUGE THEORY PAPER OF 1929 211

(2)by an infinitesimal transformation of the coordinates xµ , the tetrads


e(α) being kept fixed. The first process is represented by the equations
δeµ (α) = δΛ(αβ).eµ (β)
where δΛ is a skew symmetric (infinitesimal) rotation depending arbitrarily on
the point x. The vanishing of (20) says that
T (β, α)= T µ (α)eµ (β)
is symmetric in αβ. The symmetry of the energy-momentum tensor is therefore
equivalent with the first invariance property. This symmetry law is however not
satisfied identically, but as a consequence of the law of matter and electromag-
netism. For the components of a given ψ-field will change as a result of the
rotation of the tetrads!
The computation of the variation δeµ (α) produced by the second process is
somewhat more cumbersome. But the considerations are familiar from the the-
ory of relativity in its previous analytical formulation4 . Consider an infinitesimal
transformation of spacetime δx = ξ(x). It induces an infinitesimal transforma-
tion of the vector field e(α) denoted δe(α). The transformed vector fields from
a tetrad for the isometrically transformed metric and if the fields ψ and A are
taken along with unchanged components relative to the transformed tetrad,
then the action integral remains unchanged as well, due to its invariant defini-
tion, which depends only on the metric. The infinitesimal transformation δe of
any vector field e induced by an infinitesimal point transformations δx = ξ is
the difference between the vector e(x) at x and the image of e(x − δx) under
δx = ξ. It is given by the Lie bracket
∂ξ µ ν ∂eµ ν
δeµ = [ξ, e]µ = ∂x ν e − ∂x ν ξ .

Thus (20) will vanish if one substitutes


∂ξ µ ν ∂eµ (α) ν
δeµ (α) = [ξ, e(α)]µ = ∂x ν e (α) − ∂xν ξ (α).
After an integration by parts one obtains
R ∂T νµ ν
{ ∂xν + T ν (α) ∂e∂x(α) µ
µ }ξ dx = 0
µ
Since the ξ are arbitrary functions of vanishing outside of a finite region of
spacetime, this gives the quasi-conservation law of energy-momentum in the
form
∂T ν
µ ∂eν (α)
(21) ∂xν + ∂xµ T ν (α) = 0.
Because of the second term it is a true conservation law only in special relativity.
In general relativity it becomes one if the energy-momentum of the gravitational
field is added5 .
In special relativity one obtains the components Pµ of the 4–momentum as the
integral R
Pµ = x0 =t T 0µ dx [dx = dx1 dx2 dx3 ]
over a 3–space section
(22) x0 = t =const.
4 At this point Weyl refers to his book Raum, Zeit,Materie, 5th ed., p233ff (quoted as

RZM), Berlin 1923. According to a reference there, the argument is due to F.Klein. A
sentence explaining the argument has been added and an infinitesimal calculation, amounting
to a derivation of the formula for the Lie bracket, has been omitted. [WR]
5 Cf. RZM §41. [WR]
212 CHAPTER 4. SPECIAL TOPICS

The integrals are independent of t6 . Using the symmetry of T one finds further
the divergence equations
∂ ν ν
∂xν (x2 T 3 − x3 T 2 ) = 0, · · · ,
∂ ν ν
∂xν (x1 T 1 − x1 T 0 ) = 0, · · · .
The three equations of the first type show that the angular momentum (J1 , J2 , J3 )
is constantR in time:
J1 = x0 =t (x2 T 03 − x3 T 02 )dx, · · · .
The equations of the second type contain the law of inertia of energy 7 .
We compute the energy–momentum density for the matter action m defined
above; we treat separately the two parts of m appearing in (17). For the first
part Rwe obtain after an integration by parts
δm.dx = uµ (α)δeµ (α).dx
R

where ∗
∂ψ 1 ∂(ψ σ(α)ψ)
iuµ (α) = ψ ∗ σ(α) ∂x µ − 2 ∂xµ
 
1 ∗ ∂ψ ∂ψ ∗
= 2 ψ σ(α) ∂xµ − ∂xµ σ(α)ψ) .
The part of the energy–momentum arising from the first part of m is therefore
T µ (α) = uµ (α) − eµ (α)u, T νµ = uνµ − δµν u,
where u denotes the contraction eµ (α)uµ (α). These formulas also hold in gen-
eral relativity, for non-constant eµ (α). To treat the second part of m we restrict
ourselves to special relativity for the sake simplicity. For this second part in (17)
one has h µ
i
1
det eµ (α), eν (α), ∂(δe (α))
R R
δm.dx = 4i ∂x ν , j(α) dx
h i
1 ∂j(α)
= 4i det δeµ (α), eµ (α), eν (α), ∂xν , j(α) dx
R

The part of the energy–momentum


h arising fromi the second part of m is therefore
1 ν ∂j(α)
T µ (0) = − 4i det eµ (i), e (i), ∂xν , j(i)
i=1,2,3.
T 0µ arises from this by multiplication by −i; hence T 00 = 0 and
 
(23) T 01 = 41 ∂j(3) ∂j(2)
∂x2 − ∂x3 ,· · · .
Be combine both parts to determine the total energy, momentum, and angular
momentum. The equation
P3  ∗ i ∂ψ ∂ψ ∗ i

T 00 = − 2i1
i=1 ψ σ ∂x i − ∂xi σ ψ)
gives afterRan integrationR by P parts on the subtracted term,
3 ∂ψ
P0 = T 00 dx = − 1i ψ ∗ i=1 σ i ∂x i · dx.

This leads to the P3 operator


−P0 := 1i i=1 σ i ∂x ∂
i

as quantum-mechanical representative for the energy −P0 of a free particle.


Further,
R  ∗ ∂ψ ∂ψ ∗

P1 = T 01 dx = − 2i 1
R
ψ ∂x1 − ∂x 1ψ · dx
 
∂ψ
= 1i ψ ∗ ∂x
R
1 · dx.

6 Cf. RZM p.206. [WR]


7 Cf. RZM p.201.
4.5. WEYL’S GAUGE THEORY PAPER OF 1929 213

The expression (23) does not contribute to the integral. The momentum (P1 , P2 , P3 )
will therefore be represented by the operators
P1 , P2 , P3 := 1i ∂x∂ 1 ∂ 1 ∂
1 , i ∂x2 , i ∂x3

as it must be, according to Schrödinger. From the complete expression for


x2 T 03 − x3 T 02 one obtains by a suitable integration by parts, the angular mo-
mentum R  
1 ∗ ∂ψ ∂ψ
J1 = i ψ (x 2 ∂x 3 − x 3 ∂x 2 ) + 1
2 j(1) .dx, [j(1) = ψ ∗ σ(1)ψ].
The angular momentum (J1 , J2 , J3 ) will therefore be represented by the opera-
tors
∂ψ ∂ψ
J1 := 1i (x2 ∂x 1
3 − x3 ∂x2 ) + 2 σ(1), · · ·

in agreement with well-known formulas.


Spin –having been put into the theory from the beginning– must naturally
reemerge here; but the manner in which this happens is rather surprising and
instructive. From this point of view the fundamental assumptions of quantum
theory seem less basic than one may have thought, since they are a consequence
of the particular action density m. On the other hand, these interrelations
confirm the irreplaceable role of m for the matter part of the action. Only
general relativity, which –because of the free mobility of the eµ (α)– leads to
a definition of the energy–momentum free of any arbitrariness, allows one to
complete the circle of quantum theory in the manner described.
§5. Gravitation. We return to the transcription of Einstein’s classical theory
of gravitation and determine first of all the Riemann curvature tensor8 . Con-
sider a small parallelogram x(s, t), 0 ≤ s ≤ s∗ , 0 ≤ t ≤ t∗ and let ds , dt be the
differentials with respect to the parameters9 . The tetrad e(α) at the vertex
x = x(0, 0) is parallel-transported to the opposite vertex x∗ = x(s∗ , t∗ ) along
the edges, once via x(s∗ , 0) and once via x(0, t∗ ). The two tetrads at x∗ obtained
in this way are related an in infinitesimal rotation Ω(ds x, dt x) depending on the
line elements ds x, dt x at x in the form
Ω(ds x, dt x) = Ωµν ds xµ dt xν = 21 Ωµν (ds x ∧ dt x)µν
where Ωµν is skew-symmetric in µ and ν and (ds x ∧ dt x)µν = ds xµ dt xν −
dt xµ ds xν are the components of the surface element spanned by ds x and dt x.
As infinitesimal rotation, Ωµν is itself a skew-symmetric matrix (Ωµν (αβ)); this
is the Riemann curvature tensor.
The tetrad at x∗ arising from the tetrad e(α) at x by parallel transport via
x(0, t∗ ) is
(1 + ds + ω)(1 + dt + ω)e(α)
in a notation easily understood. The difference of this expression and the one
arising from it by interchange of ds and dt is
= {ds (dt ω) − dt (ds ω)} − {ds ω.dt ω − dt ω.ds ω}
One has
∂ω
ω = ωµ dxµ , dt (ds ω) = ∂xνµ ds xν dt xµ − ωµ dt ds xµ ,
and dt ds xµ = ds dt xµ . Thus
8 Cf. RZM, p119f.
9 Weyl’s infinitesimal argument has been rewritten in terms of parameters.
214 CHAPTER 4. SPECIAL TOPICS
 
∂ωµ
Ωµν = ∂ω ∂xµ − ∂xν
ν
+ (ωµ ων − ων ωµ ).
The scalar curvature is
ρ = eµ (α)eν (β)Ωµν (αβ).
The first, differentiated term in Ωµν gives the contribution
∂ωµ (αβ)
(eν (α)eµ (β) − eν (β)eµ (α)) ∂x ν .
Its contribution to ρ = ρ/ consists of the two terms, 
µ
ωµ (α,β) ∂eµ (α) ν ∂eµ (α) ν
−2ω(β, αβ) ∂e∂x(αβ)
ν and  xν e (β) − xν e (α)
after omission of a complete divergence. According to (12) and (10) these terms
are
−2ω(β; ρβ)ω(α; αρ) and 2ω(α; βγ)ω(γ; αβ).
The result is the following expression for the action density g of gravitation10
(24) g =R ω(α; βγ)ω(γ; αβ) + ω(α; αγ)ω(β; βγ).
The integral g dx is not truly invariant, but it is practically invariant: g differs
from the true scalar density ρ by a divergence. Variation of the eµ (α) in the
totalRaction
(g + κh) dx
gives the gravitational equations. (κ is a numerical constant.) The gravitational
energy–momentum tensor Vµν is obtained from g if one applies an infinitesimal
translation δxµ = ξ µ with constant coefficients ξ µ in the coordinate space. The
induce variation ofµ the tetrad is
δeµ (α) = − ∂e∂x(α)
ν ξν .
µ
g is a function of eµ (α) and its derivative eµν (α)= ∂e∂x(α) ν ; let δg be the total
differential of g with respect to these variables:
δg = g µ (α)δeµ (α) + g νµ (α)δeµν (α).
The variation of the action caused by the infinitesimal translation δxµ = ξ µ in
the coordinate space
R must vanish:
∂g µ
R
(25) δg.dx + ∂x µ ξ .dx = 0.

The integral is taken over an arbitrary piece of spacetime. One has


R R ∂g ν
µ (α) µ
R ∂(gνµ (α)δeµν (α)) ν
δg.dx = (g µ (α) + ∂x µ )δe (α).dx + ∂xν .dx .
The expression in parenthesis in the first term equals −κT µ (α), in virtue of the
gravitational equations, and the integral itself equals
ν
−κ T ν (α) ∂e∂x(α) µ
R
ν ξ .dx.

Introduce the quantity


ρ
V νµ = δµν g − ∂e∂x(α) ν
ρ g ρ (α).

The equation (25) says that


R ∂V νµ ν
( ∂xν − κT ν (α) ∂e (α) µ
∂xνµ )ξ .dx = 0
the integral being taken over any piece of spacetime. The integrand must there-
fore vanish. Since the ξ µ are arbitrary constants, the factors in front of them
must all be zero. Substituting from the resulting expression into (21) produces
the pure divergence equation
∂(V ν ν
µ +κT µ )
∂xν = 0,
10 Cf RZM p.231. [WR]
4.5. WEYL’S GAUGE THEORY PAPER OF 1929 215

the conservation law for the total energy momentum V νµ /κ + T νµ of which V νµ /κ


proves to be the gravitational part. In order to formulate a true differential
conservation law for the angular momentum in general relativity one has to
specialize the coordinates so that a simultaneous rotation of all local tetrads
appears as an orthogonal transformation of the coordinates. This is certainly
possible, but I shall not discuss it here.
§6. The electromagnetic field. We now come to the critical part of the
theory. In my opinion, the origin and necessity of the electromagnetic field is
this. The components ψ1 , ψ2 are in reality not uniquely specified by the tetrad,
but only in so far as they can still be multiplied by an arbitrary “gauge factor”
eiλ of absolute value 1. Only up to such a factor is the transformation specified
which ψ suffers due to a rotation of the tetrad. In special relativity this gauge
factor has to be considered as a constant, because here we have a single fame,
not tied to a point. Not so in general relativity: every point has its own tetrad
and hence its own arbitrary gauge factor; after the loosening of the ties of the
tetrads to the points the gauge factor becomes necessarily an arbitrary function
of the point. The infinitesimal linear transformation dA of the ψ corresponding
to an infinitesimal rotation ω is the not completely specified either, but can be
increased by an arbitrary purely imaginary multiple i dA of the identity matrix.
In order to specify the covariant differential ∇ψ completely one needs at each
point x, in addition to the metric, one such a dA for each vector v = (dx).
In order that ∇ψ remain a linear function of (dx), dA = Aµ dxµ has to be a
linear function of the components dxµ . If ψ is replaced by eiλ ψ, then dA must
simultaneously be replaced by dA − dλ, as follows from the formula for the
covariant differential. A a consequence, one has to add the term
1 1 ∗ ∗
(26)  A(α)j(α) =  A(α)ψ σ(α)ψ = A(α)ψ σ(α)ψ
to the action density m. From now on m shall denote the action density with
this addition, still given by (14), but now with
∇ψ(α) = (eµ (α) ∂x∂ µ + θ(α) + iA(α))ψ.
It is necessarily gauge invariant, in the sense that this action density remains
unchanged under the replacements
∂λ
ψ 7→ eiλ ψ, Aµ 7→ Aµ − ∂x µ

for an arbitrary function λ of the point. Exactly in the way described by (26)
does the electromagnetic potential act on matter empirically. We are therefore
justified in identifying the quantities Aµ defined here with the components of
the electromagnetic potential. The proof is complete if we show that the Aµ
field is conversely acted on by matter in the way empirically known for the
electromagnetic field.
∂Aµ
Fµν = ∂A∂xµ − ∂xν
ν

is a gauge invariant skew-symmetric tensor and


(27) l = 14 Fµν F µν
is the scalar action density characteristic of Maxwell’s theory. The Ansatz
(28) h = m + al
(a a numerical constant) gives Maxwell’s equation by variation of the Aµ , with
(28) −j µ = −ψ ∗ σ µ ψ
216 CHAPTER 4. SPECIAL TOPICS

as density of the electric four current.


Gauge invariance is closely related to the conservation
R law for the electric charge.
Because of the gauge invariance of h,the variation δ h dx must vanish identical
when the ψ and Aµ are varied according to
∂λ
δψ = iλ.ψ, δAµ = − ∂x µ
µ
while the e (α) are kept fixed; λ is an arbitrary function of the point. This
gives an identical relation between matter equations and the electromagnetic
equations. If the matter equations (in the restricted sense) hold, then it follows
that R
δ h dx = 0,
provided only the Aµ are varied, in accordance with the equations δAµ =
−∂λ/∂xµ . On the other hand, the electromagnetic equations imply the same,
if only the ψ is varied according to δψ = iλ.ψ. For h = m + al one obtains in
bothRcases R ∂j µ
δh.dx = ± ψ ∗ σ µ ψ. ∂x∂λ
R
µ dx = ∓ λ ∂xµ dx.
We had an analogous situation for the conservation laws for energy–momentum
and for angular momentum. These relate the matter equations in the extended
sense with the gravitational equations and the corresponding invariance under
coordinate transformations resp. Invariance under arbitrary independent rota-
tions of the local tetrads at different spacetime points.
It follows from
∂j µ
(30) ∂xµ = 0
that the flux
(ψ, ψ) = j 0 dx = e0 (α)ψ ∗ σ(α)ψ dx
R R
(31)
of the vector density j µ through a three dimensional section of spacetime, in
particular through the section x0 = t =const. of (22), is independent of t. Not
only does this integral have an invariant significance, but also the individual
element of integration; however, the sign depends on which the direction of
traversal of the three dimensional section is taken as positive. The Hermitian
form
(32) j 0 = e0 (α)ψ ∗ σ(α)ψ
must be positive definite for j 0 dx to be taken as probability density in three
space. It is easy to see that this is the case if x0 =const. is indeed a spacelike
section though x, i.e. when its tangent vectors at x are spacelike. The sections
x0 =const must ordered so that x0 increases in the direction of the timelike
vector e(0)/i for (30) to be positive. The sign of the flux is specified by these
natural restrictions on the coordinate systems; the invariant quantity (31) shall
be normalized in theR usual way be the condition
(33) (ψ, ψ) = j 0 dx = 1.
The coupling constant a which combines m and l is then a pure number = c~/e2 ,
the reciprocal fine structure constant.
We treat ψ1 , ψ2 ; Aµ ; eµ (α) as the independent fields for the variation of the
action. To the energy–momentum density T νµ arising from m one must add
Aµ j ν − δµν (Aρ j ρ )
4.5. WEYL’S GAUGE THEORY PAPER OF 1929 217

because of the additional term (26). In special relativity, this lead one to rep-
resent theP energy by the operator
3
H = i=1 σ i 1i ∂x∂ µ + Aµ ,


because the Renergy value is


−P0 = ψ ∗ Hψ dx = (ψ, Hψ).
The matter equation then read
1 ∂

i ∂x µ + Aµ ψ + Hψ = 0
∂ψ
and not 1i ∂x µ + Hψ = 0, as had assumed in quantum mechanics up to now.
Of course, to this matter energy one has to add the electromagnetic energy for
which the classical expressions of Maxwell remain valid.
As far as physical dimensions are concerned, in general relativity it is natural to
take the coordinates xµ as pure numbers. The quantities under consideration are
then invariant under change of scale, but under arbitrary transformations of the
xµ . If all e(α) are transformed into be(α) by multiplication with an arbitrary
factor b, then ψ has to be replaced by e3/2 ψ if the normalization (33) is to
remain in tact. m and l remain thereby unchanged, hence are pure numbers.
But g takes on the factor 1/b2 , so that κ becomes the square of a length d.
κ is not identical with Einstein’s constant of gravitation, but arises from it by
multiplication with 2~/e. d lies far below the atomic scale, being ∼ 10−32 cm.
So gravity will be relevant for astronomical problems only.
If we ignore the gravitational term, then the atomic constants in the field equa-
tions are dimensionless. In the two–component theory there is no place for a
term involving mass as a factor as there is in Dirac’s theory11 . But one know
how one can introduce mass using the conservation laws12 . One assumes that
the T νµ vanish in “empty spacetime” surrounding the particle, i.e. outside of a
channel in spacetime, whose sections x0 =const. are of finite extent. There the
eµ (α) mayRbe taken to be constant as in special relativity. Then
Pµ = (T 0µ + κ1 V 0µ ) dx
are the components of a four-vector in empty spacetime, which is independent
of t and independent of the arbitrary choice of the local tetrads (within the
channel). The coordinate system in empty spacetime can be normalized further
by the requirement that the 3–
momentum (P1 , P2 , P3 ) should vanish; −P0 is then the invariant and time-
independent mass of the particle. One then requires that the value m of this
mass be given once and for all.
The theory of the electromagnetic field discussed here I consider to be the cor-
rect one, because it arises so naturally from the arbitrary nature of the gauge
factor in ψ and hence explains the empirically observed gauge invariance on the
basis of the conservation law for electric charge. But another theory relating
electromagnetism and gravitation is presents itself. The term (26) has the same
form as the section part of m in formula (17); ϕ(α) plays the same role for the
for the latter as A(α) does for the former. One may therefore expect that matter
11 Proc. Roy. Soc. (A) 117, 610.
12 Cf. RZM, p.278ff. [WR]
218 CHAPTER 4. SPECIAL TOPICS

and gravitation, i.e. ψ and eµ (α), will by themselves be sufficient to explain the
electromagnetic phenomena when one takes the ϕ(α) as electromagnetic poten-
tials. These quantities ϕ(α) depend on the eµ (α) and on their first derivatives
in such a manner that there is invariance under arbitrary transformations of the
coordinates. Under rotations of the tetrads, however, the ϕ(α) transform as the
components of a vector only if all tetrads undergo the same rotation. If one ig-
nores the matter field and considers only the relation between electromagnetism
and gravitation, then one arrives in this way at a theory of electromagnetism of
exactly the kind Einstein recently tried to establish. Parallelism at a distance
would only be simulated, however.
I convinced myself that this Ansatz, tempting as it may be at first sight, does not
lead to Maxwell’s equations. Besides, gauge invariance would remain completely
mysterious; the electromagnetic potential itself would have physical significance,
and not only the field strength. I believe, therefore, that this idea leads astray
and that we should take the hint given by gauge invariance: electromagnetism
is a byproduct of the matter field, not of gravitation.

***

A comment. No doubt, the protagonist of Weyl’s paper is the vector–valued


Hermitian form13
j(α) = ψ ∗ σ(α)ψ.
with which it opens. Weyl’s whole theory is based on the fact that the most
general connection on ψ whose parallel transport is compatible with the metric
parallel transport of j(ψ) differs by a differential 1-form iA from the connection
on ψ induced by the metric itself. Attached to the action through the cubic Aψ
interaction A(α)ψ ∗ σ(α)ψ, the vector field j(ψ) then reemerges as the Nöther
current corresponding to gauge transformations of the spinor field ψ.
The current j(ψ) is a very special object indeed, in peculiar to four dimensions
in this form. (Related objects do exist in a few other dimensions, e.g. in eight,
depending somewhat on what one is willing to accept as “analog”.) No such
canonical current present in the setting of general gauge theory. The cutting of
this link between ψ and j by Yang and Mills made room for other fields, but
at the same time destroyed the beautifully rigid construction of Weyl’s theory.
This may be one reason why Weyl never wrote the promised second part of
the paper. Anderson’s discovery of the positron in 1932 must have dashed
Weyl’s hope for a theory of everything. (One is strangely reminded of Kepler’s
planetary model based on the five perfect solids; happily Kepler was spared the
discovery of Uranus, Neptune, and Pluto.)
WR

13 This form had shown up long before Dirac and Weyl, in the context of projective geometry,

in Klein’s work on automorphic functions (cf. Fricke and Klein book of 1897, vol. 1, §12),
and probably even earlier e.g. in the work of Möbius.
Time chart

Bernhard Riemann (1826–1866) C. F. Gauss (1777–1855)

Henri Cartan (1869–1951) Hermann Weyl (1885–1955)

Pictures selected from MacTutor History of Mathematics.

219
220 CHAPTER 4. SPECIAL TOPICS
Annotated bibliography

Modern Geometry–Methods and Applications, vol.1-2, by B.A. Dubrovin, A.T.


Fomenko, S.P. Novikov. Springer Verlag, 1984. [A textbook which could supplement
–or replace– the presentation here. Volume 1 is at the undergraduate level and has a similar
point of view, but does not cover manifolds explicitly. Volume 2 is much more advanced.]

A comprehensive Introduction to Differential Geometry , vol. 1-5, by M. Spivak.


Publish or Perish, Inc., 1979. [Comprehensive indeed: a resource for its wealth of
information. Volume I goes at manifolds from the abstract side, starting from topological
spaces. At the graduate level.]

Geometry and the Imagination by D. Hilbert and S. Cohn-Vossen. German


original published in 1932. Republished by the AMS in 1999. [A visual tour for the
mind’s eye, revealing an unexpected side of Hilbert himself. The original title “Anschauliche
Geometrie” means “Visual Geometry”.]

Riemann’s On the hypotheses which lie at the bases of geometry , his Habilita-
tionschrift of 1854, translated by William Kingdon Clifford. Available on the
web thanks to D.R.Wilkins. [Riemann’s only attempt to explain his revolutionary ideas
on geometry, and that in non-technical terms. Explanations of his explanations are available;
one may also just listen to him.]

Weyl’s Raum–Zeit–Materie of 1918, republished by Springer Verlag in 1984.


[Weyl’s youthful effort to communicate his insights into geometry and physics. Not meant for
the practical man. Its translation “Space–Time-Matter” (Dover,1984) casts Weyl’s poetry in
leaden prose.]

Élie Cartan’s Leçons sur la géométrie des espaces de Riemann dating from 1925–
1926, published in 1951 by Gauthier-Villars. [An exposition of the subject by the
greatest differential geometer of the 20th century, based on his own methods and on some of
his own creations.]

221
222 CHAPTER 4. SPECIAL TOPICS
Index

connected, 34
connection, 94
curvature, 109
curve, 33
differential, 12, 43
differential form, 136
geodesic (connection), 102
geodesic (metric), 71
Lie derivivative, 163
Lie bracket, 165
manifold, 25
Riemann metric, 66
submanifold, 53
tensor, 77
vector, 39

223

You might also like