Differential Geometry and General Relativity 1 9789819900213 9789819900220 Compress
Differential Geometry and General Relativity 1 9789819900213 9789819900220 Compress
Canbin Liang
Bin Zhou
Differential
Geometry
and General
Relativity
Volume 1
Translated and Revised by
Weizhen Jia and Bin Zhou
Graduate Texts in Physics
Series Editors
Kurt H. Becker, NYU Polytechnic School of Engineering, Brooklyn, NY, USA
Jean-Marc Di Meglio, Matière et Systèmes Complexes, Bâtiment Condorcet,
Université Paris Diderot, Paris, France
Sadri Hassani, Department of Physics, Illinois State University, Normal, IL,
USA
Morten Hjorth-Jensen, Department of Physics, Blindern, University of Oslo, Oslo,
Norway
Bill Munro, NTT Basic Research Laboratories, Atsugi, Japan
Richard Needs, Cavendish Laboratory, University of Cambridge, Cambridge, UK
William T. Rhodes, Department of Computer and Electrical Engineering and
Computer Science, Florida Atlantic University, Boca Raton, FL, USA
Susan Scott, Australian National University, Acton, Australia
H. Eugene Stanley, Center for Polymer Studies, Physics Department, Boston
University, Boston, MA, USA
Martin Stutzmann, Walter Schottky Institute, Technical University of Munich,
Garching, Germany
Andreas Wipf, Institute of Theoretical Physics, Friedrich-Schiller-University Jena,
Jena, Germany
Graduate Texts in Physics publishes core learning/teaching material for graduate-
and advanced-level undergraduate courses on topics of current and emerging fields
within physics, both pure and applied. These textbooks serve students at the MS- or
PhD-level and their instructors as comprehensive sources of principles, definitions,
derivations, experiments and applications (as relevant) for their mastery and teaching,
respectively. International in scope and relevance, the textbooks correspond to course
syllabi sufficiently to serve as required reading. Their didactic style, comprehensive-
ness and coverage of fundamental material also make them suitable as introductions
or references for scientists entering, or requiring timely knowledge of, a research
field.
Canbin Liang · Bin Zhou
Differential Geometry
and General Relativity
Volume 1
Canbin Liang Bin Zhou
Department of Physics Department of Physics
Beijing Normal University Beijing Normal University
Beijing, China Beijing, China
Translated by
Weizhen Jia Bin Zhou
Department of Physics Department of Physics
University of Illinois Urbana-Champaign Beijing Normal University
Urbana, IL, USA Beijing, China
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Professor Canbin Liang (1938–2022) giving a lecture in 2012, photograph by Gui-Rong Liang
Foreword
I first met Canbin Liang during a visit I made to China in 1980. Liang had just
received authorization from the Chinese government to come to the United States
for two years as a visiting scholar, and he asked if I could serve as his advisor. I
agreed, and Liang spent 1981–83 in my group at the University of Chicago.
At that time, China had just emerged from the Cultural Revolution, so, as might
be expected, Liang had very little knowledge of modern developments in general
relativity when he arrived. But he had an enormous interest in acquiring a deep
understanding of the foundations of general relativity and a tremendous dedication
to doing so. At the time, I was in the midst of writing my General Relativity book and
was giving considerable thought as to how the subject should be presented. Liang
carefully read drafts of my book and we had many discussions of issues in general
relativity. Liang also had many interactions with Bob Geroch during his years in
Chicago.
After he returned to China, Liang and I maintained our friendship and we
continued to correspond on many issues in general relativity. Liang devoted himself
to teaching the modern approach to general relativity to students in China and he
published his book Differential Geometry and General Relativity in Chinese. I learned
from a number of students who came from China to our Ph.D. program at the Univer-
sity of Chicago that this book—as well as Liang’s renown lecture course in general
relativity—was instrumental in introducing generations of Chinese students to the
modern approach to general relativity.
Liang’s book is quite similar to my General Relativity book in its coverage and
presentation of the core material. However, while my treatment of the mathematical
material focuses on presenting the general, abstract results in as concise a way as
possible, Liang’s book provides many simple examples that illustrate these results.
Liang also makes considerable effort to warn readers of possible pitfalls that can
arise in their understanding of the material. I believe that many readers will find these
examples and additional discussion quite helpful for working through the meaning
of the general concepts.
vii
viii Foreword
I am very pleased that Differential Geometry and General Relativity has been
translated into English, so that students outside of China now also can benefit from
Liang’s insights and pedagogy.
The present book is translated and revised based on the second edition of the original
Chinese text under the guidance of the first author, the late Prof. Canbin Liang. As
one of the most popular Chinese books on theoretical physics, this work has already
influenced generations of Chinese physicists since it first came out in 2000. Now
we are so pleased that this work is translated into English and can be accessed by
a broader group of readers. As can be told by the Chinese readers, the writing style
of Prof. Liang is quite distinctive, as he sometimes uses idioms, or even self-created
expressions, in order to make the text more vivid and intuitive to readers. Although
this brings certain difficulties to the translation, we have made our best attempt to
keep the style of the original text, so that this translation can be not only faithful to
but also as expressive as the original work.
Apart from translation, we also implemented a sizable amount of revision to this
work. From minor typo fixes to major content updates, every single chapter has been
improved to some extent. Some revision was based on the personal notes of Prof.
Liang himself, as he has been organizing possible improvements to this work based on
his teaching experience over the years. Besides this, the parts that have been heavily
modified are primarily those on gravitational radiation (Sect. 7.9) and cosmology
(Chap. 10), due to the rapid developments in these areas over the past two decades.
For Sect. 7.9, we expanded the discussions on the gauge conditions and gravitational
plane waves substantially. We also replaced the out-of-date introduction on the detec-
tion of gravitational waves with a new one, which includes the discussions on the
interferometric detection and the recent progress since the first direct observation by
LIGO. For Chap. 10, we decided to revise and upgrade the content of the original
chapter comprehensively, and it is now divided into two parts. The first half is the
new Chap. 10, which sets up the geometric foundation for cosmology, and focuses
on the standard cosmological model. Particularly, we enhanced our mathematical
descriptions for the spatial geometries of the universe and updated the observational
data to the latest ones. The second half will introduce the (currently in development)
“new standard cosmological model”, which includes inflation, dark matter and dark
energy. This part will be presented in Volume II along with other advanced topics.
ix
x Preface to the English Edition
As has been mentioned in the prefaces of the Chinese editions, this work was
influenced by many other textbooks, especially the classic text General relativity by
Robert Wald. Many discussions in this work can be viewed as extensions of those
concise and incisive lines in Wald’s book, making them more accessible to beginners.
However, the aim of this work is in no way to be a replacement for Wald’s book.
In fact, we encourage readers to refer to his book after reading this text. As Prof.
Liang said, just like Wald’s book does a great job of paving the way for readers to
understand The Large Scale Structure of Space-Time, the masterpiece by Stephen
Hawking and George Ellis, one of the goals of this work is to pave the way for
reading Wald’s book (or one at that level). In addition, the material in these books
is also complementary in many ways. We hope this work, especially Volume I, can
be an initiation for the beginners to differential geometry and general relativity, and
can open up the world for them to absorb further knowledge from other great works.
We would like to thank those who helped us and contributed to this work. First,
we express special thanks to Prof. James Nester, who has supported the preparation
of this English edition since day one. He read the translated manuscript of this
entire book carefully, and provided not just English refinements, but also plenty of
professional comments, which notably improved the quality of this book. We also
benefited a lot from discussions with him. The translator Weizhen Jia would like to
thank his friend Brandon Buncher for his constant support. He read a large portion
of the translated manuscript and always offered nice suggestions when Jia consulted
him. In particular, he helped a lot with those expressions used by Prof. Liang that are
hard to translate, which makes this work more accessible to readers from Western
(and other non-Chinese) backgrounds. We would also like to sincerely thank Prof.
Zhoujian Cao, who provided valuable suggestions on the manuscript of Sect. 7.9,
and has always been very supportive to us during the preparation of this work. We
also appreciate Prof. Bin Hu for his comments and suggestions on the manuscript of
Chap. 10.
We would like to thank Nick Abboud and Marcus Rosales for their helpful
comments, and Jinhuan He for providing a translation draft for part of Chap. 9. In
addition, we thank Prof. Robert Wald for his support and the lovely foreword to the
English edition. We would also like to express thanks to Prof. Jerzy Lewandowski,
Prof. Zheng Zhao, Prof. Sijie Gao, Prof. Youngge Ma and Prof. Tong-Jie Zhang
for their support of the publication of this work, and to Dr. Mengchu Huang from
Springer Nature for his assistance. We are also grateful to Ms. Jinxing Zhang for her
generous help.
Prof. Liang was an extraordinary educator who dedicated his entire life to passing
on the knowledge of physics to the next generation. At age 84, he was still giving
lectures even just a few days before he became critically ill. He left us forever soon
after that, just when this volume was about to be finalized; it is unfortunate that he
could not be here in person to see it being published. We hope that the outcome of
Preface to the English Edition xi
this effort can benefit more readers around the world, fulfilling a wish of our beloved
and highly esteemed professor.
Since its publication in 2000, the first edition of this book (the first volume) has
attracted attention and received praise in the field of theoretical physics, especially
in general relativity, in China. Due to a small print run, it was sold out in only 2 years.
After its publication, I have been using this book as a textbook; thus far, I have used
it five times to teach graduates and undergraduates. In addition to the teaching at
Beijing Normal University, I was also invited to give lectures to both undergraduates
in the Fundamental Science Class of Tsinghua University and graduate students
from the Academy of Mathematics and Systems Science at the Chinese Academy of
Sciences (CAS). The classes at the CAS have also attracted dozens of graduates and
undergraduates from other departments of the CAS and from another 11 institutions
of higher learning, including Peking University and Tsinghua University. I realized
the importance of this work in promoting the subject, and at the same time, I also found
that there are some mistakes and deficiencies in this work. Needing to improve the
content and to supplement the work with more details, I set out to write a draft of the
second edition. While creating the new draft, I consulted with many of my colleagues
and students, including (in alphabetical order by their last names) Zhoujian Cao,
Muxin Han, Zhiquan Kuang, Yongge Ma, Zhi Wang, Xiaoning Wu, Xuejun Yang,
Hao Zhang, Hongbao Zhang, Bin Zhou and Meike Zhou. Among them, Zhoujian Cao,
Zhiquan Kuang, Hongbao Zhang and Bin Zhou have made outstanding contributions.
Through many discussions with Dr. Bin Zhou, I found that he not only has a vast
amount of knowledge of mathematics and physics but also a clear and logical way
of thinking. He always has a relatively clear, deep and accurate understanding of the
mathematical and physical problems within and beyond this work. In this way, he is
truly a rare and outstanding physicist. In order to further improve the writing quality,
I decided to invite him to revise this work as the second author, and he agreed to my
request. The close collaboration in the past 5 months has proved that this was the
correct decision. I think Dr. Bin Zhou has indeed made remarkable contributions to
the revision.
I would like to express special thanks to two other friends who helped me with and
contributed to the writing of this work. The first one is Dr. Robert Wald, a professor
at the University of Chicago and a member of the National Academy of Sciences
xiii
xiv Preface to the Second Edition
1The second volume of the second edition was further divided into two volumes when it was
published, which makes the entire work into three volumes.
Preface to the First Edition
Beginning in 1981, I was a visiting scholar at the relativity group of the University of
Chicago for two years. Before going abroad, for various reasons, I only knew a little
about general relativity, and even less about its essential mathematical tool, modern
differential geometry. Thanks to the strong academic atmosphere of the relativity
group of the University of Chicago, and thanks to the careful guidance of both
Professor Robert Wald (my advisor) and Professor Robert Geroch, I soon became
very interested in this research field. As a teacher, before I returned to China, I had a
strong urge to teach my students what I had learned in the past two years as much as
possible. As soon as I got back to China, I taught a series of graduate-level courses,
starting with “Differential Geometry and General Relativity”. I was also invited to
give lectures outside of Beijing. The past decade’s lecture notes have become the
main source for writing this work. Over the past decade, I taught and learned to further
deepen my understanding of the material I had been teaching. When confronted with
difficulties, I would write to my mentors, Professor Wald and Professor Geroch, for
help, and each time they gave me warm replies. Their insights would never fail to
enlighten me. Physicists often feel that modern differential geometry is abstract and
arcane at first pass, and fail to grasp its essence immediately. I think maybe I can help
them with this issue. As I was once a first-time learner, I can empathize with how
difficult it is to begin learning differential geometry. In addition, my past teaching
experience may help reduce the subject’s difficulty. Reducing difficulty has not only
become an aim of my past decade’s teaching, but also has become a major principle
in the writing of this work. In order to reduce the difficulty, I spared no effort to
elaborate, which dramatically increased this work’s length.
Modern differential geometry is not only crucial to the study of general relativity,
but also has important applications in many sub-disciplines of physics (and even
engineering). Many physicists have realized that modern differential geometry will
play an increasingly important role in their further study based on results from inter-
national conferences and a substantial volume of literature, but find it difficult to learn
the material properly. The heads of the Department of Physics of Beijing Normal
University have recognized the importance of modern differential geometry to physi-
cists much earlier. They encouraged and supported me to transfer my first graduate
xv
xvi Preface to the First Edition
2 In the English edition, each optional reading is indented and typed in a smaller font size.
Preface to the First Edition xvii
are offered on those difficult questions. If time does not allow, the reader may choose
to complete some of the questions that are marked with a tilde. It is okay to read the
material without doing any exercises, but it is likely that you will find it difficult to
understand the later chapters due to the lack of a strong foundation.
Owing to my limited knowledge and understanding, there may exist mistakes and
deficiencies in this book. As an important way to reduce mistakes and deficiencies,
I invited a large number of experts, colleagues and students to read part of the
manuscript of the first volume. In alphabetical order by their last names, they are Bin
Ao, Zhoujian Cao, Luru Dai, Xianxin Dai, Changjun Gao, Sijie Gao, Han He, Bo Hu,
*Chao-Guang Huang, Zhiquan Kuang, **Liao Liu, Xiaoqin Li, Yongge Ma, Junjie
Nan, **Shouyong Pei, **Wen-Chao Qiang, Hua Shen, *Qingjun Tian, *Xiaocen
Tian, Bo-Bo Wang, Jinshan Wu, Xiaoning Wu, **Kongqing Yang, **Yun-Qiang Yu,
*Xuejun Yang, Hongbao Zhang, Peng Zhang, Bin Zhou and Zong-Hong Zhu. (Those
marked with ** are professors or researchers, and those marked with * are associate
professors or associate researchers.) Those mentioned above have put forward many
valuable suggestions on the chapters they read. I would like to express special thanks
to two friends who helped me and contributed substantially to the writing of this
work. The first one is Professor Robert Wald of the University of Chicago. He is
an excellent advisor who enlightened me and provided me immense help with my
teaching and writing after I returned to China. His masterpiece General Relativity is
one of the major references for these volumes. The other one is Researcher Zhiquan
Kuang from the Institute of Mathematics, CAS. He has reviewed many chapters of
this book and put forward many important suggestions. Besides that, his profound
thinking and deep understanding always benefited me when I discussed with him.
I would also like to express thanks to Professor Liao Liu from the Department of
Physics of Beijing Normal University, and Professor Yuanxing Gui from the Depart-
ment of Physics of Dalian University of Technology. Because of their recommen-
dations, this work was included in the publishing plan of Beijing Normal Univer-
sity Press and received financial support from the press as well. I would like to
sincerely thank Professor Zheng Zhao and Professor Yongcheng Wang for their
care and support during the writing and publication. Besides that, I want to express
thanks to Guifu Li, an editor of Beijing Normal University Press, for his support and
contribution. I am grateful to the Beijing Municipal Education Commission for their
approval and funding. I also appreciate the financial support provided by Beijing
Normal University Press.
xix
xx Contents
xxv
xxvi Outline of Volume II
15 Cosmology II
15.1 Finding aWay Out of the Difficulties of the Standard
Cosmological Model—the Inflationary Model
15.2 Dark Matter
15.3 The Cosmological Constant and the Vacuum Energy Problem
15.4 Dark Energy and the “New Standard Cosmological Model”
Exercises
xxix
xxx Outline of Volume III
X = {1, 4, 5.6}
represents the set that consists of the real numbers 1, 4 and 5.6. The other expression
is to point out the general character of elements in a set; for example,
X = {x | x is a real number}
represents the set of all real numbers (the common notation of this specific set is R),
while
X = {x ∈ R | x > 9}
represents the set of all real numbers that are greater than 9.
The set that has no elements is called the empty set, denoted by ∅.
Definition 1 If each element of a set A belongs to a set X , then we say A is a subset
of X . We also say that A is contained in X , or X contains A, denoted by A ⊂ X
or X ⊃ A. Stipulate that ∅ is a subset of any set. Two sets X and Y are said to be
equal (denoted by X = Y ) if X ⊂ Y and Y ⊂ X . A is called a proper subset of X
(denoted by A X ) if A ⊂ X and A = X .
for convenience’s sake, all of the terms “if” and “when” in the definition are implied
to indicate “if and only if”.
This text will use := to represent “is defined as” and use ≡ to represent “identical
to” or “denoted by”. For example, C ≡ A − B means “denote A − B by C”. The
adoption of these two symbols is simply for clarity, they may be replaced by the
equal sign as well.
Definition 2 The union, intersection, difference and complement of two sets A and
B are defined as follows:
Union A ∪ B := {x | x ∈ A or x ∈ B}.
Intersection A ∩ B := {x | x ∈ A, x ∈ B}. (The condition “x ∈ A, x ∈ B” is
short for “x ∈ A and x ∈ B”, the same applies below.)
Difference A − B := {x | x ∈ A, x ∈ / B}. (Mathematics books usually denote the
difference by A\B or A ∼ B, this text will denote all of them by A − B.)
If A is a subset of X then −A, the complement of A, is defined as −A := X − A.
Proof As an example, we prove the second equation of De Morgan’s Law (the reader
should verify the rest). All we have to do for this is to show that both sides of the
equation contain one another.
(A) Suppose x ∈ A − (B ∩ C), then x ∈ A, x ∈ / B ∩ C. The latter leads to x ∈
/ B
or x ∈
/ C. Combining x ∈ A and x ∈ / B, we have x ∈ A − B; combining x ∈ A and
x∈/ C, we have x ∈ A − C, and hence x ∈ (A − B) ∪ (A − C). Therefore,
A − (B ∩ C) ⊂ (A − B) ∪ (A − C) .
(A − B) ∪ (A − C) ⊂ A − (B ∩ C) .
That is, X × Y is a set such that each of its element is an ordered pair (x, y) formed
by one element x from X and one element y from Y . The Cartesian product of a
1.1 The ABCs of Set Theory 3
X × Y × Z := {(x, y, z) | x ∈ X, y ∈ Y, z ∈ Z } .
We also stipulate that the Cartesian product satisfies the associative law, i.e., (X ×
Y ) × Z = X × (Y × Z ).
Example 1 R2 := R × R, Rn := R × · · · × R (n factors in total). Since an element
of R2 is an ordered pair formed by two real numbers, these two real numbers are called
the natural coordinates of this element. Similarly, each element of Rn has n natural
coordinates. It follows that Rn is intrinsically endowed with coordinates, though
this is not necessarily true for other sets. Using natural coordinates, the concept of
distance between any two elements of Rn can be defined.
Definition 4 The distance, denoted by |y − x|, between any two elements x =
(x 1 , . . . , x n ) and y = (y 1 , . . . , y n ) is defined as
n
|y − x| := (y i − x i )2 .
i=1
Starting from the next paragraph, this text will use the mathematical symbols ∀
(stands for “for all” or “for any”) and ∃ (stands for “there exists”) frequently, please
be familiar with them.
Definition 5 Suppose X and Y are non-empty sets. A map from X to Y (denoted
by f : X → Y ) is a rule that associates each element of X with a unique element
in Y . If y ∈ Y is the corresponding element of x ∈ X , we write y = f (x); we also
call y the image of x under the map f and call x the preimage (or inverse image)
of y. X is called the domain of the map f , and the set of images of all elements of
X (denoted by f [X ]) is called the range of the map f : X → Y . Maps f : X → Y
and f : X → Y are said to be equal if f (x) = f (x) ∀x ∈ X .
Remark 2 Usually, y = f (x) is also written as f : x → y. Please note the difference
between → and →: → in f : X → Y means that f is a map from X to Y (a set to
a set), while → in f : x → y means that the image of x ∈ X under the map f is y
(an element to an element).
Remark 3 Suppose A ⊂ X , then the subset formed by the images of elements of A
under the map f is denoted by f [A], i.e.,
1 The Cartesian product of an infinite number of sets can also be defined, but this is outside the
scope of this text.
4 1 Topological Spaces in Brief
Z
g (f (x))
Remark 4 A map from R2 to R gives a function of two variables, because each point
in R2 is described by two real numbers (natural coordinates). Similarly, a map from
Rn to Rm gives m functions of n variables.
Remark 5 ① A necessary and sufficient condition for f to be onto is that the range
f [X ] = Y . ② If f is a one-to-one map, then there exists an inverse map f −1 :
f [X ] → X . However, whether f has an inverse map or not, we can always define
f −1 [B], the “inverse image” of any subset B ⊂ Y under f , as
f −1 [B] := {x ∈ X | f (x) ∈ B} ⊂ X .
Note that the “inverse” here is a subset (rather than an element) of X . For example, if
X has (and only has) two elements x and x , whose image under the action of f are
both y ∈ Y , then although the inverse map f −1 : Y → X does not exist, f −1 [{y}]
is still meaningful when considering y as a one-point subset (i.e., {y}) of Y , and the
meaning is f −1 [y] = {x, x } ⊂ X .
If X and Y are just two sets (with no additional structure), “one-to-one” and
“onto” are the only two requirements that we can impose on a map between X and
Y ; however, if some kinds of additional structures are assigned to X and Y , then
sometimes we can impose additional requirements on f : X → Y . For example, we
2Many mathematics books call the one-to-one and onto maps used in this text injections and
surjections, respectively, and call maps that are both injective and surjective one-to-one maps (also
called bijections). Thus, their one-to-one maps are stronger than the one-to-one maps in this text.
1.1 The ABCs of Set Theory 5
Fig. 1.2 Expressing continuity in terms of open intervals, the curve represents the map f : X → Y
one can also define many concepts and prove many theorems, which develop into a
complete and fruitful branch of mathematics—point-set topology. Sections 1.2 and
1.3 will give an introduction to the basics of topological spaces.
As we mentioned at the end of Sect. 1.1, subsets of R can be divided into two
categories: open subsets and non-open subsets. (Any subset is either open or non-
open. Do not refer to a non-open subset as a closed subset. According to the definition
of a closed subset that we will introduce later, a subset can be open, closed, both, or
neither.) The collection of open subsets has the above three properties (a), (b), (c).
For any non-empty set X , we can also assign some subsets to be open and others to
be non-open in an appropriate manner. To make this assignment useful, we require
that any method of assignment should satisfy three requirements: (a) X itself and the
empty set ∅ are open subsets; (b) the intersection of a finite number of open subsets is
an open subset; (c) the union of any number (which may be finite or infinite) of open
subsets is an open subset. For a given set, there are usually many ways of assigning
openness that meet these three requirements. For example, suppose X is a set, we can
assign X and ∅ as open subsets, and all others as non-open. This certainly satisfies
the above three requirements, with the feature that it has the lowest number of open
subsets (only two). However, we can also have another extreme assignment, namely
to assign any subset of X to be an open subset. It is not hard to see that this method
of assigning openness also satisfies the three requirements above. Although the two
assignments above do not necessarily have much use, they at least can indicate that
there is more than one way of assigning openness to meet the three requirements
above. We say that each of the assignments which satisfies the three requirements
above gives an additional structure to the set X , called the topological structure. For
a set with a topological structure defined, we can point at any subset of it and ask: “Is
this an open subset?” The answer will be either “yes” or “no”, with no middle ground.
Conversely, for a set without any topological structure defined, this kind of question
is meaningless. If X is a set having a topological structure, then the collection of its
open subsets also form a set, called a topology on X , denoted by T . Let P represent
the collection of all subsets of X (as shown in Fig. 1.3), then any open subset O and
any non-open subset V are both elements in P. All of the open subsets of X form a
subset T of P (note that it is not a subset of X ), which is the topology on X . Please
notice the difference between the symbol ⊂ and ∈: O ⊂ X only indicates that O is
a subset of X , while O ∈ T indicates that O is an open subset of X . The discussion
above will pave the way for understanding the definitions expressed by the following
mathematical language.
Fig. 1.3 P is the collection of all subsets of X . Any subset of X (e.g., O, V ) is an element of P .
T is a subset of P such that each element of it (e.g., O) is an open subset of X
n n
(b) If Oi ∈ T , i = 1, 2, . . . , n, then i=1 Oi ∈ T (where i=1 Oi stands for the
intersection of these Oi );
(c) If Oα ∈ T ∀α, then α Oα ∈ T . (Adding ∀α after Oα ∈T indicates that
each Oα belongs to T , with no restriction on the number of Oα . α Oα ∈ T indi-
cates that the union of all Oα belongs to T .)
For the same set X we can define different topologies (there may be many T that
satisfy Definition 1). Suppose T1 and T2 are both topologies on X , then a subset A
of X may satisfy both A ∈ T1 and A ∈ / T2 ; that is, A is an open set for T1 (measured
by T1 ), but not an open set for T2 . We thus see that T1 and T2 define X as two
different topological spaces. In order to clarify the choice of the topology, we use
(X, T ) to represent a topological space. As a result, (X, T1 ) and (X, T2 ) represent
two different topological spaces, even though both of their “base sets” is X . After a
topology is specified, one can also just use X to represent a topological space.
Which topology should we choose for a given set X to make it a topological
space? This depends on the properties of X itself, as well as what kind of problem
we are considering. For example, we may choose the so-called “usual topology” as
the topology for the set R in most of the problems we are usually concerned with
(see Example 3 below).
Example 2 Suppose X is an arbitrary non-empty set, and let T = {X, ∅}, then it
obviously satisfies the three requirements in Definition 1 and hence forms a topology
on X , which is called the indiscrete topology. The indiscrete topology is the topology
that has the lowest number of elements.
Example 3 (1) Suppose X = R, then Tu :={the empty set or subsets of R that can
be expressed as a union of open intervals} is called the usual topology on R.
(2) Suppose X = Rn , then Tu :={the empty set or subsets of Rn that can be
expressed as a union of open balls} is called the usual topology on Rn , where an
open ball is defined as B(x0 , r ) := {x ∈ Rn | |x − x0 | < r }, x0 is called the center
8 1 Topological Spaces in Brief
and r > 0 is called the radius. An open ball in R2 is also called an open disk; an
open ball in R is just an open interval.
It is not difficult to check that the Tu in (1) and (2) satisfy the three requirements
in Definition 1. According to the definition above, any open interval of R is an open
set measured by Tu . However, in principal we can also choose other topologies to
make R a topological space different from (R, Tu ). For example, if measured by the
indiscrete topology, then no subset is an open set other than R and ∅. In contrast, if
measured by the discrete topology, then any subset of R (including any closed interval
or half-closed interval) is an open set. From now on, we will consider (Rn , Tu ) when
we treat Rn as a topological space unless stated otherwise.
It can be proved that T˜ = T = T ; that is, these three definitions for the product topology
on X = X 1 × X 2 × X 3 are different routes to the same end, so there is no ambiguity. The
conclusion for the case of a finite number of topological spaces is also similar. A simple
example is Rn = R × · · · × R, where one can also prove that the product topology defined
in this way agrees with the topology defined in Example 3 in terms of open balls.
[The End of Optional Reading 1.2.1]
3Motivated readers can try to find examples of continuous maps that are both one-to-one and onto
whose inverse is discontinuous (hint: think about discrete and indiscrete topologies).
10 1 Topological Spaces in Brief
Example 8 Consider a circle and an ellipse on a Euclidean plane. From the viewpoint
of Euclidean geometry they are certainly different: the Euclidean geometry has a
concept of distance which circles and ellipses are defined in respect to. However,
from the aspect of pure topology, (R2 , Tu ) is a topological space and a circle S 1
as well as an ellipse E are two subsets of R2 : S 1 , E ⊂ R2 . One can make S 1 and
E topological spaces (S 1 , S S 1 ) and (E, S E ), where S S 1 and S E are topologies
induced by Tu . It can be proved (and it is intuitively easy to believe) that there
exists a homeomorphism f : (S 1 , S S 1 ) → (E, S E ); thus, from the perspective of
1.2 Topological Spaces 11
Fig. 1.6 From the perspective of circuits or topology, a and b are identical, while b and c are
different. From the viewpoint of geometry, all three are different
pure topology they are exactly the same. Conversely, if we cut a gap on S 1 , the result
will be homeomorphic to R, and thus has a different topology from that of S 1 and
E. If we imagine R2 as a rubber sheet and manipulate it by deformation, then the
shape of a curve on the sheet will change with it. However, as long as there is no
cutting or gluing, the curves before and after the deformation are homeomorphic
to each other. Hence, topology is also colloquially called “rubber sheet geometry”.
The major difference between topology geometry and Euclidean geometry is that
the former does not have a concept of distance. At first glance, it may seem that a
geometry without a concept of distance would not be useful, but this is not the case.
A simple example is the electric circuit problem. Although there is a big difference
between Fig. 1.6a, b from the Euclidean point of view, they are identical as circuits.
Conversely, if we cut one of the branches in (b) [turning it to (c)], it will be very
different in the view of circuits. This is the same as the viewpoint of topology. In fact,
topology is very useful in the study of complex circuits (networks), which forms an
applied branch called “network topology”.
(B) Suppose A is a neighborhood of x ∀x ∈ A, and let O = x∈A Ox (Ox ∈ T
satisfies x ∈ Ox ⊂ A in Definition 5), then O = A (the reader should complete the
proof of this). Also, from Definition 1 (c) we know that O ∈ T . Hence, A ∈ T , i.e.,
A is an open set.
Definition 6 C ⊂ X is called a closed set if −C ∈ T .
Theorem 1.2.2 Closed sets have the following properties:
(a) The intersection of any number of closed sets is a closed set;
(b) The union of a finite number of closed sets is a closed set;
(c) X and ∅ are closed sets.
Proof They can be easily proved using Definitions 1, 6 and De Morgan’s Law.
Thus, any topological space (X, T ) has two subsets that are both open and closed,
namely X and ∅.
Definition 7 A topological space (X, T ) is said to be connected if it does not
contain a subset that is both open and closed other than X and ∅.
Example 9 Suppose A and B are open intervals of R, and A ∩ B = ∅ (draw a picture
of this). If we use T to represent the topology induced on the subset X ≡ A ∪ B
by the usual topology of R, then, in addition to X and ∅, the topological space
(X, T ) also has subsets A and B that are both open and closed ( A and B are open
under the induced topology, and they are also closed since they are complements of
one another.). Thus, (X, T ) is not connected, which coincides with the fact that the
picture of A and B you drew is intuitively not connected.4
Suppose (X, T ) is a topological space, and let A ⊂ X . The closure, interior and
boundary of A are defined as follows:
Definition 8 The closure of A, denoted by Ā, is the intersection of all of the closed
sets that contain A, i.e.,
Ā := Cα , A ⊂ Cα , and Cα is closed. (1.2.3)
α
Definition 9 The interior of A, denoted by i(A), is the union of all open sets that
are contained in A, i.e.,
i(A) := Oα , Oα ⊂ A, Oα ∈ T . (1.2.4)
α
4What is more consistent with our intuition is the concept called “arcwise connected”. There are
subtle differences between this and the concept of connected (see the first footnote of Sect. 5.2).
1.3 Compactness [Optional Reading] 13
Proof (a), (b) are easy to prove. (c) can be proved as follows: X − Ȧ = X − [ Ā −
i(A)] = (X − Ā) ∪ i(A), where we used the conclusion of Exercise 1.2 in the last
step. Since Ā is closed, X − Ā is open. In addition, i(A) is open, so hence have
X − Ȧ is open. Therefore, Ȧ is closed.
The definition below will be used in the whole of Sect. 1.3 and the beginning of
Chap. 2:
Definition 2 A ⊂ X is said to be compact if any of its open covers has a finite subcover.
Proof Suppose {Oα } is an arbitrary open cover of A, then there exists at least one element
in {Oα } (denoted by {Oα1 }) that satisfies x ∈ {Oα1 }. Hence, {Oα1 } (as a subset of {Oα }) is
an open cover of A ≡ {x}, and thus {Oα } has a finite subcover.
Proof Let N represent the set of natural numbers, then {(1/n, 2) | n ∈ N} is an open cover
of A that does not have finite subcover.
Proof Omitted.
Remark 1 Do not think that closed sets are necessarily compact (even R has noncompact
closed subsets; the reader should try to find an example). Compactness and closedness are
closely related, but not equivalent. Their relationship is shown in the following two theorems.
14 1 Topological Spaces in Brief
Remark 2 Almost all of the common topological spaces (such as Rn ) are T2 spaces. The
indiscrete topological space is an example of a non-T2 space. Hawking and Ellis (1973)
(pp. 13–14) provided an example that is “closer to practical use”.
Proof The theorem obviously holds when A = ∅, and thus we suppose A = ∅ below. All
we have to prove is that X − A ∈ T ; to prove this we only have to show that ∀x ∈ X − A,
∃O ∈ T such that x ∈ O ⊂ X − A (see Theorem 1.2.1). Since X is a T2 space, when x is
given, ∀y ∈ A, ∃O y , G y ∈ T such that x ∈ O y , y ∈ G y and O y ∩ G y = ∅ (see Fig. 1.7).
Varying y over A yields two sets of subsets {G y | y ∈ A} and {O y | y ∈ A}. It is easy to see that
{G y | y ∈ A} is an open cover of A. The compactness of A assures that it must contain a finite
subcover {G y1 , . . . , G yn }. Let O ≡ O y1 ∩ · · · ∩ O yn , then we have: ① O ∈ T ; ② x ∈ O; ③
O ∩ A = ∅ (the proof is left as an exercise), i.e., O ⊂ X − A. Thus, from Theorem 1.2.1
we know that X − A ∈ T , and hence A is closed.
Proof (A) Suppose A is compact. (a) Since R is a T2 space, from Theorem 1.3.2 we know that
A is a closed set. (b) {(−n, n) | n ∈ N} is an open cover of A, the compactness of A assures
that for this open cover there exists a finite subcover {(−1, 1), (−2, 2), . . . , (−m, m)}, i.e.,
A ⊂ (−1, 1) ∪ (−2, 2) ∪ · · · ∪ (−m, m) = (−m, m), and thus A is bounded.
(B) Suppose A is a bounded closed set. The boundedness assures that ∃M ∈ R such
that A ⊂ [−M, M]. From Theorem 1.3.1 we know that [−M, M], being a subset a subset
of (R, Tu ), is compact. Let C ≡ [−M, M] and use S to represent the topology induced on
C by Tu , then it can be proved that (C, S ) is also compact (exercise). Regarding (C, S )
1.3 Compactness [Optional Reading] 15
Proof Suppose {Oα } is an arbitrary open cover of f [A]. The continuity of f assures that
f −1 [Oα ] is open, and thus { f −1 [Oα ]} is an open cover of A. Since A is compact, there
exists a finite subcover { f −1 [O1 ], . . . , f −1 [On ]}; thus, {O1 , . . . , On } is an open subcover
of {Oα }. Therefore, f [A] ⊂ Y is compact.
From Theorem 1.3.5 we can obtain a corollary: homeomorphisms preserve the compactness
of subsets.
Example 4 Compactness, connectedness, and the property of T2 are all topological proper-
ties. Boundedness is not an topological property; for example, although an interval (a, b) is
homeomorphic to R, the former is bounded while the latter is unbounded. From this it can
also be seen that length is not a topological property either.
Proof Omitted.
Proof This is a corollary of Theorem 1.3.7 and previous theorems (Rn is the Cartesian
product of n R).
5 Strictly speaking, to get the conclusion that A is compact from Theorem 1.3.3 we should generalize
this theorem slightly as follows (its proof is similar to the original theorem): suppose C is a compact
subset of a topological space (X, T ), A ⊂ C and A is a closed subset of (X, T ), then A must be
compact.
16 1 Topological Spaces in Brief
Definition 7 x ∈ X is called the limit of a sequence {xn } if for any open neighborhood O
of x there exists N ∈ N such that xn ∈ O ∀n > N . If x is the limit of {xn }, then we say {xn }
converges to x.
Remark 4 x is the limit of {xn } ⇒ x is an accumulation point of {xn }, but not vice versa.
One of the conditions in the following theorem involves the concept “second countable”. A
set with a finite number of elements is called a finite set; otherwise; it is called an infinite set.
For a finite set one can always number its elements and count them one by one, so a finite set
must be a countable set. However, an infinite set is not necessarily uncountable; for example,
N is a countable infinite set. Finite sets are simpler than infinite sets, and countable infinite
sets are simpler than uncountable infinite sets. A topological space (X, T ) is said to be
second countable if there exists a countable subset {O1 , . . . , O K } ⊂ T or {O1 , . . .} ⊂ T
for T such that any O ∈ T can be expressed as a union of elements from {O1 , . . . , O K } [or
{O1 , . . .}]. For example, (Rn , Tu ) is second countable since Tu has a countable subset such
that any O ∈ Tu can be expressed as a union of elements from this subset. (This countable
subset above is a subset of Tu such that each element Oi of it is an open ball, the natural
coordinates of its center are all rational numbers, so is its radius.)
Proof Omitted.
Exercises
˜1.4. Determine whether each of the following statements is true or false, and give
a brief explanation:
Reference 17
Reference
Hawking, S. W. and Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
Chapter 2
Manifolds and Tensor Fields
Physics cannot be done without a background space. For example, classical mechan-
ics and electrodynamics study the time evolution of matter and electromagnetic fields
in R3 , statistical physics and Hamiltonian theory often use phase spaces, special rel-
ativity has R4 as its spacetime background, etc. Colloquially, these spaces are all
“continuous” rather than consisting of discrete points. The spacetime of general rel-
ativity is also a “continuous 4-dimensional space”, which locally looks like R4 , yet
is not necessarily R4 . However, the meaning of the word “continuous” is not yet
clear. “Differentiable manifold” (or “manifold” for short) is the accurate term used
for all kinds of “continuous spaces” with differential structures. Rn is the simplest
n-dimensional manifold. Roughly speaking, differential manifolds are topological
spaces with differential structures, which look locally like Rn , but globally may be
different from Rn . The precise definition is as follows:
Definition 1 A topological space M is called an n-dimensional differentiable
or n-dimensional manifold for short, if M has an open cover {Oα }, i.e.,
manifold,
M = α Oα (see Definition 11 of Sect. 1.2), satisfying
(a) for each Oα ∃ a homeomorphism ψα : Oα → Vα (Vα is an open subset of Rn
measured by the usual topology);
(b) If Oα ∩ Oβ = ∅, then the composite map ψβ ◦ ψα−1 (see Fig. 2.1) is C ∞
(smooth).1
1 Definition 1 is the general definition of a smooth manifold. In this text, and usually in physics,
manifolds also satisfy the following additional conditions: as a topological space, M is Hausdorff and
second countable (for both see Sect. 1.3). From now on, our manifolds will satisfy these conditions.
© Science Press 2023 19
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_2
20 2 Manifolds and Tensor Fields
x 1 = φ 1 (x 1 , . . . , x n ), . . . , x n = φ n (x 1 , . . . , x n ).
2 The manifold in Definition 1 is short for smooth manifold. Change C ∞ in condition (b) to C r (r
is a natural number), then it becomes the definition of a C r manifold.
2.1 Differentiable Manifolds 21
has been chosen as the differential structure, so that we can perform any coordinate
transformation.
A significant difference between differentiable manifolds and topological spaces
is that the former has differential structures additional to topological structures.
Therefore, for a map between two manifolds we can not only talk about whether
it is continuous, but also whether it is differentiable, or even if it is C ∞ . Sup-
pose M and M are two manifolds whose dimensions are n and n , atlases are
{(Oα , ψα )} and {(Oβ , ψβ )}, respectively, and f : M → M is a continuous map
(see Fig. 2.2). ∀ p ∈ M, choose any coordinate system (Oα , ψα ) such that p ∈ Oα ,
and coordinate system (Oβ , ψβ ) such that f ( p) ∈ Oβ , then ψβ ◦ f ◦ ψα−1 is a map
from Vα ≡ ψα [Oα ] (or an open set of Vα ) to Rn . Thus, this map corresponds to
n functions of n variables, and their C r -differentiability can be used to define the
C r -differentiability for f : M → M .
Definition 3 f : M → M is called a C r map if ∀ p ∈ M, the n functions of n
variables corresponding to the map ψβ ◦ f ◦ ψα−1 are of class C r .
Remark 2 Since charts in the same atlas are compatible, the definition above is
independent of the choice of (Oα , ψα ) and (Oβ , ψβ ).
Definition 4 Differential manifolds M and M are said to be diffeomorphic to each
other if ∃ f : M → M satisfying (a) f is one-to-one and onto; (b) f and f −1 are
C ∞ . Such an f is called a diffeomorphism from M to M .
Remark 3 ① Being a diffeomorphism is the highest requirement one can impose on a
map of manifolds (if there are additional structures imposed on these manifolds then
it is another matter); manifolds that are diffeomorphic to each other can be considered
to be equivalent. ② A necessary condition for two manifolds to be diffeomorphic to
each other is that they have the same dimension. ③ In Definition 1, ψ : Oα → Vα was
required to be a homeomorphism instead of a diffeomorphism since a diffeomorphism
is a relationship between manifolds, and we did not have the concept of manifold
yet. But now that Definition 4 has been introduced, one may naturally ask: if we
treat Oα and Vα in Definition 1 as manifolds, is ψα a diffeomorphism? The answer
is affirmative, and motivated readers should try to verify this. From this, one can
further their understanding of the statement “a manifold M looks locally like Rn ”.
2.2 Tangent Vectors and Tangent Vector Fields 23
First we review the definition of a vector space (i.e., linear space) in linear algebra.
Definition 1 A vector space over the field of real numbers is a set V together with
two maps, namely V × V → V (called addition) and R × V → V (called scalar
multiplication), satisfying the following conditions:
(a) v1 + v2 = v2 + v1 , ∀v1 , v2 ∈ V ;
(b) (v1 + v2 ) + v3 = v1 + (v2 + v3 ) , ∀v1 , v2 , v3 ∈ V ;
(c) ∃ a zero element 0 ∈ V such that 0 + v = v , ∀v ∈ V ;
(d) α1 (α2 v) = (α1 α2 )v , ∀v ∈ V, α1 , α2 ∈ R;
24 2 Manifolds and Tensor Fields
(e) (α1 + α2 )v = α1 v + α2 v , ∀v ∈ V, α1 , α2 ∈ R;
(f) α(v1 + v2 ) = αv1 + αv2 , ∀v1 , v2 ∈ V, α ∈ R;
(g) 1 · v = v , 0 · v = 0 , ∀v ∈ V .
We will also often denote the zero element of V as 0; that is, the symbol 0 stands
for both 0 ∈ R and 0 ∈ V . The reader should be able to identify its meaning by the
context or equation.
Algebraically speaking, any set that satisfies Definition 1 is called a vector space,
and any element of it is called a vector. Suppose p is a point in 3-dimensional
Euclidean space, and V p is the collection of straight line segments (or arrows v) that
start at p with all possible directions and lengths. Define the addition of two arrows
as adding them by the parallelogram law, and define the scalar multiplication α v
(∀α ∈ R, v ∈ V p ) as the manipulation that preserves the direction of the arrow (or
turns into the opposite direction when α < 0) while multiplying its length by |α|,
then V p is a vector space according to Definition 1, and hence each arrow starting at
p is a vector. We want to generalize this kind of concept of vectors to an arbitrary
manifold M; that is, we want to define infinitely many vectors at each point p of
M, the collection of which forms a vector space at p. Since “straight line segment”,
“direction” and “length” are not defined on a general manifold (yet), the way we
defined vectors in terms of arrows cannot be carried over to general manifolds. To
generalize, we should pick the most essential property of an arrow which is the
easiest to be generalized. Suppose v is an arrow at an arbitrary point p in R3 , then
we can take the directional derivative of an arbitrary C ∞ function f on R3 along v,
and the value of this derivative function at p is a real number. Thus, v is a map that
turns f into a real number. Let FR3 represent the collection of all smooth functions
on R3 , then f ∈ FR3 , and hence v is a map from FR3 to R, i.e., v : FR3 → R.
Since the manipulation of taking the directional derivative is linear and satisfies the
Leibniz rule, we finally have found the essential property of an arrow v that can be
easily generalized: it is a linear map from FR3 to R that satisfies the Leibniz rule.
Generalizing this to an arbitrary point p of an arbitrary manifold M, we arrive at the
following definition:
Definition 2 A map v : F M → R is called a vector at a point p ∈ M if ∀ f, g ∈
F M , α, β ∈ R we have
(a) (Linearity) v(α f + βg) = αv( f ) + βv(g);
(b) (Leibniz rule) v( f g) = f | p v(g) + g| p v( f ), where f | p stands for the value
of the function f at p, which can also be denoted by f ( p).
v(g) = v( f + g) = v( f ) + v(g) ,
and hence v( f ) = 0.
stands for a function of n variables F(x 1 , . . . , x n ) rather than a scalar field f . Thus,
(2.2.1) can be shortened as
∂ f (x)
X μ ( f ) := , ∀ f ∈ FM , (2.2.1 )
∂ xμ p
Proof (A) Define the addition, scalar multiplication and zero element according to
the following three equations; it is not hard to verify that V p satisfies Definition 1,
and hence is a vector space.
(1) (v1 + v2 )( f ) := v1 ( f ) + v2 ( f ), ∀ f ∈ F M , v1 , v2 ∈ V p ;
(2) (αv)( f ) := α · v( f ), ∀ f ∈ F M , v ∈ V p , α ∈ R;
(3) Define the zero element 0 ∈ V p to satisfy 0( f ) = 0, ∀ f ∈ F M .
(B) Choose an arbitrary coordinate system whose coordinate patch contains p,
then (2.2.1) defines n vectors X μ at p, where μ = 1, . . . , n. We want to show that they
are linearly independent. Suppose n real numbers α μ (μ = 1, . . . , n) are such that
α μ X μ = 0. (In this text we adopt the Einstein summation convention; that is, repeated
indices are assumed to be summed over; here α μ X μ is short for nμ=1 α μ X μ .) Since
the coordinates x ν (ν = 1, . . . , n) can be treated as functions on the coordinate patch,
both sides of this equation should give us the same result when applied to x μ . Accord-
ing to the definition of the zero element 0 [(3) of (A) in this proof], the action on the
right-hand side yields
0(x ν ) = 0 , (2.2.2a)
α μ X μ (x ν ) = α μ ∂ x ν /∂ x μ | p = α μ δ ν μ = α ν , (2.2.2b)
1, μ = ν ;
where we used (2.2.1) in the first step, and δ μ ν is defined as δ μ ν ≡
0, μ = ν .
ν
Comparing (2.2.2a) and (2.2.2b), we can see that α = 0, ν = 1, . . . , n. Therefore,
X 1 , . . . , X n are linearly independent, and thus dim V p n.
(C) To show that ∀v ∈ V p , we have
v = vμ X μ , (2.2.3)
where
v μ = v(x μ ) . (2.2.3 )
2.2 Tangent Vectors and Tangent Vector Fields 27
[This is the tricky step, see Wald (1984) p. 16 for a proof]. Equation (2.2.3) indicates
that any element of V p can be expressed linearly in terms of these X μ , and (2.2.3 )
says that its coefficients are the real numbers given by the action of v on x μ . The
combination of (B) and (C) indicates that {X 1 , . . . , X n } is a basis of V p , and therefore
dim V p = n.
Theorem 2.2.3 Suppose {x μ } and {x ν } are two coordinate systems, the intersection
of their coordinate patches is non-empty, p is a point in the intersection, v ∈ V p , {v μ }
and {v ν } are the coordinate components of v in these two systems, then
ν ∂ x ν μ
v = v , (2.2.4)
∂ xμ p
Proof We first derive the relationship between two coordinate bases {X μ } and {X ν }
at p. From the definition of {X μ } we can see that ∀ f ∈ F M ,
∂ f (x) ∂ f (x )
X μ( f ) = , Xν( f ) = ,
∂ xμ p ∂x ν p
The equation above indicates that the maps X μ and (∂ x ν /∂ x μ )| p X ν are equivalent,
i.e.,
∂ x ν
Xμ = X . (2.2.5)
∂ xμ p ν
μ ∂x
ν
v X = v νX .
∂x ν
μ
p
ν
Equation (2.2.4) is called the vector (components) transformation law; many books
use this equation as the definition of a vector.
Next, we introduce the definitions of a curve and its tangent vector.
Remark 4 The curve described here is closely related to the intuitive concept of a
curve, but there is also a difference. An intuitive curve usually refers to the image of
the map C : I → M above, namely a subset C[I ] of M (see Fig. 2.3), without any
parameter having been mentioned. The curve defined above refers to the map itself,
which is a “curve with a parameter”.3 Suppose the images of the maps C : I → M
and C : I → M coincide (see Fig. 2.4), then one would intuitively regard them
as the same curve; however, as long as C and C are different maps, according to
Definition 4, they are different curves. Nevertheless, we can say in most instances
that C and C are two parametrization of “the same curve”. To be accurate, the curve
C : I → M is called a reparametrization of the curve C : I → M if ∃ an onto
map α : I → I satisfying (a) C = C ◦ α, (b) the function t = α(t) induced by α
has a nonvanishing derivative. Here is the explanation: from C = C ◦ α we have
C(t) = C (α(t)) = C (t ) , ∀t ∈ I .
The map α being onto assures C [I ] = C[I ], i.e., the maps of these two curves have
the same image.4
3 However, there also exists a curve C : I → M such that its image covers the whole M, which
seems quite far from an intuitive curve.
4 The fact that α satisfies condition (b) assures that α has the property of one-to-one; by adding the
Remark 5 ① The image of a curve C is also often denoted by C(t) (instead of C[I ])
in order to indicate that the parameter of the curve is t. Note that if t is one specific
value (“dead”), then C(t) only stands for a point in the image of the curve; only
when we consider t “can run all over I ” (“alive”) does C(t) stand for the image
of the curve. We also usually refer to the image of a curve simply as a curve; the
reader should recognize by the context whether the word “curve” means the map
or its image. ② The definition of a curve is independent of the coordinate system,
and hence is absolute. However, it is convenient to express a curve explicitly with
the help of a coordinate system. Suppose (O, ψ) is a coordinate system, C[I ] ⊂ O,
then ψ ◦ C is a map from I ⊂ R to Rn , which amounts to n functions of one variable
x μ = x μ (t), μ = 1, . . . , n. These n equations are called the parametric equations,
or a parametric representation of the curve. A simple example is as follows: let
M = R2 , {x 1 , x 2 } is the natural coordinate system of R2 , then x 1 = cos t, x 2 = sin t
are the parametric equations of a curve C : R → R2 , which is a unit circle in R2
centered at the origin.
{ p ∈ O | x 2 ( p) = constant, . . . , x μ ( p) = constant}
can be regarded as (the image of) a curve with the parameter x 1 (changing the
constant value of x 2 , . . . , x n gives another curve), called the x 1 -coordinate line.
The x µ -coordinate line can be defined likewise.
Now let us discuss the tangent vector of a curve. Intuitively, one may think there
are infinitely many tangent vectors parallel to each other at one point of a curve.
However, if we define a curve as a map (“curve with a parameter”), then there is only
one tangent vector at one point of a curve. The definition is as follows:
30 2 Manifolds and Tensor Fields
Example 2 Since the x μ -coordinate line is a curve with x μ as the parameter, the
coordinate basis vector X μ at p defined in (2.2.1) is a tangent vector of the x μ -
coordinate line passing through p. Hence, it is also usually denoted by ∂/∂ x μ | p , and
therefore (2.2.1 ) can also be expressed as
∂ ∂ f (x)
( f ) := ∀ f ∈ FM . (2.2.6 )
∂ xμ p ∂ xμ p
Theorem 2.2.4 Suppose the parametric equations of a curve C(t) in a given coor-
dinate system is x μ = x μ (t), then the expansion of the tangent vector at an arbitrary
point on the curve in this coordinate basis gives
∂ dx μ (t) ∂
= . (2.2.7)
∂t dt ∂ x μ
2.2 Tangent Vectors and Tangent Vector Fields 31
That is, the coordinate components of the tangent vector ∂/∂t of the curve C(t) is the
derivative of the parametric representation x μ (t) of C(t) in this system with respect
to t.
Proof Exercise.
Definition 7 Two nonzero vectors v, u ∈ V p are said to be parallel if ∃α ∈ R such
that v = αu.
From Definition 6 we can see that the tangent vectors of a curve depend on
the parametrization of the curve; there is only one vector at each point C(t0 ) of a
curve C(t) that is tangent to C(t). The reason why it intuitively seems that there
are infinitely many (parallel) tangent vectors at one point of a curve is that, in that
case, we understand a curve as being the image of the map rather than the map itself
[making “degenerate” an infinite number of curves (maps) with the same image into
one curve]. The theorem below indicates that if two curves C and C have the same
image, then their tangent vectors at any point are parallel.
Theorem 2.2.5 Suppose a curve C : I → M is a reparametrization of C : I →
M, then their tangent vectors at any image point has the following relation:
∂ dt (t) ∂
= , (2.2.8)
∂t dt ∂t
pp. 36–37], which is a map that corresponds each t ∈ I to a v ∈ VC(t) whose domain is I
rather than the subset C[I ] of M. Thus, this vector field can be denoted by v(t). In the case
of Fig. 2.8, both “the tangent vector field on C[I ]” and “the tangent vector of C[I ] at p” are
meaningless, but “the tangent vector field T (t) along C” is meaningful. Also, we can talk
about the tangent vector T (t1 ) of C at t1 (T1 in the figure) and the tangent vector T (t2 ) of C
at t2 (T2 in the figure). In this text, we often generally use “a vector field on a curve C(t)”;
for a self-intersecting curve, this actually means the vector field along C.
[The End of Optional Reading 2.2.2]
Proof Exercise.
Remark 7 The equation above is the definition of the commutator [u, v] (as a vector
field), the definition of its value [u, v]| p at each point should be understood as
To firmly believe that [u, v]| p defined by the equation above is a vector at p, one
should also show that (Exercise 2.8) it satisfies the two conditions in Definition 2.
Proof Suppose f (x) is the function of n variables that comes from the combination
of f and this coordinate system. From calculus we know that
∂ ∂ ∂ ∂
μ ν
f (x) = ν μ f (x)
∂x ∂x ∂x ∂x
34 2 Manifolds and Tensor Fields
dx μ (t)
= v μ (x 1 (t), . . . , x n (t)) , μ = 1, . . . , n ,
dt
where v μ is the μth component of v in this coordinate basis field, which is a given
function of x 1 , . . . , x n . From calculus we know that there exists a unique solution for
this system of equations under given initial conditions x μ (0) (μ = 1, . . . , n). When
a point p is given, a set of initial conditions is given, namely x μ (0) = x μ | p ; hence,
there must be a unique solution x 1 (t), . . . , x n (t), and the curve satisfied by these
n equations is the integral curve we want. It should also be verified that the curve
obtained in this way is independent of the coordinate system; the proof is left to the
reader.
[Optional Reading 2.2.3]
The word “unique” in Theorem 2.2.8 should be understood as “locally unique”. Suppose you
have found an integral curve C : (a, b) → M of v, and 0 ∈ (a, b), C(0) = p. Your friend
can always choose a smaller interval (a , b ) ⊂ (a, b) that contains 0 and define a new curve
C : (a , b ) → M as C (t) = C(t) ∀t ∈ (a , b ). The domains of the map C and C are not
equivalent, and hence C = C. In this sense, the integral curves which pass through p are
not unique. However, C is nothing but an extension of C ; they are locally the same, and
Theorem 2.2.8 holds as long as we interpret “unique” as “locally unique”.
Speaking of extension, one may naturally ask: Is it always possible to extend the domain
of an integral curve C from (a, b) to the entirety of R? The answer is negative. The following
simple example is helpful for understanding this. Let x1 and x2 be the natural coordinates of
R2 , define a curve C : R → R2 as C(t) := (0, t) ∈ R2 ∀t ∈ R. The image of this curve is
[X μ , X ν ] = 0, μ, ν = 1, . . . , n ,
then there must exist a coordinate system {x μ } whose coordinate patch O ⊂ N contains p, and on
O we have X μ = ∂/∂ x μ , μ = 1, . . . , n. See Wald (1984) Exercise 5 in Chap. 2 for a hint of the
proof of this theorem. For a complete proof see Spivak (1970) Vol. I, pp. 219–220.
2.2 Tangent Vectors and Tangent Vector Fields 35
the x 2 -coordinate axis in R2 . It is not difficult to see that this is the integral curve of the vector
field ∂/∂ x 2 passing through (0, 0). If we cut out the “upper half” of R2 and regard the rest
of it as a manifold M, or more precisely, define M as M := {(x 1 , x 2 ) ∈ R2 | x 2 < 1}, then
the map C has no image at t 1, and its domain is just the open interval (−∞, 1) instead of
R. This domain cannot be extended anymore, and thus the map C : (−∞, 1) → M is called
an inextensible integral curve of the vector field ∂/∂ x 2 on M. Therefore, Theorem 2.2.8 can
also be expressed as:
We will use some basic knowledge of group theory below, so we introduce the
following definition as a supplement (see Appendix G in Volume II for a detailed
introduction of the theory of Lie groups and Lie algebras):
Definition 12 A group is a set G together with a map G × G → G (called the
group multiplication, the product of elements g1 and g2 is denoted by g1 g2 ), that
satisfies the following conditions:
(a) (g1 g2 )g3 = g1 (g2 g3 ), ∀g1 , g2 , g3 ∈ G;
(b) ∃ an identity element e such that eg = ge = g, ∀g ∈ G;
(c) ∀g ∈ G, ∃ an inverse element g −1 ∈ G such that g −1 g = gg −1 = e.
Symmetry has a great significance for physics, and group theory is a powerful tool
for the study of symmetry. If an object is invariant under a certain transformation, then
we say it has a symmetry under this transformation. Take Fig. 2.9 as an example;
consider a moving point on a charged plane that is translating along the x− (or
y−) axis. Since the surface charge density σ at the moving point is invariant under
translation, we say σ has translational symmetry along the x− (or y−) axis. More
precisely, the translational symmetry of σ along the x-axis means that the function
σ (x, y, z) satisfies
σ (x, y, z) = σ (x + a, y, z) , ∀a ∈ R , (2.2.10)
x → x + a , y → y , z → z (2.2.11)
36 2 Manifolds and Tensor Fields
is called a translation along the x−axis. Suppose G is the collection of all the
translations along the x-axis, then an element in G can be characterized by a real
number a, denoted by φa ∈ G. Consider p ≡ (x, y, z) and q ≡ (x + a, y, z) as two
points in R3 , then the transformation (2.2.11) corresponds to the map φa : R3 →
R3 [satisfying φa ( p) = q], which is a diffeomorphism. Moreover, if we define the
multiplication for G as
then G forms a group (φ0 is the identity, and φ−a is the inverse of φa ). Each of
the infinitely many elements of this group can be characterized by a real number a,
which is therefore called a parameter, and G is called a one-parameter group.
Also, since each group element φa ∈ G is a diffeomorphism on R3 , we also call G a
one-parameter group of diffeomorphisms on R3 . To help the reader understand the
definition of a one-parameter group of diffeomorphisms, let us set the stage for it first.
Suppose M is a manifold, then R × M is a manifold that has one more dimension
than M (see the last paragraph of Sect. 2.1). Suppose φ is a map from R × M to M
(i.e., φ : R × M → M), then it can turn a real number t ∈ R and a point p ∈ M into
a point φ(t, p) ∈ M. We can also visualize φ as a machine with two slots, denoted
by φ(•, •); in order to produce an “end product” φ(t, p) ∈ M, one has to input two
“raw materials”, namely, a real number t ∈ R and a point p ∈ M. If we input t ∈ R
alone, then what it can produce is only a semi-manufacture φ(t, •), which is also a
machine that gives an “end product” after we input p ∈ M. φ(t, •) is usually denoted
by φt , i.e., φt : M → M. On the other hand, if we input p ∈ M to φ(•, •) first, we get
a semi-manufacture φ(•, p), which is also a machine waiting for t ∈ R to be input.
φ(•, p) is usually denoted by φ p , i.e., φ p : R → M.
Definition 13 A C ∞ map φ : R × M → M is called a one-parameter group of
diffeomorphisms on M if
(a) φt : M → M is a diffeomorphism ∀t ∈ R;
(b) φt ◦ φs = φt+s , ∀t, s ∈ R.
that it is located on the integral curve that passes through p, the difference of whose
parameter and the parameter of p is t.] So it looks like we can obtain a one-parameter
group of diffeomorphisms. However, the following problem may occur: the image
point does not exist for certain parameters of a curve (cutting out a region M could
make this situation happen); therefore, we can only say that a smooth vector field
on M gives rise to a one-parameter local group of diffeomorphisms, see Optional
Reading 2.2.4.
[Optional Reading 2.2.4]
Suppose the integral curve C of a vector field v that passes through p has a range of parameters
that cannot reach all of R (see the second paragraph of Optional Reading 2.2.3), namely
∃t ∈ R such that C(t) is not a point of M, then φt defined above is not even a map from M to
M [at least the image point φt ( p) does not exist], so clearly not a diffeomorphism from M
to M. However, it can be proved that ∀ p0 ∈ M, one can always find an open neighborhood
U of p0 and an open interval I in R that contains 0 to make the map φ in the text above
meaningful when restricted on I × U (i.e., there exists a map φ : I × U → M). The precise
definition is ∀t ∈ I , φt : U → M is a map that maps any p ∈ U to a point on the integral
curve that passes through p, with t the difference between the parameters of p and this point
(the reader may understand this with the help of the simple example in the second paragraph
of Optional Reading 2.2.3). Moreover, it can also be proved that φ : I × U → M has the
following properties:
(a) ∀t ∈ I , φt : U → φt [U ] is a diffeomorphism;
(b) If t, s, t + s ∈ I (real numbers t, s and t + s are all in the open interval I ), then φt ◦ φs =
φt+s .
Such a {φt |t ∈ I } is called a one-parameter local group of diffeomorphisms or a one-
parameter family of diffeomorphisms.
A vector field is said to be complete if the range of the parameter of every (inextensible) inte-
gral curve is R. Obviously, each complete smooth vector field can produce a one-parameter
group of diffeomorphisms. It can be proved that any vector field on a compact manifold is
complete. [References for this optional reading: Hawking and Ellis (1973) p. 27; Straumann
(1984) pp. 21–22.]
[The End of Optional Reading 2.2.4]
6When talking about dual vectors (and tensors, see Sect. 2.4) in the future, unless stated otherwise,
we will always assume V is a finite dimensional vector space on R.
38 2 Manifolds and Tensor Fields
Proof Define addition, scalar multiplication and the zero element for V ∗ as follows:
It is not difficult to see that such a V ∗ is a vector space. Suppose {eμ } is a basis of V ,
we can define n special elements e1∗ , . . . , en∗ in V ∗ using the following equation:
The equation above only defines the action of eμ∗ on the basis vectors in V , but since
the action of eμ∗ is linear, it actually defines the action of eμ∗ on an arbitrary element
in V . Now we only have to show that {eμ∗ } is a basis of V ∗ . It is easy to show that
e1∗ , . . . , en∗ are linearly independent to each other (exercise). ∀ω ∈ V ∗ , let
ωμ ≡ ω(eμ ) , μ = 1, . . . , n , (2.3.3)
ω = ωμ eμ∗ . (2.3.4)
(Hint: the equation above is an equality of dual vectors, note that the action of ω on
v is linear, all we have to do for proving this equation is to verify that both sides of it
acting on any basis vector eν give the same real number.) Equation (2.3.4) indicates
that any element in V ∗ can be expressed linearly in terms of {eμ∗ }, and thus {eμ∗ }
is a basis of V ∗ , called the dual basis to the basis {eμ }, from which we find that
dim V ∗ = dim V .
Review. Two vector spaces are said to be isomorphic if there exists a one-to-one and
onto linear map between them (this map is called an isomorphism). A necessary
and sufficient condition of two vector spaces to be isomorphic is they have the same
dimension.
2.3 Dual Vector Fields 39
Remark 2 The reader may be used to writing matrix elements as Aνμ , here we write
them as Aν μ . The reason for distinguishing the upper and lower indices is to make the
summation crystal clear (an upper ν together with a lower ν implies the summation
over ν) and to distinguish the type of a tensor (see Sect. 2.4 for details). However, what
is important in the matrix operation is just differentiating the left and right indices.
Therefore, if you want, you may change all the upper indices to lower indices for
now; for instance, (2.3.6) may be written as eμ ∗ = ( Ã−1 )νμ eν ∗ .
Proof All we have to prove is that both sides of the equation give the same result
when applied to eα . The proof is as follows:
40 2 Manifolds and Tensor Fields
where we used the linearity of the action by a dual vector on a vector in the second
equality, the definitions of a transpose matrix and inverse matrix in the third and
fifth equality respectively, and the definition of dual vector basis (2.3.2) in the sixth
equality.
The discussions above all pertain to algebras; now we get back to a manifold M.
Since p ∈ M has a vector space V p , it also has a V p∗ . If we assign a dual vector at
each point of M (or A ⊂ M), we obtain a dual vector field on M (or A). A dual
vector field ω on M is said to be smooth if ω(v) ∈ F M ∀ smooth vector fields v.
Suppose f ∈ F M , let us show that f naturally induces a dual vector field on M,
denoted by d f . (The d f that our readers are familiar with stands for the differential
of a function f . From the perspective of differential geometry, the differential of f is
essentially a dual vector field. Optional Reading 2.3.1 will introduce the connection
between this brand new understanding and classical calculus.) To define d f we only
have to give the definition of its value d f | p ∈ V p∗ at any point p of M, and to define
d f | p we only have to specify the real number that comes from its action on an
arbitrary vector v ∈ V p at p. This number should be related to both f and v, and the
most natural (simplest) real number that can be constructed from f and v is v( f );
therefore, we define d f | p as
d f | p (v) := v( f ) , ∀v ∈ V p . (2.3.7)
Comparing this with (2.3.2) we can see that {dx μ | p } is exactly the dual coordinate
basis that corresponds to the coordinate basis {∂/∂ x ν | p }. The equation above holds
at any point of O. Therefore, just like ∂/∂ x ν is the νth coordinate basis vector field
on O, dx μ is the μth dual coordinate basis vector field on O, and {dx μ } is a dual
coordinate basis field on O. Any dual vector field ω on O can be expanded in terms
of {dx μ }:
ω = ωμ dx μ , (2.3.9)
2.3 Dual Vector Fields 41
where ωμ are called the coordinate components of ω in this coordinate system whose
expression can be obtained from (2.3.3) as
ωμ = ω(∂/∂ x μ ) . (2.3.10)
∂ f (x) μ
df = dx , ∀ f ∈ FO . (2.3.11)
∂xμ
Proof All we have to prove is that we obtain the same result after applying
both sides of this equation to any coordinate basis vector ∂/∂ x ν , which is very
straightforward.
Theorem 2.3.4 Suppose the coordinate patches of the coordinate systems {x μ } and
{x ν } have an intersection, and ω is a dual vector at an arbitrary point p in the
intersection, then the transformation relation between ωμ and ων , the components
of ω in these two coordinate systems, is
∂ x μ
ων = ωμ . (2.3.12)
∂ x ν p
v at p shows exactly “how far along which direction” a moving point would potentially
“move” from p, we can let d f | p actually “become” a real number (increment) by assigning
a v ∈ V p . And since d f | p evaluates to a real number when v is given, d f | p is actually a map
from the tangent space V p of p to R. To ensure that d f | p has the properties of differential
from classical calculus, this map is also required to be linear. Thus, d f | p is a dual vector on
V p while d f is a dual vector field on O. This is the most concrete and precise interpretation
of d f .
Physicists usually do not make any distinction between d f and f , and like to say
“d f | p equals f (q) − f ( p), where q is a point infinitely close to p.” They may even sketch
two points p and q on paper. In fact, p and q can not be infinitely close as long as they have
been assigned (marked out in a picture), which means f (q) − f ( p) is not an infinitesimal
quantity, and hence it can only be f instead of d f . However, since certain approximations
are always allowed in physics, treating f as being small enough that it approximates d f
is not only allowed, but often quite useful. In fact, suppose a curve C(t) satisfies C(0) = p,
(∂/∂t)| p = v, and q = C(α) with α small enough, then from (2.3.7) and (2.2.6 ) we can see
that the result of d f | p acting on αv is
1
d f | p (αv) = αv( f ) = α lim { f [C( t)] − f [C(0)]}
t→0 t
∼ 1
= α [ f (q) − f ( p)] = f (q) − f ( p) ≡ f,
α
and we see that (after acting on αv) d f | p really gives us f approximately. Albert Einstein
once said: “As far as the laws of mathematics refer to reality, they are not certain; and as far as
they are certain, they do not refer to reality.” As a physics book, this text also approximates
d f as f in multiple places.
[The End of Optional Reading 2.3.1]
T : V ∗ × · · · × V ∗ × V × · · · × V → R .
k terms l terms
Remark 1 T can be likened to a machine with k “upper slots” and l “lower slots”.
So long as we input k dual vectors and l vectors into the upper and lower slots,
respectively, this machine produces a real number which is linearly dependent on
each of the inputs (this is the meaning of a “multilinear map”).
Example 1 (1) A dual vector on V is a tensor of type (0, 1) on V . (2) An element of
V can be regarded as a tensor of type (1, 0) on V . (This is because v can be identified
as v ∗∗ , and v ∗∗ is a linear map from V ∗ to R.)
From now on, we will use TV (k, l) to represent the collection of all tensors of
type (k, l) on V ; thus, V = TV (1, 0), V ∗ = TV (0, 1).
Suppose T ∈ TV (1, 1), then T : V ∗ × V → R. However, T can also be viewed
as another type of map. Since ∀ω ∈ V ∗ , v ∈ V , we have T (ω; v) ∈ R, so T (ω; •) is
2.4 Tensor Fields 43
a machine with only a lower slot that can turn a vector linearly into a real number,
which means that T (ω; •) is a dual vector on V , i.e., T (ω; •) ∈ V ∗ . After T is given,
we can create T (ω; •) with one ω ∈ V ∗ ; hence, T can also be viewed as a map (and it
linearly
is linear) that turns a dual vector ω into a dual vector T (ω; •), i.e., T : V ∗ −−−→ V ∗ .
linearly
Similarly, we can also view T as T : V −−−→ V . These three viewpoints for the
same T ∈ TV (1, 1) are equivalent. For expositional convenience, we call this way of
viewing the same tensor as different maps “the multifaceted view of tensors”. Being
able to have a “multifaceted view” is one of the advantages of defining tensors as
maps. We will use this frequently in the future.
Definition 2 The tensor product T ⊗ T of a tensor T of type (k, l) and a tensor
T of type (k , l ) on V is a tensor of type (k + k , l + l ) defined as follows:
In Euclidean vector field theory, a dyadic v u is actually the tensor product of two
vectors v and u simply with the symbol ⊗ being omitted.7
Do tensor products satisfy the commutative law? Suppose ω ∈ V ∗ , v ∈ V ≡ V ∗∗ ,
then v ⊗ ω ∈ TV (1, 1), ω ⊗ v ∈ TV (1, 1). It follows from Definition 2 that ∀μ ∈ V ∗
and u ∈ V we have v ⊗ ω(μ; u) = v(μ)ω(u) = ω(u)v(μ) = ω ⊗ v(μ; u) [where
v(μ) should be interpreted as v ∗∗ (μ)], and hence v ⊗ ω = ω ⊗ v. However, the
tensor product of two vectors (or two dual vectors) usually becomes another tensor
after exchanging the order, i.e., v ⊗ u = u ⊗ v, ω ⊗ μ = μ ⊗ ω. For instance, a
dyadic in Euclidean space does not satisfy the commutative law.
Theorem 2.4.1 TV (k, l) is a vector space, with dim TV (k, l) = n k+l .
Proof (A) We define the addition, scalar multiplication and zero element in a natural
way and make TV (k, l) a vector space (see the first part of the proof of Theorem 2.3.1).
(B) Show that there are n k+l basis vectors. Take n = 2, k = 2, l = 1 as an example
(it is not difficult to prove this in the general case). Suppose {e1 , e2 } is a basis of V ,
and {e1∗ , e2∗ } is its dual basis. All we have to prove is that the following 8 elements
form a basis of TV (2, 1):
One can first show that they are linearly independent (left as an exercise), and then
show that any T ∈ TV (2, 1) can be expressed as
7Similarly, |ψ|φ in quantum mechanics is also a tensor product of |ψ and |φ simply with the
symbol ⊗ being omitted. However, in quantum mechanics the vector space of |ψ is an infinite
dimensional vector space on C, which is more complicated than a finite dimensional vector spaces
on R which we are discussing. For details see Appendix B in Volume II.
44 2 Manifolds and Tensor Fields
T = T μν σ eμ ⊗ eν ⊗ eσ ∗ , (2.4.1)
where
T μν σ = T (eμ∗ , eν∗ ; eσ ) . (2.4.2)
The proof is left as an exercise. [NB: The equation to be proved, i.e., (2.4.1), is a
tensor equation of type (2, 1).]
where in the first and forth step we used (2.4.2), in the second step we used The-
orem 2.3.2, and in the third step we used the linearity of T . As a result, we have
the matrix equation T = A−1 T A (where T , A and T all represent matrices. T
sometimes represents a tensor and sometimes represents a matrix, the reader should
Thus we can see that T and T are two similar matrices.
interpret it by the context).
Using T μ μ (short for nμ=1 T μ μ ) and T ρ ρ to represent the trace of T and T , then
from (2.3.4) we get
μ
T μ = (A−1 )μ ρ T ρ σ Aσ μ = Aσ μ (A−1 )μ ρ T ρ σ = δ σ ρ T ρ σ = T ρ ρ .
This shows that a tensor of type (1, 1) has the same trace in different bases. When
we are considering tensors we should pay attention to the features that do not depend
on the basis, and the trace of a tensor of type (1, 1) is exactly one of these features,
which is usually called the contraction of T , denoted by CT for now; namely,
CT := T μ μ = T (eμ∗ ; eμ ) . (2.4.4)
And now we discuss the contraction of a tensor T of type (2, 1). T can be denoted
by T (• , • ; •); it has two upper slots and one lower slot, and thus there are two
possible contractions: ① The contraction on the first upper slot and the lower slot
C11 T := T (eμ∗ , • ; eμ ); ② The contraction on the second upper slot and the lower
slot C21 T := T ( • , eμ∗ ; eμ ). If we define these two contractions using another basis
{eρ } and denote them by (C11 T ) and (C21 T ) , respectively, then it is easy to show
2.4 Tensor Fields 45
that (Exercise 2.14) (C11 T ) = C11 T , (C21 T ) = C21 T . From the “multifaceted view
of tensors” we can see that both C11 T and C21 T are tensors of type (1, 0), whose
components in any basis can be expressed in terms of the components of T in this basis
as (C11 T )ν = T (eμ∗ , eν∗ ; eμ ) = T μν μ and (C21 T )ν = T νμ μ (the summation symbol
has been omitted). It is not difficult to generalize the discussion above and give a
definition for the contraction of a tensor of type (k, l) as follows:
Definition 3 The contraction on the ith upper index (i k) and the jth lower index
( j l) of T ∈ TV (k, l) is defined as
Remark 3 ① Cij T does not depend on the choice of a basis. ② It can be easily
seen from (2.4.5) that any contraction of a tensor of type (k, l) is a tensor of type
(k − 1, l − 1). ③ One can construct all kinds of new tensors using tensor products
in conjunction with contractions. For example, suppose v ∈ V , ω ∈ V ∗ , then v ⊗ ω
is a tensor of type (1, 1), while C(v ⊗ ω) is a tensor of type (0, 0) (a scalar).
Later, we will encounter the operation of contracting after taking the tensor product
occurs frequently, whose conclusion can be considered as the action of a tensor on
a vector (or a dual vector). As examples, here we write out three equations and then
prove them.
We will only give the proof of (2.4.7), and the other two equation are left as exercises.
T ⊗ v on the left-hand side of (2.4.7) is a tensor of type (1, 2), which is a machine
with 1 upper slot and 2 lower slots, and can be expressed as T ⊗ v(• ; • , •); hence,
C12 (T ⊗ v) = T ⊗ v(eμ∗ ; • , eμ ) .
T ⊗ v(eμ∗ ; • , eμ ) = T (• , v) . (2.4.7 )
Seeing that this is an equality of dual vectors, we only have to show that both sides
give the same real number when applied to any u ∈ V :
46 2 Manifolds and Tensor Fields
(where we used the result of Exercise 2.11 in the fourth equality), and thus we have
(2.4.7).
Apart from the three equalities above, there are many similar ones. Those equal-
ities represent the following rule: “The action of T on ω (or v) is contracting after
taking the tensor product of T and ω (or v)”, or roughly speaking, “the action is
contracting after taking product”. The manipulation of contracting after taking the
tensor product of two tensors is also usually called contraction for short, and thus
the expression above can even be simplified as “action means contraction”.
Now we return to a manifold M. The collection of all tensors of type (k, l) on
the tangent space V p of an arbitrary point p in M is denoted by TV p (k, l). Suppose
{eμ } and {eν∗ } are an arbitrary basis of V p and its dual basis, respectively, then
T ∈ TV p (2, 1) can also be written in an expanded form similar to (2.4.1). If we
choose a coordinate system such that the coordinate patch contains p, then we can
choose the coordinate basis vectors ∂/∂ x μ and dual basis vectors dx μ to be eμ and
eμ∗ ; namely, we rewrite (2.4.1) as
∂ ∂
T = T μν σ μ
⊗ ν ⊗ dx σ , (2.4.1 )
∂x ∂x
where the coordinate components T μν σ can be expressed following (2.4.2) as
μ1 ...μk ∂ x μ1 ∂ x μ k ∂ x σ1 ∂ x σl ρ1 ...ρk
T ν1 ...νl = . . . . . . T σ1 ...σl .
∂ x ρ1 ∂ x ρk ∂ x ν1 ∂ x νl
Proof Exercise.
Remark 4 Many textbooks adopt the above equation as the definition of a tensor.
2.5 Metric Tensor Fields 47
Remark 3 For Lorentzian metrics, there are two conventions in the literature. Defini-
tion 3 presents the first convention, in which the diagonal elements of a 4-dimensional
Lorentzian metric are (−1, 1, 1, 1) (up to a trivial reordering,8 ) and the signature is
+2. In the other convention a Lorentzian metric is defined as a metric whose diag-
onal elements has only one +1, and thus the diagonal elements of a 4-dimensional
Lorentzian metric reads (1, −1, −1, −1), and the signature is −2. This text adopts
the convention with the +2 signature.
Definition 4 There are three types of vectors in a vector space V with a Lorentzian
metric g: ① any v that satisfies g(v, v) > 0 is called a spacelike vector; ② any v that
satisfies g(v, v) < 0 is called a timelike vector; ③ any v that satisfies g(v, v) = 0
is called a lightlike vector or a null vector.
Remark 4 ① In the convention with the −2 signature, the definitions of spacelike
vectors and timelike vectors are the exact opposite: a spacelike vector is defined as
g(v, v) < 0, while a timelike vector is defined as g(v, v) > 0. Nonetheless, there is
no essential difference: a vector that is timelike in the −2 signature is also timelike
in the +2 signature, and vice versa. ② The zero vector is certainly a null vector, but
not vice versa. Many readers may only be familiar with positive definite metrics, and
may think v = 0 (the zero element) whenever g(v, v) = 0. However, if a metric is
Lorentzian, then g(v, v) = 0 does not necessarily lead to v = 0 (the zero element
is unique, while there are infinitely many null vectors). Nonzero 4-dimensional null
vectors play a significant role in relativity. For instance, it is convenient to use them
to describe the propagation of electromagnetic waves and gravitational waves.
A metric g is a tensor of type (0, 2), which is a bilinear map from V × V to R,
so ∀v, u ∈ V we have g(v, u) ∈ R, and thus g(v, •) ∈ V ∗ . Given g, we can create
g(v, •) ∈ V ∗ for any v ∈ V , and hence g can be viewed as a linear map from V to V ∗ ,
linearly
i.e., g : V −−−→ V ∗ , which is an isomorphism (the proof is left as Exercise 2.15).
Therefore, V acquires a natural, distinguishing isomorphism from V to V ∗ after a
metric is assigned to it, using which we can naturally identify V and V ∗ . Summary:
V is naturally identified with V ∗∗ whether or not there is a metric; if there is a metric,
then V can also be identified with V ∗ .
Now we return to a manifold M.
Definition 5 A symmetric, everywhere non-degenerate tensor field of type (0, 2) is
called a metric tensor field.
Remark 5 In this text, we only care about metric fields each of which has a signature
that is the same everywhere.
One of the uses of a metric field is to define the arc length of a curve. First, we
discuss a 2-dimensional Euclidean space. Suppose the parametric equation of a curve
C(t) in the natural coordinate system {x, y} is x = x(t), y = y(t), then the square
of the length of a curve segment dl 2 [short for (dl)2 ] is
dl = |T |dt , (2.5.2)
l= |T |dt . (2.5.3)
The equation above can be generalized to any manifold M with a positive definite
metric field g. Suppose C(t) is an√arbitrary C 1 curve on M and T is its tangent
vector, i.e., T ≡ ∂/∂t, then |T | = g(T, T ), and hence the arc length of C(t) can
be naturally written as
l := g(T, T )dt . (2.5.4)
For a manifold M with a Lorentzian metric field g one should pay attention to the
type of a curve before defining its arc length. If the tangent vector at each point of a
C 1 curve C(t) is spacelike, then C(t) is called a spacelike curve. Similarly, we can
define a timelike curve and a null curve. The arc length of spacelike and null curves
are also defined by (2.5.4) (and thus the arc length of a null curve is always zero).
Note that for a timelike curve√ we have g(T, T ) < 0, so the length of a segment of
the curve is defined as dl := −g(T, T )dt. Thus, we have the following definition:
Definition 6 Suppose a manifold M has a Lorentzian metric field g, then the arc
length of a spacelike, null or timelike curve C(t) can be defined as
∂
l := |g(T, T )|dt , where T ≡ . (2.5.5)
∂t
As for the arc length of an outlandish curve that can turn from spacelike into
timelike (or the other way round), we will leave it undefined. Although the following
discussion about arc length is for the Lorentzian metrics, it also applies to positive
definite metrics (if we consider all curves as spacelike curves).
It is not difficult to show that (Exercise 2.16) the arc length of a curve is inde-
pendent of its parametrization; that is, the reparametrization (which keeps the image
unchanged and adjusts the parameter) of a curve does not change the arc length of the
curve. In addition, since the definition of arc length (Definition 6) does not involve a
coordinate system, the arc length is certainly independent of the coordinate system.
However, if the curve lies inside the coordinate patch of a coordinate system {x μ },
the arc length can also be calculated with the help of the coordinate system. Since
[In the last step, we used that fact that “the coordinate components of a tangent
vector of a curve are equal to the derivative of the parametric equation of the curve
in this system with respect to the parameter” (Theorem 2.2.4), i.e., T μ = dx μ /dt.]
the length of a line segment is
dl = |gμν dx μ dx ν | . (2.5.6)
then we can read off the components of g in this system as gtt = −x, gx x = 1, gt x =
gxt = 2. Thus, we can see that a given line element (expression) is equivalent to the
given metric field.
Suppose C : I → M is a spacelike or timelike curve, then |T |, the length of the
tangent vector T at an arbitrary point C(t), is a function of t denoted by |T |(t). If we
assign a point C(t0 ) on the curve arbitrarily as the starting point for measuring
t length,
then the curve segment between C(t0 ) and C(t) has the length l(t) = t0 |T |(t )dt ,
which is a increasing function of t. Hence, l can also √ act as the parameter of this
curve, called the arc length parameter. From dl ≡ |g(T, T )|dt we can see that a
tangent vector of a curve with the arc length as its parameter satisfies |g(T, T )| = 1,
namely it has a unit length.
Definition 7 Suppose a metric field g is given on a manifold M, then (M, g) is called
a generalized Riemannian space. (If g is positive definite, it is called a Riemannian
2.5 Metric Tensor Fields 51
then (Rn , δ) is called the n-dimensional Euclidean space, and δ is called the
Euclidean metric.
The equation above indicates that the components of δ in a dual coordinate basis
of the natural coordinate system are
0, μ = ν
δμν = .
+ 1, μ=ν
Therefore, according to (2.5.7), the expression for the line element of the Euclidean
metric in the natural coordinate system should be ds 2 = δμν dx μ dx ν . If n = 2, then
we have ds 2 = (dx 1 )2 + (dx 2 )2 . This is exactly the well-known expression for the
line element of the 2-dimensional Euclidean space. It follows from (2.5.11) that the
natural coordinate basis is orthonormal measured by the Euclidean metric, since from
However, a coordinate system that satisfies (2.5.12) is not necessarily the natural
coordinate system. For example, for 2-dimensional Euclidean space, the coordinate
system defined based on the natural coordinate system {x, y} as follows
has a basis {∂/∂ x , ∂/∂ y } that satisfies (2.5.12) (and thus it is orthonormal). Fur-
thermore, it is not difficult to show that (Exercise 2.17) the coordinate bases
{∂/∂ x , ∂/∂ y } of {x , y } defined by the following three equations also satisfy
(2.5.12):
η := ημν dx μ ⊗ dx ν , (2.5.17)
then (Rn , η) is called the n-dimensional Minkowski space (also known as the n-
dimensional Minkowski spacetime in physics), and η is called the Minkowski
metric.
From Definition 10 we can see that the expression for the line element of
Minkowski space in the natural coordinate system is ds 2 = ημν dx μ dx ν . Take n = 4,
for example, we have ds 2 = −(dx 0 )2 + (dx 1 )2 + (dx 2 )2 + (dx 3 )2 . This is exactly
the well-known expression for the line element of 4-dimensional Minkowski space-
time. It is easy to show that
and thus the natural coordinate basis {∂/∂ x μ } is also orthonormal as measured by
the Minkowski metric. (The 0th coordinate basis vector is normalized to −1, the
others are normalized to 1). However, a basis satisfying (2.5.18) is not necessarily
the basis of the natural coordinate system. For instance, suppose t and x are the
natural coordinates for 2-dimensional Minkowski space, then the coordinate basis
{∂/∂t , ∂/∂ x } of
also satisfies (2.5.18). It is not difficult to verify that (Exercise 2.18) the coordinate
bases {∂/∂t , ∂/∂ x } of {t , x } defined by the following three equations also satisfy
(2.5.18):
dl 2 = dx 2 + dy 2 . (2.5.23)
If a curve is a straight line, the essence of (2.5.23) is then
ds 2 = gμν dx μ dx ν . (2.5.25)
Since both dx μ and dx ν are both dual vectors, their “product” dx μ dx ν can only be the tensor
product “dx μ ⊗ dx ν ”; thus, the right-hand side of (2.5.25) is actually an abbreviation for
gμν dx μ ⊗ dx ν . However, gμν dx μ ⊗ dx ν is nothing but the expansion of the metric tensor
g in the dual coordinate basis, i.e.,
g = gμν dx μ ⊗ dx ν . (2.5.26)
On the other hand, in differential geometry, one cannot find any other interpretation for ds 2
on the left-hand side of (2.5.25), it is actually nothing but another notation for g! Therefore,
we can see that the precise meaning of (2.5.25) turns out to be the tensor equation (2.5.26).
This interpretation is accurate, but also sounds pedantic, and is hard to be popularized. In
contrast, one of the important reasons why (2.5.25) is commonly used is that when using
approximations, dl 2 can be viewed as the square of the length of a line segment, and ds 2 is
nothing but a notation for dl 2 (for spacelike segments) or −dl 2 (for timelike segments). Many
equations in this section can only be understood with this interpretation of approximation.
For instance, if we insist that we use the true, precise definition from differential geometry,
then (2.5.8) should be rewritten as
2.6 The Abstract Index Notation 55
dx μ dx ν
l= gμν dt , (2.5.8 )
dt dt
where t is the parameter of the curve we are talking about. Unlike (2.5.8), each symbol in
this equation has a precise meaning; for example, dx μ /dt is the μth coordinate component
of a tangent vector of the curve, while dt together with the integral sign indicate that the
variable of integration is t.
[The End of Optional Reading 2.5.2]
There are two common ways to express a tensor. The first one is using a letter without
any index (such as T ) to represent a tensor, though this contains two drawbacks: ① one
cannot tell the type of a tensor; ② it is not easy to state that a contraction is between
which upper slot and which lower slot. (The symbol Cij T we used before is only
temporary, it is not convenient to use in computations.) The second notation is to use
the components (such as T μν ρ ) to represent a tensor, and to use the equalities obeyed
by the components to represent the equalities obeyed by tensors. The equalities of
components are the equalities of numbers, and thus all of the tensor equations in the
literature using this notation are equalities of numbers. This notation can overcome
the two difficulties of the first notation; however, it has a serious disadvantage of itself:
sometimes, one can choose a special basis and obtain a relatively simple equation
relating its components, but this equation only holds for this basis, and cannot be
used to represent the tensor equation in general. We want to know which equations
can and which cannot represent tensor equations, yet this is difficult to tell in this
component notation. To overcome this problem, Roger Penrose created the “abstract
index notation”. The main points are as follows:
1. A tensor of type (k, l) is represented by a letter with k upper indices and l lower
indices, all the indices are lower-case Latin letters, which only indicate the type of
a tensor, and thus are called abstract indices. For example, v a stands for a vector,
in which the upper index a plays the same role as the → in v (and hence one cannot
say a = 1 or a = 2), ωa stands for a dual vector, T ab c stands for a tensor of type
(2, 1), and so on. v b and v a stand for the same vector (i.e., v); however, we should
pay attention to the “balance of indices” when writing an equation. For example, one
can write αu a + v a = wa or αu b + v b = w b , but not αu a + v b = wa .
2. Repeated upper and lower indices represent the contraction between these two
indices; for example,
3. The tensor product symbol is omitted. For instance, suppose T ∈ TV (2, 1),
S ∈ TV (1, 1), then T ⊗ S can be written as T ab c S d e . In the notation without indices,
generally, ω ⊗ μ = μ ⊗ ω, as when acting on (v, u), whether ω acts on u or v
depends on the order of these letters [the first letter in ω ⊗ μ act on the first letter
56 2 Manifolds and Tensor Fields
in (v, u), i.e., ω acts on v]. In the abstract index notation, since repeated upper and
lower indices are assumed to be contracted, ω ⊗ μ(v, u) can be written as either
ωa μb v a u b or μb ωa v a u b [both stand for ω(v)μ(u)]. Since the acting target of both
ωa μb and μb ωa is the same v a u b , we have ωa μb = μb ωa . That is, the letters that
represent tensors can be interchanged assuming their indices travel with them. The
non-commutativity of the order of a tensor product is now manifested by ωa μb =
ωb μa .
4. When we are talking about the components of a tensor, the corresponding
indices are labeled by lower-case Greek letters, such as μ, ν, α, β, etc. (as we
used before). These indices are called component indices or concrete indices, and
we can ask about whether μ = 1 or μ = 2. A basis expansion of a tensor T =
T μν σ eμ ⊗ eν ⊗ eσ ∗ can now be written as
[the lower index c of (eσ )c has already indicated that it is a dual basis vector, so there
is no need to write (eσ ∗ )c ] while T μν σ = T (eμ∗ , eν∗ ; eσ ) can now be written as
Note that the indices of both (2.6.1) and (2.6.2) (whether abstract or concrete) are
“balanced”. Suppose T ∈ TV (0, 2), then T should be denoted by Tab . Let eμ be the
μth basis vector of a basis, then from (2.4.7) we can see that T (• , eμ ) = C12 (T ⊗ eμ ),
and since T ⊗ eμ should be denoted by Tab (eμ )c using the abstract index notation,
T (• , eμ ) should be denoted by Tab (eμ )b , also abbreviated as Taμ , i.e.,
This is an expression with both abstract and component indices; we may consider
Ta1 , . . . , Tan as n dual vectors, where Taμ stands for “the μth dual vector”.
5. From the “multifaceted view of tensors”, we can see that a tensor of type
(1, 1) T a b on V can be viewed either as a linear map from V to V or a linear
map from V ∗ to V ∗ . That is, T a b acting on a vector v b ∈ V still returns a vector,
denoted by u a ≡ T a b v b ∈ V , while T a b acting on a dual vector ωa ∈ V ∗ still returns
a dual vector, denoted by μb ≡ T a b ωa ∈ V ∗ . Actually, it can be seen at a glance
from the abstract index notation that T a b v b and T a b ωa are a vector and a dual vector,
respectively. Thus, the abstract index notation is a simple and intuitive representation
of the “multifaceted view of tensors”. Using δ a b to represent the identity map from
V to V , i.e., δ a b v b := v a ∀v b ∈ V , we can easily see that it is also an identity map
from V ∗ to V ∗ , i.e., δ a b ωa = ωb ∀ωa ∈ V ∗ . It is not difficult to further show that
(exercise) the result of δ a b contracting with any tensor is substituting the upper index
b of that tensor with a (or substituting the lower index a with b), such as δ a b Tac = Tbc ,
δ a b T cb e = T ca e . Suppose {(eμ )a } is a basis of V , and {(eμ )a } is the dual basis, then
This is a tensor of type (1, 1); to prove it, we only need to verify that the result of each
side acting on an arbitrary vector v a is the same (exercise). Suppose {(eμ )a } is a basis
of V and {(eμ )a } is the dualbasis, then the components of δ a b in this basis δ μ ν ≡
+ 1, (μ = ν)
δ a b (eμ )a (eν )b satisfy δ μ ν = . The proof is very simple: taking δ 1 1
0, (μ = ν)
as an example, δ 1 1 = δ a b (e1 )a (e1 )b = (e1 )a (e1 )a = 1. Note that δ 0 0 = +1 even for
the Lorentzian signature.
6. Since a metric g ∈ TV (0, 2), it should be denoted by gab . Suppose v ∈ V , then
g(• , v) ∈ V ∗ (see the paragraph after Example 1 in Sect. 2.4). Regarding g as the T
in (2.4.7), we get g(• , v) = C12 (g ⊗ v) = C12 (gab v c ) = gab v b ; hence, g(• , v) should
be denoted by gab v b . Further, when there is a metric g, V is identified with V ∗ under
the isomorphism g : V → V ∗ , and gab v b ≡ g(• , v) is exactly the image of v a under
this map. Hence gab v b should be identified with v a , and may just simply be denoted
by va (which can be taken as a definition of va ). That is, although mathematically
speaking v a and va are two different types of objects (a vector and a dual vector), in
application they represent the same thing (and thus both are denoted by v). Thus, we
usually write
va = gab v b . (2.6.5)
ωa = g ab ωb . (2.6.6)
Equations (2.6.5) and (2.6.6) indicate that one can use gab and g ab to “raise” and
“lower” the upper and lower indices, respectively. These operations of raising and
lowering indices are applicable for any abstract index in any tensor. For instance, a
tensor T of type (1, 1) can be denoted by T a b in abstract index notation, and lowering
the index using the metric is actually performing the tensor product and contraction
between g and T to obtain a tensor of type (0, 2), g(•, eμ ) ⊗ T (eμ∗ ; •), which is
denoted by Tab in abstract index notation, i.e., Tab ≡ gac T c b .
Using (2.6.6) and (2.6.5) in turn we have
ωa = g ab ωb = g ab (gbc ωc ) , ∀ωa ∈ V ,
and hence
g ab gbc = δ a c , (2.6.7)
where the third equality is because (eν )b gρσ (eρ )b = δ ρ ν gρσ = gνσ , and hence
g μν gνσ = δ μ σ . (2.6.8)
The above equation indicates that the matrix formed by the components gμν of
the metric gab in any basis is invertible (whose inverse is the matrix formed by
the components g μν of the inverse metric g ab in the same basis), and thus is non-
degenerate. Therefore, the non-degeneracy of gab assures the non-degeneracy of its
matrix (gμν ) in any basis. Conversely, suppose there exist a basis {(eμ )a } and its dual
basis {(eμ )a } such that (gμν ) is non-degenerate, then (gμν ) has an inverse matrix
(g μν ). Let g ab ≡ g μν (eμ )a (eν )b , then it is easy to prove from g μν gνσ = δ μ σ that
g ab gbc = δ a c , and thus gab : V → V ∗ is non-degenerate since it has an inverse map
g ab . (The proof of “the inverse exists ⇒ non-degenerate” is left as an exercise. Hint:
gab : V → V ∗ having an inverse indicates that it is a one-to-one map, while if gab
is degenerate, then, besides the zero element, there would be a v a = 0 in V whose
image is also 0 ∈ V ∗ , which contradicts the fact that gab is one-to-one.)
It is not difficult to see that the upper and lower indices of the components of a
tensor can be raised and lowered using the components of a metric gμν and its inverse
g μν . For instance, we can write gμν v ν as vμ because
As an example of the abstract index notation, here we introduce the abstract index
expression for the 4-dimensional Minkowski metric ηab .
The definition of the Minkowski metric (2.5.17) can be expressed in abstract index
notation as
ηab := ημν (dx μ )a (dx ν )b ,
where {(dx μ )a } is the dual basis of the Lorentzian coordinate system. If we use
{t, x, y, z} to represent {x 0 , x 1 , x 2 , x 3 }, then since the only nonzero ημν are η00 = −1
and η11 = η22 = η33 = 1, the equation above can be expressed as
ηab = −(dt)a (dt)b + (dx)a (dx)b + (dy)a (dy)b + (dz)a (dz)b , (2.6.9a)
2.6 The Abstract Index Notation 59
ηab = −(dt)a (dt)b + (dr )a (dr )b + r 2 (dθ )a (dθ )b + r 2 sin2 θ (dϕ)a (dϕ)b ,
(2.6.9b)
which corresponds to the line element ds 2 = −dt 2 + dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 ).
In much of the literature that does not use the abstract index notation, the compo-
nent indices in a 4-dimensional spacetime and a 3-dimensional Riemannian space are
denoted by Greek letters μ, ν, . . . (each can be 0, 1, 2, 3) and Latin letters i, j, k, . . .
(each can be 1, 2, 3), respectively. According to what we mentioned previously, the
Latin indices in this text are supposed to represent abstract indices. However, in order
to distinguish the component indices of 4 dimensions and 3 dimensions, we allow
one exception: whenever we discuss a 3-dimensional Riemannian space, the Latin
letters that start from i (i, j, k, . . . ) are component indices (each can be 1, 2, 3), and
the other Latin letters (such as a, b, c, etc.) are still abstract indices. For example, a
3-dimensional vector v can be expressed as v a = v i (∂/∂ x i )a (i is summed from 1 to
3).
In the abstract index notation, a coordinate basis vector is denoted by (∂/∂ x μ )a ,
and a dual coordinate basis vector is denoted by (dx μ )a . Using a metric gab and its
inverse g ab to raise and lower their indices, respectively, we obtain a dual vector
gab (∂/∂ x μ )b and a vector g ab (dx μ )b . Denote the gab (∂/∂ x μ )b by ωa for short and
expand it using the dual coordinate basis as gab (∂/∂ x μ )b = ων (dx ν )a . Applying both
sides to (∂/∂ x σ )a yields gσ μ = ωσ ; hence,
When gab = δab (Euclidean metric) and {x μ } is a Cartesian coordinate system, these
two equations above can be simplified as
1 1
T(ab) := (Tab + Tba ) , T[ab] := (Tab − Tba ) ,
2 2
generally, the symmetric and antisymmetric parts of a tensor Ta1 ...al of type (0, l) are
defined as
1
T(a1 ...al ) := Taπ(1) ...aπ(l) , (2.6.13)
l! π
1
T[a1 ...al ] := δπ Taπ(1) ...aπ(l) , (2.6.14)
l! π
where π represents a permutationof (1, . . . , l), π(1) stands for the first number in
the permutation described by π , π represents the summation of all permutations,
2.6 The Abstract Index Notation 61
1
T(a1 a2 a3 ) := (Ta1 a2 a3 + Ta3 a1 a2 + Ta2 a3 a1 + Ta1 a3 a2 + Ta3 a2 a1 + Ta2 a1 a3 ) ,
6
1
T[a1 a2 a3 ] := (Ta1 a2 a3 + Ta3 a1 a2 + Ta2 a3 a1 − Ta1 a3 a2 − Ta3 a2 a1 − Ta2 a1 a3 ) .
6
i.e., any term in the expansion of T(a1 ...al ) (l! terms in total) equals Ta1 ...al ; for example,
i.e., any even permutation term in the expansion of T[a1 ...al ] equals Ta1 ...al , and any
odd permutation term equals −Ta1 ...al ; for example,
Similar conclusions also hold for the tensors of type (k, 0) such that (the upper
indices) are totally symmetric and totally antisymmetric.
Proof We only take the case of l = 3 as an example. For the other integer values of
l, one can prove it in the same manner.
(a) From Tabc = T(abc) we have Tacb = T(acb) (the latter equation is nothing but a
result of changing the abstract indices for both sides of the former one), and since
T(acb) = T(abc) (which is manifest from the definition of T(abc) ), we have Tacb =
T(acb) = Tabc . The other equalities of the right-hand side of (2.6.16) can be proved
likewise.
(b) From Tabc = T[abc] we have Tacb = T[acb] = −T[abc] = −Tabc . The other
equalities of the right-hand side of (2.6.18) can be proved likewise.
62 2 Manifolds and Tensor Fields
In the future we will often deal with the computations that involve parentheses
and square brackets, and the theorem below will bring great convenience for many
computations.
Theorem 2.6.2 (a) The brackets are “contagious” in a contraction process, i.e.,
T[a1 ...al ] S a1 ...al = T[a1 ...al ] S [a1 ...al ] = Ta1 ...al S [a1 ...al ] , (2.6.19)
1
T[[ab]c] = T[abc] , where T[[ab]c] ≡ (T[abc] − T[bac] ) . (2.6.20)
2
(c) A pair of brackets inside a pair of the other kind of brackets yields zero; for
example,
T[(ab)c] = 0 , T(a[bcd]) = 0 . (2.6.21)
(d) The contraction of different kinds of brackets yields zero; for example,
(e)
Similar conclusions also hold for the tensors of type (k, 0) such that the upper indices
are totally symmetric or totally antisymmetric.
Proof The proof of (a), (b), (c) are left as exercises. (d) is a corollary of (a) and (c),
and (e) is a corollary of (c).
Exercises
˜2.1. Show that the homeomorphism ψi± defined in Example 2 of Sect. 2.1 sat-
isfies the compatibility condition on all the overlap regions of Oi± , which
verifies that S 1 is indeed a 1-dimensional manifold.
2.2. Deduce that an n-dimensional vector space can be regarded as an n-
dimensional trivial manifold.
2.3. Suppose X and Y are topological spaces, f : X → Y is a homeomorphism.
If X is also a manifold, define a differential structure for Y such that f :
X → Y is upgraded to a diffeomorphism.
2.6 The Abstract Index Notation 63
˜2.4. Suppose x, y are the natural coordinates of R2 , C(t) is a curve whose para-
metric equations are x = cos t and y = sin t, t ∈ (0, π ). If p = C(π/3),
write down the components of the tangent vector of the curve at p in the
natural coordinate basis, and sketch this curve as well as this tangent vector.
2.5. Suppose the tangent vectors of two curves C(t) and C (t) = C(2t0 − t) at
C(t0 ) = C (t0 ) are v and v , respectively. Show that v + v = 0.
˜2.6. Suppose O is the coordinate patch of the coordinate system {x μ }, p ∈ O,
v ∈ V p , v μ are the coordinate components of v. Regarding x μ as a C ∞
function on O, show that v μ = v(x μ ). Hint: act both sides of v = v ν X ν on
a function x μ .
2.7. Suppose M is a 2-dimensional manifold, (O, ψ) and (O , ψ ) are two coor-
dinate systems on M whose coordinates are x, y and x , y , respectively,
and the coordinate transformation on O ∩ O is x = x, y = y − x ( =
constant). Write down the expression for the expansion of ∂/∂ x and ∂/∂ y
in terms of ∂/∂ x and ∂/∂ y .
˜2.8. (a) Show that [u, v] in (2.2.9) pointwisely satisfies the two conditions in
the definition of a vector (Definition 2 in Sect. 2.2), and thus is a vector
field. (b) Suppose u, v, w are smooth vector fields on M. Show that
[[u, v], w] + [[w, u], v] + [[v, w], u] = 0 (this is called the Jacobi identity) .
˜2.9. Suppose r, ϕ are the polar coordinates on an open set (the coordinate patch)
in R2 , x and y are natural coordinates.
(a) Write down the expression for the expansion of the polar coordinate
basis ∂/∂r and ∂/∂ϕ (as vector fields on the coordinate patch) in terms of
∂/∂ x and ∂/∂ y.
(b) Derive the expression for the expansion of a vector [∂/∂r, ∂/∂ x] in
terms of ∂/∂ x and ∂/∂ y.
(c) Set êr ≡ ∂/∂r , êϕ ≡ r −1 ∂/∂ϕ. Derive the expression for the expansion
of [êr , êϕ ] in terms of ∂/∂ x and ∂/∂ y.
˜2.10. Suppose u, v are vector fields on M. Show that the components of [u, v]
in any coordinate basis satisfy
∂v μ ∂u μ
[u, v]μ = u ν ν
− v ν ν . Hint: use (2.2.3 ) and (2.2.3).
∂x ∂x
˜2.14. Suppose C11 T and (C11 T ) are contractions of a tensor T of type (2, 1)
defined in two different basis {eμ } and {eμ }. Show that (C11 T ) = C11 T .
*˜2.15. Suppose g is a metric of V . Show that g : V → V ∗ is an isomorphism (see
the hint for Exercise 2.13).
˜2.16. Show that the arc length of a curve does not depend on the parametrization.
2.17. Suppose {x, y} is a Cartesian coordinate system of 2-dimensional Euclidean
space. Show that {x , y } defined by (2.5.14) is also a Cartesian system.
2.18. Suppose {t, x} is a Lorentzian coordinate system of 2-dimensional
Minkowski space. Show that {t , x } defined by (2.5.20) is also a Lorentzian
system.
˜2.19. (a) Using the tensor transformation law, derive all the components gμν of the
3-dimensional Euclidean metric in a spherical coordinate system. (b) Given
the expression for the line element of the 4-dimensional Minkowski metric
in a Lorentzian system ds 2 = −dt 2 + dx 2 + dy 2 + dz 2 , derive all the com-
ponents of g and its inverse g −1 in a new coordinate system {t , x , y , z },
denoted by gμν and g μν . This new coordinate system is defined as follows:
2.24. Suppose Tab is a tensor of type (0, 2) on a vector space V . Show that
Tab v a v b = 0, ∀v a ∈ V ⇒ Tab = T[ab] . Hint: express v a as the sum of two
arbitrary vectors u a and wa .
2.25. Show that Tabcd = Ta[bc]d = Tab[cd] ⇒ Tabcd = Ta[bcd] .
Remark (1) The above claim has the following generalization:
The premise above only contains two equal signs, the key point is that
the index b from both T···[a···b]···c··· and T···a···[b···c]··· are inside the square
brackets.
(2) Both the original and generalized claims will still hold when changing
the square brackets in the premise and conclusion to parentheses.
References
Hawking, S. W. and Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
Kline, M. (1980), Mathematics: The Loss of Certainty, Oxford University Press, New York.
Sachs, R. K. and Wu, H. (1977), General Relativity for Mathematicians, Spinger-Verlag, New York.
Schutz, B. F. (1980), Geometrical Methods of Mathematical Physics, Cambridge University Press,
Cambridge.
Spivak, M. (1970), A Comprehensive Introduction to Differential Geometry, Vol. I, II, Publish or
Perish INC, Berkeley.
Straumann, N. (1984), General Relativity and Relativistic Astrophysics, Spinger-Verlag, Berlin.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Chapter 3
The Riemann (Intrinsic) Curvature
Tensor
In Euclidean space there is a familiar derivative operator ∇, the action of which on,
for example, a function (scalar field) f yields a vector field ∇ f (gradient) and on
a vector field v (with contraction) it yields a scalar field ∇ · v (divergence). Since
there exists a Euclidean metric δab , a vector va can be naturally identified with a dual
vector va = δab vb . Now we want to generalize ∇ to an arbitrary manifold that may
not have a metric, so we need to distinguish vectors and dual vectors. It has been
shown that ∇ behaves more like a dual vector after being generalized, and hence
should be denoted by ∇a . Actually, ∇ itself is an operator, which is neither a vector
nor a dual vector; by regarding ∇ as a dual vector, we mean that the result of it acting
on a function f is a dual vector ∇a f . More generally, the result of ∇ acting on a
tensor field of type (k, l) is a tensor field of type (k, l + 1). Therefore, we have the
following definition:
Definition 1 Use F M (k, l) to represent the collection of all C ∞ tensor fields of type
(k, l) on a manifold M. [A function f can be viewed as a tensor field of type (0, 0)
(scalar field), and hence F M (0, 0) ≡ F M .] A map ∇ : F M (k, l) → F M (k, l + 1)
is called a derivative operator1 on M if it satisfies the following conditions:
(a) Linearity:
∇a (αT b1 ···bk c1 ···cl + β S b1 ···bk c1 ···cl ) = α∇a T b1 ···bk c1 ···cl + β∇a S b1 ···bk c1 ···cl
∇a (T b1 ···bk c1 ···cl S d1 ···dk e1 ···el ) = T b1 ···bk c1 ···cl ∇a S d1 ···dk e1 ···el + S d1 ···dk e1 ···el ∇a T b1 ···bk c1 ···cl
1F (k, l) can be relaxed to the collection of all C 1 tensor fields of type (k, l); that is, ∇a can act on
an arbitrary tensor field of class C 1 .
© Science Press 2023 67
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_3
68 3 The Riemann (Intrinsic) Curvature Tensor
∇a (vb ωb ) = vb ∇a ωb + ωb ∇a vb ,
which requires condition (c) since the derivation of this equation reads
where we used condition (c) in the second step (see Sect. 2.4 for a refresher on the
operation of C).
(2) The function v( f ) on the left-hand side of condition (d) should not be denoted
by va ( f ) since it may be easily mistaken for a vector field. This is one of the few cases
where we should but we do not put on an abstract index. To understand condition
(d), one can use ∇ in Euclidean space as an example. Suppose va is a vector field in
Euclidean space whose expansion in the Cartesian coordinates is
f = va ∇a f .
v( f ) = v1 (∂ f /∂ x) + v2 (∂ f /∂ y) + v3 (∂ f /∂z) = v · ∇
∇a f = (d f )a , ∀ f ∈ FM , (3.1.1)
where (d f )a is the abstract index expression for a dual vector field d f generated by
a function f [see (2.3.7)].
(4) In general relativity, a derivative operator also satisfies ∇a ∇b f = ∇b ∇a f ,
∀ f ∈ F M . From Definition 1 in Sect. 2.6 we can see that this is essentially the
abstract index expression for
which means ∇∇ f is a symmetric tensor of type (0, 2). A derivative operator that
satisfies this additional condition is called a torsion-free derivative operator. Unless
stated otherwise, all ∇a in this text will stand for torsion-free derivative operators.
[Optional Reading 3.1.1]
This optional reading has the same spirit as Optional Reading 2.2.1. For the sake of concise-
ness, here we abbreviate a tensor field T b1 ···bk c1 ···cl as T .
Proof The proof is similar to that of Theorem 2.2.1, and should be carried out by the
reader.
For any manifold, there always exists a derivative operator that satisfies Defini-
tion 1 [see Theorem 1.1 in Chap. 4 of Chern et al. (1999)]. In fact, derivative operators
on a manifold not only exist, but also they are numerous. Now we will discuss how
many there can be. From (3.1.1) we know that two different derivative operators ∇a
and ∇˜ a acting on the same function give the same result, i.e.,
∇a f = ∇˜ a f = (d f )a , ∀ f ∈ FM . (3.1.2)
Thus, the difference between ∇a and ∇˜ a can only be manifested by the action on a
tensor field not of type (0, 0). First we discuss the action on a tensor field of type
(0, 1) (a dual vector field). Suppose a dual vector μb ∈ V p∗ is given at a point p ∈ M,
and consider two arbitrary dual vector fields ωb , ωb ∈ F M (0, 1) on M that satisfy
ωb | p = ωb | p = μb (ωb and ωb are called two extensions of μb on M). Suppose ∇a is
a derivative operator on M, then ∇a ωb | p and ∇a ωb | p are not the same in general. This
is similar to the fact that two functions f (x) and f (x) that have the same value at x0
[i.e., f (x0 ) = f (x0 )] are not assured to have (d f /dx)|x0 = (d f /dx)|x0 . However,
we are about to show that for any two derivative operators ∇a and ∇˜ a on M, as long
as ωb | p = ωb | p , we have
Since p is chosen arbitrarily, the difference between two derivative operators ∇a and
∇˜ a on M is manifested by a tensor C c ab of type (1, 2); that is:
Theorem 3.1.3
∇a being torsion free will give rise to the following symmetry of the tensor field
C c ab :
Theorem 3.1.4 C c ab = C c ba .
Theorem 3.1.5
∇a (ωb vb ) = ωb ∇a vb + vb ∇a ωb = ωb ∇a vb + vb (∇˜ a ωb − C c ab ωc ) ,
where we used (3.1.6) in the last step. On the other hand, ∇˜ a (ωb vb ) = ωb ∇˜ a vb +
vb ∇˜ a ωb . Since ωb vb is a scalar field, it follows from (3.1.2) that ∇a (ωb vb ) =
∇˜ a (ωb vb ), and hence the right-hand sides of the two equations above are equal.
Therefore, we obtain
ωb ∇a vb = ωb ∇˜ a vb + C c ab vb ωc = ωb ∇˜ a vb + C b ac vc ωb , ∀ωb ∈ F M (0, 1) ,
By a similar analysis, one can also show that the difference between the result of
∇a and ∇˜ a acting on a tensor field T b1 ···bk c1 ···cl of type (k, l), i.e., ∇a T b1 ···bk c1 ···cl −
∇˜ a T b1 ···bk c1 ···cl , can be expressed in k + l terms, each of which has a C c ab . In front
of each term there is a + sign if it contracts with an upper index of T , and a − sign
if it contracts with a lower index of T ; for example,
∇a T b c = ∇˜ a T b c + C b ad T d c − C d ac T b d ,
∀T ∈ F M (k, l) .
(3.1.8)
Proof Exercise.
Theorem 3.1.6 indicates that the difference between two arbitrary derivative oper-
ators is only manifested by a tensor field C c ab . Conversely, it is not difficult to verify
that given an arbitrary derivative operator ∇˜ a and a smooth tensor field C c ab with
symmetric lower indices, ∇a defined by (3.1.8) satisfies all of the conditions in
Definition 1, and thus this ∇a is also a derivative operator. Therefore, there exists
numerous derivative operators on a manifold as long as there is one. A manifold with
a chosen derivative operator can be denoted by (M, ∇a ), and this combination has
72 3 The Riemann (Intrinsic) Curvature Tensor
more structure than M itself (∇a provides additional structure); for instance, we can
now talk about the parallel transport of a vector along a curve (see Sect. 3.2) and the
curvature of (M, ∇a ) (see Sect. 3.4).
Suppose {x μ } is a coordinate system of M, the coordinate basis and dual basis of
which are {(∂/∂ x μ )a } and {(dx μ )a }. Define a map ∂a : F O (k, l) → F O (k, l + 1) on
the coordinate patch of O as follows [we only write down the case T b c ∈ F O (1, 1)
as an example]:
∂a T b c := (dx μ )a (∂/∂ x ν )b (dx σ )c ∂μ T ν σ , (3.1.9)
where T ν σ are the components of T b c in this coordinate system, and ∂μ is short for
∂/∂ x μ , the partial derivative with respect to a coordinate x μ . It is not difficult to verify
that ∂a satisfies all of the conditions in Definition 1 plus the torsion-free condition,
and thus ∂a is a torsion-free derivative operator on O. This is a derivative operator
that by definition depends on the coordinate system, and it is only defined in the
coordinate patch of this coordinate system, called the ordinary derivative operator
of this coordinate system. Equation (3.1.9) indicates that ∂μ T ν σ are the components
of ∂a T b c in this coordinate system, and therefore the definition of ∂a can also be
formulated as: the coordinate components of the ordinary derivative ∂a T bl ···bk c1 ···cl
of a tensor field T bl ···bk c1 ···cl are equal to ∂(T ν1 ···νk σ1 ···σl )/∂ x μ , the derivatives of the
coordinate components of this tensor field with respect to the coordinates. Thus, we
can easily see that:
(1) ∂a of any coordinate system acting on a coordinate basis vector and a dual
coordinate basis vector of this system yields zero, i.e.,
(2) ∂a satisfies a much stronger condition than the torsion-free condition, i.e.,
tensor transformation law under a coordinate transformation, and hence do not con-
stitute a tensor. From the very beginning, however, we define a Christoffel symbol as
a tensor, which is a multilinear map, but since it corresponds to ∂a which depends on
the coordinate system, a Christoffel symbol is a tensor associated with the coordinate
system (the tensor itself will change under a coordinate transformation). Suppose ∇a
is a derivative operator assigned on M, {x μ } and {x μ } are two coordinate systems
on M, the intersection of their coordinate patches is U , and the Christoffel symbols
of ∇a in these two systems are c ab and ¯ c ab , respectively. As tensors, they can be
expressed as components (in U ) using the {x μ } system or the {x μ } system. Suppose
the components of c ab in the {x μ } and {x μ } systems are { σ μν } and { σ μν } (these
two arrays of numbers certainly satisfy the tensor transformation law), and the com-
ponents of ¯ c ab in the {x μ } and {x μ } systems are { ¯ σ μν } and { ¯ σ μν } (which also
satisfy the tensor transformation law); however, { σ μν } and { ¯ σ μν } do not satisfy
the tensor transformation law. Nevertheless, textbooks normally just define { σ μν }
and { ¯ σ μν } to be the Christoffel symbols in the coordinate systems {x μ } and {x μ },
respectively, and therefore there is no doubt that they do not constitute a tensor. It
is right for those books to emphasize “a Christoffel symbol is not a tensor”, but we
instead emphasize that “a Christoffel symbol is a tensor associated with a coordinate
system”. The reader may ask: why do you have to describe a Christoffel symbol as
a tensor? The answer is: as long as we use the abstract index notation and follow
the above reasoning (including the elegant argument from the “multifaceted view of
tensor”), surely we need to admit that C c ab is a tensor that reflects the difference
between ∇a and ∇˜ a . Under the premise that a derivative operator has been assigned
to M, for a given coordinate system there is a derivative ∂a , and if we regard ∂a as ∇˜ a ,
then C c ab (which is now denoted by c ab ) is, of course, a tensor. It would be a slap in
the face if we do not admit that c ab is a tensor. However, at the same time, we should
emphasize that c ab is a tensor associated with a coordinate system. (There are as
many ∂a , and thus as many c ab , as there are coordinate systems). This emphasis is
essentially the same as the emphasis in many books that say “a Christoffel symbol is
not a tensor”. They are just two ways of wording the same issue. What is important
is not how it is worded but the substance of it, i.e., we should keep in mind that it
does not satisfy the tensor transformation law between { σ μν } and { ¯ σ μν }.
where vν ,μ ≡ ∂μ vν ≡ ∂vν /∂ x μ (the comma stands for the partial derivative). Again,
textbooks often emphasize that vν ,μ does not constitute a tensor, while we say that
∂a vb is a tensor field associated with the coordinate system; they are also just two
ways of wording the same issue. More specifically, suppose ∂a and ∂a are the ordinary
derivative operators of two coordinate systems {x μ } and {x μ }, respectively, then
usually ∂a vb = ∂a vb (that is why ∂a vb is a tensor field associated with the coordinate
system). If we expand ∂a vb and ∂a vb in terms of their own coordinate basis:
74 3 The Riemann (Intrinsic) Curvature Tensor
vν ;μ = vν ,μ + ν
μσ v
σ
, ων;μ = ων,μ − σ
μν ωσ , (3.1.11)
where vν and ων are the components of any vector field and dual vector field in an
arbitrary coordinate basis, ν μσ are the components of the Christoffel symbol of this
system in this basis. (Many books may say “ ν μσ is the Christoffel symbol of this
system”; later on, we will also say this for simplicity.)
∇a δ b c = 0 , (3.1.12)
where δ b c is a tensor field of type (1, 1), whose definition at each point p ∈ M is
δ b c vc = vb , ∀vc ∈ V p .
∇a v b = ∇a (δ b c v c ) = ∇a [C(δ b c v d )] = C[∇a (δ b c v d )]
= C(v d ∇a δ b c + δ b c ∇a v d ) = vc ∇a δ b c + δ b c ∇a v c = v c ∇a δ b c + ∇a v b ,
where C stands for the contraction of indices c and d; in the third equality we used condition
(c) and in the last step we used δ b c Ta c = Ta b ∀Ta c . The above equation indicates that
vc ∇a δ b c = 0 ∀v c ∈ F M (1, 0), and therefore ∇a δ b c = 0.
(B) Suppose ∇˜ a satisfies conditions (a), (b), (d) in Definition 1 and (3.1.12). We would like
to show that it also satisfies condition (c). For this propose, suppose ∇a satisfies all of the
conditions in Definition 1. Since the proof of Theorem 3.1.2 does not need condition (c),
(3.1.6) is satisfied. We cannot use Theorem 3.1.4 directly since the proof of it needs condition
(c); however, one can still prove it from the properties of ∇a and ∇˜ a (motivated readers may
try it as a challenging exercise), and therefore we have (3.1.8). From this, using the fact that
∇a satisfies (c) we can show that ∇˜ a satisfies condition (c).
3.2 Derivative and Parallel Transport of a Vector Field Along a Curve 75
The commutator [u, v]a of two vector fields on M does not require M to have any
additional structure [see (2.2.9)]; however, the inconvenience of this equation is that
it cannot be isolated by the object it acts on (a scalar field f ). Now, after we have the
concept of a derivative operator, we can write the explicit expression of a commutator
of vector fields [u, v]a by means of an arbitrary torsion-free derivative operator, as
shown in the following theorem:
Theorem 3.1.9
[u, v]a = u b ∇b va − vb ∇b u a , (3.1.13)
where in the second step we used condition (d) of a derivative operator, in the third
step we used conditions (b) and (c), and in the fourth step we used the torsion-
free condition. Finally, using (d) again, namely [u, v]( f ) = [u, v]a ∇a f , we arrive at
(3.1.13).
Remark 4 Choose the derivative operator ∂b of an arbitrary coordinate system {x μ }
as ∇b from (3.1.13), then we have
Sect. 3.2.3 for details). Thus, Definition 1 can also be interpreted as: a necessary and
sufficient condition for va to be parallelly transported along C(t) is that the derivative
of it along T b vanishes.
Theorem 3.2.1 Suppose a curve C(t) is in the coordinate patch of a coordinate sys-
tem {x μ } and the parametric representation of the curve is x μ (t). Let T a ≡ (∂/∂t)a ,
then a vector va along C(t) satisfies
Proof Let ∂a be the ordinary derivative operator of the coordinate system {x μ }, then
it follows from (3.1.7) that
T b ∇b va = T b (∂b va + a
bc v
c
) = T b [(dx ν )b (∂/∂ x μ )a ∂ν v μ + a
bc v
c
]
ν μ a μ ν
= T (∂/∂ x ) (∂v /∂ x ) + a
bc T v = (∂/∂ x ) [T (∂v /∂ x ν ) +
b c μ a ν μ μ
νσ T
ν σ
v ],
(3.2.2)
where T ν are the coordinate components of the tangent vector T b of that curve. From
(2.2.7) we know that T ν = dx ν (t)/dt, and hence
where the key step (the third equality) is because v̄ μ (t) [the function of one variable that
comes from the combination of a vector v̄a on U and C(t)] is equal to v μ (t). In conclusion,
∇b va | p is meaningless, but T b ∇b va | p is meaningful.
[The End of Optional Reading 3.2.1]
Theorem 3.2.2 A point C(t0 ) on a curve and a vector at this point uniquely defines
a vector field that is parallelly transported along the curve.
Proof If there exists a coordinate system whose coordinate patch contains the whole
curve, then it follows from (3.2.1) that T b ∇b va = 0, the definition of parallel trans-
port, is equivalent to
dvμ μ ν σ
+ νσ T v = 0, μ = 1, . . . , n . (3.2.5)
dt
Suppose p, q ∈ M, then V p and Vq are two vector spaces, and their elements cannot
be compared. However, if there is a curve C(t) that connects p and q, we can define
a map from V p to Vq in the following way: ∀va ∈ V p , from Theorem 3.2.2 we know
that there is a unique parallelly transported vector field on C(t) (whose value at p
is va ), and its value at q can be defined as the image of va . Note that this is a map
that depends on the curve, which means va could be different for another curve that
connects p and q. However, after all, the existence of ∇a in some ways (although
is curve-dependent) connects two vector spaces V p and Vq that were completely
unrelated before. Therefore, ∇a is also called a connection.2
Beginners often raise questions like: why do we call ∇a a derivative operator?
In other words, why is this ∇a some kind of generalization of the familiar ∇ in 3-
dimensional Euclidean space on a general manifold? Why do we interpret T b ∇b va
as the derivative of va along T b ? Why do we call va that satisfies T b ∇b va = 0 a
vector field that is parallelly transported along the curve? In order to answer these
questions, we need Sect. 3.2.2 first.
2For the formal definition of a connection given in terms of the language of fiber bundles, see
Appendix I in Volume III.
78 3 The Riemann (Intrinsic) Curvature Tensor
Until now, no metric has been involved in this chapter, rather, we only assumed a
connection (i.e., a derivative operator) ∇a is assigned to M. If a metric gab is also
assigned to M, then one can talk about the inner product between two vectors. To
make the concept of parallel transport agree with the familiar parallel transport in
Euclidean space, we should add the following requirement: suppose u a and va are
vector fields parallelly transported along C(t), then u a va (≡ gab u a vb ) is a constant on
C(t); that is, the “inner product” of two vectors is invariant under parallel transport.
Suppose T a is the tangent vector field of C(t), then this requirement is equivalent to
A necessary and sufficient condition for the above equation to hold for any curve and
any two vector fields that are parallelly transported along the curve is
∇c gab = 0 . (3.2.6)
When there is no metric, the choice of ∇c is very arbitrary. After a metric is assigned,
we may choose a ∇c that satisfies the additional requirement ∇c gab = 0. Now we
will prove that this requirement determines a unique ∇a .
Theorem 3.2.3 After assigning a metric gab to a manifold M, there exists a unique
∇a such that ∇a gbc = 0.
Proof Suppose ∇˜ a is an arbitrary derivative operator. We want an appropriate C c ab
such that the ∇a determined by it and ∇˜ a satisfies ∇a gbc = 0. From (3.1.8) we have
Similarly, we have
Adding (3.2.7) to (3.2.8) and subtracting (3.2.9), we obtain by using Ccab = Ccba
that
or
3.2 Derivative and Parallel Transport of a Vector Field Along a Curve 79
1 cd ˜
C c ab = g (∇a gbd + ∇˜ b gad − ∇˜ d gab ) . (3.2.10)
2
The combination of this C c ab and ∇˜ a , namely ∇a , is then the solution to the equation
∇a gbc = 0. This must be the unique solution since if ∇a also satisfies ∇a gbc = 0,
treating ∇a as ∇˜ a we can see that C c ab vanishes, which means there is no difference
between ∇a and ∇a .
The ∇a that satisfies ∇a gbc = 0 is called the derivative operator associated (or
compatible) with g bc . From now on, unless stated otherwise, when we talk about
∇a when there is a gab , we will choose it to be the derivative operator associated
with gab . It can be proved that (exercise) ∇a gbc = 0 assures that ∇a g bc = 0 (and
vice versa), which is exceptionally convenient for performing calculations.
Example 1 In Euclidean space, there exist infinitely many derivative operators that
satisfy Definition 1 in Sect. 3.1. However, there is only one derivative operator
that is associated with the Euclidean metric δab , namely the ordinary derivative
operator ∂a of a Cartesian coordinate system {x μ } (all Cartesian systems have
the same ∂a ), since it follows from the definition of δab (2.5.11) that ∂c δab =
(dx σ )c (dx μ )a (dx ν )b ∂σ δμν = 0. For the 3-dimensional Euclidean space, the ∂a of
a Cartesian coordinate system is the familiar ∇ in the standard vector field theory.
σ 1 σρ
μν = g (gρμ,ν + gνρ,μ − gμν,ρ ) . (3.2.10 )
2
1 cd
c
ab = g (∂a gbd + ∂b gad − ∂d gab ) ,
2
σ
μν = c ab (dx σ )c (∂/∂ x μ )a (∂/∂ x ν )b
1
= (dx σ )c (∂/∂ x μ )a (∂/∂ x ν )b g cd (∂a gbd + ∂b gad − ∂d gab )
2
1 1
= g σρ (∂μ gνρ + ∂ν gμρ − ∂ρ gμν ) = g σρ (gρμ,ν + gνρ,μ − gμν,ρ ) .
2 2
First we talk about the simplest case, i.e., Euclidean space. There is one type of
special coordinate system (Cartesian system) in Euclidean space, using which we
can define the absolute (curve-independent) parallel transport of a vector.
Definition 2 A vector ṽ at p in Euclidean space is referred to as the result of a vector
v at q parallelly transported to p if their components in the same Cartesian system
are the same. (NB: The parallel transport for one Cartesian system is the parallel
transport for all Cartesian systems.)
Definition 3 In Euclidean space, the derivative of a vector field v on a curve C(t)
along the curve, denoted by dv/dt, is defined as
dv 1
:= lim (ṽ| p − v| p ) ∀ p ∈ C(t) , (3.2.11)
dt p t→0 t
dvi
the ith component of T b ∂b va = (dx i )a T b ∂b va = T b ∂b [(dx i )a va ] = T b ∂b vi = T (vi ) = .
dt
3.2 Derivative and Parallel Transport of a Vector Field Along a Curve 81
[where we used condition (d) of a derivative operator in the fourth equality, and the
definition of a tangent vector, (2.2.6 ), in the fifth equality.] On the other hand, from
(3.2.11) we can see that
dv 1 i 1 i dvi
the ith component of = lim (ṽ | − v i
| ) = lim (v | − v i
| ) = ,
dt p dt p
p p q p
t→0 t t→0 t
[where we used Definition 2 in the second equality, and the third equality is nothing
but the definition of the derivative of a function vi (t).] Comparing the two equations
we can see that
dv
= T b ∂b v a . (3.2.12)
dt
Generalizing to any curve C(t) on any manifold M with any ∇a , we can naturally
call T b ∇b va the derivative of va along T b [or along C(t)]. Sometimes this derivative
is also denoted by Dva /dt, i.e.,
Dva
≡ T b ∇b v a . (3.2.13)
dt
where T b ∇b va |t and va (s) are short for T b ∇b va |C(t) and va (C(s)), respectively, and ψs,t is
the translation map from vector space VC(s) to VC(t) (see Fig. 3.2). It is not difficult to show
that (Exercise 3.8) ψs,t : VC(s) → VC(t) is an isomorphism.
Suppose ṽa is a vector field parallelly transported along C(t) that is determined by va (s),
then
ṽa (t) = [ψs,t v(s)]a , (3.2.16)
T b ∇b ṽa = 0 . (3.2.17)
The coordinate component expression for (3.2.16) is
On the other hand, by definition, ψt,s is the inverse map of ψs,t , i.e., (ψs,t )μ ρ (ψt,s )ρ ν = δ μ ν ,
and hence
3.3 Geodesics 83
d(ψs,t )μ ρ d(ψt,s )ρ ν
0= (ψt,s )ρ ν + (ψs,t )μ ρ
ds s=t ds s=t
d(ψs,t )μ ν d(ψt,s )μ ν
= + . (3.2.19)
ds s=t ds s=t
Now let us prove (3.2.15). The μth component of the right-hand side of this equation is
d d
[ψs,t v(s)]μ = [(ψs,t )μ ν v ν (s)]
ds s=t ds s=t
d dvν (s)
= μ
(ψs,t ) ν v (t) + (ψs,t )μ ν s=t
ν
ds s=t ds s=t
d dvν (s)
= − (ψt,s )μ ν v ν (t) + δ μ ν
ds s=t ds s=t
dv (s)
μ
= ( μ νσ T σ )|t v ν (t) +
ds s=t
μ σ ν dv (s)
μ
= ( νσ T v )|t + = (T b ∇b va )μ |t ,
ds s=t
where we used (3.2.19) in the third step and (3.2.18) in the fourth step. The right-hand side
of the above equation is the μth component of the right-hand side of (3.2.15), and (3.2.14)
is therefore proved.
3.3 Geodesics
Definition 1 A curve γ (t) on (M, ∇a ) is called a geodesic if its tangent vector field
T a satisfies T b ∇b T a = 0.
Remark 1 ① We can see that a necessary and sufficient condition for a curve to be
a geodesic is that its tangent vector field is parallelly transported along the curve. ②
T b ∇b T a = 0 is called a geodesic equation. ③ Suppose there is a metric field gab
on a manifold M, then the geodesics of (M, gab ) refer to the geodesics of (M, ∇a ),
where ∇a is associated with gab .
Suppose a geodesic γ (t) is located in the coordinate patch of a coordinate system,
then substituting T a for the va in (3.2.5) yields
dT μ μ ν
+ νσ T Tσ = 0, μ = 1, . . . , n .
dt
Suppose x ν = x ν (t) are the parametric equations of γ (t), then T μ = dx μ /dt. Hence,
the equation above can be rewritten as
d2 x μ μ dx ν dx σ
+ νσ = 0, μ = 1, . . . , n . (3.3.1)
dt 2 dt dt
This is the coordinate component expression for a geodesic equation.
84 3 The Riemann (Intrinsic) Curvature Tensor
Theorem 3.3.1 Suppose γ (t) is a geodesic, then the tangent vector field T a
of its
reparametrization γ (t ) [= γ (t)] satisfies
T b ∇b T a
= αT a
[α is a function defined on γ (t)] . (3.3.2)
Proof
a a
∂ ∂
dt dt a
T = a
= = T ,
∂t ∂t
dt dt
2
dt b dt a dt dt dt
0 = T b ∇b T a = T ∇b T = T b ∇b T a + T a T b ∇b
dt dt dt dt dt
2 2 2
dt dt d dt dt d t
= T b ∇b T a + T a = T b ∇b T a + T a 2 ,
dt dt dt dt dt dt
2 2
d2 t d2 t
and hence T b ∇b T a
=− dt
dt dt 2
T a . Set α ≡ − dt
dt dt 2
, then (3.3.2) is satis-
fied.
Theorem 3.3.2 Suppose the tangent vector field T a of a curve γ (t) satisfies
T b ∇b T a = αT a [α is a function on γ (t)], then there exists a t = t (t) such that
γ (t ) [= γ (t)] is a geodesic.
Remark 3 Similar to Theorem 2.2.8, the word “unique” in Theorem 3.3.4 should
also be understood as “locally unique”.
The discussions above do not involve a metric. From now on, we will suppose there
is a metric field gab on M. Since a tangent vector T a is parallelly transported along
a geodesic, and since the self “inner product” gab T a T b of a parallelly transported
vector is a constant, the sign of gab T a T b does not change along the geodesic, which
indicates that geodesics can always be classified as three types: timelike, spacelike
and null (there is no “outlandish” geodesic that can turn from one type into another).
Theorem 3.3.5 The arc length parameter of a (nonnull) geodesic is an affine param-
eter.
Proof Exercise 3.9. Hint: First show that a tangent vector of an affinely parametrized
geodesic has a constant magnitude along the curve.
As we all know, a straight line (segment) is the shortest path between two points in
Euclidean space. Now we will discuss to what extent this conclusion can be applied
to a manifold with a Lorentzian metric (a spacetime).
Remark 4 ① This theorem also holds for any case where gab is positive definite [in
this case the modifier “spacelike (timelike)” is omitted]. ② The meaning of extrem-
izing the arc length is as follows: suppose C is a spacelike (timelike) curve between
86 3 The Riemann (Intrinsic) Curvature Tensor
p and q, then one can add a small modification to it and obtain many spacelike
(timelike) curves that are “infinitely close” to C. Theorem 3.3.6 claims that, a nec-
essary and sufficient condition for a curve C to be a geodesic is that the length of the
curve is an extremum among the lengths of all possible spacelike (timelike) curves.
The condition for a function f (x) of one variable to take an extremum is that its
first order derivative is zero. However, the “argument” corresponds to the length l
(which can be seen as the “function value”) in Theorem 3.3.6 is not a real number
but a curve. Here we are concerned about the change of l when a curve turns into
another curve, and thus l is not a function but a functional. According to the theory
of variations, the necessary and sufficient condition for l to be extremized is that its
variation δl vanishes.
∂gμν σ
δgμν ≡ gμν [x σ (t) + δx σ (t)] − gμν [x σ (t)] = δx (t)
∂xσ
and
dx μ d(x μ + δx μ ) dx μ d(δx μ )
δ ≡ − = ,
dt dt dt dt
which, through (3.3.3), give rise to the following variation of l:
−1/2
1 t2 dx μ dx ν
δl = gμν
2 t1 dt dt
ν
dx μ d dx ν d ∂gμν μ
σ dx dx
× gμν (δx ν ) + gμν (δx μ ) + (δx ) dt.
dt dt dt dt ∂xσ dt dt
Since arc length is independent of the parametrization of a curve, one can choose the most
convenient parameter for the calculation. Theorem 3.3.5 indicates that no matter what the
old parameter is (denoted by t˜ for now), we can always choose a new parameter t = t (t˜)
such that
μ
the length of the tangent vector at each point of the curve is normalized, i.e.,
ν
gμν dxdt dxdt = 1 (namely the arc length parameter). Also, noticing the symmetry of gμν , the
equation above can then be simplified as
3.3 Geodesics 87
ν
t2 dx μ d 1 ∂gμν μ
σ dx dx
δl = gμν (δx ν ) + (δx ) dt
t1 dt dt 2 ∂xσ dt dt
ν
t2 d dx μ ν d dx μ 1 ∂gμν μ
σ dx dx
= gμν δx − gμν δx ν + (δx ) dt
t1 dt dt dt dt 2 ∂xσ dt dt
t2 d dx μ 1 ∂gμν dx μ dx ν
= − gμσ + (δx σ )dt ,
t1 dt dt 2 ∂ x σ dt dt
where in the last step we used the premise that δx σ vanishes at C(t1 ) and C(t2 ). The equation
above indicates that the necessary and sufficient condition for δl to vanish for any δx σ is
that
d dx μ 1 ∂gμν dx μ dx ν
0=− gμσ +
dt dt 2 ∂ x σ dt dt
d2 x μ ∂gμσ dx ν dx μ 1 ∂gμν dx μ dx ν
= − gμσ − ν
+ .
dt 2 ∂ x dt dt 2 ∂ x σ dt dt
Contracting this equation with g ρσ yields
d2 x ρ ρσ 1 dx μ dx ν
0=− − g (gμσ,ν − g μν,σ )
dt 2 2 dt dt
d2 x ρ 1 ρσ dx μ dx ν
=− 2
− g (gσ μ,ν + gνσ,μ − gμν,σ )
dt 2 dt dt
d2 x ρ dx μ dx ν
=− − ρ μν .
dt 2 dt dt
This is exactly the coordinate expression for the geodesic equation (3.3.1).
As a problem
μ ν
for thinking, the reader may consider what result it leads to if we do not set
gμν dxdt dxdt = 1.
the curve. There is certainly no conjugate points in Euclidean space, and therefore a
straight line (segment) is the shortest between two points.
And then we discuss the case where gab is a Lorentzian metric. We first look at
Minkowski spacetime as the simplest example. We have said that straight lines and
geodesics are synonymous in Minkowski spacetime. Suppose p and q are connected
by a timelike geodesic γ . Is it the shortest curve between p and q? No. Since the
length of a null curve is zero, any timelike curve C is not the shortest. One can always
modify it slightly and make it a timelike curve C that is close enough to null whose
length is less than C (see Fig. 3.4). In fact, not only is a timelike geodesic γ not
the shortest, but it is also the longest curve between p and q. Here we show it in
2-dimensional Minkowski spacetime as an example (it can be easily carried over to
an arbitrary dimensional Minkowski spacetime). Since the parametric representation
x μ (t) of γ are linear functions, by performing a translation and a boost [(2.5.19),
(2.5.20)] of the Lorentzian coordinates, we can choose a Lorentzian system {x 0 , x 1 }
that can make the coordinate line of x 0 coincide with γ . Suppose C is an arbitrary
timelike non-geodesic between p and q, we can use a lot of constant-x 0 lines to divide
γ into many line segments (see Fig. 3.5). From the expression for a Minkowski line
element we can see that the arc length of the line segments pa and pb, respectively,
are
dl pa = −ds 2 = −[−(dx 0 )2 + 0] = dx 0 ,
dl pb = −[−(dx 0 )2 + (dx 1 )2 ] < dx 0 = dl pa .
3.3 Geodesics 89
This result can also be applied to any other line segment, and thus lγ > lC , i.e., a
timelike geodesic is the longest timelike curve between two points in Minkowski
spacetime. In other words, a (timelike) straight line (segment) is the longest between
two points in Minkowski spacetime. And since the longest curve must be a geodesic,
the necessary and sufficient condition for a timelike curve between two points in a
Minkowski spacetime to be the longest is that it is a geodesic. Now let us talk about
a general spacetime. Suppose C is the timelike curve between p and q that has the
greatest length, then it follows from Theorem 3.3.6 that it is a geodesic. However,
the converse is not necessarily true, because Theorem 3.3.6 only assures that the
length of a geodesic between p and q is an extremum, but does not guarantee that it
is a maximum. (Of course, it is definitely not a minimum either since the length of
a null curve is zero.) It can be proved that the necessary and sufficient condition for
the length of a geodesic in an arbitrary spacetime to be a maximum is that there is
no pair of conjugate points on the curve. Summary: for two points that are timelike
related in any spacetime: ① the longest curve between them is a timelike geodesic;
② a timelike geodesic between them is not necessarily the longest curve (though for
Minkowski spacetime it certainly is); ③ there is no shortest timelike curve between
them.
[Optional Reading 3.3.1]
Using geodesics we can define two useful concepts; namely, the exponential map of a gen-
eralized Riemannian space (M, gab ) and Riemannian normal coordinates.
The exponential map of p ∈ M is a map from V p (or a subset of it) to a manifold M,
denoted by
exp p : V p (or a subset of it) → M ,
defined as follows: ∀va ∈ V p , ( p, va ) determines a unique geodesic γ (t). If we set the affine
parameter t as zero at p, then the image of va under the map exp p is defined as the point
with t = 1 on the geodesic, i.e., exp p (va ) := γ (1). Suppose 0 is the zero element of V p .
Since the unique geodesic determined by ( p, 0) maps all the points of R (or an interval of
it) to p, we have exp p (0) = p. However, if we remove the point γ (1) from M, i.e., we use
M − {γ (1)} as the background manifold (see Fig. 3.6), then va has no image under the map
exp p . Therefore, the domain of the exponential map can only be a subset of V p , denoted by
90 3 The Riemann (Intrinsic) Curvature Tensor
V̂ p , i.e., exp p : V̂ p → M. Figure 3.7 indicates that two geodesics γ (t) and γ (t) determined
by ( p, va ) and ( p, v a ) intersect at q. Choosing the magnitude of va and v a appropriately,
one can make q = γ (1) = γ (1), so that
Thus, in this case exp p is not a one-to-one map. Since we have removed a point as shown in
Fig. 3.6, there is no u a ∈ V p for q such that q = exp p (u a ); thus, in this case exp p is not an
onto map. However, it can be proved that as long as we add proper constraints on the domain
and the range of exp p , it will be not only one-to-one and onto, but also a diffeomorphism.
See the following theorem:
Theorem 3.3.7 ∀ p ∈ M, one can always find an open subset V̂ p that contains the zero
element in the tangent space V p of p (regarded as an n dimensional manifold), and find an
open subset N of M that contains p such that exp p : V̂ p → N is a diffeomorphism (see
Fig. 3.8).
Proof Without loss of generality, we may consider p = γ (0). Denote v1a ≡ (∂/∂t)a | p , q1 ≡
γ (1), then q1 = exp p (v1a ). Suppose q is an arbitrary point on γ (t), q ≡ γ (tq ). Performing
a reparametrization to γ (t) by choosing a new parameter t = α −1 t (α = constant) yields
a geodesic γ (t ) = γ (t). Choosing an appropriate constant α we can make γ (1) = q, and
hence q = exp p (va ), where
Theorem 3.3.9 The Christoffel symbol of the connection ∇a of (M, gab ) in the Riemannian
normal coordinate system at p satisfies c ab | p = 0.
Proof Any geodesic γ (t) that passes through p can be expressed using the Riemannian
normal coordinate system at p as
d2 x μ μ dx ν dx σ
+ νσ = 0, μ = 1, . . . , n .
dt 2 dt dt
Since a Riemannian normal coordinate system (N , ψ) maps γ (t) into a straight line in Rn ,
we have d2 x μ /dt 2 = 0. Thus,
μ dx ν dx σ
νσ = 0, μ = 1, . . . , n .
dt dt
92 3 The Riemann (Intrinsic) Curvature Tensor
Any geodesic γ (t) that passes through p can be expressed by the above equation. Using T a
to represent the tangent vector of the geodesic at p, then the above equation gives
μ ν
νσ | p T Tσ = 0, μ = 1, . . . , n .
For each μ, the left-hand side of the above equation is a quadratic polynomial with respect
to n variables T ν , and the fact that it vanishes for any T ν renders all the coefficients being
zero, i.e., μ νσ | p = 0, ν, σ = 1, . . . , n, and therefore a bc | p = 0.
Theorem 3.4.2 indicates that (∇a ∇b − ∇b ∇a ) is a linear map that turns a dual
vector ωc | p at p into a tensor [(∇a ∇b − ∇b ∇a )ωc ]| p of type (0, 3). The way of doing
this is: extend ωc | p arbitrarily into a dual vector field ωc defined on a neighborhood
of p, evaluate (∇a ∇b − ∇b ∇a )ωc , and then taking the value of it at p we obtain
the image of the map. Theorem 3.4.2 assures that this image does not depend on
the choice of extension. Therefore, (∇a ∇b − ∇b ∇a ) corresponds to a tensor of type
(1, 3) at p, called the Riemann curvature tensor, denoted by Rabc d . Since p is
arbitrary, Rabc d is also a tensor field. Hence, we have:
3.4 The Riemann Curvature Tensor 93
Therefore, Euclidean space and Minkowski space are called flat spaces. In fact,
Minkowski space is similar to Euclidean space in many ways, and thus is also called
a pseudo-Euclidean space.
Equation (3.4.3) reflects the non-commutativity of a derivative operator acting on
a dual vector field. From this we can deduce the non-commutativity of a derivative
operator acting on a tensor field of an arbitrary type T c1 ···ck d1 ···dl , i.e., express (∇a ∇b −
∇b ∇a )T c1 ···ck d1 ···dl in terms of Rabc d . We have the following theorems:
Theorem 3.4.4
Proof ∀ωc ∈ F (0, 1), we have vc ωc ∈ F ; hence, it follows from the torsion-free
condition that
Thus, ωc (∇a ∇b − ∇b ∇a )vc = −vc (∇a ∇b − ∇b ∇a )ωc = −vc Rabc d ωd = −ωc Rabd c
vd , and therefore we get (3.4.4).
94 3 The Riemann (Intrinsic) Curvature Tensor
k
l
(∇a ∇b − ∇b ∇a )T c1 ···ck d1 ···dl = − Rabe ci T c1 ···e···ck d1 ···dl + Rabd j e T c1 ···ck d1 ···e···dl .
i=1 j=1
(3.4.5)
Proof Omitted.
Theorem 3.4.6 A Riemann curvature tensor has the following properties [NB: (1)
and (4) are general, (2), (3) and (5) require the torsion-free condition] :
if there is a metric field gab on M and ∇a gbc = 0, then we can define Rabcd ≡
gde Rabc e , which also satisfies
∇a (∇b ωc ) = ∂a (∇b ωc ) − d
ab ∇d ωc − d
ac ∇b ωd
= ∂a (∂b ωc − e
bc ωe ) − d
ab ∇d ωc − d
ac ∇b ωd
= (∂a ∂b ωc − e
bc ∂a ωe − ωe ∂a e
bc ) − d
ab ∇d ωc − d
ac ∇b ωd ,
(3.4.12)
and hence
where |d| in the lower indices [ab|d|c] indicates that d does not participate in the
antisymmetrization. Noticing that ∂a ∂b ωc = ∂b ∂a ωc and e bc = e cb , we see from
Theorem 2.6.2 (c) that each term on the right-hand side of the above equation van-
ishes.
3.4 The Riemann Curvature Tensor 95
(3) To prove (3.4.8), we only have to show that ωe ∇[a Rbc]d e = 0 ∀ωe ∈ F (0, 1).
Since
one has
To derive the sum of the first two terms on the right, first we write out the expression
without the square bracket
where in the second equality we used (3.4.5). Antisymmetrizating the lower indices
a, b, c, and noticing (3.4.7), then we have
which indicates that the right-hand side of (3.4.13) vanishes. Therefore, ωe ∇[a Rbc]d e
= 0.
(4) Applying (3.4.5) to gcd , it follows from ∇a gcd = 0 that
Remark 1 Suppose dim M = n, then Rabcd has in total n 4 components Rμνσρ . How-
ever, since the algebraic equations (3.4.6), (3.4.7), (3.4.9) and (3.4.10) are satisfied,
the number of independent components is only [for a proof, see Bergmann (1976)
pp. 172–174]
n 2 (n 2 − 1)
N= .
12
After a metric is chosen, each tensor Tab of type (0, 2) corresponds to a tensor
T a b ≡ g ac Tcb of type (1, 1), which is nothing but a linear transformation on a vector
space. The components of this linear transformation in an arbitrary basis form a
matrix, and the matrices in different bases are similar to each other; hence, they have
the same trace, whose value is T a a = g ac Tac , called the trace of the tensor T a b , also
called the trace of Tab . Similarly, for a given tensor Rabcd of type (0, 4), we can
in principle obtain the following six “traces” through contraction [each “trace” is a
tensor of type (0, 2)]: g ab Rabcd , g ac Rabcd , g ad Rabcd , g bc Rabcd , g bd Rabcd , g cd Rabcd .
96 3 The Riemann (Intrinsic) Curvature Tensor
However, due to the properties of Rabcd which comes from lowering the upper index
of the Riemann tensor Rabc d [(1), (4), (5) of Theorem 3.4.6] and the symmetry of
g ac , it is easy to see from (d) of Theorem 2.6.2 that the first and the sixth contractions
above vanishes; the second and the fifth are equal (reason: g ac Rabcd =g ac Rbadc , which
is essentially the same as g bd Rabcd , we do not write g ac Rabcd =g bd Rabcd only because
we need to take care of the balance of indices); the third and the forth are equal
and they are the negative of the second and the fifth ones. Hence, among these
six contractions there is only a single independent one, we can take, for example,
g bd Rabcd , denoted by Rac , called the Ricci tensor. What should be emphasized is
that we do not need a metric to define the Ricci tensor since Rac ≡ Rabc b is endowed
with a clear meaning. We can also take the trace of Rac using the metric, i.e., g ac Rac ,
denoted by R, called the scalar curvature. From (3.4.10), it is easy to show that
Rac = Rca . Besides, one should also be acquainted with the traceless part of Rabc d ,
which is called the Weyl tensor, defined as follows:
Definition 2 For a generalized Riemannian space of dimension n 3, the Weyl
tensor Cabcd is defined by the following expression:
2 2
Cabcd := Rabcd − (ga[c Rd]b − gb[c Rd]a ) + Rga[c gd]b .
n−2 (n − 1)(n − 2)
(3.4.14)
Proof Exercise.
Remark 2 Equation (3.4.14) indicates that Rabcd is the summation of its traceless
part Cabcd and its trace part
2 2
(ga[c Rd]b − gb[c Rd]a ) − Rga[c gd]b .
n−2 (n − 1)(n − 2)
1
G ab := Rab − Rgab . (3.4.16)
2
Theorem 3.4.8
∇ a G ab = 0 (where ∇ a G ab ≡ g ac ∇c G ab ) . (3.4.17)
Proof From the Bianchi identity (3.4.8) and (3.4.6) we have 0 = ∇a Rbcd e + ∇c Rabd e
+ ∇b Rcad e . Contracting indices a and e yields 0 = ∇a Rbcd a + ∇c Rabd a + ∇b Rcad a
= ∇a Rbcd a − ∇c Rbd + ∇b Rcd . Acting g bd on it we get
3.4 The Riemann Curvature Tensor 97
Equation (3.4.17) that the Einstein tensor satisfies is significant for establishing
Einstein’s equation of general relativity, for details, see Sect. 7.7.
Suppose M has a given metric gab , from ∇a gbc = 0 a unique connection ∇a is deter-
mined, and thus we have a Riemann tensor Rabc d . A common problem is to compute
Rabc d from the given gab . Computing a tensor means deriving its components in
a certain basis. There are two types of basis: coordinate basis and non-coordinate
basis. In this section, we only talk about the method of computing curvature using
a coordinate basis; the methods using non-coordinate bases are introduced in Sects.
5.7 and 8.7.
After we choose an arbitrary coordinate system, the components gμν of the metric
are then known, and in this coordinate system the connection ∇a satisfying ∇a gbc = 0
can be characterized by its Christoffel symbol in this system:
σ 1 σρ
μν = g (gρμ,ν + gνρ,μ − gμν,ρ ) [i.e., (3.2.10 )] . (3.4.19)
2
σ
μν has three component indices, and thus { σ μν } contains n 3 numbers. The sym-
metry σ μν = σ νμ makes it so that only n 2 (n + 1)/2 among the n 3 numbers are
independent (when n = 4 there are 40 independent numbers). The first step for the
calculation is to derive all the nonvanishing σ μν from the given gμν .
From the definition of the Riemann tensor we have Rabc d ωd = 2∇[a ∇b] ωc , where
∇a ∇b ωc can be expressed in six terms using (3.4.12) (there are five terms in this
equation, and the fifth term can be expanded into two terms, i.e., ∂b ωd − e bd ωe ).
Antisymmetrizing the indices a, b in each term, and noting that ∂[a ∂b] ωc = 0,
[ab] = [(ab)] = 0, we obtain
d d
Rabc d ωd = 2(− e
c[b ∂a] ωe − ωe ∂[a e
b]c − d
c[a ∂b] ωd + d
c[a
e
b]d ωe )
= −2ωd ∂[a d
b]c +2 e
c[a
d
b]e ωd , ∀ωd ∈ F (0, 1) .
Hence,
Rabc d = −2∂[a d
b]c +2 e
c[a
d
b]e , (3.4.20)
Rμνσ ρ = ρ
μσ,ν − ρ
νσ,μ + λ
σμ
ρ
νλ − λ
σν
ρ
μλ , (3.4.20 )
where ρ μσ,ν ≡ ∂ ρ μσ /∂ x ν . From the equation above we can also obtain the expres-
sion for the coordinate components of the Ricci tensor
Rμσ = Rμνσ ν = ν
μσ,ν − ν
νσ,μ + λ
μσ
ν
λν − λ
νσ
ν
λμ . (3.4.21)
Theorem 3.4.9 A metric field gab is (locally) flat (i.e., Rabc d = 0) if and only if there exists
a coordinate system such that the coordinate components of gab are all constants.
Proof The proof of this theorem requires techniques that we have not covered yet, see
Appendix J of Volume III.
where we used the fact that g [μλ] = 0 in the last step. This equation can be rewritten as
μ 1 μλ ∂gμλ
μσ = g . (3.4.22)
2 ∂xσ
On the other hand, the determinant g of the matrix constituted by gμλ can be expanded with
respect to the μth row as g = gμλ Aμλ (where Aμλ is the cofactor of gμλ , and the sum is
only taken over λ); hence, ∂g/∂gμλ = Aμλ . Thus, from the expression for the inverse matrix
elements g μλ = Aλμ /g we have
∂g
= gg μλ . (3.4.23)
∂gμλ
Since gμλ are functions of the coordinates x σ , g is a function of x σ as well, and
∂g ∂g ∂gμλ ∂gμλ
= = gg μλ σ , (3.4.24)
∂xσ ∂gμλ ∂ x σ ∂x
where (3.4.23) is used in the last step. Combining (3.4.22) and (3.4.24) yields
3.5 The Intrinsic Curvature and the Extrinsic Curvature 99
√
μ 1 ∂g 1 ∂ |g|
μσ = = √ . (3.4.25)
2g ∂ x σ |g| ∂ x σ
This is the expression for the “contracted Christoffel symbol”. The divergence ∇a va (as a
scalar field) can be derived by means of an arbitrary basis. Using the coordinate basis, it is
easy to derive from (3.4.25) and ∇a va = ∂a va + a ab v b that
1 ∂
∇a va = √ ( |g|v σ ) . (3.4.26)
|g| ∂ x σ
As an example of the application, now we derive the expression for the divergence ∇ · v of
a vector field v in the 3-dimensional Euclidean space in both the Cartesian and the spherical
coordinate systems. First, we rewrite the above equation as
· v = ∇a va = √1 ∂ ( |g|vi ) .
∇ (3.4.27)
|g| ∂ x i
· v =
(1) For a Cartesian coordinate system, g = 1, ∇ ∂vi
= ∂v1
+ ∂v2
+ ∂v3
; this is the
∂xi ∂x1 ∂x2 ∂x3
familiar formula for divergence.
√
(2) For a spherical coordinate system, g = r 2 sin2 θ ,
· v = 1 ∂
∇ (vi r 2 sin θ)
r 2 sin θ ∂ x i
1 ∂(v 1 r 2 sin θ) ∂(v2 r 2 sin θ) ∂(v3 r 2 sin θ)
= 2 + + , (3.4.28)
r sin θ ∂r ∂θ ∂ϕ
According to our intuition, a plane is flat while a curved surface is not. More precisely,
these “flat” and “curved” surfaces in our mind are all 2-dimensional surfaces (such as
spherical and cylindrical surfaces) embedded in the 3-dimensional Euclidean space.
Now we ask: given an n-dimensional manifold, can we talk about if it is curved by
following the same idea? As long as it can be embedded into an (n + 1)-dimensional
manifold, the answer will be yes. The curvature defined by embedding a manifold in
100 3 The Riemann (Intrinsic) Curvature Tensor
a manifold with one extra dimension is called the “extrinsic curvature”, which has
a precise definition (for details see Chap. 14). According to this definition, both of
a sphere and a cylindrical surface in 3-dimensional Euclidean space have a nonzero
curvature, which tallies with our intuition. However, the Riemann curvature we intro-
duced in this chapter is the intrinsic curvature, which reflects the “intrinsic warping”
of a manifold M after a connection ∇a is assigned. Unlike the extrinsic curvature,
there is no need to embed M in a one-higher dimensional manifold to tell the intrinsic
curvature. [Generally speaking, any property of (M, gab ) that can be determined by
just gab (without having to embed the manifold in a higher dimensional manifold)
is called an intrinsic property of (M, gab ).] The term “intrinsic curvature” actu-
ally just reflects the following three equivalent properties; a generalized Riemannian
space with these properties is called a curved space.
(1) The non-commutativity of the derivative operator, i.e., (∇a ∇b − ∇b ∇a )ωc =
Rabc d ωd , ∀ωc ∈ F (0, 1), where the nonvanishing tensor field Rabc d is used as the
definition of the intrinsic (Riemann) curvature, see Sect. 3.4.
(2) The curve-dependence of the parallel transport of a vector.
As we have discussed in Sect. 3.2, for two points p and q in (M, ∇a ), there
exists a curve-dependent translation map between their tangent spaces V p and Vq ;
that is, for a curve between p and q, any vector va at p determines a vector field
ṽa (satisfies ṽa | p = va ) parallelly transported along the curve whose value at q can
be defined as the image of va . In other words, ṽa |q is the result of va parallelly
transported to q. For Euclidean, Minkowski and any other flat space, this parallel
transport is curve-independent; thus, there is no need to specify a curve when we
talk about “parallelly transporting a vector at p to q”. This simplicity is called the
absoluteness of the parallel transport, which is pretty familiar to us. (Do you specify a
curve when you parallel transport a vector from a point to another point in Euclidean
space?) However, it is not as simple for a curved space. It can be proved that [see
Wald (1984) pp. 37–38; Straumann (1984) Theorem 5.7] a necessary and sufficient
condition for the intrinsic curvature Rabc d to be nonvanishing is that there exists a
closed curve such that a vector at a point on the curve will not return to itself when
parallelly transported along the curve; therefore, the parallel transport depends on
a curve (there is only a curve-dependent concept of parallel transport). Spherical
geometry provides a simple but intuitive example of this phenomenon:
(3) There exist geodesics that are parallel at first which become not parallel.
The two meridians in Fig. 3.10 give an intuitive example. For the precise meaning
see Sect. 7.6.
The curvature tensor field Rabc d of a flat space vanishes, and thus it does not
have any of the three properties above. Specifically, ① the derivative operator ∂a
associated with the flat metric (i.e., the ordinary derivative operator of a Cartesian
or Lorentzian system) does not have the non-commutativity; ② the parallel transport
of a vector does not depend on the curve, and thus one can talk about the “absolute
parallel transport” of a vector; ③ parallel lines will never intersect.
The intrinsic curvature and the extrinsic curvature are two different concepts. For
instance, a 2-dimensional cylindrical surface in 3-dimensional Euclidean space has
a nonzero extrinsic curvature but a zero intrinsic curvature. A cylindrical surface can
be viewed as the part between two parallel lines l1 and l2 on a plane after identifying
(gluing together) these two lines (see Fig. 3.11). Since the computation of Rabc d
at p only involves a neighborhood of p, it would not become nonzero due to the
identification of l1 and l2 .
Exercises
∇a ∇b f − ∇b ∇a f = −T c ab ∇c f , ∀ f ∈ F .
102 3 The Riemann (Intrinsic) Curvature Tensor
˜3.13. Derive all of the components of the Riemann tensor of the spherical metric (see
Exercise 3.10) in the {θ, ϕ} coordinate system.
3.14. Derive all of the components of the Riemann tensor of the metric ds 2 =
2 (t, x)(−dt 2 + dx 2 ) in the {t, x} coordinate system (use ˙ and to represent
the partial derivatives of the function with respect to t and x, respectively).
3.15. Derive all of the components of the Riemann tensor of the metric ds 2 =
z −1/2 (−dt 2 + dz 2 ) + z(dx 2 + dy 2 ) in the {t, x, y, z} coordinate system.
3.16. Suppose α(z), β(z), γ (z) are three arbitrary functions, h = t + α(z)x +
β(z)y + γ (z). Derive all of the components of the Riemann tensor of the metric
ds 2 = −dt 2 + dx 2 + dy 2 + h 2 dz 2
References
Bergmann, P. G. (1976), Introduction to the Theory of Relativity, Dover Publications INC, New
York.
Chern, S. S., Chen, W. & Lam, K. S. (1999), Lectures on Differential Geometry, World Scientific
Publishing Company, Singapore.
Hawking, S. W. & Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
Straumann, N. (1984), General Relativity and Relativistic Astrophysics, Spinger-Verlag, Berlin.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Chapter 4
Lie Derivatives, Killing Fields
and Hypersurfaces
(φ ∗ f )| p := f |φ( p) , ∀ f ∈ FN , p ∈ M ,
Definition 2 For any point in M one can define the pushforward map φ∗ : V p →
Vφ( p) as follows: ∀va ∈ V p , define its image φ∗ va ∈ Vφ( p) as
It should also be verified (Exercise 4.1) that the φ∗ va defined in this manner satisfies
the two conditions for a vector in Definition 2 of Sect. 2.2 and is thus indeed a vector
at φ( p). Many works refer to φ∗ as the tangent map of φ.
Theorem 4.1.1 φ∗ : V p → Vφ( p) is a linear map, i.e.,
* f
f
IR
Theorem 4.1.2 Suppose C(t) is a curve in M and T a is the tangent vector of the
curve at a point C(t0 ), then φ∗ T a ∈ Vφ(C(t0 )) is the tangent vector of the curve φ(C(t))
at φ(C(t0 )) (the image of the tangent vector of a curve is the tangent vector of the
image of the curve).
Proof Exercise 4.2. Hint: use the definition of the tangent vector of a curve
[see (2.2.6)].
(φ ∗ T )a1 ···al | p (v1 )a1 · · · (vl )al := Ta1 ···al |φ( p) (φ∗ v1 )a1 · · · (φ∗ vl )al ,
∀ p ∈ M , v1 , . . . , vl ∈ V p . (4.1.3)
(φ∗ T )a1 ···ak (ω1 )a1 · · · (ωk )ak := T a1 ···ak (φ ∗ ω1 )a1 · · · (φ ∗ ωk )ak ,
∗
∀ω1 , . . . , ωk ∈ Vφ( p) ,
where (φ ∗ v)b should be understood as (φ∗−1 v)b . Similarly, the pullback map can also
be generalized as φ ∗ : F N (k, l) → F M (k, l). The generalized φ∗ and φ ∗ are still
linear maps and are the inverse of each other.
viewpoints seems quite at odds with each other, they are equivalent for all practical
purposes. The theorem below can be seen as some kind of manifestation of this
equivalence.
Theorem 4.1.3
(φ∗ T )μ1 ···μk ν1 ···νl |φ( p) = T μ1 ···μk ν1 ···νl | p , ∀T ∈ F M (k, l) , (4.1.6)
where the left-hand side are the components of the new tensor φ∗ T at the new point
φ( p) in the old coordinate system {y μ }, and the right-hand side are the components
of the old tensor T at the old point p in the new coordinate system {x μ }.
Remark 2 Equation (4.1.6) is an equality of real numbers, the left-hand side is the
number coming from the active viewpoint (which regards the point and the tensor as
changed but the coordinate system as unchanged), while the right-hand side is the
number coming from the passive viewpoint (which regards the point and the tensor as
unchanged but the coordinate system as changed). Both sides being equal indicates
that these two viewpoints are equivalent for all practical purposes.
u μ = vν (∂ x μ /∂ x ν )| p . (4.1.7)
general. (Here we mean the expressions for Tμν and Tμν are different, though the symbols
for the argument do not matter.) If we want to obtain another set of functions Tμν from
the function set Tμν , we just need to perform the coordinate transformation, but not the
transformation for points and tensors on the manifold; that is, there is no need to employ the
map between manifolds and the map of tensors induced by it. This can be called the “passive
approach” of acquiring a new set of functions Tμν . However, the same effect can also be
obtained by adopting the following “active approach”. Suppose N is another manifold and
there exists a diffeomorphism φ : M → N , then T̃ab ≡ φ∗ Tab is a tensor field on N , the
components of which in a coordinate system {y σ } are also a set of functions T̃μν (y σ ) that,
in general, have a different form than Tμν (x σ ). This approach involves the transformation
of points (φ : M → N ) and the transformation of tensor fields (φ∗ : Tab → T̃ab ) but not any
coordinate transformation, which is exactly what the active viewpoint means. In order to
make sure that they will lead to the same end—that is, that the new function sets T̃μν and
coming from the active and passive approaches are the same—we only need to set the
Tμν
coordinate transformation on M induced by the diffeomorphism φ : M → N in the active
approach as the coordinate transformation {x σ } → {x σ } in the passive approach. In fact, if
we suppose p ∈ M and q ≡ φ( p) ∈ N , then
4.1 Maps of Manifolds 109
where Theorem 4.1.3 and the requirement of “setting the coordinate transformation induced
by φ : M → N as {x σ } → {x σ }” are applied in the third and the fifth equality, respectively.
This equation above indicates that T̃μν (y σ ) = Tμν
(y σ ), i.e., functions T̃
μν and Tμν are
equivalent.
This is only an example that shows the equivalence of the active and passive viewpoints
for practical purposes. The fact that Theorem 4.1.3 was used in a key step of the proof
indicates once again that this theorem is some kind of manifestation of this equivalence.
[The End of Optional Reading 4.1.1]
Proof The reader should add abstract indices to the equation and carry out the proof.
Proof The reader should add abstract indices to the equation and carry out the proof.
Remark 3 ① The above equation is an equality of tensor fields on N , while (4.1.9) is just an
equality of tensors at a point φ( p) ∈ N . ② The above equation will still hold if we substitute
φ ∗ for φ∗ ; however, T and T in this case should be tensor fields on N , and the new equation
should be viewed as an equality of tensor fields on M.
Proof Exercise.
and hence
C(φ∗ T ) = (φ∗ T μ ν )[φ∗ (eμ )a ][φ∗ (eν )a ] .
Taking (∂/∂ x μ )a from (4.1.4) and (dx μ )a from (4.1.5) as (eμ )a and (eμ )a , respectively,
yields
[φ∗ (eμ )a ][φ∗ (eν )a ] = (∂/∂ y μ )a (dy ν )a = δ ν μ .
110 4 Lie Derivatives, Killing Fields and Hypersurfaces
(Actually it can be proved that the above equation holds for any {(eμ )a } and {(eμ )a } at p.)
Therefore,
C(φ∗ T ) = (φ∗ T μ ν )δ ν μ = φ∗ (T μ ν δ ν μ ) = φ∗ (T μ μ ) = φ∗ (C T ) .
The reader may generalize this proof to a tensor field of arbitrary type on M.
As we have discussed at the end of Sect. 2.2, a smooth vector field va on M gives rise
to a one-parameter group of diffeomorphisms φ.1 Suppose T ··· ··· is a smooth tensor
field on M, then φt∗ T ··· ··· is also a smooth tensor field of the same type, where φt is a
group element from the one-parameter group of diffeomorphisms φ. The difference
of these two tensor fields at p ∈ M, namely, φt∗ T ··· ··· | p − T ··· ··· | p , is a tensor at p,
and the quotient (φt∗ T ··· ··· | p − T ··· ··· | p )/t in the limit of t approaches zero can be
viewed as some kind of derivative of the tensor field T ··· ··· at p. Therefore, we have
the following definition:
Definition 1
1
Lv T a1 ···ak b1 ···bl := lim (φt∗ T a1 ···ak b1 ···bl − T a1 ···ak b1 ···bl ) (4.2.1)
t→0 t
is called the Lie derivative of a tensor field T a1 ···ak b1 ···bl along a vector field va . (To
avoid confusion, the v in Lv is not written as va .)
Remark 1 Since φt∗ is a linear map, the Lie derivative is a linear map from F M (k, l)
to F M (k, l). From (4.2.1) and Theorem 4.1.7 we can also see that Lv commutes with
contractions.
Theorem 4.2.1
Lv f = v( f ) , ∀f ∈F . (4.2.2)
Proof ∀ p ∈ M, suppose C(t) is the orbit of φ that passes through p. Set p = C(0),
then φt ( p) = C(t), and va | p ≡ (∂/∂t)a | p is the tangent vector of C(t) at p (see
Fig. 4.2). Hence,
1 1
Lv f | p = lim (φt∗ f − f )| p = lim [ f (φt ( p)) − f ( p)]
t→0 t t→0 t
1 d
= lim [ f (C(t)) − f (C(0))] = ( f ◦ C)|t=0 = v( f )| p .
t→0 t dt
1If va is incomplete, then it can only give rise to a one-parameter local group of diffeomorphisms.
This section only involves local properties, so there is no need to distinguish local and global.
4.2 Lie Derivatives 111
p C(0)
Theorem 4.2.2 The components of the Lie derivative of a tensor field T a1 ···ak b1 ···bl
along va in a coordinate system adapted to va are
∂ T μ1 ···μk ν1 ···νl
(Lv T )μ1 ···μk ν1 ···νl = . (4.2.3)
∂x1
Remark 2 The left-hand side of the above equation satisfies the tensor transformation
law under a coordinate transformation while the right-hand side does not. Hence, this
equation cannot be written as an equality of tensors.
1
(Lv T )μ ν | p = lim [(φ−t∗ T )μ ν | p − T μ ν | p ] ∀p ∈ M . (4.2.4)
t→0 t
Let q ≡ φt ( p). Since (4.2.4) only involves the points near p, one can always consider
p and q as being in the same adapted coordinate patch. For φ−t , q is the old point
and p is the new point, and hence it follows from (4.1.6) that
2As long as va = 0 at a point, one can always define a coordinate system adapted to va in a
neighborhood of the point.
112 4 Lie Derivatives, Killing Fields and Hypersurfaces
∂ x μ ∂ x σ ρ
(φ−t∗ T )μ ν | p = T μ ν |q = T σ , (4.2.5)
∂ x ρ ∂ x ν q
where x σ are the adapted coordinates (the old coordinates), while x μ are the new
coordinates induced by φ−t . The right-hand side of the above equation involves the
value of the partial derivatives between the new and old coordinates at q which, to
calculate, we need to find the coordinate transformation in a small neighborhood N of
q. ∀q̄ ∈ N , denote p̄ ≡ φ−t (q̄). From the definition of adapted coordinates we know
that x 1 (q̄) = x 1 ( p̄) + t, x 2 (q̄) = x 2 ( p̄); also, by definition, the new coordinates at q̄
induced by φ−t are x 1 (q̄) ≡ x 1 ( p̄), x 2 (q̄) ≡ x 2 ( p̄), and hence x 1 (q̄) = x 1 (q̄) − t,
x 2 (q̄) = x 2 (q̄). Since q̄ is an arbitrary point in N , for N we have x 1 = x 1 − t,
x 2 = x 2 , and taking the derivatives we get (∂ x μ /∂ x ρ )|q = δ μ ρ , (∂ x σ /∂ x ν )|q =
δ σ ν . Therefore, (4.2.5) becomes (φ−t∗ T )μ ν | p = T μ ν |q , and plugging this into (4.2.4)
yields (Lv T )μ ν | p = ∂ T μ ν /∂ x 1 | p .
Theorem 4.2.3
Lv u a = [v, u]a , ∀u a , va ∈ F (1, 0) , (4.2.6)
Lv u a = vb ∇b u a − u b ∇b va , (4.2.6 )
Proof The claim we are about to prove is an equality of vectors, all we have to show
is that the corresponding equality of components in a coordinate system holds. The
most convenient one to use is certainly the adapted coordinate system. Suppose the
ordinary derivative operator of a coordinate system {x μ } adapted to va is ∂a , then
where the third equality comes from the fact that va = (∂/∂ x 1 )a leads to ∂b va = 0,
condition (d) in the definition of a derivative operator is used in the fourth equality,
and (4.2.3) is used in the last step.
Theorem 4.2.4
Proof Exercise 4.7. Hint: use Theorem 4.2.3 and 4.2.1, the latter of which will give
Lv (ωa u a ) = vb ∇b (ωa u a ).
4.3 Killing Vector Fields 113
Theorem 4.2.5
k
l
Lv T a1 ···ak b1 ···bl = v c ∇c T a1 ···ak b1 ···bl − T a1 ···c···ak b1 ···bl ∇c vai + T a1 ···ak b1 ···c···bl ∇b j v c
i=1 j=1
∀T ∈ F (k, l), v ∈ F (1, 0) , ∇a is an arbitrary torsion-free derivative operator. (4.2.8)
Proof Exercise.
Up to this point, this chapter has not yet mentioned any metric or any derivative
operator associated with a metric since the definition of a Lie derivative does not
require any additional structure on the manifold M. However, if a metric field gab is
assigned to M, then one can also impose a higher requirement on a diffeomorphism
φ : M → M, i.e., φ ∗ gab = gab . Therefore, we have the following definition:
Among all the vector fields on a manifold M there is a special class of vector
fields, namely the smooth vector fields. Each smooth vector field gives rise to a
one-parameter group of diffeomorphisms.3 If a metric field gab is assigned to M,
then we can also pick a special subclass among all the smooth vector fields, in
which the one-parameter group of diffeomorphisms given by each vector field is a
one-parameter group of isometries; that is, each group element φt : M → M is an
isometry. Therefore, we have the following definition:
Definition 2 A vector field ξ a on (M, gab ) is called a Killing vector field if its
one-parameter (local) group of diffeomorphisms is a one-parameter (local) group of
isometries. Equivalently (motivated readers should verify this), ξ a is called a Killing
vector field if Lξ gab = 0.
3 We do not require the vector field to be complete. When talking about an incomplete vector field,
the one-parameter group of diffeomorphisms refers to its one-parameter local group of diffeomor-
phisms.
114 4 Lie Derivatives, Killing Fields and Hypersurfaces
Theorem 4.3.1 The necessary and sufficient condition for ξ a to be a Killing vector
field on (M, gab ) is that ξ a satisfies the following Killing equation:
where ∇a is the torsion-free operator associated with gbc (∇a gbc = 0).
Proof For any vector field ξ a , it follows from (4.2.8) that
Lξ gab = ∇a ξb + ∇b ξa , (4.3.1 )
where we used the fact that ∇a gbc = 0. By definition, ξ a being a Killing vector field
is equivalent to Lξ gab = 0. Hence, the necessary and sufficient condition for ξ a to
be a Killing vector field is that it satisfies the Killing equation ∇a ξb + ∇b ξa = 0.
Theorem 4.3.2 If there exists a coordinate system {x μ } such that all the com-
ponents of gab satisfy ∂gμν /∂ x 1 = 0, then (∂/∂ x 1 )a is a Killing vector on the
coordinate patch.
Proof {x μ } is a coordinate system adapted to (∂/∂ x 1 )a . From (4.2.3) we can see that
(L∂/∂ x 1 g)μν = ∂gμν /∂ x 1 = 0, and hence L∂/∂ x 1 gab = 0, i.e., (∂/∂ x 1 )a is a Killing
vector field.
Theorem 4.3.3 Suppose ξ a is a Killing vector field, and T a is the tangent of a
geodesic, then T a ∇a (T b ξb ) = 0, i.e., T b ξb is a constant along the geodesic.
Proof T a ∇a (T b ξb ) = ξb T a ∇a T b + T b T a ∇a ξb = T b T a ∇a ξb = 0, where the defi-
nition of a geodesic is used in the second equality, and Theorem 4.3.1 (i.e.,
∇a ξb = ∇[a ξb] ) and Theorem 2.6.2 (d) are used in the third equality.
Suppose ξ a and ηa are Killing vector fields, α and β are real constants, then from
the linearity of the Killing equation we know that αξ a + βηa is also a Killing vector
field. It is not difficult to see that the collection of all the Killing vector fields on M
is a vector space. It can also be proved (Exercise 4.13) that the commutator [ξ, η]a
is also a Killing vector field.
Theorem 4.3.4 There are at most n(n + 1)/2 independent Killing vector fields
(n ≡ dim M) on (M, gab ). That is, the dimension of the collection of all the Killing
vector fields on M (as a vector space) is less than or equal to n(n + 1)/2.
Proof See Wald (1984) pp. 442–443.
Remark 2 ① Isometries can be viewed as some kind of symmetry transformations
that “preserve the metric”, and thus a Killing vector field represents a symmetry of
(M, gab ). A generalized Riemannian space that has n(n + 1)/2 independent Killing
vector fields is called a maximally symmetric space. ② The general method of finding
all the Killing vector fields on (M, gab ) is to find the general solution of the Killing
equation. However, for some (M, gab ) that are relatively simple, there also exist
methods that are a lot easier. We provide several examples below.
4.3 Killing Vector Fields 115
Example 1 Find all the independent Killing vector fields of the following generalized
Riemannian spaces.
(1) 2-dimensional Euclidean space (R2 , δab ). Suppose {x, y} is a Cartesian coor-
dinate system, then ds 2 = dx 2 + dy 2 , i.e., all the components of the Euclidean metric
δab in this coordinate system are constant. Hence, it follows from Theorem 4.3.2 that
(∂/∂ x)a and (∂/∂ y)a are Killing vector fields. We believe that a Euclidean space
is maximally symmetric, and it follows from Theorem 4.3.4 that there should be
three independent Killing Fields when n = 2. As expected, if we change to a polar
coordinate system, then ds 2 = dr 2 + r 2 dϕ 2 , and thus all of the components of δab
in this coordinate system are independent of ϕ. Therefore, it follows from Theorem
4.3.2 that (∂/∂ϕ)a is a Killing vector field. The expanded form of it in the coor-
dinate basis of a Cartesian coordinate basis is (∂/∂ϕ)a = −y(∂/∂ x)a + x(∂/∂ y)a .
The coefficients of the expansion depends on the coordinates, from which it is not
difficult to show that (∂/∂ϕ)a is independent of the first two Killing fields. (∂/∂ x)a
and (∂/∂ y)a being Killing reflects the translational invariance of the 2-dimensional
Euclidean metric along the x- and y-axes, while (∂/∂ϕ)a being Killing manifests the
rotational invariance of this metric.
(2) 3-dimensional Euclidean space (R3 , δab ). Since n = 3, there are six inde-
pendent Killing fields, namely (∂/∂ x)a , (∂/∂ y)a , (∂/∂z)a , −y(∂/∂ x)a + x(∂/∂ y)a ,
−z(∂/∂ y)a + y(∂/∂z)a and −x(∂/∂z)a + z(∂/∂ x)a . The first three reflect the trans-
lational invariance of the 3-dimensional Euclidean metric along the x-, y- and z-axes,
while the last three reflect the rotational invariance of the metric along the z, x, y
axes, respectively.
(3) 2-dimensional Minkowski space (R2 , ηab ). In a Lorentzian coordinate system
{t, x} we have ds 2 = −dt 2 + dx 2 , and thus we see that (∂/∂t)a and (∂/∂ x)a are
Killing fields. To find the third one, we define new coordinates ψ and η as follows:
The Minkowski line element can be expressed in terms of the new coordinates as
ds 2 = dψ 2 − ψ 2 dη2 . This expression indicates that all the components of ηab in the
new coordinate system are independent of the coordinate η, and hence (∂/∂η)a is
also a Killing vector field (whose integral curves are hyperbolas). The expanded form
of it in the coordinate basis of a Lorentzian coordinate basis is
From the fact that the coefficients of the expansion are coordinate dependent we
can see that (∂/∂η)a is independent of the first two Killing fields. The coordinate
patch of η and ψ defined by (4.3.2) is just an open subset of R2 which is restricted
by x > |t| (see region A in Fig. 4.3). However, (4.3.3) is defined on the whole R2 ,
and it is not difficult to verify that (∂/∂η)a is a Killing field on R2 . It is timelike
in the regions A and B in Fig. 4.3, spacelike in the regions C and D, and null on
the two lines with 45◦ tilt. t (∂/∂ x)a + x(∂/∂t)a is called the boost Killing vector
116 4 Lie Derivatives, Killing Fields and Hypersurfaces
D C( )
field, which indicates that the Minkowski metric has the invariance under a boost,
corresponding to the Lorentz transformation (for details, see Theorem 4.3.5).
(4) 4-dimensional Minkowski space (R4 , ηab ). Since n = 4, there are in total 10
independent Killing fields, divided into three groups:
(a) 4 translations (∂/∂t)a , (∂/∂ x)a , (∂/∂ y)a , (∂/∂z)a ;
(b) 3 spatial rotations
−y(∂/∂ x)a + x(∂/∂ y)a , −z(∂/∂ y)a + y(∂/∂z)a , −x(∂/∂z)a + z(∂/∂ x)a ;
(c) 3 boosts t (∂/∂ x)a + x(∂/∂t)a , t (∂/∂ y)a + y(∂/∂t)a , t (∂/∂z)a + z(∂/∂t)a .
Group (a) reflects the translational invariance of the Minkowski metric along the
t-, x-, y-, z-axes; group (b) reflects the spatial rotational invariance with respect to
z-, x-, y-axes, respectively; group (c) reflects the invariance under the boosts within
the t x-, t y-, t z-planes.
In Sect. 4.1 we already introduced the active and passive viewpoints of a diffeo-
morphism (in which the former is a transformation of points and tensor fields while
in the latter is a coordinate transformation) and their relationship (a transformation of
points induces a coordinate transformation). Now that we know an isometry is a spe-
cial diffeomorphism, we can expect that the coordinate transformation induced by an
isometry is also a special coordinate transformation. In fact, this is true! First, we will
use the 2-dimensional Euclidean space (R2 , δab ) as an example. Each Killing vector
field will give rise to a one-parameter group of isometries {φλ : R2 → R2 | λ ∈ R}.
From the active viewpoint, there are three kinds of isometries in this group, i.e., three
independent Killing vector fields:
① Translational Killing vector field (∂/∂ x)a . It induces the translation along the
x-direction, which can be expressed as x = x + λ, y = y; (It is not difficult to prove
by following the proof of Theorem 4.3.5; the expressions in ② and ③ can be proved
similarly.)
② Translational Killing vector field (∂/∂ y)a . It induces the translation along the
y-direction, which can be expressed as x = x, y = y + λ;
③ Rotational Killing vector field (∂/∂ϕ)a = −y(∂/∂ x)a + x(∂/∂ y)a . It induces
the rotation with respect to the origin, which can be expressed using polar coordinates
as r = r , ϕ = ϕ + λ, or expressed using Cartesian coordinates as x = x cos λ −
y sin λ, y = x sin λ + y cos λ.
Now we look at the 2-dimensional Minkowski space (R2 , ηab ). It also contains
three kinds of isometries, i.e., three independent Killing vector fields:
4.3 Killing Vector Fields 117
① Time translational Killing vector field (∂/∂t)a . It induces the time translation
along the t-direction, which can be expressed as t = t + λ, x = x (where x and t
are Lorentzian coordinates);
② Spatial translational Killing vector field (∂/∂ x)a . It induces the spatial transla-
tion along the x-direction, which can be expressed as t = t, x = x + λ;
③ Boost Killing vector field (∂/∂η)a = t (∂/∂ x)a + x(∂/∂t)a . The coordinate
transformation it induces is the well-known Lorentz transformation, see the following
theorem:
Remark 3 This theorem indicates that a boost and a Lorentz transformation are two
different wordings (active and passive) of the same transformation.
Proof The parametric equations of an integral curve of the vector field satisfy-
ing ξ a ≡ (∂/∂η)a are dx μ (η)/dη = ξ μ (μ = 0, 1). Noticing that ξ a ≡ t (∂/∂ x)a +
x(∂/∂t)a [see (4.3.3)], we have
dx(η) dt (η)
= t (η) , = x(η) . (4.3.4)
dη dη
∀ p ∈ R2 , suppose C(η) is the integral curve that satisfies p = C(0), i.e., x(0) = x p ,
t (0) = t p , then it is not difficult to prove that (4.3.4) have the particular solutions
(i.e., the parametric equations of the curve)
Suppose q ≡ φλ ( p), then q is the point on C(η) that has the parameter η = λ, i.e.,
q = C(λ). Hence, the new coordinates t and x induced by φλ satisfy
This is exactly the well-known Lorentz transformation. (Note that we have applied
the system of geometrized units, where the speed of light c = 1.)
[Optional Reading 4.3.1]
For any point p in R2 , C(η) in the proof above is a complete curve, i.e., η ∈ (−∞, ∞).
If p is in the region A or B, then C(η) is timelike; if p is in the region C or D, then C(η) is
spacelike; if p is on the lines with 45◦ tilt, then C(η) is null. The most special case is that
p = (0, 0), i.e., p is the origin of the {t, x} system, where C(η) = p (a single-point curve).
Thus, each line with 45◦ tilt is not one integral curve but the union of 3 integral curves,
in which the first and second ones are the upper and lower halves (excluding the origin),
respectively, and the third one is the single-point curve { p}. The range of the parameter of
these 3 lines are all (−∞, ∞).
[The End of Optional Reading 4.3.1]
Proof Denote ηab by gab , and denote its components in coordinate systems {x μ } and
{x μ } as gμν and gμν
, respectively.
(A) Suppose φ : Rn → Rn is an isometry (i.e., φ ∗ gab = gab ), and {x μ } is the
coordinate system induced by the Lorentzian system {x μ } through φ, then ∀ p ∈ Rn
we have gμν | p = (φ∗ g)μν |φ( p) = (φ −1∗ g)μν |φ( p) = gμν |φ( p) = ημν , where (4.1.6) is
used in the first equality, the third equality comes from the fact that φ being an
isometry makes φ −1 an isometry, and the fourth equality comes from the fact that
{x μ } is Lorentzian. This equation shows that the components of gab at p in the system
{x μ } are ημν , and hence {x μ } is a Lorentzian system.
(B) Suppose {x μ } and {x μ } are both Lorentzian coordinate systems, φ : Rn → Rn
is the diffeomorphism that corresponds to the coordinate transformation {x μ } →
{x μ }, then ∀ p ∈ Rn we have (φ −1∗ g)μν | p = (φ∗ g)μν | p = gμν
|φ −1 ( p) = ημν = gμν | p ,
where (4.1.6) is used in the second equality, while the third and fourth equalities come
from the fact that {x μ } and {x μ } are Lorentzian. This indicates that φ −1∗ gab = gab ,
and hence φ −1 is (which means φ is also) an isometry.
Remark 4 This theorem can also be applied to Euclidean space, where one only
needs to change the Lorentzian system to a Cartesian system. We therefore can
say that under an isometry a Lorentzian (or Cartesian) coordinate system remains
Lorentzian (or Cartesian).
4.4 Hypersurfaces 119
4.4 Hypersurfaces
Remark 1 The above conditions for embedding makes it so the topology and mani-
fold structure of S can be naturally carried to φ[S], and hence makes φ : S → φ[S]
a diffeomorphism.
Example 2 Suppose S is the unit sphere S 2 in R3 (viewed as M), then the identity
map φ : S 2 → R3 gives rise to an embedded submanifold of R3 . Noticing that S 2
has one lower dimension than R3 , we conclude that S 2 is a hypersurface of R3 .
[Optional Reading 4.4.1]
An embedded submanifold φ[S] has two topologies, one is the topology that comes
naturally from the embedding (see Remark 1), and the other is the topology on φ[S] (as
a subset of M) induced by M (see Example 5 in Sect. 1.2). These two topologies are not
necessarily the same. However, if we further require them to be the same, then we impose a
stricter requirement on the embedding. An embedding satisfying this additional requirement
is called a regular embedding [see Chern et al. (1999)]. The term “embedding” in some
works [e.g., Hawking and Ellis (1973)] actually refers to a regular embedding. Suppose
S = R, and M = R2 , then an embedding φ : S → M is a smooth curve in R2 . The one-
to-one condition of φ in the definition does not allow the embedded submanifold to be a
self-intersecting curve (such as the figure-eight shaped curve in Fig. 4.4). Is the curve that
is “arbitrarily close to self-intersecting” but not self-intersecting in Fig. 4.5 an embedded
submanifold? The answer is: it is an embedded submanifold but not a regular embedded
submanifold. From now on, most of the cases in this text where we talk about an embedded
submanifold will refer to a regular embedded submanifold.
[The End of Optional Reading 4.4.1]
[IR]
8
IR2
“arbitrarily close to
self-intersecting” is an
embedded submanifold but
not a regular embedded b
(b)
submanifold +
8
(a)
8
a
then a normal vector n a at q should be defined as a vector that is orthogonal to all the
vectors tangent to φ[S]. However, orthogonality is only meaningful after a metric is
assigned. When there is no metric on M, one cannot define a normal vector n a , but
can instead define a “normal covector” n a . Covector is another name for dual vector.
Since a dual vector gives a real number when acting on a vector (with no need for a
metric), a normal covector can be defined as follows:
Theorem 4.4.2 Let φ[S] represent the hypersurface defined by f = constant. Sup-
pose q ∈ φ[S], and ∇a f |q = 0, then ∇a f |q is a normal covector of φ[S] at q.
Proof All we have to prove is that, for any q ∈ φ[S], we have wa ∇a f = 0, ∀wa ∈ Wq .
Since wa is always tangent to a curve C(t) lying on φ[S] and passing through q, we
get wa ∇a f = ∂t∂ ( f ) = 0 ∀wa ∈ Wq , where the last step is because f is a constant
on C(t).
Proof
(A) Suppose n a ∈ Wq . Since n a is a normal covector of φ[S], regarding the wa
in Definition 3 as the n a in the present expression n a n a , we have n a n a = 0.
(B) From the proof of Theorem 4.4.1 we know that for any normal covector n a
there exists a basis {(eμ )a } such that (e2 )a , . . . , (en )a ∈ Wq and n a = (e1 )a ; hence,
for the first component of n a in this basis we have n 1 = n a (e1 )a = n a n a . Therefore,
n n a = 0 ⇒ n = 0 ⇒ n = nτ =2 n τ (eτ )a ∈ Wq .
a 1 a
na [IR] (e1)a
[IR]
(e2)a
(e2)a
(e2)a q
q (e1)a
[IR]
(e1)a q na na
(a) n a n a < 0 (spacelike hypersurface). (b) n a n a > 0 (timelike hyper- (c) n a n a = 0 (null hypersurface).
surface).
Fig. 4.6 Three cases of embedding R into R2 (t-axis points vertically upwards, x-axis points
horizontally to the right)
Theorem 4.4.1 we can see that (e1 )a is a normal covector n a whose corresponding
normal vector is n a = α −1 g ab (dt)b = −α −1 (∂/∂t)a , satisfying n a ∈
/ Wq and n a n a <
a
0 (i.e., n is timelike).
(2) φ[R] is parallel to the t-axis [see Fig. 4.6b]. ∀q ∈ φ[R], let (e2 )a = (∂/∂t)a ,
and choose (e1 )a = α(∂/∂t)a + β(∂/∂ x)a , (α, β can be arbitrary real numbers, but
β = 0.) then (e1 )a = β −1 (dx)a . Take (e1 )a to be the normal covector n a whose
corresponding normal vector is n a = β −1 (∂/∂ x)a , satisfying n a ∈ / Wq and n a n a > 0
a
(i.e., n is spacelike).
(3) φ[R] makes an angle of 45◦ with the x-axis (in Euclidean) [see Fig. 4.6c]. ∀q ∈
φ[R], let (e2 )a = (∂/∂t)a + (∂/∂ x)a , and choose (e1 )a = α(∂/∂t)a + β(∂/∂ x)a ,
α = β, then (e1 )a = (α − β)−1 [(dt)a − (dx)a ]. Take (e1 )a to be the normal cov-
ector n a whose corresponding normal vector is
n a = (α − β)−1 g ab [(dt)b − (dx)b ] = −(α − β)−1 [(∂/∂t)a + (∂/∂ x)a ] = −(α − β)−1 (e2 )a ,
The induced metric h ab is essentially the result of restricting the acting target
of gab of Vq to Wq . Since h ab is defined pointwisely on φ[S], it gives rise to an
induced metric field on φ[S]. When φ[S] is a timelike or spacelike hypersurface,
the induced metric can be conveniently expressed by the normalized normal vector
(n a n a = ±1) as
It is easy to see that ∀w1a , w2b ∈ Wq we have h ab w1a w2a = gab w1a w2a ∓ n a w1a n b w2b =
gab w1a w2a , which satisfies (4.4.1). However, there are actually many h ab that satisfy
(4.4.1), why do we only use the one defined by (4.4.2)? For the reason, see Optional
Reading 4.4.3.
[Optional Reading 4.4.3]
For convenience, we suppose Vq to be 4-dimensional (and thus Wq is 3-dimensional). As
an induced metric (a metric on Wq ), h ab in (4.4.1) is a tensor on Wq (a 3-dimensional tensor),
i.e., h ab ∈ TWq (0, 2) (which cannot act on elements in Vq − Wq ). However, for the conve-
nience of performing the 4-dimensional calculation, we want to find a 4-dimensional tensor
of type (0, 2) [i.e., an element of TVq (0, 2)], which can represent the 3-dimensional tensor
h ab . h ab ≡ gab ∓ n a n b is such a 4-dimensional tensor (note that both terms on the right-
hand side are 4-dimensional tensors). To distinguish from the h ab in (4.4.1), we temporarily
denote the h ab in h ab ≡ gab ∓ n a n b as h̄ ab . It can be proved that TVq (0, 2) has a sub-
set SVq (0, 2) ≡ {Tab ∈ TVq (0, 2) | Tab n a = 0, Tab n b = 0} that is naturally isomorphic to
TWq (0, 2), and thus SVq (0, 2) and TWq (0, 2) can be naturally identified (for details see Chap.
14). It is easy to see that gab ∈/ SVq (0, 2) while h̄ ab ∈ SVq (0, 2), and h̄ ab w1a w2b = gab w1a w2b
∀w1 , w2 ∈ Wq ; thus, one can identify h̄ ab as h ab . It can also be proved (left to the reader)
a b
that the only element in SVq (0, 2) that satisfies (4.1.1) (and thus can serve as h ab ) is h̄ ab ,
this is the reason why we regard the 4-dimensional tensor h̄ ab ≡ gab ∓ n a n b as the induced
metric. From now on, we will not distinguish the notation of h̄ ab and h ab .
The above conclusion about tensors of type (0, 2) can also be generalized as follows:
a special subset of TVq (0, l), namely {Ta1 ···al ∈ TVq (0, l) | the contraction on n a and any
index of Ta1 ···al vanishes}, is naturally isomorphic to TWq (0, l), and thus they can be natu-
rally identified. This identification makes it possible to substitute the elements of the former
one for the elements of the latter one when discussing and writing equations, which brings
us great convenience.
[The End of Optional Reading 4.4.3]
Remark 3 Equation (4.4.2) also holds when gab is positive definite (just change the
sign ∓ to −). As an exercise, the reader should write down the expression of expand-
ing the 3-dimensional Euclidean metric using the dual vector basis of a spherical
coordinate system, and verify that the induced metric h ab = gab − n a n b on the sphere
is the same as the induced metric ĝab defined in Example 2 of Sect. 3.3. [Hint: the
normalized normal covector on a sphere is n a = (dr )a .]
Fig. 4.7 va ∈ Vq is va
decomposed into the normal + n anb v b
component ±n a (n b v b ) and
the tangential component
h a b v b ∈ Wq
a
h bv b
q
[S ]
va = h a b vb ± n a (n b vb ) . (4.4.4)
The above equation represents a decomposition of the vector va (Fig. 4.7), where
±n a (n b vb ) is parallel to n a , called the normal component, and h ab vb is perpendicular
to n a [since n a (h ab vb ) = 0], called the tangential component (the component tangent
to φ[S]). h ab is called the projection map from Vq to Wq .
Theorem 4.4.4 The induced “metric” on a null hypersurface is degenerate (and
thus there is no induced metric).
Proof Let h ab represent the induced “metric”. The hypersurface being null leads to
the result that n a ∈ Wq (see Theorem 4.4.3), and hence there is a nonzero element
n a in Wq such that h ab n a wb = gab n a wb = 0, ∀wa ∈ Wq . Thus, h ab is a degenerate
tensor on Wq .
Example 4 Suppose t, x, y, z are Lorentzian coordinates of the 4-dimensional Minkowski
space (R4 , ηab ), r, θ, ϕ are the spherical coordinates corresponding to x, y, z, then
ηab can be expressed in terms of the dual coordinate basis vectors as
ηab = −(dt)a (dt)b + (dr )a (dr )b + r 2 (dθ )a (dθ )b + r 2 sin2 θ (dϕ)a (dϕ)b . (4.4.5)
where (4.4.1) is used in the second equality and (4.4.5) is used in the third equality.
Similarly, we have h φφ = r 2 sin2 θ , and the third diagonal element of h μν (denoted
by h nn ) is
Also, it is easy to verify that all of the non-diagonal elements vanish. Hence,
⎡ ⎤
r2 0 0
(h μν ) = ⎣ 0 r 2 sin2 θ 0 ⎦ ,
0 0 0
and therefore h ab is degenerate [we also say its “signature” is (+, +, 0)]. Thus,
ηab does not have an induced metric on the null hypersurface S . However, the
intersection S of S and an arbitrary constant-t surface (t > 0) is a 2-dimensional
sphere with a radius r = t. Let Ŵq ⊂ Wq represent the subspace formed by all the
elements in Wq that are tangent to S (see Fig. 4.8), then ηab does have an induced
metric on Ŵq , denoted by ĥ ab . Also, it is not difficult to verify that
It is not difficult for the reader to discuss the null hypersurface in (R4 , ηab ) defined
by t − z = 0 in a similar manner.
In the discussion so far, we have considered an embedded submanifold φ[S] as
the image of the embedding map φ : S → M for convenience. However, sometimes
it is useful to regard the map φ itself as a submanifold, as it was originally defined
in Definition 2. In this case, the induced metric in Definition 5 can be equivalently
defined as follows:
h ab w1a w2b = gab (φ∗ w1a )(φ∗ w2b ) , ∀w1a , w2b ∈ Wq . (4.4.7)
Since, by definition, gab (φ∗ w1a )(φ∗ w2b ) = (φ ∗ gab )w1a w2b , and since w1 a and w2 a are
arbitrary, the above equation can also be written simply as
h ab = φ ∗ gab . (4.4.8)
126 4 Lie Derivatives, Killing Fields and Hypersurfaces
Note that the above definition is valid at any point q ∈ S. Since q is arbitrary, the
induced metric as a tensor field on S is essentially the pullback of gab on M.
Exercises
˜4.1. Show that (φ∗ v)a defined by (4.1.2) satisfies the two conditions for a vector
in Definition 2 of Sect. 2.2.
˜4.2. Prove Theorems 4.1.1, 4.1.2, 4.1.3.
4.3. Suppose φ : M → N is a smooth map, p ∈ M, and y μ are the coordinates
in a neighborhood of φ( p). Show that
∂ωμ ∂vν
(Lv ω)μ = vν ν
+ ων μ . Hint: use (4.2.7) and set the ∇a to be ∂a .
∂x ∂x
˜4.9. Suppose u a , va ∈ F M (1, 0), then the following equality holds when both
sides acting on a tensor field of any type:
Prove the case where the acting targets are respectively f ∈ F M and wa ∈
F M (1, 0). Hint: when the acting target is wa one can use the Jacobi identity
(Exercise 2.8).
4.10. Suppose Fab is an antisymmetric tensor field on 4-dimensional Minkowski
space, whose components in a Lorentzian coordinate system {t, x, y, z}
are F01 = −F13 = xρ −1 , F02 = −F23 = yρ −1 , F03 = F12 = 0, where ρ ≡
(x 2 + y 2 )1/2 . Show that Fab has rotational symmetry, i.e., Lv Fab = 0, where
va = −y(∂/∂ x)a + x(∂/∂ y)a .
4.11. Suppose ξ a is a Killing vector field on (M, gab ), and ∇a is associated with
gab . Show that ∇a ξ a = 0.
4.12. Suppose ξ a is a Killing vector field on (M, gab ), φ : M → M is an isome-
try. Show that φ∗ ξ a is also a Killing vector field on (M, gab ). Hint: use the
conclusion in Exercise 4.5(c).
4.13. Suppose ξ a and ηa are Killing vector fields on (M, gab ). Show that their
commutator [ξ, η]a is also a Killing vector field. NB: This conclusion makes
the collection of all Killing vector fields on M not only a vector space, but
also a Lie algebra (for details, see Appendix G in Volume II).
4.14. Suppose ξ a is a Killing vector field of a generalized Riemannian space
(M, gab ), and Rabc d is the Riemann curvature tensor of gab .
(a) Show that ∇a ∇b ξc = −Rbca d ξd . NB: This equation is significant for
proving Theorem 4.3.4. Hint: from the definition of Rabc d and the Killing
equation (4.3.1) we can see that ∇a ∇b ξc + ∇b ∇c ξa = Rabc d ξd . Refer this
to as the first equation. By substituting the indices a → b, b → c, c → a
we get the second equation, and by substituting twice we get the third
equation. Adding the first equation to the second equation and subtracting
the third equation, and using (3.4.7), one can prove the claim.
(b) Use the conclusion of (a) to show that ∇ a ∇a ξ c = −Rcd ξ d , where Rcd is
the Ricci tensor.
˜4.15. Verify that (∂/∂η)a in (4.3.3) indeed satisfies the Killing equation (4.3.1).
˜4.16. Find the coordinate transformation induced by an arbitrary element φa
from the one-parameter group of isometries generated by R a = x(∂/∂ y)a −
y(∂/∂ x)a in the 2-dimensional Euclidean space.
128 4 Lie Derivatives, Killing Fields and Hypersurfaces
*4.17. Suppose each point of a hypersurface φ[S] in a spacetime (M, gab ) has a null
tangent vector while it does not have any timelike tangent vector (“tangent
vector” means the vector is tangent to φ[S]). Show that this is a null hyper-
surface. Hints: ① show that any vector orthogonal to a timelike vector t a must
be spacelike [choose an orthonormal basis {(eμ )a } such that (e0 )a = t a ]; ②
show that each point on a timelike hypersurface has a timelike tangent vector;
③ prove the original claim from these two lemmas.
References
Chern, S. S., Chen, W. and Lam, K. S. (1999), Lectures on Differential Geometry, World Scientific
Publishing Company, Singapore.
Chillingworth, D. (1976), Differential Topology with a View to Applications, Pitman Publishing,
London.
Hawking, S. W. and Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Chapter 5
Differential Forms and Their Integrals
For convenience in writing, we will sometimes drop the lower indices and write an
l-form as ω.
Theorem 5.1.1 (a) ωa1 ···al = ω[a1 ···al ] ⇒ for any basis we have ωμ1 ···μl = ω[μ1 ···μl ] ;
(b) ∃ a basis such that ωμ1 ···μl = ω[μ1 ···μl ] ⇒ ωa1 ···al = ω[a1 ···al ] .
Proof Exercise.
[See the explanation after (2.6.14) for the meaning of δπ , aπ(1) , . . . , aπ(l) .] For exam-
ple, ωab = −ωba , ωabc = −ωacb = ωcab = · · · ;
It follows from (5.1.1 ) that any component ωμ1 ···μl of an l-form with repeated
indices must vanish, e.g.,
Denote the collection of all the l-forms on V by (l). A 1-form is actually a dual
vector on V , and hence (1) = V ∗ . We stipulate that any real number is called a
0-form on V , then (0) = R. Since an l-form is a tensor of type (0, l), we naturally
have (l) ⊂ TV (0, l). Moreover, it is easy to show that (l) is a linear subspace
of TV (0, l). The computation of the dimension of (l) can be inspired by the com-
putation of the dimension of TV (0, l) in Theorem 2.4.1: to find the dimension of
TV (0, l), one finds a basis first, and to do so one needs to define the tensor prod-
uct. However, the tensor product of two differential forms (as two tensors) is not
totally antisymmetric, and hence is no longer a differential form. Nonetheless, one
can totally antisymmetrize all its indices and make it a differential form. Thus, we
have the following definition:
Definition 2 Suppose ω and μ are respectively an l-form and an m-form, then their
wedge product is an (l + m)-form defined by the following equation:
(l + m)!
(ω ∧ μ)a1 ···al b1 ···bm := ω[a1 ···al μb1 ···bm ] . (5.1.2)
l!m!
In other words, the wedge product is a map ∧ : (l) × (m) → (l + m) which
satisfies (5.1.2).
The wedge product (ω ∧ μ)a1 ···al b1 ···bm can also be denoted by ωa1 ···al ∧ μb1 ···bm , or
ω ∧ μ for short.
It follows from the definition that the wedge product satisfies both the associative
law and distributive law, i.e., (ω ∧ μ) ∧ ν = ω ∧ (μ ∧ ν) (and thus ω ∧ μ ∧ ν has
a clear meaning) and ω ∧ (μ + ν) = ω ∧ μ + ω ∧ ν. However, the wedge product
does not in general obey the commutative law. For instance, for 1-forms ω and μ we
have
and thus for the wedge product of any two 1-forms we have ω ∧ μ = −μ ∧ ω. Car-
rying over to the general case, suppose ω and μ are an l- and an m-form, respectively,
then
ω ∧ μ = (−1)lm μ ∧ ω . (5.1.3)
n!
dim (l) = , if l n ; (5.1.4)
l!(n − l)!
(l) = {0} (only contains the zero element) , if l > n .
5.1 Differential Forms 131
ωab = ω11 (e1 )a (e1 )b + ω12 (e1 )a (e2 )b + ω13 (e1 )a (e3 )b
+ ω21 (e2 )a (e1 )b + ω22 (e2 )a (e2 )b + ω23 (e2 )a (e3 )b
+ ω31 (e3 )a (e1 )b + ω32 (e3 )a (e2 )b + ω33 (e3 )a (e3 )b .
Noticing that ω11 = ω22 = ω33 = 0, ω21 = −ω12 , ω32 = −ω23 , ω13 = −ω31 , the
above equation becomes
ωab = ω12 [(e1 )a (e2 )b − (e2 )a (e1 )b ] + ω23 [(e2 )a (e3 )b − (e3 )a (e2 )b ]
+ ω31 [(e3 )a (e1 )b − (e1 )a (e3 )b ]
= ω12 (e1 )a ∧ (e2 )b + ω23 (e2 )a ∧ (e3 )b + ω31 (e3 )a ∧ (e1 )b . (5.1.5)
Thus, any ωab ∈ (2) can be expressed linearly in terms of {(e1 )a ∧ (e2 )b , (e2 )a ∧
(e3 )b , (e3 )a ∧ (e1 )b }. It is not difficult to show that the three 2−forms in the curly
brackets are linearly independent (Exercise 5.1), and hence they comprise a set
of basis vectors. Therefore, dim (2) = 3. The reader may generalize the above
discussion to the case where l, n are arbitrary positive integers and l n, and show
that any l-form ω can be expanded as
ωa1 ···al = ωμ1 ···μl (eμ1 )a1 ∧ · · · ∧ (eμl )al , (5.1.6)
C
where {(e1 )a , . . . , (en )a } is an arbitrary basis of V ∗ , and ωμ1 ···μl are the components
of ω in the basis of TV (0, l) constituted by {(e1 )a , . . . , (en )a }, i.e.,
where each component is determined by (5.1.7), e.g., ω134 = ωabc (e1 )a (e3 )b (e4 )c .
132 5 Differential Forms and Their Integrals
1 n
ωa1 ···al = ωμ ···μ (eμ1 )a1 ∧ · · · ∧ (eμl )al (the symbol is omitted by convention) .
l! 1 l μ ,...,μ
1 l
(5.1.6 )
The number of nonzero terms on the right-hand side is equal to the number of permutations
of taking l numbers from n numbers, i.e., Pnl = n!/(n − l)!, which can be divided into
Cnl = n!/[l!(n − l)!] groups, each containing l! terms. All the terms in each group are the
same, so dividing by l! yields Cnl = n!/[l!(n − l)!] terms, which is in agreement with (5.1.6).
[The End of Optional Reading 5.1.1]
where
ωμ1 ···μl = ωa1 ···al (∂/∂ x μ1 )a1 · · · (∂/∂ x μl )al (5.1.9)
ω = ω1···n dx 1 ∧ · · · ∧ dx n . (5.1.10 )
The equation above can be interpreted like this: the collection of all the n-forms at any
point p in M is a 1-dimensional vector space, which only has one independent basis
vector. Take the basis vector to be dx 1 ∧ · · · ∧ dx n | p , then (5.1.10 ) is the expansion
of ω| p in this basis. Note that the coefficient ω1···n can be different from point to
point, and thus is a function on the coordinate patch, which can be expressed as a
function of n variables, namely ω1···n (x 1 , . . . , x n ).
We will use M (l) to represent the collection of all the l-forms on M.
Definition 3 The exterior differentiation operator on a manifold M is the map
d : M (l) → M (l + 1), which can be defined as
Proof Exercise 5.4. Hint: choose the ordinary derivative operator ∂a of this coordinate
system as the ∇b in (5.1.11).
Theorem 5.1.5 d ◦ d = 0.
Proof Choosing the ordinary derivative operator ∂a of an arbitrary coordinate system
as the ∇b in (5.1.11) yields
[d(dω)]cba1 ···al = (l + 2)(l + 1)∂[c ∂[b ωa1 ···al ]] = (l + 2)(l + 1)∂[[c ∂b] ωa1 ···al ] = 0 ,
where Theorem 2.6.2 (b) is used in the second equality, and ∂[a ∂b] T ··· ··· = 0 in
Sect. 3.1 is used in the third equality.
Definition 4 Suppose ω is an l-form field on M. ω is said to be closed if dω = 0;
ω is said to be exact if there exists an (l − 1)-form field μ such that ω = dμ.
Remark 1 Theorem 5.1.5 can be expressed alternatively as follows: if ω is exact,
then ω is closed. However, to make the converse to be true one has to impose an
additional requirement on M. The requirement is omitted here; what the reader has
to know is that the trivial manifold Rn satisfies this requirement. Since any manifold
is locally trivial, one concludes that a closed l-form field on any manifold must be
at least locally exact. That is, suppose ω is a closed l-form field on a manifold M,
then for any point p of M there must be a neighborhood N on which there exists an
(l − 1)-form field μ such that ω = dμ.
Corollary 5.1.6 When M = R2 , Theorem 5.1.5 and its converse gives the following
proposition in standard calculus: given functions X (x, y) and Y (x, y), a necessary
and sufficient condition for the existence of a function f (x, y) such that d f = X dx +
Y dy is ∂ X/∂ y = ∂Y/∂ x.
1This definition is sufficient for this text, but the general definition of the exterior differentiation
does not require the torsion-free condition, see, e.g., Warner (1983); Chern et al. (1999).
134 5 Differential Forms and Their Integrals
Proof It follows from Theorem 5.1.4 that the exterior differentiation of the 1-form
field X dx + Y dy is
d(X dx + Y dy) = dX ∧ dx + dY ∧ dy
∂X ∂X ∂Y ∂Y
= dx + dy ∧ dx + dx + dy ∧ dy
∂x ∂y ∂x ∂y
∂X ∂Y ∂Y ∂X
= dy ∧ dx + dx ∧ dy = − dx ∧ dy . (5.1.13)
∂y ∂x ∂x ∂y
First, we take the 3-dimensional Euclidean space (R3 , δab ) as an example. Suppose
v is a vector field, L is a smooth curve, and S is a smooth surface. Before we specify
5.2 Integration on Manifolds 135
the direction of L (the arrow inFig. 5.1) andthe normal direction of S (the arrow n
in Fig. 5.2), both the integrals L v · dl and S v · d S can only be determined up to
a minus sign. By extension, one should assign an “orientation” to a manifold before
calculating the integral on it. However, not all manifolds are orientable.
Remark 1 From the orientation point of view, the ε1 and ε 2 that satisfy ε 1 = hε2
(h > 0) are equivalent. Since the collection of all the 1-forms at each point on an
n-dimensional manifold M is a 1-dimensional vector space (see (5.1.4)), for any
two n-form fields ε1 and ε 2 there must be ε1 = hε2 , where h is a ( not necessarily
positive) function on M. If ε1 and ε 2 are nowhere vanishing, then h is nowhere
136 5 Differential Forms and Their Integrals
Thus, each n-form field ω gives rise to a function of n variables, i.e., ω1···n (x 1 , . . . , x n ),
in the coordinate patch. We call the n-tuple integral of this function of n variables
the integral of the n-form field ω; the precise definition is as follows:
Definition 4 Suppose (O, ψ) is a right-handed coordinate system on an n-
dimensional oriented manifold M, ω is a continuous n-form field on an open subset
G ⊂ O, then the integral of ω on G is defined as
ω := ω1···n (x 1 , . . . , x n )dx 1 · · · dx n . (5.2.2)
G ψ[G]
The right-hand side of the above equation is just the standard integral3 of a function
of n variables on an open subset ψ[G] of Rn , which is already well-defined.
Remark 2 (1) To show the validity of Definition 4, one should also prove that the
integral of ω on G does not depend on the choice of the right-handed system. We
only prove the case n = 2 below as an example; the reader should carry over the
proof to the general case.
Suppose (O, ψ) and (O , ψ ) are right-handed coordinate systems that satisfy
G ⊂ O ∩ O . The coordinates of these two systems are denoted by x 1 , x 2 and x 1 ,
2 A topological space (X, T ) is said to be connected if it only has two subsets that are both open
and closed (Definition 7 of Sect. 1.2), and is said to be arcwise connected if any two points in
X can be joined by a continuous curve in X . A manifold is said to be connected (or arcwise
connected) if its base topological space is connected (or arcwise connected). For a topological
space, arcwise connected must be connected, but connected is not necessary arcwise connected
(there exist “sideswipe” counterexamples). For a manifold, arcwise connected is equivalent to
connected [see Abraham and Marsden (1978) Proposition 1.1.33].
3 Namely, the Riemann or Lebesgue integral.
5.2 Integration on Manifolds 137
x 2 , respectively, then
ω = ω12 dx 1 ∧ dx 2 = ω12 dx 1 ∧ dx 2 .
Let G ω≡ ψ[G] ω12 dx 1 dx 2 and ( G ω) ≡ ψ [G] ω12 dx 1 dx 2 . We want to prove
ω = ω. (5.2.3)
G G
Suppose S and M are manifolds with dimensions l and n(> l), respectively,
and φ : S → M is an embedding (see Sect. 4.4). Since φ[S] is an l-dimensional
submanifold, of course we can talk about the integral of an l-form field μ on it
(Definition 4 applies). However, the fact that “φ[S] is embedded in M” leads to two
possible meanings of “an l-form field on φ[S]”. Just like “a vector field on φ[S]” can
be tangent or not tangent to φ[S], “an l-form field on φ[S]” can also be classified
as “tangent to” and not “tangent to” φ[S]. Precisely speaking, an l-form field μ on
φ[S] is said to be “tangent to” φ[S] if ∀q ∈ φ[S], μ|q is an l-form on Wq (rather than
Vq ); that is, μ|q is a linear map that can turn l arbitrary elements of Wq into a real
number. An “l-form field on φ[S]” can either be tangent to φ[S] or not “tangent to”
φ[S]. Since we consider the φ[S] as an independent manifold when we talk about
the integral of an l-form on φ[S] (and do not care about the “outside” situation), only
an l-form μ that is “tangent to” φ[S] is meaningful. Nevertheless, since an l-form
field μ on φ[S] that is not “tangent to” φ[S] is a linear map that can turn l arbitrary
elements in Vq (rather then only Wq ) of each point q ∈ φ[S] into a real number, and
Wq is nothing but a subspace of Vq , we can obtain an l-form μ that is “tangent to”
φ[S] by just restricting the acting range of μ to Wq . We denote it by μ̃ and call it the
restriction of μ. Precisely, we have the following definition:
Definition 5 Suppose μa1 ···al is an l-form field on an l-dimensional submanifold
φ[S] ⊂ M. An l-form field μ̃a1 ···al on φ[S] (viewed as a manifold independent of M)
is called the restriction of the l-form field μa1 ···al on φ[S] if
μ̃a1 ···al |q (w1 )a1 · · · (wl )al = μa1 ···al |q (w1 )a1 · · · (wl )al ,
∀q ∈ φ[S] , (w1 )a1 · · · (wl )al ∈ Wq . (5.2.7)
Similar to the induced metric (see Definition 5 of Sect. 4.4), in the perspective that
a submanifold is the embedding map φ : S → M itself, the restriction of a form μ is
essentially the pullback φ ∗ μ on S. Especially, one can show that the integral of the
μ̃ in Definition 5 satisfies
μ̃ = φ ∗ μ .
φ[S] S
Later on, whenever we talk about the integral of an l-form field μ over an l-
dimensional submanifold φ[S], one should always
interpret it as the integral of the
restriction of μ, i.e., always interpret φ[S] μ as φ[S] μ̃ or S φ ∗ μ.
Rn− := {(x 1 , . . . , x n ) ∈ Rn |x 1 0} ,
where x 1 , . . . , x n are natural coordinates, the subset formed by all the points on
x 1 = 0 is called the boundary of Rn− , which by itself is an (n − 1)-dimensional
manifold (in fact it is just Rn−1 ). Carrying over to the general case, an n-dimensional
manifold N with boundary is defined in a way similar to an n-dimensional manifold,
except the Rn in that definition is changed to Rn− . That is, each element in the open
cover {Oα } of N should be homeomorphic to an open subset of Rn− ; all the points
in N that are mapped to x 1 = 0 (such as p in Fig. 5.4) form the boundary of N ,
denoted by ∂ N . Note that ∂ N is an (n − 1)-dimensional manifold; i(N ) ≡ N − ∂ N
is an n-dimensional manifold. For instance, a solid ball B in R3 is a 3-dimensional
manifold with boundary, whose boundary (a 2-sphere) is a 2-dimensional manifold,
while i(B) is a 3-dimensional manifold.
Theorem 5.3.1 (Stokes’s Theorem) Suppose a compact subset N of an n-
dimensional oriented manifold is an n-dimensional manifold with boundary, and
ω is an (n − 1)-form field (whose differentiability is at least C 1 ) on M, then
dω = ω. (5.3.1)
i(N ) ∂N
Now we will show that the equation above is a special case of Theorem 5.3.1. Let
M = R2 , then S ∪ L can be treated as the N in Theorem 5.3.1, where S and L serve
as i(N) and ∂ N , respectively. If we turn Aa into a 1-form field using the Euclidean
metric δab , then Aa can be treated as the ω in Theorem 5.3.1. Expand Aa using the
dual coordinate basis vectors of the Cartesian system: ω = Aa = Aμ (dx μ )a , then
∂ Aμ ν ∂ A1 2 ∂ A2 1
dω = dAμ ∧ dx μ = ν
dx ∧ dx μ = dx ∧ dx 1 + dx ∧ dx 2
∂
x ∂ x 2 ∂ x 1
∂ A2 ∂ A1
= − dx 1 ∧ dx 2 .
∂x1 ∂x2
Thus, the left-hand side of (5.3.2) can be expressed as i(N) dω, which means it is
a special case of the
left-hand
side of (5.3.1). On the other hand, the right-hand
side of (5.3.1) is ∂ N ω = ∂ N ω̃. Setting the arc length l as the local coordinate
of L, expanding ω̃ using the dual coordinate basis vector as ω̃a = ω̃1 (l)(dl)a , and
contracting both sides with (∂/∂l)a , we have
and hence ω̃ = Al dl. Therefore, the right-hand side of (5.3.2) can be written as
Al dl = ω. (5.3.3)
L ∂N
Now we have introduced the integral of a differential form on a manifold and some
related theorems. To talk about the integral of a function on a manifold, we first
introduce the concept of a volume element in Sect. 5.4.
εa1 a2 εa1 a2 = η11 η22 ε12 ε12 + η22 η11 ε21 ε21 = −2(ε12 )2 .
where ε1···n is a component of εa1 ···an in the orthonormal basis, and s is the number
of −1 among the components of gab in the orthonormal basis; for instance, s = 0
for definite positive metrics, and s = 1 for Lorentzian metrics. To choose a specific
volume element using the given metric, one just needs to impose the following simple
requirement on the components of the volume element εa1 ···an in the orthonormal basis
{(eμ )a }:
ε1···n = ±1 , (5.4.1)
i.e.,
εa1 ···an = ±(e1 )a1 ∧ · · · ∧ (en )an (for an orthonormal basis) , (5.4.2)
An εa1 ···an that satisfies the above equation is called the volume element associated
(or compatible) with the metric gab . The above equation can only determine the
volume element up to a minus sign, only together with the requirement “the volume
element is compatible with the orientation” can the volume element be uniquely
determined. Thus, the + and − signs on the right-hand side of (5.4.2) correspond to
right and left-handed orthonormal bases.
Summary. When dealing with an integral, we are only concerned here with orientable
manifolds.4 First, one should choose an orientation and make M an orientable man-
ifold. A basis being right or left-handed is stipulated by the orientation we choose.
When there is no metric field gab (or any other available geometric structure), except
for being required to be compatible with the orientation, the volume element is
quite arbitrary. After gab is assigned, εa1 ···an is uniquely determined by gab and the
requirement of it being compatible with the orientation, called the associated vol-
ume element for short. Later on, unless stated otherwise, all the volume elements we
mention when there is a metric will refer to this unique associated volume element.
Choose any right-handed Cartesian system {x, y, z} in the 3-dimensional
Euclidean space (R3 , δab ) by intuition and assign the orientation using the 3-form
field ε = dx ∧ dy ∧ dz, then according to Definition 3 of Sect. 5.2, {x, y, z} is a
right-handed system measured by ε. Comparing ε = dx ∧ dy ∧ dz and (5.4.2) we
can see that ε is an associated volume element. Suppose G is an open subset of R3
4Integration can also be defined on non-orientable manifolds. In this case, one needs the concept
of a “twisted” (also called “odd” or “pseudo”) form, which is outside the scope of this text.
5.4 Volume Elements 143
and the integral G dxdydz exists, then this integral naturally stands for the volume
of G (by the definition of volume in standard calculus).
On the other hand, it follows
from Definition
4 of Sect. 5.2 that
the integral G ε of the 3-form field ε on G ⊂ R3
is exactly G dxdydz, and thus G ε is the volume of G. Generalize to any oriented
manifold N with a positive definite metric gab : suppose ε is the associated volume
element, if N ε exists, then we call it the volume (or length and area for 1- and
2-dimensional manifolds, respectively) of N (measured by gab ). This is the reason
why ε is called a volume element.
Theorem 5.4.1 Suppose ε is an associated volume element, {(eμ )a } and {(eμ )a } are
a basis and its dual basis, g is the determinant of the components of gab in this basis,
|g| is the absolute value of g, then (+ for right-handed basis and − for left-handed
basis)
εa1 ···an = ± |g|(e1 )a1 ∧ · · · ∧ (en )an . (5.4.4)
(−1)s n! = εμ1 ···μn εμ1 ···μn = g μ1 ν1 · · · g μn νn εν1 ···νn εμ1 ···μn . (5.4.5)
The right-hand side of this equation should be interpreted as summing over each of μ1 · · · μn
and ν1 · · · νn from 1 to n. Considering the total antisymmetry of εν1 ···νn and εμ1 ···μn , one
can
simplify the summation above into a sum over the permutations. More precisely, let
π(ν1 ···νn ) represent summing over all the permutations of 1, 2, . . . , n, then
r.h.s. of (5.4.5) = g μ1 ν1 · · · g μn νn εν1 ···νn εμ1 ···μn
π(μ1 ···μn ) π(ν1 ···νn )
= g μ1 1 g μ2 2 g μ3 3 · · · g μn n ε123···n εμ1 ···μn
π(μ1 ···μn )
+ g μ1 2 g μ2 1 g μ3 3 · · · g μn n ε213···n εμ1 ···μn + · · · . (5.4.5 )
π(μ1 ···μn )
There are n! terms on the right-hand side of this equation. Using ε̂μ1 ···μn to represent the
Levi-Civita symbol, i.e.,
⎧
⎨ + 1,
⎪ (when μ1 · · · μn is an even permutation of 1, 2, . . . , n) ,
ε̂μ1 ···μn = − 1 , (when μ1 · · · μn is an odd permutation of 1, 2, . . . , n) ,
⎪
⎩
0, (when two of μ1 , . . . , μn are equal) ,
we have εμ1 ···μn = ε123···n ε̂μ1 ···μn . Denote π(μ1 ···μn ) as π for short, then
where det(g μν ) stands for the determinant of the matrix g μν (the definition of the determinant
is used in the last step). Also,
Similarly one can prove that each term on the right-hand side of (5.4.5 ) equals (ε1···n )2
det(g μν ). Noticing that there are n! terms on the right-hand side of the above equa-
tion, plugging them back to (5.4.5) yields (−1)s n! = (n!)(ε1···n )2 det(g μν ), or (−1)s =
(ε1···n )2 det(g μν ). The fact that the matrix g μν is the inverse of gμν gives that det(g μν ) =
1/ det(gμν ) ≡ 1/g. Plugging into the previous equation, we obtain
Remark 2 For an orthonormal basis we have |g| = 1, and hence (5.4.4) goes back
to (5.4.2).
Theorem 5.4.2 Suppose ∇a and ε are respectively the derivative operator and the
volume element associated with the metric, then
Proof It follows from ∇b gac = 0 and (5.4.3) that εa1 ···an ∇b εa1 ···an = 0, and thus for
any vector field vb we have
2 [a3
δ [a2 a2 δ a3 b3 · · · δ an ] bn = δ b3 · · · δ a n ] bn ,
n−1
and carrying over to the general case,
j
δ [a j a j δ a j+1 b j+1 · · · δ an ] bn = δ [a j+1 b j+1 · · · δ an ] bn .
n − ( j − 1)
Therefore, it can be proved that
1 2 3 j
δ [a1 a1 · · · δ a j a j δ a j+1 b j+1 · · · δ an ] bn = ··· δ [a j+1 b j+1 · · · δ an ] bn
n n−1n−2 n− j +1
(n − j)! j! [a j+1
= δ an ]
b j+1 · · · δ bn .
n!
Theorem 5.4.4
Proof εa1 ···an εb1 ···bn = ε[a1 ···an ] ε[b1 ···bn ] indicates that all the upper indices and all the
lower indices of εa1 ···an εb1 ···bn are antisymmetric. It is not difficult to prove that the
collection of all tensors of type (n, n) satisfying this condition is a 1-dimensional
vector space, and since δ [a1 b1 · · · δ an ] bn belongs to this collection (it is not difficult to
show that δ [a1 b1 · · · δ an ] bn = δ [a1 [b1 · · · δ an ] bn ] ), any tensor in this collection can only
differ by a multiplicative factor. Thus, εa1 ···an εb1 ···bn = K δ [a1 b1 · · · δ an ] bn . Contract-
ing with εa1 ···an εb1 ···bn , the left-hand side yields (−1)s n!(−1)s n!, and the right-hand
yields K εb1 ···bn εb1 ···bn = K (−1)s n!, and hence K = (−1)s n!, which brings (5.4.9).
Contracting the first j upper and lower indices on both sides gives
εa1 ···a j a j+1 ···an εa1 ···a j b j+1 ···bn = (−1)s n!δ [a1 a1 · · · δ a j a j δ a j+1 b j+1 · · · δ an ] bn
= (−1)s (n − j)! j!δ [a j+1 b j+1 · · · δ an ] bn .
From Definition 1 we see that the integral of a function depends on the choice
of a volume element. As long as a metric is given on the manifold, we
146 5 Differential Forms and Their Integrals
stipulate that the integral of a function is always defined using the associated vol-
ume element. In this way, for an oriented manifold with a metric, the integral of
a given function is determined. Take the 3-dimensional Euclidean space (R3 , δab )
as an example. Suppose {x, y, z} is a right-handed Cartesian coordinate system,
then ε = dx ∧ dy ∧ dz is an associated volume element, and hence
the integral of
a function f : R3 → R on (R3 , δab ) is, by definition, R3 f = R3 f ε. The right-
hand side is nothing but an integral of a 3-form field ω ≡ f ε, and according
to its definition (Definition 4 of Sect. 5.2), one should express ω in the form of
(5.2.1) using the dual basis of the right-handed system. Let F(x, y, z) be the func-
tion of 3 variables coming from combining f with the Cartesian system {x, y, z},
then
ω = F(x, y, z) dx ∧ dy ∧ dz .
If you like, you can also compute it using the (right-handed) spherical coordinate
system {r, θ, ϕ}. It follows from the line element ds 2 = dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 )
that g = r 4 sin2 θ , and thus from (5.4.4) we know that ε = r 2 sin θ dr ∧ dθ ∧ dϕ.
Therefore, (5.2.1) in the present case is ω ≡ f ε = F̂(r, θ, ϕ)r 2 sin θ dr ∧ dθ ∧ dϕ
[where F̂(r, θ, ϕ) comes from combining f with {r, θ, ϕ}]. Hence,
f = fε = ω= F̂(r, θ, ϕ)r 2 sin θ dr ∧ dθ ∧ dϕ .
Now we will introduce the general form of Gauss’s theorem. The form of Gauss’s
law that is familiar to readers is
· A)dV
(∇ = A · ndS . (5.5.2)
V S
Respectively, the two sides of the above equation can be colloquially described as “the
integral of the product of the function ∇ · A and the volume element dV ” and “the
integral of the product of the function A · n and the area element (2-dimensional
volume element) dS”. Now we will show in two steps that the Stokes theorem
(5.3.1) leads to a formula which includes (5.5.2) as a special case. The first step
is to derive Theorem 5.5.1, the left-hand side of which can be seen as a generaliza-
tion of (5.5.2).
Remark 1 The left-hand side of the equation above can be seen as a generalization
of the left-hand side of (5.5.2).
Proof The exterior derivative of the (n − 1)-form field ωa1 ···a−1 ≡ vb εba1 ···an−1 is the
n-form field (dω)ca1 ···an−1 = n∇[c (vb ε|b|a1 ···an−1 ] ), in which ∇c can be any torsion-free
derivative operator. The collection of all n-forms at any point in N is a 1-dimensional
vector space. Hence, two n-forms dω and ε only differ by a multiplicative factor, i.e.,
where h is a function on N that can be found as follows: contracting both sides with
εca1 ···an−1 the right-hand side yields (−1)s hn!, and the left-hand side yields
nεca1 ···an−1 ∇[c (vb ε|b|a1 ···an−1 ] ) = nε[ca1 ···an−1 ] ∇c (vb εba1 ···an−1 )
= nεca1 ···an−1 εba1 ···an−1 ∇c vb = n(−1)s (n − 1)!δ c b ∇c vb = (−1)s n!∇b vb .
Now we go one step further and rewrite the right-hand side of (5.5.3) into a form
like the right-hand side of (5.5.2). Since the latter involves the volume element dS on
the boundary S, let us start with the volume element of ∂ N . Here we only talk about the
case where ∂ N is not a null hypersurface, and thus we can talk about the normalized
normal vector n a of ∂ N that satisfies n a n a = ±1 (see Sect. 4.4). The induced metric
of the metric gab on ∂ N is h ab = gab ∓ n a n b [see (4.4.2)]. Regarding ∂ N as an
(n − 1)-dimensional manifold with the metric h ab , its volume element (denoted by
ε̂a1 ···an−1 ) should satisfy two conditions: ① compatible with the induced orientation
of ∂ N (denoted by ε̄a1 ···an−1 , see Remark 1 of Sect. 5.3); ② associated with h ab , i.e.,
where ε̂a1 ···an−1 is the result of raising the indices of ε̂a1 ···an−1 using h ab , and ŝ is the
number of negative numbers in the diagonal elements of h ab . The volume element
ε̂a1 ···an−1 on ∂ N that satisfies these two conditions is called the induced volume
element. Suppose n b is the outgoing unit normal vector of ∂ N [with i(N ) being the
interior, there is a clear meaning for “outgoing”], then the induced volume element
ε̂a1 ···an−1 and the volume element εba1 ···an−1 on N have the following relation (for a
proof, see Optional Reading 5.5.1):
From the spirit of Remark 1 in Sect. 5.3 [see, Wald (1984) p. 431 for details] we know that
(e2 ∧ · · · ∧ en )a2 ···an serves as the induced orientation ε̄a2 ···an at q ∈ ∂ N , and hence
εa1 ···an = ±n a1 ∧ ε̄a2 ···an , also written as εba1 ···an−1 = ±n b ∧ ε̄a1 ···an−1 .
Using this, one can easily show that ε̄a1 ···an−1 = n b εba1 ···an−1 , and then it follows from (5.5.6)
that ε̂a1 ···an−1 = +1 · ε̄a1 ···an−1 . Thus, ε̂a1 ···an−1 is compatible with the induced orientation
ε̄a1 ···an−1 , i.e., condition ① is satisfied. As an exercise (Exercise 5.10), the reader should
verify that ε̂a1 ···an−1 = n b εba1 ···an−1 also satisfies condition ②, i.e., (5.5.5). Note that condition
② can only determine ε̂a1 ···an−1 up to a minus sign [i.e., ε̂a1 ···an−1 = −n b εba1 ···an−1 also
satisfies (5.5.5)]. Only when taken together with condition ① can ε̂a1 ···an−1 be determined as
n b εba1 ···an−1 .
[The End of Optional Reading 5.5.1]
The theorem below is the general version of Gauss’s theorem that contains (5.5.2)
as a special case.
Theorem 5.5.2 (Gauss’s Theorem) Suppose M is an n-dimensional oriented man-
ifold, N is an n-dimensional compact submanifold with boundary in M, gab is a
metric on M, ε and ∇a are, respectively, the associated volume element and the
associated derivative operator, ε̂ is the induced volume element on ∂ N , n a is the
outgoing normal vector of ∂ N satisfying n a n a = ±1, and va is a C 1 vector field on
M. Then,
(∇a va )ε = ± va n a ε̂ . (+for n a n a = +1, −for n a n a = −1.) (5.5.7)
i(N ) ∂N
Proof
From Theorem 5.5.1 we know that all we have to prove is ∂ N vb εba1 ···an−1 =
± ∂ N va n a ε̂. Let
ωa1 ···an−1
= v εba1 ···an−1 . Noticing the discussion
b
at
the end of
Sect. 5.2 about φ[S] ω ≡ φ[S] ω̃, we can see that here ∂ N vb εba1 ···an−1 is ∂ N ω̃. Hence,
all we have to prove is that
where n a is the outgoing unit normal vector of ∂ N . Both sides of the above equation
are (n − 1)-forms on Wq , and hence there exists a K such that
and thus all we have to prove is that K = ±1. Suppose {(e0 )a = n a , (e1 )a , . . . ,
(en−1 )a } is a right-handed orthonormal basis of Vq . Contracting (e1 )a1 · · · (en−1 )an−1
5.6 Dual Differential Forms 149
where we used n b = ±(e0 )b in the first equality; in the second equality we used
the following fact: it can be shown from the definition of the induced orientation
ε̄ that the right-handedness of {(e0 )a = n a , (e1 )a , . . . , (en−1 )a } (measured by the
orientation ε) assures the right-handedness of {(e1 )a , . . . , (en−1 )a } (measured by ε̄),
and thus ε̂12···(n−1) = 1. On the other hand, the left-hand side of (5.5.9) after the
contraction yields
ω̃a1 ···an−1 (e1 )a1 · · · (en−1 )an−1 = ωa1 ···an−1 (e1 )a1 · · · (en−1 )an−1
= vb εba1 ···an−1 (e1 )a1 · · · (en−1 )an−1 = vμ εμ12···(n−1) = v0 ε012···(n−1) = v0 , (5.5.11)
where (5.2.7) is used in the first equality, and the right-handedness of {(e0 )a = n a ,
(e1 )a , · · · , (en−1 )a } is used in the fifth equality. Comparing (5.5.10) and (5.5.11) we
obtain K = ±1.
Remark 2 One of the conditions for (5.5.7) is that n a is the outgoing unit normal
vector of ∂ N . If we change the stipulation to “n a is outgoing when n a n a = +1, n a is
ingoing [pointing towards i(N )] when n a n a = −1”, then the ± sign in the right-hand
side of (5.5.7) vanishes, and Gauss’s theorem turns into the following form
(∇a va )ε = va n a ε̂ . (5.5.7 )
i(N ) ∂N
1
εa ···a = n [a1 ε̂a2 ···an ] .
n 1 n
If M is an oriented manifold with a metric gab and ε is the associated volume element,
then we can define an isomorphism between M (l) and M (n − l) using ε and gab
as follows:
150 5 Differential Forms and Their Integrals
∗ 1 b1 ···bl
ωa1 ···an−l := ω εb1 ···bl a1 ···an−l , (5.6.1)
l!
where
ωb1 ···bl = g b1 c1 · · · g bl cl ωc1 ···cl .
Remark 1 The ∗ operator we defined above is called the Hodge star, and ∗ ω is also
called the Hodge dual of the form ω. It is not difficult to see that: ① ∗ : M (l) →
M (n − l) is an isomorphism; ② for a 0-form field f ∈ F M , its dual form field by
definition is
∗ 1
f a1 ···an = f εa1 ···an = f εa1 ···an ,
0!
i.e., ∗ f equals f times the volume element ε associated with the metric. Therefore,
one can say that the integral of a function f is defined as the integral of its dual form
field. Applying ∗ to the above equation again we have
∗ ∗ 1 b1 ···bn
( f ) = ∗ ( f ε) = fε εb1 ···bn = (−1)s f .
n!
[Equation (5.4.3) is used in the third equality.] This result can be generalized into
the following theorem:
Theorem 5.6.1
∗∗
ω = (−1)s+l(n−l) ω . (5.6.2)
Now, from the differential geometry point of view, let us revisit the vector algebra and
vector field theory on 3-dimensional Euclidean space (R3 , δab ) that we are already
familiar with (where M is R3 ).
(1) Why have we never heard of 1-, 2- and 3-form fields before? First, using
the Euclidean metric δab , one can turn a dual vector field ωa into a vector field
ωa = δ ab ωb , which eliminates the need to use a 1-form field. Later on, we will
not distinguish the upper and lower indices strictly when we are dealing only with
(R3 , δab ). Second, since n = 3, M (2) and M (1) have the same dimension, and
ω ∈ M (2) and ∗ ω ∈ M (1) can be identified using the isomorphism ∗ : M (2) →
M (1), which eliminates the need to use a 2-form field. Similarly, M (3) and M (0)
have the same dimension, and using the isomorphism ∗ : M (3) → M (0) one can
identify ω ∈ M (3) and ∗ ω ∈ M (0), the latter of which is exactly a function on
R3 (a 0-form field). Therefore, any differential form on the 3-dimensional Euclidean
space can be represented by a function or a vector field.
(2) Now we discuss the dot product and cross product operations of the vector
algebra. Denote the vectors A and B as Aa and B a , respectively. Naturally, the dot
5.6 Dual Differential Forms 151
then
∗ 1 ab
ωc = ω εabc = εabc A[a B b] = εabc Aa B b , (5.6.3)
2
where εabc is the volume element associated with the Euclidean metric. Suppose
{x, y, z} is a right-handed Cartesian coordinate system, then its coordinate basis is
orthonormal. It follows from (5.4.2) that the nonzero components εi jk of εabc in this
system are
ε123 = ε312 = ε231 = −ε132 = −ε321 = −ε213 = 1 ,
and thus εi jk is the familiar Levi-Civita symbol. Therefore, the kth component of ∗ ωc
in this Cartesian system is
∗
ωk = εi jk Ai B j , k = 1, 2, 3 . (5.6.4)
f = ∂a f ;
(a) ∇
· A = ∂a Aa ;
(b) ∇
× A = εabc ∂a Ab [the derivation is similar to (5.6.3)] ;
(c) ∇
· ( A B)
(d) ∇ = ∂a (Aa B b ) ;
A = ∂ a Ab ;
(e) ∇
(f) ∇ 2 f = ∂a ∂ a f ;
(g) ∇ 2 A = ∂a ∂ a Ab . (5.6.5)
By means of ∂a and the abstract index notation, one can also simplify the derivation
of some useful formulas and make the reasoning clearer. Here we give only two
examples.
152 5 Differential Forms and Their Integrals
· ( A × B)
∇ = B · (∇ − A · (∇
× A) .
× B) (5.6.6)
Proof
· ( A × B)
∇ = ∂c (εcab Aa Bb ) = εcab (Aa ∂c Bb + Bb ∂c Aa ) , (5.6.7)
while
B · (∇ = Bb (∇
× A) b = Bb εbca ∂c Aa = εcab Bb ∂c Aa ,
× A)
− A · (∇ = −Aa (∇
× B) a = −Aa εacb ∂c Bb = εcab Aa ∂c Bb .
× B)
A · B)
∇( = ( A · ∇)
B + ( B · ∇)
A + A × (∇ + B × (∇
× B) .
× A) (5.6.8)
and similarly,
Hence,
A · B)
the r.h.s. of (5.6.8) = Aa ∂ b B a + Ba ∂ b Aa = ∂ b (Aa B a ) = ∇( .
(4) The gradient, curl and divergence in the 3-dimensional Euclidean space can be
simply expressed using the exterior differentiation as follows:
Theorem 5.6.2 Suppose f and A are respectively a function and a vector field on
the 3-dimensional Euclidean space, then
The fact that R3 is a trivial manifold assures that a closed form field on R3 is exact
(see Remark 1 of Sect. 5.1). Combining this with (5.6.9), one can easily prove (Exer-
cise 5.15) the following well-known propositions which are not so straightforward
to prove by the standard vector analysis of 3-dimensional Euclidean space:
(1) a vector with no curl must be a gradient field, i.e.,
There are two major methods for computing the Riemann curvature Rabc d of a derivative
operator ∇a . The first one uses a coordinate basis field; the second one uses a non-coordinate
basis field. In Sect. 3.4.2 we have already introduced the first method, in which the key step
is to find the manifestation of ∇a in a coordinate basis field, namely the Christoffel symbol
σ μτ . This section will discuss how to compute Rabc d using a non-coordinate basis field.
First, we need to find the manifestation of ∇a in this non-coordinate basis field. For a given
derivative operator ∇a , suppose {(eμ )a } is an arbitrary basis field whose domain is U ⊂ M.
The derivative of the μth basis field (eμ )a along the τ th basis field (eτ )a , i.e., (eτ )b ∇b (eμ )a ,
is also a vector field on U , and thus can be expanded in terms of the basis field {(eσ )a }:
The contraction of (5.7.1) and the dual basis (eν )a gives the explicit expression for γ σ μτ :
ωμ ν a := −γ ν μτ (eτ )a . (5.7.4)
154 5 Differential Forms and Their Integrals
Note that the lower index a of ωμ ν a is an abstract index, indicating it is a 1-form; μ and
ν are the indices numbering the connection 1-forms. It is easy to derive from the equation
above and (5.7.3) that
Proof
Now we discuss how to calculate the curvature tensor Rabc d from ωμ ν . Let
Rμ ν = dωμ ν + ωμ λ ∧ ωλ ν . (5.7.8)
Also,
5.7 Computing the Riemann Curvature Using the Tetrad Method [Optional Reading] 155
Hence,
Remark 1 As we mentioned, (5.7.6) only holds for torsion-free connections. When torsion
exists, one should add an additional torsion term, and the complete first equation of structure
can be written as (also see Appendix I in Volume III):
T ν = deν + eμ ∧ ωμ ν ,
where T ν is the torsion 2-form, which relates to the torsion tensor defined in Exercise 3.1
as T ν ab ≡ T c ab (eν )c . Note that the definition of the exterior differentiation (5.1.11) holds
only without torsion, so the last step in the proof of Theorem 5.7.1 will not hold in this case.
Remark 2 Equation (5.7.8) is equivalent to (3.4.20 ); they are the component expressions for
the definition of the curvature (namely the relation between the connection and the curvature)
in a frame and in a coordinate basis, respectively.
When ωμ ν are already obtained, we can conveniently derive Rμ ν using the second equa-
tion of structure; all we have to do is to take the exterior differentiation and take the wedge
product of ωμ ν . To find all the components Rρσ μ ν of Rabc d in the chosen frame, all we have
to do is to take the contraction using the following formula:
A frame with gμν as constants (i.e., ∇a gμν = 0) is called a rigid frame. An orthonormal
frame is the simplest rigid frame. For a Lorentzian metric, an orthonormal frame satisfies
gμν = ημν , which brings a huge convenience to the calculations (for details, see the example
at the end of this chapter). There is another kind of rigid frame that is frequently used in
general relativity—the complex null frame, which will be discussed in detail in Sects. 8.7
and 8.8. It is easy to see from (5.7.16) and ∇a gμν = 0 that the following relation holds for
rigid frames:
ωμνa = (eμ )b ∇a (eν )b . (5.7.17)
ωμνa = ∇a [(eμ )b (eν )b ] − (eν )b ∇a (eμ )b = ∇a [gbc (eμ )c (eν )b ] − (eν )b ∇a (eμ )b
= ∇a gνμ − (eν )b ∇a (eμ )b = −(eν )b ∇a (eμ )b = −ωνμa ,
where the fourth equality comes from the fact that ∇a gνμ = 0.
Equation (5.7.18) indicates that, for a rigid frame, the ωμνa are antisymmetric with respect
to μ and ν, which reduces the number of the independent connection 1-forms from n 2 (where
n is the dimension of M) to n(n − 1)/2 (there are 6 of them when n = 4). In a chosen
basis, the components ωμνρ ≡ ωμνa (eρ )a play a similar role in the computation as the
Christoffel symbols σ μτ in a coordinate basis, the former of which also have n 3 numbers,
but with only n 2 (n − 1)/2 independent ones (there are 24 of them when n = 4). Hence, the
independent ωμνρ are less than the independent σ μτ . [It follows from the symmetry of
its lower indices that there are n 2 (n + 1)/2 independent σ μτ .] ωμνρ are called the Ricci
rotation coefficients.
The “tetrad method” of computing the curvature tensor includes the following three steps:
(a) choosing a tetrad; (b) computing all the connection 1-forms ωμ ν ; (c) using Cartan’s second
equation of structure (5.7.8) to compute all the curvature 2-forms Rμ ν from ωμ ν . Among
them, step (b) needs to be further elaborated. Since rigid tetrads are the most commonly
5.7 Computing the Riemann Curvature Using the Tetrad Method [Optional Reading] 157
used, here we only introduce the method of computing ωμ ν using a rigid tetrad. Choosing
an arbitrary coordinate system {x μ }, in which we define
and (eν )λ,τ is an abbreviation for ∂(eν )λ /∂ x τ . It can be easily seen that μνρ = −ρνμ ;
hence, there are only n 2 (n − 1)/2 independent μνρ . After obtaining all the μνρ using
(5.7.19), one can compute all the ωμνρ using the following theorem.
Theorem 5.7.4
1
ωμνρ = (μνρ + ρμν − νρμ ) . (5.7.20)
2
Proof It follows from the torsion-free condition of ∇a that the lower indices of the Christoffel
symbol are symmetric, i.e., μ νσ = μ σ ν . Hence,
Equation (5.7.20) is the explicit expression for ωμνρ , which is convenient for calculating
ωμνρ directly. However, the drawback is that this formula involves too many equations. If
the metric has some symmetries, it is usually faster to find the ωμ ν for a rigid tetrad using
Cartan’s first equation of structure (see the method given after the solution of Example 1).
Now we give a specific example of the calculation.
Example 1 Given the expression for the line element of a spacetime metric gab in the
{t, r, θ, ϕ} coordinate system:
Solution (a) Choose an orthonormal tetrad. It follows from (5.7.21) that the coordinate basis
vectors are orthogonal but not normalized; therefore, to make from them an orthonormal
basis, one may choose
(b) Compute μνρ using (5.7.19). In the calculation we need a coordinate system, and
naturally we choose the given system {t, r, θ, ϕ}. Noticing the antisymmetric relation μνρ =
−ρνμ , one can first find all (six) independent μ0ρ (namely, 001 , 002 , 003 , 102 , 103 ,
203 ), and then find all the independent μ1ρ , · · · . Equation (5.7.24) indicates that the only
nonvanishing component of (e0 )λ is (e0 )0 = −e A , which is only a function of r , and hence
the only nonvanishing term of (e0 )0,τ is (e0 )0,1 = −A e A (where stands for the derivative
with respect to r ). Thus,
Also, (eμ )0 and (eρ )1 are nonvanishing unless μ = 0 and ρ = 1; hence, the only nonvan-
ishing μ0ρ is
Plugging into (5.7.20) yields the nonvanishing ωμνρ (note that ωμνρ = −ωνμρ ):
(c) Derive the curvature 2-forms using Cartan’s second equation of structure. To find the
exterior differentiation more conveniently, we rewrite the nonvanishing ωμν in terms of the
dual coordinate basis vectors:
It follows from ωμ ν = g νσ ωμσ = ηνσ ωμσ that ω0 i = ω0i , ωi j = ωi j (i, j = 1, 2, 3). Plug-
ging into (5.7.8), it is not difficult to find
R1 3 = dω1 3 + ω1 λ ∧ ωλ 3 = r −1 B e−2B e1 ∧ e3 .
0 = −α1 e0 ∧ e1 − e2 ∧ ω2 1 − e3 ∧ ω3 1 .
The last two terms in this equation do not contain e0 ∧ e1 , and hence α1 = 0. It seems that one
could guess ω2 1 = ω3 1 = 0; however, ω2 1 = 0 cannot satisfy (5.7.25c) and ω3 1 = 0 cannot
satisfy (5.7.25d). From (5.7.25c) one can guess that ω1 2 = −r −1 e−B e2 , and from (5.7.25d)
one can guess that ω1 3 = −r −1 e−B e3 and ω2 3 = −r −1 cot θ e3 . It can be easily seen that
these guesses also satisfy (5.7.25b) and (5.7.25c). Thus, the solution we just guessed, i.e.,
ω1 0 = −A e−B e0 , ω2 0 = ω3 0 = 0 ,
ω1 2 = −r −1 e−B e2 , ω1 3 = −r −1 e−B e3 , ω2 3 = −r −1 cot θ e3 ,
satisfies Cartan’s equation, and therefore is the correct answer [which is the same as the
result of step (b) in the solution of Example 1].
So far we have introduced two methods for computing the Riemann tensor Rabc d : the
coordinate basis method and the tetrad method (especially the orthonormal tetrad method).
Each of these two methods has advantages and disadvantages, one can choose which one
to use based on the specific problem and their own proficiency. Someone might wish that
there is a method that combines the coordinate basis method and the orthonormal tetrad
method, namely wish that there is an orthonormal coordinate basis. However, this is impos-
sible unless gab is a flat metric. The reason is simple: the coordinate basis being orthonormal
indicates that gab = ημν (∂/∂ x μ )a (∂/∂ x ν )b . Suppose ∂a is the ordinary derivative operator
of the coordinate system, then ∂a gbc = 0, and hence ∂a is the derivative operator associated
with gab . Since ∂[a ∂b] ωc = 0 ∀ωc , we see that the Rabc d for gab vanishes, i.e., gab is flat.
160 5 Differential Forms and Their Integrals
Exercises
˜5.1. Complete the proof of Theorem 5.1.3 by showing that the 2-forms (e1 )a ∧
(e2 )b , (e2 )a ∧ (e3 )b and (e3 )a ∧ (e1 )b are linearly independent.
˜5.2. Suppose V is a vector space and {(e1 )a , (e2 )a , (e3 )a , (e4 )a } is a basis of
V ∗ . Find the expansion of ωa ∈ (1), ωab ∈ (2), ωabc ∈ (3) and ωabcd ∈
(4) in this basis and explain the definition of the coefficients (e.g., ω12 ).
˜5.3. Using mathematical induction, show that (ω1 )a1 ∧ · · · ∧ (ωl )al = l!(ω1 )[a1 · · ·
(ωl )al ] , where (ω1 )a , . . . , (ωl )a are arbitrary dual vectors.
˜5.4. Prove Theorem 5.1.4.
˜5.5. Suppose ω is a 1-form field and u and v are vector fields. Show that dω(u, v) =
u(ω(v)) − v(ω(u)) − ω([u, v]). The left-hand side represents the result of dω
acting on u and v, i.e., (dω)ab u a vb .
˜5.6. Suppose vb and ωa1 ···al are a vector field and an l-form field, respectively, on
a manifold M. Show that
(a) Lv ωa1 ···al = da1 (vb ωba2 ···al ) + (dω)ba1 ···al vb .
NB: Let μa2 ···al ≡ vb ωba2 ···al , then da1 μa2 ···al means (dμ)a1 a2 ···al .
(b) Lv dω = dLv ω (this is actually a very useful identity).
Hints: (1) One can first prove the special case of (a) where l = 2, and
then it is not difficult to generalize it after getting the feeling.
(2) The result of (a) can make the proof of (b) quite simple.
5.7. Suppose O is the coordinate patch of the coordinate system {x μ } on an n-
dimensional manifold M (and O is homeomorphic to Rn ) and that ωa is a
1-form field on O. Show that
∂ωμ ∂ων
= μ (μ, ν = 1, . . . , n)
∂xν ∂x
1 1
(Fac Fb c + ∗ Fac ∗ Fb c ) = Fac Fb c − gab Fcd F cd ,
2 4
where ∗ Fac ≡ (∗ F)ac , ∗ Fb c = g ac∗ Fba (this identity is helpful for studying
electromagnetic fields).
*5.10. Show that ε̂a1 ···an−1 ≡ ±n b ε̂ba1 ···an−1 is the volume element on ∂ N associated
with the induced metric field h ab .
References 161
× ( A × B)
∇ = ( B · ∇)
A + (∇ A − ( A · ∇)
· B) B − (∇ B .
· A)
5.15. Using differential forms, prove the following well-known propositions that
are not so easy to prove by the vector analysis of the 3-dimensional Euclidean
space (see the end of Sect. 5.6):
(1) a vector with no curl must be a gradient field;
(2) a vector with no divergence must be a curl field.
5.16. Suppose ∇a is the associated derivative operator on a generalized Rieman-
nian space (M, gab ) (i.e., ∇a gbc = 0), ε is the associated volume element
(i.e., ∇a εb1 ···bn = 0), va is a vector field on M, va ≡ gab vb is the 1-form corre-
sponding to va , and ∗ v is the dual form field of va . Show that (∇a va )ε = d∗ v.
NB: This conclusion can be generalized as follows: suppose Fa1 ···ak is a k-
form field (k n), denoted by F for short, and denote the (k − 1)-form
field ∇ ak Fa1 ···ak as divF, then ∗ (divF) = d∗ F. The Maxwell equations of an
electromagnetic field (see Sect. 12.6.1) provide an example.
5.17. Show that the σ μτ defined by (5.7.2) are exactly the components of the
Christoffel symbol defined in Sect. 3.1 with respect to the given coordinate
basis in (5.7.2).
*5.18. Using the orthonormal tetrad method, find all the tetrad components of the
curvature tensors of the metrics in Exercises 14–16 of Chap. 3, and verify
that the results are the same as those of the curvature tensors derived from
the coordinate basis method. To distinguish from the coordinate components
Rμνσ ρ of Rabc d , one may change the notation of the tetrad components to
R(μ)(ν)(σ ) (ρ) after obtaining all the tetrad components of Rabc d .
References
6.1.1 Preliminaries
Physics studies the evolution of physical objects. For the convenience of study, people
usually use physical models to describe physical objects. Models are the idealized
version of objects, such as point masses, point charges, charged surfaces, etc.
Now let us introduce a few fundamental concepts that will later be frequently
encountered using the language of models.
An “event” is supposed to be a very intuitive concept. A bomb explosion, a car
crash, a cough are all events, each of which occupies a certain part of space and
lasts for a certain period of time. The concept of an event in physics, however, is
the modeling of a real event, i.e., we regard every event as happening at a point in
space and an instant in time. No matter what is happening, the combination of a point
in space and an instant in time is called an event. The collection of all the events
is called a spacetime, and thus each event is a spacetime point. According to our
direct measurements of the events which happen on its own world line. In order to
observe any event in the whole spacetime (or in an open subset of it), one needs
to set observers everywhere (like a “patrol”), and these ubiquitous observers form a
reference frame. More precisely, the set R of an infinite number of observers is called
a frame of reference, or a reference frame, if it satisfies the following condition:
any point in spacetime (or in an open subset of spacetime) is passed through by
one and only one observer in R. This abstract definition is actually the specification
and generalization of the often used concept of a reference frame. Take the familiar
example of a moving train. Imagine the train being filled with passengers (observers),
each of which carries a standard clock and is labeled by three real numbers (the spatial
coordinates). Any event which happens inside the train must happen to an observer,
who can record the spacetime coordinates t, x, y, z of this event (where t is the
reading of the standard clock and x, y, z are the spatial coordinates of the observer).
Although a train has only a limited size (length, width and height), when we talk
about the “train frame”, i.e., the reference frame of the train, as a modeled concept,
we have already assumed that the whole space is filled with observers. To be specific,
each spatial point is occupied by an observer in the train frame; these observers move
along with the train, which means they are motionless with respect to the observers
inside the train. On the other hand, the observers in the “ground frame” also fill up
the whole space, but they have a relative velocity with respect to the observers in
the train frame. If we use vertical lines to represent the world lines of the ground
frame observers in a spacetime diagram, the world lines of the train frame observers
will be parallel oblique lines (the reader should draw a picture). The specification
and generalization of this understanding (allowing two world lines in a frame to be
non-parallel, i.e., allowing the distance between two observers to change with time)
lead us to the preceding definition of a reference frame.
The so-called “geometric formulation” of special relativity actually refers to the con-
struction of a 4-dimensional (rather than 3-dimensional) model using the language
of differential geometry. The conclusions we derive will certainly agree with the
3-dimensional formulation of special relativity. To construct this geometric formu-
lation, the first problem is: what manifold, together with what additional structure,
should we use as the background spacetime? Physically speaking, any event in spe-
cial relativity can be described by the coordinates of an inertial frame. The ranges
for the coordinates t, x, y, z of any inertial frame are all from −∞ to ∞. Suppose
p and q are two neighboring points (see Fig. 6.2), which represent two neighboring
events in physics. According to special relativity, the important physical quantity
that describes the relationship between p and q is the infinitesimal interval, which
can be defined by means of an inertial coordinate system {t, x, y, z} as
ds 2 = −dt 2 + dx 2 + dy 2 + dz 2 . (6.1.1)
166 6 Special Relativity
[This book adopts the geometrized unit system, in which c = 1 (for details, see
Appendix A)]. An important property of an infinitesimal interval is that it preserves
its form when transformed from one inertial frame to another inertial frame, i.e.,
−dt 2 + dx 2 + dy 2 + dz 2 = −dt 2 + dx 2 + dy 2 + dz 2 .
This equation has the same form as (6.1.1), and it preserves this form when trans-
formed from one Lorentzian system to another Lorentzian system. Hence, one can
see that an infinitesimal interval in physics corresponds to a Minkowski line element
in mathematics, an inertial coordinate system in physics corresponds to a Lorentzian
coordinate system in mathematics, and the background spacetime of special relativ-
ity corresponds to the Minkowski space (R4 , ηab ). (Thus, a Minkowski space is also
called a Minkowski spacetime. We may regard Minkowski space as an expression
leaning towards the mathematics side, and Minkowski spacetime as leaning towards
the physics side). Even further, by changing “corresponds to” to “is identical to”, one
can say that the background spacetime of special relativity is Minkowski spacetime.
That is, special relativity is the study regarding the evolution of physical objects in
Minkowski spacetime. Any physical phenomenon happening in Minkowski space-
time belongs to the scope of special relativity.
Using an inertial coordinate system, one can define the speed of any particle.
Suppose L is the world line of a particle, p and q are two neighboring points on L
(see Fig. 6.2), and (t1 , x1 , y1 , z 1 ) and (t2 , x2 , y2 , z 2 ) are the coordinates of p and q
in an inertial frame R. Let
dt ≡ t2 − t1 , dx ≡ x2 − x1 , dy ≡ y2 − y1 , dz ≡ z 2 − z 1 ,
The fundamental postulates of special relativity are: the principle of invariant light
speed; and the special principle of relativity. The latter further contains the following
two aspects.
① Among all observers (i.e., point masses), there exists a special kind of observer,
called inertial observers, which are essentially distinguished from all the other
observers (non-inertial observers); that is, one can choose a special subset from the
collection of all the observers, in which each element is an inertial observer.
168 6 Special Relativity
② All inertial observers are on an equal footing, i.e., no inertial observer is pre-
ferred over any other; that is, one cannot choose a special element (or several) from
the subset formed by inertial observers. For example, one cannot ask which inertial
observer is at absolute rest.
Now we discuss the mathematical counterpart for an inertial observer. According
to the 3-dimensional formulation of special relativity, the speed of an inertial observer
relative to its own inertial coordinate system {t, x, y, z} is u = 0, and thus its world
line coincides with a t-coordinate line in this system. Suppose ∂a is the ordinary
derivative operator of this system, then ∂a (∂/∂t)b = 0. Hence,
a b
∂ ∂
∂a = 0. (6.1.4)
∂t ∂t
4 independent Killing vector fields: (∂/∂t)a , (∂/∂ x)a , (∂/∂ y)a , (∂/∂z)a ; (b) spa-
tial rotations, represented by 3 independent Killing vector fields: −y(∂/∂ x)a +
x(∂/∂ y)a , −z(∂/∂ y)a + y(∂/∂z)a , −x(∂/∂z)a + z(∂/∂ x)a ; and (c) boosts, repre-
sented by 3 independent Killing vector fields: t (∂/∂ x)a + x(∂/∂t)a , t (∂/∂ y)a +
y(∂/∂t)a , t (∂/∂z)a + z(∂/∂t)a . Now we interpret the physical meaning of these
three types of transformations by providing an example for each of them.
(a) Without loss of generality, consider time translation. In this case, the coordinate
transformation induced by the one-parameter group of isometries corresponding to
the Killing field (∂/∂t)a is
t = t + a , x = x , y = y , z = z ,
where a serves as the parameter for this one-parameter group. Physically, this trans-
formation corresponds to adding a value a to the initial setting of the standard clocks
of all the observers in the inertial frame R. Daylight saving time is an example of it,
where a = 1 (hour).
(b) Consider a rotation in the x y-surface. The coordinate transformation induced
by the one-parameter group of isometries corresponding to the Killing field
−y(∂/∂ x)a + x(∂/∂ y)a is
The proper time of an observer (a point mass) is the reading of his standard clock.
However, what exactly is a standard clock? We will need to add the following defi-
nition:
Remark 1 If we do not take c = 1, then the right-hand side of the above equation
should be multiplied by 1/c.
Remark 2 One should distinguish two concepts related to clocks—rate and (initial)
setting. A standard clock only has a requirement on its rate (i.e., the difference of
the readings at any two points on the world line equals the arc length), while the
synchronization problem in a reference frame only involves the initial (zero) setting.
Many regions in the world use daylight saving time, which stipulate the clock to be
“one hour faster” at a certain date of each year. The word “faster” may be misunder-
stood as raising the rate, but it actually just means changing the setting.
Remark 3 According to Definition 1, the proper time of an observer is equal to the arc
length of its world line. The zero of τ on the world line only depends on the setting,
which is arbitrary when there is only one observer (or a few observers). However,
if we consider a reference frame, then the zero of the proper time of each observer
needs to satisfy a certain kind of requirement. For instance, suppose R is an iner-
tial reference frame, and G is one of its observers. Let any p0 ∈ G be the zero of
the proper time of G and 0 represent the hypersurface passing through p0 that is
orthogonal to the world lines of all the observers, then any observer G in R must
choose the intersection of 0 and their world line as the zero of their proper time.
This kind of requirement is called clock synchronization in an inertial frame. At
first sight, it seems that this could be realized as follows: Alice (observer G) sets her
clock to zero, denoted by event p0 , and simultaneously tells Bob (observer G ) “set
your clock to zero right now.” However, since it takes time for a signal to propagate,
if we use q to represent the event of Bob receiving the notice, then q cannot be on the
hypersurface 0 . If Bob follows the order, i.e., sets his clock to zero at event q, then
it certainly cannot satisfy the requirement of clock synchronization. Thus, we can
see that clock synchronization is a nontrivial process in relativity. Here we introduce
a method of synchronization. First, Alice should tell Bob beforehand, “take a mirror
with you and zero your clock when you see the light signal I send.” At a point (event)
p1 , Alice would send a light signal to Bob; the light will be reflected when it arrives
6.1 Foundations of the 4-Dimensional Formulation 171
at Bob’s mirror (event p ) and Alice will see this reflected light when she is at point
p2 (see Fig. 6.3). To synchronize her clock with that of Bob, Alice just needs to zero
her clock at p0 , namely the midpoint of p1 p2 (measured by the arc length). Note that
in this method we have used the fact that the speed of light does not depend on the
direction (the path of a photon is a null geodesic).
Remark 4 A standard clock is also a model. What kind of real clock can be regarded
as a standard clock? Experiments show that, in most cases, atomic clocks can be
treated as standard clocks to a high degree of accuracy, and even the clocks in our
daily life provide a good approximation. However, any real clock will deviate sub-
stantially from a standard clock in certain special cases [see Misner et al. (1973)
pp. 393–395; Rindler (1982) p. 31]. For example, a pendulum clock, the mechanism
of which highly depends on the gravitational acceleration of Earth, will become
completely useless in a spaceship far away from the Earth. Nonetheless, this only
affects the choice of a clock in an experiment and is thus irrelevant to our theoretical
discussions. In theory, all we need is the concept of a standard clock.
Remark 5 Later, when we talk about a world line, we always assume that we are using
the proper time τ as the parameter. Since the proper time is equal to the arc length of
the world line, the length of its tangent vector (∂/∂τ )a is 1 (see the paragraph before
Definition 7 in Sect. 2.5). Thus, one should interpret an observer as a timelike curve
with a unit tangent vector field.
Remark 6 Photons do not have the notion of proper time (the length of a null curve
vanishes), and therefore cannot serve as observers.
Suppose x 0 is the timelike coordinate of a coordinate system [i.e., ηab (∂/∂ x 0 )a
(∂/∂ x 0 )b < 0], and x 1 , x 2 , x 3 are spacelike coordinates [i.e., ηab (∂/∂ x i )a (∂/∂ x i )b >
0, i = 1, 2, 3], then the value of x 0 for any point p in the coordinate patch is called
the coordinate time of an event p in this system. The coordinate time for an inertial
reference frame is called an inertial coordinate time, whose domain is the whole
R4 . One should pay close attention to the following two differences between the
coordinate time and the proper time:
① Proper time only makes sense in relation to the points on the world line, and
so without a world line one cannot talk about proper time. If two world lines L 1 and
L 2 intersect at p, then p’s proper time on L 1 can be different from its proper time on
172 6 Special Relativity
dt
= γu , (6.1.7)
dτ
where γu ≡ (1 − u 2 )−1/2 , with u being the speed of the point mass relative to R.
√
Proof Again, we use Fig. 6.2. It follows from dτ = −ds 2 and (6.1.3) that dτ 2 =
(1 − u 2 )dt 2 , and hence we have (6.1.7).
If L(τ ) is a t-coordinate line in the inertial frame R, then from (6.1.2) we see that
u = 0, and hence it follows from (6.1.7) that dt = dτ . Thus, the coordinate time for
an inertial observer in their inertial frame is equal to their proper time.
Diagrams are used frequently in the study of motion. The diagrams people usually use
to show spatial trajectories are spatial diagrams. For example, the spatial trajectory
for a projectile is a parabola. This kind of diagram does not have time involved, and
cannot reflect at which point the object is located at a certain moment of time. A
spacetime diagram, however, can overcome this drawback. It uses points to represent
events, and curves to represents the motion (evolution) of a particle in spacetime,
etc. If we only consider 1-dimensional motion, then we only need to draw a 2-
dimensional spacetime diagram. When drawing the diagram, one should choose an
arbitrary inertial frame R, and then draw a vertical axis pointing upward as its t-axis
(this axis represents the flow of time), and a horizontal axis as its x-axis (see Fig. 6.4).
All kinds of particles moving along the x-axis can be represented by the curves in
the figure. For example, the t-axis represents the world line of the observer G 0 at
x = 0 in the frame R, another vertical line in the picture represents the observer G 1
at x = x1 in the frame R (vertical indicates that the observer is at rest relative to
frame R), while the dashed line in the figure represents the world line of a photon.
For any given moment tˆ, we have a point (tˆ, x̂) on the line, whose spatial coordinate
x̂ reflects the position of the photon at tˆ. What does the tilted line G 0 in Fig. 6.4 stand
for? Since it is tilted, its x-coordinate will change linearly with the t-coordinate. It
6.1 Foundations of the 4-Dimensional Formulation 173
follows from (6.1.2) that its speed relative to R is a constant less than 1, and thus it
is a point mass experiencing inertial motion.
Actually, based on the fact that G 0 is a timelike straight line (geodesic), we can
directly tell that G 0 is an inertial observer. Also, from the fact that G 0 passes through
the origin we can see that it is the observer at x = 0 in the frame R obtained by
the transformation (6.1.5), i.e., G 0 is the t -coordinate of R . This conclusion can
also be verified another way: plugging x = 0 into the Lorentz transformation (6.1.5)
yields t = x/v, and thus the t -axis is a straight line passing through the origin, with
a slope 1/v. How do we draw the x -axis of the frame R ? The x -axis satisfies
t = 0, and plugging this into (6.1.5) yields t = vx, and thus the x -axis is a straight
line passing through the origin that has a slope v; the dashed line bisects the angle
between the x -axis and the t -axis (see Fig. 6.5). One may ask: does this indicate
that the x -axis and t -axis are not orthogonal, and therefore R is preferred over
R ? This actually is the “deception of the spacetime diagram” that comes from our
Euclidean way of thinking. In fact, noticing that {t , x , y , z } is also a Lorentzian
system, naturally we have ηab (∂/∂t )a (∂/∂ x )b = 0, i.e., (∂/∂t )a and (∂/∂ x )b are
orthogonal measured by the Minkowski metric. So R and R are still on an equal
footing. Indeed, when drawing a picture, we usually choose a reference frame first,
and set their t-axis and x-axis to be, respectively, vertical and horizontal; however,
the choice of this reference frame is totally arbitrary. For instance, if we choose R
first, then the spacetime diagram will look like Fig. 6.6, which seems to be different
from Fig. 6.5, but essentially they are the same.
The “deception” of a spacetime diagram is not only manifested in the orthogonal-
ity, but also in the judgement of length. Suppose p = (t, x) is an arbitrary spacetime
point, op is the straight line segment betweeno and p (see Fig. 6.7), whose length
measured by the Minkowski metric is lop = | − t 2 + x 2 |. Thus, the straight line
segment between o and each point on the hyperbola −t 2 + x 2 = K (constant) has the
same length, e.g., lop = loq , even though intuitively (i.e., according to the Euclidean
174 6 Special Relativity
metric) their lengths are not the same in the diagram. The hyperbola in Fig. 6.7 is
called a calibration curve.
If the physical phenomenon also involves the second and the third spatial dimen-
sions, then the 2-dimensional spacetime diagram will not be enough. However, even
if we draw in perspective we can only represent three dimensions on a piece of paper,
and since we need one dimension to represent time, there are only two dimensions
left; therefore, one spatial dimension cannot be represented on paper (and has to be
“suppressed” in the diagram).
Luckily, in many cases, there will be one dimension (or even two) that are not
important, or there exists some kind of symmetry that allows us to suppress one
dimension without losing anything useful. Take an artificial satellite rotating around
the Earth as an example (see Fig. 6.8). The surface of the Earth is a 2-dimensional
6.1 Foundations of the 4-Dimensional Formulation 175
In pre-relativity physics, space and time are the most primary concepts that everyone
knows. From the historical perspective, the concept of space and time came first, and
then, after the birth of relativity, the concept of spacetime was gradually developed.
Many people would think spacetime is not difficult to understand since “it is nothing
but space and time”. However, in relativity spacetime itself is the most primary con-
cept, while space and time are relative notions derived from spacetime. By “derived”
we mean the notions of space and time only come from applying a “3 + 1” decom-
position to spacetime using a reference frame, and by “relative” we mean there exist
many different ways of 3 + 1 decomposition for a spacetime (Fig. 6.5 represents two
176 6 Special Relativity
different ways of decomposition for Minkowski spacetime using the reference frames
R and R ). From the viewpoint of 4-dimensional geometry, the difference between
the concepts of space and time in relativity and pre-relativity physics come from the
difference between their spacetime structures. Pre-relativity physics assumes that
the spacetime manifold is R4 , equipped with some intrinsic additional structures.
The first one is a smooth function t : R4 → R, called the absolute time, such that
R4 can be foliated into infinitely many slices. Each slice is a constant-t surface t
(a hypersurface in R4 , see Fig. 6.10), called a surface of absolute simultaneity,
with a 3-dimensional Euclidean metric, which represents the “whole 3-dimensional
space” at t (see Optional Reading 6.1.6 for details). All the points on the same t
represent the events happening simultaneously at different places, and points on dif-
ferent t represent the events happening at different times. The so-called “absolute
simultaneity” means that simultaneity holds in whatever reference frame, which is
obviously different from relativity. In special relativity, two simultaneous events in
one reference frame can be non-simultaneous in another reference frame. There are
only surfaces of relative simultaneity in special relativity. If we compare surfaces of
simultaneity to playing cards, then pre-relativity physics only contains one deck of
cards (which is independent of the reference frame, and thus is absolute) while spe-
cial relativity has infinitely many decks of cards (which depend on reference frames,
and thus are relative; each individual card represents the whole space at a given time
in a given reference frame). This is a significant difference between the spacetime
structures of these two kinds of theory.
Now we discuss the difference between these two kinds of spacetime structure
from the perspective of causality. Given an event p ∈ R4 , one can always write
R4 − { p} as the union of three nonintersecting subsets M1 , M2 and M3 , i.e., R4 −
{ p} = M1 ∪ M2 ∪ M3 , where
Pre-relativity physics assumes that the subset M3 is the surface t of absolute simul-
taneity (with p being removed) passing through p, while M1 and M2 are respec-
tively the “upper half of R4 ” and the “lower half of R4 ” on different sides of t
(see Fig. 6.11). Their physical meanings are: if q ∈ M2 , then we say that the event
q happens in the future of p; if q ∈ M1 , then we say that q happens in the past of p.
However, in special relativity, since the world lines of observers are timelike curves,
M2 and M1 are respectively the subset enclosed by the future light cone surface (a
null hypersurface) of p and the past light cone surface of p (excluding the points
6.1 Foundations of the 4-Dimensional Formulation 177
Fig. 6.11 The spacetime structure of pre-relativity physics. The surface of absolute simultaneity
passing through p is a 3-dimensional surface, above and below which are the future and the past of
p
Fig. 6.12 The spacetime structure of special relativity. There is no surface of absolute simultaneity.
The future and past of p are much smaller than the corresponding subset in Fig. 6.11, while the
subset M3 that has no causal relation with p is much larger than the M3 in Fig. 6.11
on the surface), while M3 will be a lot “bigger” than the 3-dimensional submanifold
t in Fig. 6.11, which contains all the points that are not contained by M1 and M2
(including the points on the light cone surfaces), see Fig. 6.12.
Suppose q ∈ M2 , then the tangent vector T a of the geodesic from p to q at p
must be timelike. Similarly, if q ∈ M1 , then the tangent vector T a of the geodesic
from p to q at p must also be timelike. However, physically T a and T a are quite
different after all: T a is future-directed while T a is past-directed (see Fig. 6.13).
In relativity, T a and T a are called a future-directed timelike vector and a past-
directed timelike vector, respectively. A timelike vector at p is either future-directed
or past-directed. The (nonvanishing) future-directed and past-directed null vectors
can be defined similarly.
[Optional Reading 6.1.1]
It is instructive to compare the spacetime structures of special relativity and general
relativity with that of pre-relativity physics. According to general relativity, gravity in essence
is the wrapping of 4-dimensional spacetime (see Sect. 7.1). Special relativity is about the
physics when gravity is not present (or can be ignored), and thus the background spacetime
is (R4 , ηab ). General relativity is about the physics when there is gravity, the background
spacetime of which is an arbitrary (connected) 4-dimensional manifold M together with a
178 6 Special Relativity
curved metric field gab , i.e., (M, gab ). The background spacetime of pre-relativity physics
can be revisited by formulating Newton’s theory of gravity using the 4-dimensional geometric
language. Based on Newton’s theory of gravity, the gravitational field of the space can be
described by the gravitational potential φ, whose relation with the mass density μ satisfies
the Poisson equation
∇ 2 φ = 4π μ . (6.1.8)
[We have set the gravitational constant G = 1, as we adopt the geometrized unit system]. A
point mass that is not subjected to any forces other than gravity is called a free point mass.
A free point mass with unit mass obeys the following equation of motion:
d2 x i ∂φ
=− i , i = 1, 2, 3 , (6.1.9)
dt 2 ∂x
where t is the Newtonian absolute time, and x i are the spatial Galilean coordinates (i.e., the
Cartesian coordinates in mathematics). After the initial conditions are given, the solutions
for x i (t) in (6.1.9) can be viewed as the parametric representation of a curve in space with t
as the parameter, which represents the spatial trajectory of the mass point. For instance, the
trajectory of a point mass projectile near the ground is a parabola. Cartan et al. reformulated
the facts above using the geometric language, the key points are as follows [also see Misner
et al. (1973), Chap. 12]:
The background spacetime of Newton’s theory of gravity is called Newtonian spacetime,
which is formed by a manifold R4 and the following additional structures: (a) there exists
a smooth function t : R4 → R4 , called the absolute time, satisfying certain conditions; (b)
there exists a derivative operator ∇a on R4 , whose Christoffel symbols in a given coordinate
system {x μ } satisfy
∂f μ
i
00 = , i = 1, 2, 3 ( f is a function on R4 ), the other νσ = 0. (6.1.10)
∂xi
From these two points, one finds the following:
(1) The existence of the absolute time provides an absolute “stratification” to the spacetime
manifold R4 : ∀ p ∈ R4 , there exists a constant-t surface t (a hypersurface in R4 ) such
that p ∈ t (see Fig. 6.10), which represents the “whole 3-dimensional space” at t, called a
surface of absolute simultaneity. Events p and q are said to be simultaneous if t ( p) = t (q).
Suppose γ (λ) is an arbitrary geodesic in Newtonian spacetime (λ is an affine parameter),
then its parametric representation x μ (λ) under a coordinate system satisfying (6.1.10) obeys
the following equations:
d2 x μ μ dx ν dx σ
+ νσ = 0, μ = 0, 1, 2, 3 . (6.1.11)
dλ2 dλ dλ
Let μ = 0, it follows from 0
νσ = 0 (ν, σ = 0, 1, 2, 3) that 0 = d2 x 0 /dλ2 = d2 t/dλ2 , and
hence
t = αλ + β , α, β are constants. (6.1.12)
This equation indicates that the absolute time t can serve as the affine parameter of any
geodesic whose α = 0. Now let μ = i in (6.1.11), it follows from (6.1.12) and i 00 =
∂ f /∂ x i , i jk = 0 (i, j, k = 1, 2, 3) that
d2 x i ∂f
+ i = 0, i = 1, 2, 3 . (6.1.9 )
dt 2 ∂x
Comparing this with (6.1.9) we know that, as long as we interpret f as the gravitational
potential φ and interpret x i as the Galilean coordinates, then a geodesic with the absolute
6.2 Interesting Typical Effects 179
time t as its affine parameter in Newtonian spacetime corresponds to the world line of a free
point mass.
(3) Plugging (6.1.10) into (3.4.20 ) and (3.4.21), it is not difficult to find the components of
the Riemann tensor and Ricci tensor of ∇a as follows ( f has been changed to φ):
∂2φ
R0i0 j = −Ri00 j = , all the other Rμνρ σ = 0 , (6.1.13)
∂xi ∂x j
3
∂ ∂φ
R00 = = ∇ 2 φ = 4π μ , all the other Rμν = 0 . (6.1.14)
∂xi ∂xi
i=1
Equation (6.1.13) indicates that Newtonian spacetime is not flat. (In comparison, in Einstein’s
theory a spacetime with gravity is not flat either). However, the derivative operator ∇ˆ a
induced by ∇ on each surface t of simultaneity is flat. [(6.1.10) indicates that i jk = 0,
i, j, k = 1, 2, 3, and the corresponding 3-dimensional Riemann tensor vanishes]. This can
also be verified from another point of view: when the α in (6.1.12) vanishes, the geodesic
γ (λ) lies on the surface β of simultaneity at t = β; from (6.1.11), i jk = 0 and with t = β
we get d2 x i /dλ2 = 0, and thus
then the ordinary derivative operator ∂a of the system {x i } will naturally satisfy ∂a δbc = 0,
and the Christoffel symbols of ∂a in {x i } will certainly vanish, i.e.,
i
jk = 0, i, j, k = 1, 2, 3 .
not surprising that different 1-dimensional rulers have different lengths. In this way,
“length contraction” is just like the classic parable of the blind men and an elephant.1
Since the frame R treats the ruler as at rest while R treats the ruler as moving,
the lengths of the line segments oa and ob are the rest length and “moving” length,
respectively. Now the only problem left is to compare loa and lob . Intuitively we
have lob > loa , so it seems that the moving ruler is longer! However, this is the
“deception” of the spacetime diagram again. From the calibration curve passing
through a we see that lob < loa , and thus the moving ruler is shorter. To find the
quantitative relation between them, we just need to compute the length of both
line segments. The arc length of a spacetime curve is an absolute quantity and is
independent of the coordinate system. For the convenience of comparison, we choose
the same coordinate system (the inertial frame that corresponds to R) to compute
both of them. Noticing that the coordinates of o in this system are (0, 0, 0, 0),
from the
the Minkowski line element in this system we obtain loa = xa2 − 0 =
expression of
xa , and lob = xb2 − tb2 . Also, it follows from (6.1.5) that the equation for the x -axis
is t = vx, and hence tb = vxb . From Fig. 6.14 we can see that xb = xa , and plugging
these into the equations above yields lob = γ −1 xb = γ −1 xa = γ −1loa . This is the
well-known quantitative relation for length contraction.
Consider two standard clocks C1 and C2 in an inertial frame R and a standard clock
C in another inertial frame R . the world lines of these three clocks are shown in
Fig. 6.15. From the viewpoint of R, the clocks C1 and C2 are at rest, while C is
moving. At the beginning, C and C coincide at event o, at which both clocks are
zeroed. After a while, C will coincide with C2 at event b. From the fact that proper
1This is the story of a group of blind men who have never come across an elephant trying to
conceptualize the elephant by touching it; however, their understandings of an elephant turn out to
be completely different, since each of them touches a different part of the elephant’s body.
182 6 Special Relativity
time is equal to the arc length of the world line, we can see that the reading of the
clock C at b is equal to lob . Both C2 and C1 belong to the same frame R, and the
x-axis is a line of simultaneity of R, so since the reading of C1 at o is zero, the reading
of C2 at c should also be zero. Hence, the reading of C2 at b equals lcb = loa . Plotting
the calibration curve through p we see that lob < loa = lcb , and thus the frame R
regards C (the moving clock) as running slower. However, from the viewpoint of
R , the event o happens simultaneously with d (see Fig. 6.16) rather than c. Since
the reading of C2 is zero at c, it must have a reading δ > 0 at d. A short time later, C2
coincides with C at b. Although the reading lob of C is smaller than the reading lcb of
C2 (which is admitted by both frames), the observer in R does not admit that C runs
slower, because when C is zeroed, judged by the line of simultaneity of R , C2 has
already had a reading δ (C2 “jumped the gun”). Hence, one should subtract δ from
the reading lcb of C2 at b, and then compare with lob , i.e., the observer in R thinks
that one should compare ldb and lob . From the calibration curve passing though b we
know that lob > loe = ldb , and hence R regards C2 as running slower, and it is still
the moving clock that runs slower. Figure 6.17 is a 3-dimensional demonstration of
the discussion above, where (a) and (b) are the 3-dimensional perspectives of R and
R , respectively. Thus, we can see again that the 3-dimensional perspective depends
on the reference frame; only spacetime diagrams and the 4-dimensional formulation
can be independent of the reference frame.
Following the derivation of the quantitative relation between the lengths of a rest
and a moving ruler, and noticing xb = vtb , it is not difficult to find from Fig. 6.15
that the quantitative relation between the time intervals of a rest and a moving clock
is lob = γ −1loa .
The discussion above clearly indicates that, just like nothing is really contracting
in the phenomenon of “length contraction”, none of the clocks really runs slower
in the phenomenon of “time dilation”. (They all have the rate of a standard clock,
i.e., the difference of the readings equals the arc length). It should be emphasized
that there are a variety of methods for “clock comparison” (comparing the readings
of standard clocks in different inertial frames), which can lead to different results.
Therefore, when talking about clock comparison, one must stipulate all the details
of the method beforehand. The method we just used in the phenomenon of “time
dilation” is standard, but it is only one of the various methods for clock comparison.
A feature of this method is that it involves three clocks C1 , C2 and C , two of which
are in the same inertial frame and have been synchronized beforehand. Without C2 ,
one can still get lob < loa from Fig. 6.15; however, one cannot conclude that “the
observer who carries C1 measures that C2 runs slower”, because there is no way
for that observer to measure it directly since the event b is not on the world line of
C1 . The only way for C1 to observe b is to receive a light signal (or other signals)
coming from b, which leads to the problem that the propagation of light takes time.
(One can certainly apply this method, but to do so this problem needs to be taken
into account). Actually, when we draw the conclusion that “C is slower” by means
of C2 , we have already cleverly let the light signal play the role of a “messenger”,
since a light signal has been used when synchronizing C1 and C2 (see Sect. 6.1.4).
In summary, without C2 , one cannot compare C1 and C using the method above; in
other words, this method of clock comparison will not have any physical meaning
without the third clock.
When there are only two clocks C and C , there are still methods of clock compar-
ison that are physically meaningful. For example, as shown in Fig. 6.18, the observer
G that carries the clock C can compare the clocks by looking at two clocks using the
left and right eyes respectively at a time a. “Looking at C using the right eye” means
that the light signal sent from C at e is received by the right eye at a (the photon
goes from e to a through a future-directed null geodesic). If both clocks are zeroed
at o, then the reading of C at a equals loa , and the reading of C at e equals loe . Since
loe < loa , the observer G will also conclude that “the moving clock runs slower”,
but the difference is that this method will make the moving clock even slower when
compared with the method in Fig. 6.15. To compute how much the clock slows down,
one can make a parallel line passing though e and intersect with the world line of
C at f (see Fig. 6.19). Let τ ≡ loa , τ ≡ loe , p = lo f , q ≡ l f a , then le f = q. From
p = γ τ (the quantitative relation for the regular time dilation) and u = q/ p, the
geometric expression for the relative speed of the two clocks (C regards C as moved
for a distance q during a time p), we can easily find that2
2 This is the relativistic Doppler relation. It is valid for both positive and negative u, giving respec-
τ = (1 − u)/(1 + u)τ . (6.2.1)
One can even come up with a method of clock comparison (see Fig. 6.20) that
leads to the result that “the moving clock runs faster”! Suppose C and C are both
zeroed at o, then an observer G who uses two eyes to observe C and C respectively
at a will get negative readings from both clocks. From the figure it is easy to see
that loa < loe , and hence the reading of C is even more negative than that of C.
Thus, G will think “the moving clock runs faster”. Though it sounds ridiculous, this
conclusion is beyond reproach: it is nothing but a result coming from the specific
method of clock comparison in Fig. 6.20. Therefore, to compare clocks we must
indicate all the details of the method beforehand, for which a spacetime diagram
would be very helpful.
6.2 Interesting Typical Effects 185
Solution Denote the inertial frame at rest relative to the liquid as R . Draw a 3-
dimensional spacetime diagram based on the frame R (see Fig. 6.21), where events
p1 and p2 represent a photon coming out of S and arriving at G, respectively, and
hence the line segment p1 p2 represents the propagation of that photon. Viewed from
R and R , the times this propagation takes are, respectively, the Minkowski length
of the line segments p3 p2 and p4 p2 (the world line S can be used as the t-axis for
the frame R). The length of p1 p3 is obviously l. Using σ and α to represent the
length of p1 p4 and p3 p4 , we can easily see that
Figure 6.22a is the spacetime diagram for the twin “paradox”, where the two curves
are the world lines of the twin brothers A and B. The curve A is a vertical line
indicating that A stays at home (as an inertial observer), while the curve B is a
non-geodesic indicating that B goes for a journey in space and returns. Suppose
the brothers are of the same age when they separate, would they be the same age
when they meet again? If not, then who is older? This is nothing but a question of
comparing the proper times of A and B between p and q, i.e., a question of comparing
the arc lengths l A and l B between p and q. Since the timelike geodesic is the longest
timelike curve between two points in Minkowski spacetime (see the paragraph above
Optional Reading 3.3), we have l A > l B , and thus B is younger than A when they
meet again. Figure 6.22b is the simplest example of the twin “paradox” (where the
world line of B is a broken line composed by two timelike geodesics); using the
quantitative relation for time dilation, it is easy to see that l A = γ l B > l B .
These are the essentials of the twin “paradox”, and the problem itself is just that
simple. However, due to a lack of deep understanding at the early stage of relativity
study, people used to consider this problem as a paradox. The argument regarding
the twin “paradox” even had an upsurge in 1957–1958 (though most physicists had
agreed that the problem had been solved long ago), and some papers were even
published in journals like Nature and Science. The representatives for the two sides
were physicist W. H. McCrea and physicist and natural philosopher H. Dingle. Dingle
claimed that according to relativity everything is relative, and thus the twins should
be the same age when they meet again. McCrea, however, pointed out shrewdly that
it is not true that everything is relative in relativity; it is the fact that the twin brother
B has an acceleration while A does not have one which leads to the result of the age
difference. As the study went deeper, especially after the geometric language became
widely used, physicists have already reached a consensus on the twin “paradox”,
which is exactly what we have shown at the beginning of this subsection [see, for
example, Sachs and Wu (1977) p. 42–43; Wald (1977) pp. 25–26; Misner et al. (1973)
p. 167]. It should be particularly emphasized that one should not have the idea that
“everything is relative in relativity” based solely on the name of the theory, as this is
a critical misunderstanding!
6.2 Interesting Typical Effects 187
The twin “paradox” was verified experimentally in 1971 using cesium atomic
clocks, not humans, of course; the reader may refer to Hafele and Keating (1972a;
1972b) for more information and look at Exercise 6.10.
Now, we answer a few frequently asked question regarding the twin “paradox”.
Q: In the phenomenon of time dilation, the two observers are on an equal footing:
A thinks B’s clock runs slower, and B thinks A’s clock runs slower. Why in the twin
“paradox” are A and B not on an equal footing (everyone thinks B is younger than
A)?
A: The premise for these two phenomena are different. In the phenomenon of
time dilation, both observers are undergoing inertial motion; since inertial frames
are on an equal footing, the result for both of them are certainly the same. However,
in the twin “paradox”, one of the brothers is experiencing a non-inertial motion (the
world line is not a geodesic), otherwise they will not meet again after they have
separated. The premise implies an inequality between the two observers, and thus
the conclusion is also one-sided.
Q: The conclusion of the twin “paradox” is that the accelerating brother is younger.
However, since acceleration is relative, B accelerating relative to A means that A is
accelerating relative to B. In this way, wouldn’t B also think A is younger?
A: One needs to distinguish the 3-dimensional and 4-dimensional accelerations
(namely 3-acceleration and 4-acceleration, see Sect. 6.3), the former of which is rel-
ative, while the latter of which is absolute (independent of the choice of the observer,
reference frame, coordinate system, etc.). On the other hand, the concept of inertial
motion and non-inertial motion are both absolute: a point mass undergoes inertial
motion if and only if its world line is a geodesic (independent of the reference
frame!). When we consider the term “accelerated motion” as a synonym of “non-
inertial motion”, it is supposed to be understood as the 4-acceleration. If one uses
the word “acceleration” to describe the twin “paradox”, then one should say “the
brother with a 4-acceleration is younger”. No observer would say that A has a 4-
acceleration, and there is no issue anymore. It is already a convention for physicists
that in the 3-dimensional language, when talking about acceleration without spec-
ifying a reference frame, it is always assumed to be relative to an inertial frame.
Under this agreement, expressions like “the brother experiencing accelerated motion
is younger” and “an electric charge only radiates under accelerated motion” are both
correct.
Q: It is sometimes heard that the twin “paradox” is inside the scope of general
relativity, and thus cannot be interpreted by just using special relativity. Is that right?
A: No. Didn’t we just interpret it clearly in the first paragraph of this subsection?
The misconception of some people that the twin “paradox” is related to general
relativity may occur when they choose the coordinate system corresponding to the
reference frame of B for calculating the time experienced by B. This is a non-
inertial frame, and some people may think general relativity is involved as long
as we talk about non-inertial frames. The explanations of this misunderstanding
are the following: ① The time experienced by B is the length of his world line,
which is a geometric quantity independent of the coordinate system, and hence it
is not necessary at all to trouble yourself by choosing a non-inertial frame for the
188 6 Special Relativity
calculation. ② At the very least, even if you insist on using a non-inertial frame for
the calculation, there is no need to use general relativity at all. We should specify
the division criteria for special versus general relativity. At first, people thought
they could use coordinate systems as the criterion, and it would be considered to
be in the scope of general relativity as long as a non-inertial frame is involved.
Later, however, people realized that it is much more natural (and elegant) to use the
absolute spacetime geometry (which is independent of any human choice) as the
criterion. Therefore, now we shall have the following standard: any physics problem
that has Minkowski spacetime as its background is in the scope of special relativity,
while general relativity must be used when spacetime is curved (see Chap. 7). When
discussing any physics problem, an important but often ignored step is to identify
the spacetime background beforehand, i.e., to specify in what spacetime the physical
phenomenon happens. The premise for the twin “paradox” is that the whole process
happens in Minkowski spacetime, and thus is in the realm of special relativity (unless
one stipulates that the background spacetime is not Minkowski, which means the
gravitational field is not negligible, see Chap. 7). Unfortunately, some people went
even further and mistakenly thought that accelerated motion could lead to curved
spacetime and so general relativity must be involved. (Some may conclude that the
spacetime is curved just based on the fact that the Christoffel symbols μ νσ do not
all vanish in a non-inertial coordinate system, but do not realize that it is absolutely
normal to find the μ νσ of a Minkowski metric to be nonvanishing in a non-inertial
frame). Another famous example similar to this is Einstein’s rotating disk, which is
sometimes also misunderstood as a problem involving general relativity. The premise
of this problem is actually also that the whole phenomenon (including the motions of
the disk and the observers on it) takes place in Minkowski spacetime, and therefore
it is also within the scope of special relativity. The best way to clearly analyze the
problem of Einstein’s rotating disk is also by using the geometric language; however,
it is much more complex than the twin “paradox”, see Sect. 14.2 (Volume II) for
details.
Suppose a car has the same rest length as a garage. When driving the car into the
garage, the driver thinks: “the moving garage becomes shorter, which will not be large
enough to fit the car.” However, the doorman of the garage thinks: “the moving car has
shrunk, and the garage will be more than enough to fit the car.” Is the driver correct? Is
the doorman correct? It will be crystal clear in the 4-dimensional geometric language.
To make the problem simple and more specific, we assume that the garage dose not
have a back wall (its “back wall” will be just a line on the ground). Figure 6.23 is the
spacetime diagram of the car coming into the garage at a uniform speed (one can use
a calibration curve to make sure the car and the garage have the same rest length in
the diagram). It is easy to see from the diagram that, measured by the inertial frame
of the doorman, the garage is longer than the car and has enough room for the car;
6.3 Kinematics and Dynamics of a Point Mass 189
measured by the inertial frame of the driver, however, the car is longer than the garage
and cannot be fit into the garage. The viewpoints of them are both correct, and the
divergence of their conclusion comes from the relativity of simultaneity. Actually,
a question like “can the car be fit in or not?” is not well-defined and should not be
raised. Since the conclusion is relative, this absolute type of question is meaningless,
just like in the phenomenon of “length contraction” one cannot ask “which ruler is
longer?” The case where the garage has a hard back wall is a little more complicated,
the basic principle is: due to relativity, any information cannot propagate faster than
the speed of light, and so the information that the head of the car crashes into wall
(and stops moving) takes time to propagate to the tail of the car; the tail of the car will
start to decelerate and come to a stop after “receiving” this information. Therefore,
the car would be physically compressed into a length that can be fit into the garage
(no matter from whose perspective). Motivated readers should draw a spacetime
diagram that roughly describes the whole process and finish Exercise 6.11.
Due to the importance and the subtleties of concepts like momentum, energy and
mass in special relativity, it is necessary to first review some relevant issues.
The principle of relativity requires that the laws of physics have the same form
in all inertial frames. The transformation between reference frames in Newtonian
mechanics is a Galilean transformation, while in special relativity it is a Lorentz
transformation. Hence, in Newtonian mechanics the principle of relativity requires
the mathematical expressions for the laws of physics to be invariant under Galilean
transformations (called Galilean covariance), while in special relativity it requires
the mathematical expressions for the laws of physics to be invariant under Lorentz
transformations (called Lorentz covariance). This principle is very powerful in that
it is a “law of laws”, which means any law that does not have Lorentz covariance
must be modified in order to be fit into special relativity. One notable example is
the law of conservation of momentum. In Newtonian mechanics, the momentum
of a point mass is defined as the product of the mass m and the velocity u, i.e.,
p := m u, and the corresponding force is defined by the time rate of change of the
190 6 Special Relativity
v+v 2v
u= = . (6.3.1)
1 + v /c
2 2 1 + v 2 /c2
Suppose the Newtonian mass for both of the balls is m, then the total momenta
of the balls in R before and after the collision are, respectively,
2mv
initial total momentum (magnitude) = mu + 0 = ,
1 + v 2 /c2
final total momentum (magnitude) = 2mv.
6.3 Kinematics and Dynamics of a Point Mass 191
(The conservation of Newtonian mass is used in the second line). The total momenta
before and after the collision are not the same, and thus the momentum is not con-
served in the frame R. This indicates that now the conservation of momentum is not
Lorentz covariant, and hence is not a law. Now we have two choices: either give up
on the conservation of momentum or render the conservation of momentum Lorentz
covariant by modifying the definitions of mass and momentum. In consideration of
the significance of the laws of conservation in physics, we certainly would like to
choose the latter one. To get an idea how to modify them, let us consider the fol-
lowing: suppose a point mass is accelerated by a constant force. Based on Newton’s
second law, its speed must eventually exceed the speed of light at some point in
time, which contradicts special relativity. In order to get rid of this inconsistency, it
would be reasonable to suspect that the mass of a point mass increases with its speed
in relativity. In this way, the acceleration of the point mass under a constant force
would become smaller and smaller, and so it is possible that its speed will never
reach the speed of light. Therefore, we can suggest the following modification: we
still define momentum as mass times velocity; however, the mass, now denoted by
m u (called the relativistic mass), is no longer a constant but depends on the speed
u. Now based on this idea let us reconsider the conservation of momentum in the
frame R in Fig. 6.24. Since ball 2 is at rest before the collision while ball 1 moves
with velocity u, their relativistic masses are, respectively, m 0 (called the rest mass)
and m u . Hence,
2m u v
initial total momentum (magnitude) = m u u + 0 = , (6.3.2)
1 + v 2 /c2
final total momentum (magnitude) = Mv v , (6.3.3)
where Mv represents the total mass of the combined body after the collision. Assume
that the total mass is invariant in the collision, i.e., m u + m 0 = Mv . (This is a very
natural assumption, the meaning of which will be explained later). Then (6.3.3)
becomes
final total momentum (magnitude) = (m u + m 0 )v . (6.3.4)
Comparing (6.3.2) and (6.3.4) we can see that, in order to let the conservation of
momentum hold in the frame R, we only have to require that
1 + v 2 /c2
mu = m0 . (6.3.5)
1 − v 2 /c2
1 − v 2 /c2
1 − u 2 /c2 = , (6.3.6)
1 + v 2 /c2
192 6 Special Relativity
Thus, for the collision shown in Fig. 6.24, we can only guarantee that momentum
is conserved in R if we allow m u to change with the speed u according to (6.3.7).
From this, we extrapolate that the momentum of a point mass in special relativity
should be defined as
p := m u u [where m u is given by (6.3.7)] . (6.3.8)
Usually we denote γu ≡ (1 − u 2 /c2 )−1/2 , and hence the momentum can also be
expressed as
p = γu m 0 u , or, for short, p = γ m 0 u = m u u . (6.3.9)
Now that we have the definition of momentum, we can define force. In relativity, the
force f acting on a point mass is still defined by the time rate of change of the point
mass’s momentum:
dp
f := . (6.3.10)
dt
The principle of relativity requires the above equation to be Lorentz covariant, which
determines the transformation law of forces between inertial frames (see textbooks
on special relativity for details).
Now we introduce the definition of energy. First, following Newtonian mechanics,
we define the kinetic energy E k of a point mass using the following two requirements:
① E k = 0 when the point mass is at rest (u = 0), ② the time rate of change of the
kinetic energy equals the power f · u, from which we obtain
dE k dp d(m u u) du dm u du dm u
= f ·u = ·u =u· = mu u · +u·u = mu u + u2 ,
dt dt dt dt dt dt dt
(6.3.11)
where dm u /dt can also be expressed using (6.3.1) as
dm u d cm 0 m u u du
= √ = . (6.3.12)
dt dt c2 − u 2 c2− u 2 dt
dE k dm u dm u dm u
= (c2 − u 2 ) + u2 = c2 . (6.3.13)
dt dt dt dt
Noticing that m u = m 0 and E k = 0 when u = 0, by integrating over the above equa-
tion we find the kinetic energy at the speed u is
mu
E k (u) = c2 dm = m u c2 − m 0 c2 . (6.3.14)
m0
6.3 Kinematics and Dynamics of a Point Mass 193
Albert Einstein boldly claimed that the m u c2 on the right-hand side of the equation
above to be the (total) energy of the point mass at the speed u (denoted by E = mc2 ,
where m is short for m u ). Thus, m 0 c2 is the mass when the point mass is at rest
(denoted by E 0 = m 0 c2 , called the rest energy of the point mass), and the kinetic
energy is the difference of the total energy and rest energy. E = mc2 indicates that the
energy E is proportional to the mass m (by which we mean the relativistic mass m u ),
called the equivalence of mass and energy. In the geometrized unit system c = 1, and
hence E = m, i.e., energy is equal to mass, and E 0 = m 0 indicates that an object has
the same amount of energy as its rest mass even at rest. This is an incredibly huge
amount of energy: the energy of an object with m 0 = 1 g (which is about 1% of a
bag of instant noodles) is
Mc2 = m 1 c2 + m 2 c2 . (6.3.15)
Before the fission the nucleus is at rest, the relativistic mass of which is equal to the
rest mass M0 . Use m 01 , m 02 , u 1 and u 2 to respectively represent the rest masses and
the velocities of the two pieces. Let γ1 ≡ (1 − u 21 /c2 )−1/2 , γ2 ≡ (1 − u 22 /c2 )−1/2 ,
then m 1 = γ1 m 01 , m 2 = γ2 m 02 . Hence, it follows from (6.3.15) that
M0 = γ1 m 01 + γ2 m 02 > m 01 + m 02 . (6.3.16)
3 However, make sure not to think this “law of the conservation of mass” as the same as the one
in Newtonian mechanics. The former is a conservation law of a physical quantity (the relativistic
mass), while the latter reflects the following tenet of Newton: matter can neither be created nor
destroyed. From today’s vantage point, this tenet is not quite true, since matter can be “destroyed”
— it can be turned into radiation, even though the energy does not change. Thus, energy is conserved
while matter is not.
194 6 Special Relativity
where q is the electric charge of the point mass, u is the 3-velocity, E and B are the
electric field and magnetic field, respectively.
Remark 1 ① The γ here is short for γu ≡ (1 − u 2 )−1/2 , while the γ in the Lorentz
transformation (6.1.5) stands for (1 − v 2 )−1/2 , where v is the relative speed between
two inertial frames, and u is the velocity of a particle with respect to the chosen
inertial frame. ② The transformations of coordinate systems are frequently involved
in relativity, and therefore people often use the term “invariant”. Note that “invariant”
and “conserved quantity” are two different concepts. A conserved quantity is a
quantity whose value remains a constant (does not change with time) in a physical
process; an invariant is a quantity that does not change with human factors such as a
coordinate system, reference frame, or observer. The former emphasizes the physical
process, and the latter emphasizes the transformation of the coordinate system, etc.
Energy is a conserved quantity rather than an invariant; (rest) mass is an invariant
rather than a conserved quantity; the electric charge of a charged particle is both an
invariant and a conserved quantity.
This is the 3-dimensional formulation based on a specific inertial frame. Now we
will introduce the 4-dimensional formulation, as well as the relationship between the
3- and 4-dimensional languages.
Proof The proper time is the arc length parameter of a timelike curve, and the tangent
vector of a curve whose parameter is the arc length has unit length (see Sect. 2.5).
discussion a little complicated. (On the theoretical side, the shape of an object moving
at high speed is a problem in this category; on the practical side, all the astronomy
observation are indirect measurements). The simplest, clearest, and most basic kind
of measurement is a direct measurement, i.e., the measurement of an event happening
on the observer’s world line, also called a local measurement. Luckily, a reference
frame is formed by ubiquitous observers, and the events happening elsewhere can
just be measured by another observer. ② When you measure an event happening at a
point p on your world line, in many cases what is important is just the 4-velocity at
p but not the whole world line. Then, there is no need to emphasize the world line of
the observer, and one only needs to know the tangent vector Z a of this world line at
p. Hence, we can extract a more abstract concept, called an instantaneous observer
[see Sachs and Wu (1977)], which contains two key elements, namely the point p and
a (future-directed) timelike unit vector Z a at p, together denoted by ( p, Z a ). ③ You,
as an observer, have a sense of spatial direction besides a sense of time (from your
standard clock). Assume you hold an arrow in your hand, and any direction it points
to represents a spatial direction you can perceive. The collection of all the directions
you can perceive at p (a point on your world line G) is of course a 3-dimensional set
W p , while for p as a point in R4 , its tangent space V p is 4-dimensional. What is the
relationship between W p and V p ? First let us consider the simplest case. Suppose
you are an inertial observer in an inertial frame R. The surface of simultaneity of R
is the 3-dimensional space of R at a certain time, which is orthogonal to the world
lines of all the inertial observers in this frame, and thus all the spatial vectors you
have at p are orthogonal to your 4-velocity Z a at p. Therefore, W p corresponds to
the 3-dimensional subspace of V p orthogonal to Z a , i.e.,
W p = {wa ∈ V p | ηab wa Z b = 0} .
This correspondence also applies to non-inertial observers, since we only care about
the situation at one point p on the world line of the observer.
In Fig. 6.26, W p is represented by as a small plane, but it is actually an “infinitesi-
mal” plane. The most precise interpretation of W p is a subspace of the tangent space
at p, which in the figure can only be drawn as a small plane. Suppose wa ∈ V p , when
wa ∈ W p , we say that wa is a spatial vector for the observer G. A (nonzero) spatial
vector must be a spacelike vector, but the converse is not true. From the definition
6.3 Kinematics and Dynamics of a Point Mass 197
we can see that a spacelike vector is absolute (does not depended on factors such as
the observer, reference frame or coordinate system, etc.), while a spatial vector is
relative (depends on the 4-velocity Z a of the observer). It follows from (4.4.2) that
the induced metric of ηab on W p at p is h ab = ηab + Z a Z b , and from the paragraph
below (4.4.4) we know that h a b = δ a b + Z a Z b is the projection map from V p to W p ,
i.e., h a b v b ∈ W p is the projection of v a ∈ V p onto W p .
Suppose the world line L of a point mass and the world line G of an observer
intersect at p, let us discuss the 3-velocity of L relative to G at p. First we discuss
the case where L(τ ) and G are both geodesics. Let U a and Z a be, respectively, the
4-velocities of L(τ ) and G at p (see Fig. 6.27), and {t, x i } be the coordinates of the
inertial frame that the inertial observer G belongs to. Then,
a a a
∂ ∂ dt ∂ dx i
U =a
= + , (6.3.25)
∂τ ∂t dτ ∂xi dτ
Suppose p = L(τ1 ), and let q ≡ L(τ1 + dτ ), then the geodesic segment pq repre-
sents the “infinitesimal” process of the point mass from the proper time τ1 to τ1 + dτ .
For the observer G, the time of this process would be dt in (6.3.26), and the corre-
sponding spatial displacement is (∂/∂ x i )a dx i . Hence, the 3-velocity of a point mass
L relative to G (also called the 3-velocity of L measured by G) should be defined
as
198 6 Special Relativity
a a
∂ dx i ∂ dx i /dτ
u :=
a
= . (6.3.27)
∂xi dt ∂xi dt/dτ
a
It follows from (6.3.25) that ∂∂x i dx
i
dτ
is the spatial projection of U a , i.e., h a b U b .
Now if we set γ ≡ dt/dτ , then (6.3.27) can be rewritten as
ha bU b
u a := . (6.3.28)
γ
γ = −U a Z a , (6.3.29)
Remark 3 ① It is easy to see that the 3-velocity u a is a spatial vector of the observer
G at p (and thus can be denoted by u). This is the most basic requirement for u a :
since the 3-velocity is a vector in the 3-dimensional language (called a 3-vector), of
course it should be a spatial vector. ② Although we have used a coordinate system in
the discussion above, the definition (6.3.28) of u a is independent of the coordinate
system. ③ Suppose R is an inertial reference frame that the inertial observer G
belongs to, then the u a in (6.3.28) is also called the 3-velocity of the point mass L
at p relative to R. Suppose {t, x i } is an arbitrary inertial coordinate system in R,
then it follows from (6.3.27) that the components of the 3-velocity in this system are
u i = dx i /dt. Note that the components of u defined by (6.3.17) are also u i = dx i /dt,
and thus this agrees with the definition of u a in (6.3.28). ④ A 3-vector (e.g., u a ) at
any point p in the 4-dimensional spacetime is an element in V p , and hence is also a
4-vector, just the time component u 0 is zero.
Since (6.3.28) only involves the tangent space of p (only involves an “infinites-
imal” neighborhood of p), it also applies to the cases where L(τ ) and G are not
geodesics, and therefore we have the following definitions:
Definition 2 Suppose L(τ ) is an arbitrary point mass, and p ∈ L, then the 3-velocity
u a of the point mass relative to any instantaneous observer ( p, Z a ) is defined by
(6.3.28), where h ab = ηab + Z a Z b , and γ ≡ −U a Z a .
√
Definition 3 The magnitude u = u a u a of the 3-velocity vector u a of a point mass
with respect to an instantaneous observer is called the 3-speed of the point mass with
respect to this instantaneous observer, where u a := ηab u b = h ab u b .
the timelike and spacelike cases, τ represents the arc length, and for the null case, τ
represents an arbitrary parameter. Let U a ≡ (∂/∂τ )a , we still use (6.3.28) to define
u a . Then,
(h a c U c )(h b d U d ) U cU d
u 2 = h ab u a u b = h ab = h cd
γ 2 γ2
ηcd U c U d + Z c Z d U c U d ηcd U c U d + γ 2
= = .
γ 2 γ2
U a = γ (Z a + u a ) , (6.3.30)
where u a is the 3-velocity of the point mass relative to the instantaneous observer,
and γ ≡ −Z a Ua .
γ u a = h a b U b = (δ a b + Z a Z b )U b = U a − γ Z a ,
Definition 4 Suppose the (rest) mass of a point mass is m, and the 4-velocity is U a ,
then the 4-momentum P a of the point mass is defined as
P a := mU a . (6.3.31)
P a = E Z a + pa , (6.3.32)
where the energy E and the 3-momentum pa are defined by (6.3.20) and (6.3.19).
P a = mU a = m(γ Z a + γ u a ) = E Z a + pa .
Remark 7 Equation (6.3.32) indicates that the 3-momentum pa and the energy E are
respectively the spatial and time components of the 4-momentum P a , the latter of
which can be expressed as
E = −P a Z a . (6.3.33)
[This can be easily seen by contracting Z a with (6.3.32)]. The concept of the 4-
momentum of a point mass unifies two different concepts—the energy and momen-
tum of a point mass—organically into one physical quantity, which is independent
of the observer (P a is absolute). However, the way of decomposing P a into time
and spatial components depends on the observer, and thus is relative. If there is no
observer making a local measurement, then the 4-momentum still exists objectively,
but the energy and 3-momentum are meaningless. Now we can further understand
why most modern literature only uses the (rest) mass m and (total) energy E—they
are two fundamentally different types of quantity. The mass m of a point mass (e.g.,
an electron) is an invariant (just like its electric charge), which reflects an intrinsic
6.3 Kinematics and Dynamics of a Point Mass 201
property of a point mass. The energy E of a point mass depends on the observer (and
thus is not an invariant). The energy measured by an instantaneous rest observer is
the rest energy; although it has the same value as the mass, they are not quantities
of the same type [mass is an invariant, while the rest energy is a special case of an
observer dependent quantity (energy)].
Remark 8 It is easy to derive the relation of mass, energy and 3-momentum from
(6.3.32) as follows:
P a Pa = (E Z a + pa )(E Z a + pa ) = −E 2 + p 2 ,
where p stands for the magnitude of the 3-momentum. On the other hand, P a Pa =
mU a mUa = −m 2 , and therefore
E 2 = m 2 + p2 , (6.3.34)
Aa := U b ∂b U a , (6.3.35)
202 6 Special Relativity
where U a is the 4-velocity of the point mass, and ∂b is the derivative operator asso-
ciated with ηab (i.e., ∂a ηbc = 0 ).
Proposition 6.3.4 The 4-acceleration Aa at each point on the world line of a point
mass is orthogonal to the 4-velocity U a , i.e., Aa Ua = ηab Aa U b = 0.
Definition 6 Suppose the parametric equations of the world line L(τ ) of a point
mass in an inertial coordinate system {t, x i } are t = t (τ ), x i = x i (τ ), then its 3-
acceleration relative to this system is defined as
a
d2 x i (t) ∂
a a := , (6.3.36)
dt 2 ∂xi
where u and a are respectively the 3-velocity and 3-acceleration of the point mass
relative to R, γ ≡ (1 − u 2 )−1/2 , and u ≡ (u · u)1/2 .
Proof Suppose {(dx μ )a } is the dual coordinate basis of the frame R, then it follows
from the definition of Aa that
dU 0 dγ
A0 = γ =γ ,
dt dt
dU i d(γ u i ) du i dγ dγ
Ai = γ =γ = γ2 + ui γ = γ 2ai + ui γ .
dt dt dt dt dt
Remark 11 For a free point mass we have Aa = 0, and from (6.3.37) we can see that
its 3-acceleration relative to any inertial frame is a a = 0.
F a := U b ∂b P a , (6.3.38)
where U a and P a are the 4-velocity and 4-momentum of the point mass, respectively.
Equation (6.3.38) is also called (the 4-dimensional expression of) the relativistic
equation of motion for a point mass, but actually it is just the definition of the 4-force.
204 6 Special Relativity
To obtain the real physical laws, one also needs to combine (6.3.38) with the specific
expression of the 4-force in each specific case.
Remark 12 In this section, we only care about the case where the (rest) mass m of
the point mass remains a constant (dm/dτ = 0). In this case plugging P a = mU a
into (6.3.38) we obtain F a = m Aa . However, if m is changing during the motion
(dm/dτ = 0), then this conclusion does not hold, see Optional Reading 6.3.
Fi = γ f i , F0 = γ f · u , (6.3.39)
where γ ≡ (1 − u 2 )−1/2 , and u is the magnitude of the 3-velocity u of the point mass
with respect to this system.
Take the ith component. It follows from (6.3.32) and (6.3.19) that
d pi d pi dt
F i = U b ∂b P i = = = γfi .
dτ dt dτ
Now take the 0th component. It follows from (6.3.32) and (6.3.20) that
dE dE dt dE
F 0 = U b ∂b P 0 = U b ∂b E = = =γ =γ f ·u.
dτ dt dτ dt
[Optional Reading 6.3.2]
So far we only discussed the case where the (rest) mass m of the point mass is a constant
(dm/dτ = 0), but more generally, m may change in the motion, i.e., dm/dτ = 0. For instance,
consider a resistor in a DC circuit at rest in an inertial frame R . The Joule heat (which is
also a form of energy) caused by the current makes the rest energy mc2 of the resistor to
increase, and thus dm/dτ > 0. In the case where dm/dτ = 0, some previous conclusions
need to be modified. Such as,
(1) Although f · u can still be called the power of the 3-force (there are also people who
think it is improper to call it so), it is not equal to the rate of change of the total energy any
more. The relation between them is now
dE c2 dm
f ·u = − . (6.3.40)
dt γ dt
6.3 Kinematics and Dynamics of a Point Mass 205
(2) The kinetic energy should be defined as the difference between the total energy γ mc2
and the rest energy mc2 , i.e., E k = (γ − 1)mc2 , which does not satisfy f · u = dE k /dt. In
fact, if dm/dτ = 0 then f · u is equal to neither dE/dt nor dE k /dt.
(3) The 4-force is still defined as U b ∂b P a ; however, F a = m Aa .
(4) Proposition 6.3.7 now should be stated as
dE
Fi = γ f i , F0 = γ (= γ f · u) . (6.3.41)
dt
When discussing media that are continuously distributed (gases, liquids, solids,
plasma, etc.), we care not about the behavior of any specific particle, but about the
statistical average over all of the particles. We are interested in the energy/momentum
density and energy/momentum flux density, etc. at each point of space rather than the
energy and momentum of any individual particle. Thus, a continuous medium is sim-
ilar to an electromagnetic field in many aspects, and we call it a matter field. Suppose
m is the rest mass in a macroscopically small volume V , the content of which has a 3-
velocity u relative to an inertial frame, then its 3-momentum is p = γ m u = (E/c2 )u,
where E is its energy and the meaning of γ is self-evident. Dividing the whole equa-
tion by V yields
1 1
3-momentum density = 2
energy density × u = 2 energy flux density . (6.4.1)
c c
6.4 The Energy-Momentum Tensor of Continuous Media 207
and thus T̂ ab should be interpreted as the 3-stress tensor. On the other hand,
and hence
Also, a force is nothing but the rate of change of the 3-momentum of the object which the
force acts on, and the interaction between them is nothing but exchanging their 3-momenta.
Thus,
The (ei )a in the equation above can be the unit vector of any spatial direction, and so
this equation indicates that the 3-momentum flux density along any spatial direction can
be obtained by contracting T̂ ab with the unit vector of this direction. Therefore, T̂ ab ≡
T i j (ei )a (e j )b can be interpreted (called) as the 3-momentum flux density tensor.
[The End of Optional Reading 6.4.1]
W a = μZ a + wa , (6.4.4)
where μ and wa ≡ wi (ei )a are respectively the energy density and 3-momentum
density measured by this observer, the latter of which is a spatial vector of this
observer.
Remark 1 Equations (6.4.4) and (6.3.32) are very similar: the left-hand side of the
latter is the 4-momentum P a , and the left-hand side of the former is the 4-momentum
density W a . Both equations are the 3 + 1 decomposition of a 4-vector. However, one
should notice a difference: the 4-momentum P a is independent of the observer, while
the 4-momentum density W a depends on the observer (from Definition 1 one can
see that W a is a 4-vector that depends on the observer).
Proof Suppose t, x, y, z are the coordinates for an inertial frame R, and let Z a ≡
(∂/∂t)a . Then taking the derivative of W a ≡ −T a b Z b yields
∂a W a = ∂a (−T a b Z b ) = −Z b ∂ a Tab − T a b ∂a Z b .
The first term on the right-hand side of the above equation vanishes (since ∂ a Tab = 0),
as does the second term [since ∂a Z b = ∂a (∂/∂t)b = 0], and hence
∂a W a = 0 . (6.4.5)
Therefore,
∂μ
0 = ∂μ W μ = ∂0 W 0 + ∂i W i = ∂0 μ + ∂i wi = +∇ ·w. (6.4.6)
∂t
Since μ and wa are respectively the energy density and energy flux density measured
by the frame R, the equation above looks quite like the continuity equation (∂ρ/∂t) +
∇ · j = 0 in electrodynamics. Following the reasoning of the conservation of the
electric charge from the latter, one can deduce that (6.4.6) leads to the conservation
of energy.
Remark 2 One can also derive the conservation of 3-dimensional momentum and
angular momentum from ∂ a Tab = 0, and thus ∂ a Tab = 0 is also called the conser-
vation equation.
[Optional Reading 6.4.2]
The conservation of energy can also be derived directly from (6.4.5) using the 4-
dimensional version of Gauss’s Theorem as follows: let to be the 4 dimensional “cuboid”
bounded by several hypersurfaces (3d!) in R4 (see Fig. 6.33, one dimension is suppressed
in the figure), i.e., (a segment of) the world tube of the 3-dimensional rectangular box ω
(shown in Fig. 6.34). It follows from Gauss’s theorem and (6.4.5) that
0= W a na = W a na + W a na + W a na . (6.4.7)
∂ σ1 σ2
σ1 and σ2 are the “upper and lower bases” of , and represents all of the “sides” of .
Noticing the requirement on the direction of the normal vector in (5.5.7 ), we can see that
the normal vector of σ1 , σ2 and 1 (one of the side surfaces) is in the direction shown in
Fig. 6.33. Thus,
210 6 Special Relativity
W a na = (μZ a + wa )n a = μZ a Z a = − μ
σ1 σ1 σ1 σ1
= −E 1 = − (the energy of the 3d box ω at t1 ) ,
where ε̂ is the 3-dimensional volume element induced by the 4-dimensional volume element
ε = dt ∧ dx ∧ dy ∧ dz on 1 , i.e.,
ε̂abc = (∂/∂ x)d (dt)d ∧ (dx)a ∧ (dy)b ∧ (dz)c = −(dt)a ∧ (dy)b ∧ (dz)c .
Hence,
t2 y2 z 2
W a na = − w1 dt ∧ dy ∧ dz = w1 dtdydz = (w1 dydz)dt ,
1 1 1 t1 y1 z1
(6.4.8)
where the minus sign is dropped in the second equality because {t, y, z} is a left-handed
coordinate system measured by ε̂ = −dt ∧ dy ∧ dz. Recalling that w1 is the energy flux
y z
density along the direction of (∂/∂ x)a , we can see that y12 z 12 (w1 dydz)dt is the energy
flowing out of the side wall S1 of ω within a time dt, and hence the right-hand side of
(6.4.8) is the energy flowing out of ω from the side wall S1 (see Fig. 6.34) in a time t2 − t1 ,
and − W a n a is the energy flowing into ω from each side wall in t2 − t1 , i.e., the energy
increase in this period of time. Therefore, (6.4.7) indicates that:
which seems should be expressed according to (5.5.6) as ε̂abc = n d εdabc . However, the n a
in (5.5.6) is an outgoing unit normal vector, which differs from the n a here (see Fig. 6.33)
by a minus sign, and thus ε̂ should be expressed using the n a here as
This indicates that the coordinate system {x, y, z} on σ1 is left-handed measured by ε̂.
Therefore,
(W a n a )ε̂ = − με̂ = μ(dx)a ∧ (dy)b ∧ (dz)c = − μdxdydz = −E 1 .
σ1 σ1 σ1 σ1
6.5 Perfect Fluid Dynamics 211
[Since {x, y, z} is a left-handed system, we used (5.2.6) in the third equality]. Although the
conclusion is still σ1 W a n a = −E 1 , one should note that there are two minus signs showing
up which cancel each other that assures the same result.
[The End of Optional Reading 6.4.2]
where u and p are functions (scalar fields), and U a is a future-directed timelike vector
field which satisfies U a Ua = −1, called the 4-velocity field of the perfect fluid.
A fluid itself can be viewed as a reference frame. Suppose the 4-velocity (e0 )a
of an instantaneous observer ( p, (eμ )a ) satisfies (e0 )a = U a | p , then this observer
is at rest relative to the fluid reference frame, and thus is called an instantaneous
rest observer. However, to another reference frame, this observer moves with the
fluid, and hence ( p, U a | p ) is also called an instantaneous comoving observer. For
a comoving observer,
Thus, the 3-dimensional stress tensor measured by a comoving observer has the
matrix form ⎛ ⎞
p 0 0
⎝0 p 0⎠ ,
0 0 p
i.e., there is only pressure but no shear stress (which is exactly an important property
of a perfect fluid4 ). From T11 = T22 = T33 = p and the arbitrariness of the triad of a
comoving observer we can see that a perfect fluid is isotropic.5 Also, Tab (e0 )a (ei )b
indicates that the energy flux density measured by a comoving observer is zero, and
thus there is no thermal conduction. All of these are important properties of a perfect
fluid.
It is necessary to give an explanation of the physical meaning of the 4-velocity
field U a . A perfect fluid is a continuous medium, which is a model obtained from the
statistical average over the microscopic discreet structure of the particles. Usually,
a fluid volume element that is large enough microscopically while small enough
macroscopically is called a fluid particle or fluid point mass [see Landau and
Lifshitz (1987) p. 1; Zhou et al. (2000) pp. 15–17]. The U a in (6.5.1) is the vector
field formed by the 4-velocity of all fluid particles. A comoving observer is the
observer at rest relative to a fluid particle, and a comoving reference frame (rest
reference frame) is the reference frame of the observers whose 4-velocity field is U a .
One should note the conceptual difference between fluid particles and microscopic
particles that form a fluid. This difference is especially prominent for an ideal gas
(which is an example of a perfect fluid). Due to frequent collisions, the world lines
of the gas molecules intersect a lot. Since the 4-velocity of a molecule has a sudden
change during a collision, the world lines of the molecules are significantly distinct
from Fig. 6.35, and so do not treat the U a in (6.5.1) as the 4-velocity of a specific
molecule. In fact, we have already taken the statistical average over the microscopic
motion of the molecules when we regard an ideal gas as a perfect fluid, and U a
is the 4-velocity field after the average. Consider a box at rest in an inertial frame
{t, x, y, z}, which contains an ideal gas in thermal equilibrium. Since there is no
special direction, the average 3-velocity of the gas molecule is zero, and hence
U a = (∂/∂t)a , whose integral curves are the t-coordinate lines as shown in 6.36.
Thus, a comoving observer is not an observer moving with a gas molecule, but is the
inertial observer at rest relative to the box.
The pressure p and the mass density μ of a perfect fluid have the following
well-known relation:
preference, then the fluid is said to be isotropic. We have shown that a comoving frame meets this
requirement, so a perfect fluid is isotropic.
6.5 Perfect Fluid Dynamics 213
μu 2
p= , (6.5.2)
3
where u 2 is the average of the square of the random motion velocity of each molecule.
Since u 2 c2 , we have ( p/c2 ) μ, which in the unit system with c = 1 is p μ,
i.e., the pressure is much less than the density. This conclusion holds for any non-
relativistic fluid, like in a hurricane p/μ ∼ 10−12 , and in the Earth’s core p/μ ∼
10−10 . However, for relativistic fluids it will be quite different. The electromagnetic
radiation that reaches thermal equilibrium in an isothermal box (which is called
blackbody radiation) can be viewed as an example of an extreme relativistic perfect
fluid, where the reference frame at rest relative to the box is the rest frame (comoving
frame) of the fluid. The radiation inside the box is isotropic in this frame, and thus
this frame is also called the isotropic reference frame of blackbody radiation. The
electromagnetic radiation in the box has many similarities with an ideal gas, and can
be called a photon gas. The relation between the pressure p and energy density μ
of an photon gas also satisfies (6.5.2) (of course the derivation is different), also now
u 2 = 1, and hence
μ
p= . (6.5.3)
3
The key point for why blackbody radiation can be regarded as a perfect fluid is that,
relative to the isotropic reference frame, its photons have random motions in all
directions that are sufficiently disordered (see Appendix D in Volume II for details).
In contrast, the light rays coming from a searchlight cannot be regarded as a perfect
fluid, since there does not exist a reference frame in which these light rays are
isotropic.
214 6 Special Relativity
A perfect fluid in Newtonian mechanics obeys two important laws, namely the
continuity equation that describes the rate of change of the mass density μ,
∂μ
+ ∇ · (μu) = 0 (reflects the conservation of mass) , (6.5.4)
∂t
and the Euler equation that describes the rate of change of the 3-velocity u (see
Optional Reading 6.5 for a derivation)
∂u
−∇p = μ + (u · ∇)u . (6.5.5)
∂t
Now we will introduce the generalization of these two laws in relativistic perfect
fluid mechanics. Suppose a perfect fluid has no interaction with the outside, then its
energy-momentum tensor satisfies ∂ a Tab = 0. It follows from (6.5.1) that
This is an equality of 4-vectors, which can be projected onto the spatial and time
directions of a comoving observer. Contracting U b with the equation above yields
Noticing
1 a
U b U a ∂a Ub = U ∂a (U b Ub ) = 0 (since U b Ub = −1 = constant) ,
2
we have
U a ∂a μ + (μ + p)∂a U a = 0 . (6.5.7)
This is the projection of (6.5.6) in the time direction. To find the spatial projection,
we contract the projection map h c b = δc b + Uc U b with (6.5.6) and obtain
(μ + p)U a ∂a Uc + ∂c p + Uc U b ∂b p = 0 . (6.5.8)
Equations (6.5.7) and (6.5.8) are the relativistic equations of motion for a perfect
fluid. A perfect fluid with zero pressure is called a dust. For a dust, (6.5.8) can be
simplified as U a ∂a Uc = 0, and thus the world line of a dust particle is a geodesic. This
is pretty natural since p = 0 indicates that there is no force exerted on the particle. To
find the non-relativistic approximation of (6.5.7) and (6.5.8), we choose an arbitrary
inertial frame {t, x i } and make the 3 + 1 decomposition for U a [see (6.3.21)]:
U a = γ [(∂/∂t)a + u a ] ∼
= (∂/∂t)a + u a , (6.5.9)
6.5 Perfect Fluid Dynamics 215
where u a is the 3-velocity of the fluid in this system, and γ = −(∂/∂t)a Ua is approx-
imated as 1 in the non-relativistic limit. Plugging (6.5.9) into (6.5.7) and noticing
that p μ, we get (the approximation symbol is omitted from now on)
a
∂ ∂μ
0= ∂a μ + u a ∂a μ + μ∂a u a = + ∂a (μu a ) .
∂t ∂t
Since u a is a spatial vector in the inertial frame we are using, ∂a (μu a ) = ∂i (μu i ) =
∇ · (μu), and hence the equation above is exactly the continuity equation (6.5.4).
Contracting (∂/∂ x i )c with (6.5.8) and noticing (6.5.9) and p μ, we get
a c b
∂ ∂ ∂
0=μ ∂a u i + u ∂a u i +
a
∂c p + u i +u b
∂b p
∂t ∂xi ∂t
∂u i ∂p ∂p ∂p
=μ + u a ∂a u i + + ui + ui u j j .
∂t ∂x i ∂t ∂x
f mdu/dt du
−∇p = = =μ , (6.5.10)
V V dt
where u is the 3-velocity of the fluid particle. There are two reasons that u changes with
time: ① the 3-velocity u of each spatial point can change with time ( p and p in Fig. 6.37
can have different u a ); ② a fluid particle can move from one spatial point to another spatial
point (the mass point L in Fig. 6.37 moves from the spatial point P to Q), the way of its
moving is described by the parametric equations x i = x i (t) of its trajectory. Let u(t, x i (t))
represent the dependency of u on t due to these two factors. Then, (6.5.10) can be expressed
as
du ∂u ∂ u dx i (t) ∂u
−∇ p = μ =μ + i =μ + (u · ∇)u .
dt ∂t ∂x dt ∂t
which is Euler’s equation (6.5.5).
[The End of Optional Reading 6.5.1]
6.6 Electrodynamics
Definition 1 The electric field E a and the magnetic field B a measured by an instan-
taneous observer ( p, Z a ) are defined by the following equations
where ∗ Fab is the dual differential form of Fab (see Sect. 5.6), which is also a 2-form
field.
Proposition 6.6.1 E a and B a are spatial vector fields of the instantaneous observer
( p, (eμ )a ), (e0 )a = Z a , and
E a Z a = Fab Z a Z b = 0 , Ba Z a = −∗ Fab Z a Z b = 0 ,
and thus E a and B a are spatial vectors of the instantaneous observer ( p, Z a ). Since
1 1 1
Bi = Ba (ei )a = −∗ Fab Z b (ei )a = − εabcd F cd (e0 )b (ei )a = ε0icd F cd = ε0i jk F jk ,
2 2 2
From Proposition 6.6.1 we can see that the matrix constituted by the components of
Fab in terms of the observer’s tetrad (eμ )a is
⎡ ⎤
0 −E 1 −E 2 −E 3
⎢ E 1 0 B3 −B2 ⎥
(Fμν ) = ⎢
⎣ E 2 −B3 0 B1 ⎦ .
⎥ (6.6.3)
E 3 B2 −B1 0
Proposition 6.6.2 Suppose two inertial frames R and R are related by the Lorentz
transformation
t = γ (t + vx ) , x = γ (x + vt ) , y = y , z = z . (6.6.4)
Then, the values ( E, B) and ( E , B ) of the same electromagnetic field Fab measured
by two observers in these two frames have the following relationship:
218 6 Special Relativity
E 1 = E 1 , E 2 = γ (E 2 − v B3 ) , E 3 = γ (E 3 + v B2 ) ;
(6.6.5)
B1 = B1 , B2 = γ (B2 + v E 3 ) , B3 = γ (B3 − v E 2 ) .
Proof This proposition is only about the local measurement at p and does not involve
any derivative. Choose the inertial frame R such that the 4-velocity of the observer
whose world line passes p is (e0 )a , and choose another inertial frame R such that
the 4-velocity of the observer whose world line passes p is (e0 )a . Then, the relation
between R and R will be (6.6.4). Hence, we have (6.6.5).
Proposition 6.6.3 indicates that (6.6.5) holds for any two instantaneous observers at
any spacetime point p that satisfy (e2 )a = (e2 )a and (e3 )a = (e3 )a , which clarifies
the misunderstanding that “(6.6.5) only holds for an inertial frame.”
[Optional Reading 6.6.1]
Propositions 6.6.2 and 6.6.3 can also be proved using the orthonormal frame transfor-
mation (see Fig. 6.38). According to (6.3.30), the 3 + 1 decomposition of the 4-velocity
U a ≡ (e0 )a of the instantaneous observer ( p, (e0 )a ) relative to the instantaneous observer
( p, (e0 )a ) gives
(e0 )a = γ (e0 )a + γ u a .
Since the 3-velocity u a is in the same direction as (e1 )a , and (e1 )a is normalized, we have
u a = u(e1 )a , and thus the above equation becomes
E 2 = F20
= Fab (e2 )a (e0 )b = Fab (e2 )a [γ (e0 )b + γ u(e1 )b ]
= γ (F20 + u F21 ) = γ (E 2 − u B3 ) .
The sources of the electromagnetic field are electric charges and electric currents.
In the 4-dimensional language, the continuously distributed electric charges and cur-
rents can be viewed as a dust formed by a large amount of charged particles [see
6.6 Electrodynamics 219
Synge (1956), Chap. VIII Sect. 10, Chap. X Sect. 7]. To simplify the question, we
only talk about the case where all the charged particles are of the same kind (e.g.,
they are all electrons), whose electric charge is e.6 Let U a represents the 4-velocity
field of this charged dust, then ( p, U a ) is the instantaneous comoving observer at p.
Suppose there are N charged particles in the small volume V0 of the local surface
of simultaneity perpendicular to U a , then η0 = N /V0 is the particle number den-
sity measured by the comoving observer (called the proper number density). Let
( p, Z a ) be an arbitrary instantaneous observer at p. This observer will regard the
particle as in motion, i.e., will see a current, as long as it is not a comoving observer
(as long as Z a = U a ). Suppose the N particles above take a volume V in the local
surface of simultaneity of ( p, Z a ) perpendicular to Z a (see Fig. 6.39), then from the
Lorentz contraction we know that V0 = γ V , where γ ≡ −Z a Ua , and thus the particle
number density measured by the observer ( p, Z a ) is η = N /V = γ N /V0 = γ η0 .
Therefore, ρ0 ≡ eη0 and ρ ≡ eη are, respectively, the charge density observed by
the comoving observer ( p, U a ) and that observed by an arbitrary observer ( p, Z a ),
which have the relation ρ = γρ0 . Suppose u a is the 3-velocity of the charged particle
relative to ( p, Z a ), then j a := ρu a is the 3-current density measured by ( p, Z a ).
The 3-current density measured by the comoving observer is zero.
J a := ρ0 U a . (6.6.8)
6 This simplification does not affect the essence of the problem. What is important is that they form
a stream of particles, and unlike gas molecules which move randomly in all the directions, the value
of its 4-velocity field U a at each spacetime point is the 4-velocity of the dust particle whose world
line passes through this point.
220 6 Special Relativity
Ja = ρ Za + ja . (6.6.9)
Proof J a = ρ0 U a = ρ0 γ (Z a + u a ) = ρ Z a + ρu a = ρ Z a + j a .
Thus, the charge density ρ and 3-current density j a are respectively the time com-
ponent J 0 and spatial projection h a b J b of the 4-current density. The equation above
can also be expressed as
ρ = −Z a J a , ji = Ji .
Like mass, electric charge is also a physical quantity that describes an intrinsic
property of a charged particle. The charged particles and electric charges remain the
same when they are not involved in any interaction. When they are interacting with
other particles, the total charge must be the same before and after the interaction.
This is the law of conservation of charge, which is a result confirmed by all the
experiments so far. In the 3-dimensional language of electrodynamics, this law is
expressed as the continuity equation: (∂ρ/∂t) + ∇ · j = 0 (for any inertial frame).
It is not difficult to see that the corresponding 4-dimensional expression is ∂a J a = 0.
In our current framework, we will treat the above two equations as the starting point,
i.e., we will assume the electromagnetic field tensor obeys (6.6.10) and (6.6.11).
Note that (6.6.10) already contains the law of conservation of charge, since from it
we get
∂ b Jb = −(4π )−1 ∂ b ∂ a Fab = −(4π )−1 ∂ (b ∂ a) F[ab] = 0 ,
∂B
(a) ∇ · E = 4πρ , (b) ∇ × E = − ,
∂t (6.6.12)
∂E
(c) ∇ · B = 0 , (d) ∇ × B = 4π j + .
∂t
The first and fourth equations here correspond to (6.6.10), and the second and third
equations correspond to (6.6.11).
Remark 1 Here we adopt the geometrized Gaussian unit system (see Appendix A), in
which the coefficients of the 3-dimensional Maxwell equations are slightly different
from the common form.
Proof Let δab represent the (induced) Euclidean metric on a constant-t surface of the
chosen inertial frame, and let ∂ˆa and ∂a represent the derivative operators associated
with the metrics δab and ηab , respectively. Setting Z a ≡ (∂/∂t)a , and noticing that
the spatial vector E a satisfies E 0 = 0, we have
∂ Ei
∇ · E = ∂ˆ a E a = = ∂ a E a = ∂ a (Fab Z b ) = Z b (−4π Jb ) = 4πρ .
∂xi
This is (6.6.12)(a). Now we prove (6.6.12)(b). Suppose ε̂abc is the volume element
associated with δab on the constant-t surface, then from (c) of (5.6.5) we know that
Comparing the projection of the above equation on the constant-t surface with
(6.6.14), and noticing that the projection of (dx 0 )a vanishes and the projection of
(dx i )a are themselves, we have
∂ˆa E b = h a d h b e ∂d E e . (6.6.15)
Since ε̂ab c is a spatial tensor, its projection is equal to itself. Plugging (6.6.15) into
(6.6.13) yields
(∇ × E)c = ε̂ab c h a d h b e ∂d E e = ε̂de c ∂d E e ,
and hence
(∇ × E)c = ε̂ab c ∂a E b = ε̂ab c ∂a (Fbe Z e ) = Z e ε̂ab c ∂a Fbe = −Z e ε̂ab c ∂e Fab − Z e ε̂ab c ∂b Fea ,
222 6 Special Relativity
where in the last step we used (6.6.11) and the antisymmetry of Fab . Also, the second
term of the right-hand side of this equation is equal to −ε̂ab c ∂a (Fbe Z e ), i.e., is equal
to −(∇ × E)c , and hence
Suppose εabcd is the volume element associated with ηab , then it follows from (5.5.6)
that
ε̂cab = Z d εdcab , (6.6.17)
Thus, c
∂ ∂ Bi
(∇ × E)i = (∇ × E)c = −Z e ∂e Bi = − ,
∂xi ∂t
and therefore
∂B
∇×E =− .
∂t
The derivation of the other two Maxwell’s equations are left to the reader in Exer-
cise 6.16.
Remark 2 The 4-dimensional formulation of Maxwell’s equations is explicitly
Lorentz covariant, and is independent of the reference frame. The 3-dimensional
formulation of Maxwell’s equations is also Lorentz covariant, but it is not obvious
to see. Also, the 3-dimensional formulation only holds for inertial frames; for a non-
inertial frame, the equations derived from (6.6.10) and (6.6.11) will be different from
the regular 3-dimensional Maxwell equations.
[Optional Reading 6.6.2]
As a volume element associated with the induced metric h ab = ηab + Z a Z b on the
constant-t surface, ε̂cab can only be determined up to a minus sign (see the end of Optional
Reading 5.5.1), i.e., −Z d εdcab can also be taken as ε̂cab . Only after we take the orientation of
the constant-t surface into consideration can ε̂cab be uniquely determined as Z d εdcab . Unlike
the situation when we discuss Gauss’s theorem, here there does not naturally exist a manifold
N with boundary such that the constant-t surface can be treated as the boundary ∂ N , and
thus one cannot say whether its normal vector Z a is ingoing or outgoing. Equivalently, the
constant-t surface now does not have any induced orientation. The reason we write ε̂cab as
Z d εdcab rather than −Z d εdcab is based on the following consideration: the 3-dimensional
formulation of Maxwell’s equation ∇ × E = −∂ B/∂t involves curl, and the condition for
it to hold is that the chosen Cartesian coordinate system {x, y, z} is right-handed (otherwise
we have ∇ × E = ∂ B/∂t), i.e., the spatial orientation needs to be compatible with dx ∧
dy ∧ dz. Noting that εdcab = (dt)d ∧ (dx)a ∧ (dy)b ∧ (dz)c and Z d = (∂/∂t)d , we know
that the volume element ε̂cab = (dx)a ∧ (dy)b ∧ (dz)c , which is compatible with the needed
orientation.
[The End of Optional Reading 6.6.2]
6.6 Electrodynamics 223
As we have pointed out previously, charged particles are the sources of the electro-
magnetic field (manifested by J a ), whose effect on the electromagnetic field Fab is
reflected by (6.6.10). Conversely, there are also forces exerted from the electromag-
netic field on the charged particles, namely the Lorentz force
f = q( E + u × B) , (6.6.18)
where q and u represent respectively the electric charge and 3-velocity of the point
mass. Combining the above equation and the definition of the 3-force f = d p/dt
yields the equation of motion of a charged particle in an electromagnetic field (assum-
ing no other force)
dp
= q( E + u × B) . (6.6.19)
dt
It should be pointed out that the equation above is Lorentz covariant (although it is
hard to see explicitly), which is also a manifestation of the conclusion “Maxwell’s
theory of electromagnetism is endowed with Lorentz covariance”. That is, for another
inertial frame R , the equation of motion of the same point mass will have the same
form as (6.6.19), only the quantities that depend on the reference frame need to be
labeled by , i.e.,
d p
= q( E + u × B ) . (6.6.19 )
dt
Note that q does not need to be primed, since the electric charge of a point mass is
an invariant.
Proposition 6.6.6 Suppose a point mass has electric charge q, 4-velocity U a and
4-momentum P a , then the force from the electromagnetic field Fab on it (called the
Lorentz 4-Force) is
Thus, the 4-dimensional equation of motion for a point mass that only experiences
the electromagnetic force is
q F a b U b = U b ∂b P a . (6.6.21)
Fi = γ f i , (6.6.22)
F =γ f ·u,
0
(6.6.23)
F a = γ q F a b (Z b + u b ) = γ q(E a + F a b u b ) , (6.6.24)
or
Fa = γ q(E a + Fab u b ) .
Hence,
Fi = (ei )a Fa = γ q(E i + Fi j u j ) . (6.6.25)
1 1
(u × B)c = ε̂c ab u a Bb = ε̂c ab u a (−∗ Fbd Z d ) = ε̂c ab u a (− εbd e f Fe f Z d ) = − u a ε̂cab ε bde f Fe f Z d
2 2
1 1
= u a Z g εgcab ε de f b Fe f Z d = (−3!)u a Z g δ [d g δ e c δ f ] a Z d Fe f = −3u a Z g Z [g Fca]
2 2
= −u a Z g (Z g Fca + Z a Fgc + Z c Fag ) = Fca u a − Z c u a E a , (6.6.27)
where in the second last equality we used Fca = −Fac , and in the last equality we
used Z g Z g = −1, Fag Z g = E a and u a Z a = 0. It follows from (6.6.27) that
which is exactly (6.6.26). The second term on the right-hand side of (6.6.27) is nec-
essary, otherwise the time component of the right-hand side would be nonvanishing,
which contradicts the fact that (u × B)c on the left-hand side is spatial. Now we will
prove (6.6.23).
which is (6.6.23). In the second equality we used (6.6.24), in the third equality we
used (e0 )a E a = 0 and (e0 )a = −(e0 )a , in the sixth equality we used the orthogonality
between u × B and u, and in the seventh equality we used (6.6.18).
6.6 Electrodynamics 225
1 1
Tab = (Fac Fb c − ηab Fcd F cd ) , (6.6.28)
4π 4
where Fac is the electromagnetic field tensor. Using the result in Exercise 5.9, one
can also rewrite the equation above into a more symmetric form:
1
Tab = (Fac Fb c + ∗ Fac ∗ Fb c ) , (6.6.28 )
8π
where ∗ Fac is the dual form of Fac and ∗ Fb c = ηac∗ Fba . It is not difficult to verify
that this tensor has the properties 1 and 3 of an energy-momentum tensor described
in Sect. 6.4. Especially, after choosing an arbitrary inertial frame, from (6.6.28 ) one
can easily obtain that
1
T00 = (E 2 + B 2 ) ,
8π
and from (6.6.28) one can easily obtain that (see Exercise 6.17)
1
wi = −Ti0 = ( E × B)i , i = 1, 2, 3 ,
4π
which are exactly the energy density and energy flux density (which also equals the
momentum density) of the electromagnetic field measured by this inertial observer.
However, the property 2 of an energy-momentum tensor in Sect. 6.4 (i.e., ∂ a Tab = 0)
needs to be clarified here. When J a = 0 (source free), one can show that ∂ a Tab =
0 from the 4-dimensional formulation of Maxwell’s equation, i.e., a source-free
electromagnetic field obeys the conservation laws of energy, momentum and angular
momentum. However, if J a = 0, then the Tab in (6.6.28) does not satisfy ∂ a Tab = 0
[Exercise 6.18(a)]. This is quite natural, since then there are interactions between
the electromagnetic field and the charged particles, which involve the exchange of
energy, momentum and angular momentum [Exercise 6.18(b)]. Nevertheless, the
total energy-momentum tensor of the electromagnetic field and charged particles is
still conserved.
226 6 Special Relativity
Since Fab is a 2-form, one can rewrite Maxwell’s equation (6.6.11) using the notion
of exterior differentiation as dF = 0, i.e., F is a closed form. Since the background
manifold is R4 , from Remark 1 of Sect. 5.1 we can see that F is exact, i.e., there
exists a 1-form field Aa on R4 such that F = d A, or
Fab = ∂a Ab − ∂b Aa .
then it is not difficult to show that φ and aa are respectively the scalar potential and
the 3-vector potential of the electromagnetic field F (Exercise 6.19).
When F is given, the 4-potential will not be unique. Suppose A is a 4-potential
of F, and χ is an arbitrary C 2 function on R4 , then à ≡ A + dχ is also a 4-potential
of F since ddχ = 0. This is known as the gauge freedom of the electromagnetic
4-potential. One can impose an additional condition ∂ a Aa = 0 called the Lorenz7
gauge condition. The Aa that satisfies this condition always exists, since suppose
∂ a Aa = 0, then one can always choose a function χ such that à ≡ A + dχ satisfies
∂ a Ãa = 0, and to do so χ only has to satisfy ∂ a ∂a χ = −∂ a Aa . Noticing that
∂ 2χ ∂ 2χ ∂ 2χ ∂ 2χ
∂ a ∂a χ = ηab ∂b ∂a χ = − + + + ,
∂t 2 ∂x2 ∂ y2 ∂z 2
we can see that the nonzero solutions for ∂ a ∂a χ = −∂ a Aa not only exist, but also
they are numerous.
Using the 4-potential we can reformulate Maxwell’s equations. F = d A satisfies
(6.6.11) automatically, and (6.6.10) can be expressed as
− 4π Jb = ∂ a (∂a Ab − ∂b Aa ) = ∂ a ∂a Ab − ∂b ∂ a Aa . (6.6.30)
7 Named after the Danish physicist Ludwig Lorenz, not to be confused with H. A. Lorentz.
6.6 Electrodynamics 227
The equation above is equivalent to the d’Alembert equation for the scalar potential
φ and the vector potential a in the 3-dimensional formulation of electrodynamics.
For a source-free electromagnetic field, this will become a wave equation
∂ a ∂a Ab = 0 . (6.6.32)
We want to find the wave solutions of the form of Ab = Cb cos θ for (6.6.32), where
θ is a real scalar field called the phase; C b is a nonvanishing constant vector field
(“constant” means ∂a C b = 0) called the polarization vector. Plugging these into
(6.6.32) yields
cos θ (∂ a θ )∂a θ + sin θ ∂ a ∂a θ = 0 , (6.6.33)
(∂ a θ )∂a θ = 0 , (6.6.34)
∂ ∂a θ = 0
a
(6.6.35)
are solutions to the wave equation (6.6.32). Now we will discuss this important kind
of solution in detail.
Let K a ≡ ∂ a θ . We can expand K a in terms of the dual coordinate basis of an
inertial coordinate system:
(dθ )a = ∂a θ = K a = K μ (dx μ )a .
In the following, we only consider the simplest (which is also the most important)
case where K a is a constant vector field (∂b K a = 0). In this case K μ is a constant,
and integrating the above equation yields
θ = K μ x μ + θ0 (constant) . (6.6.36)
θ = −ωt + ki x i , (6.6.38)
Ab = Cb cos(ωt − ki x i ) . (6.6.39)
This solution agrees with the familiar expression for a monochromatic plane wave,
and therefore can be called a monochromatic electromagnetic plane wave. “Plane”
means that the surface S0 of constant phase at a given time t0 , i.e., a wavefront,
described by ωt0 − ki x i = ϕ0 (constant), is a 2-dimensional plane in R3 . Since
we see that k a is the normal vector of S0 . Physically, k a is called the wave 3-vector,
which represents the direction of wave propagation, and ω is called the angular
frequency of the wave. Therefore, K a is called the wave 4-vector.
Now we will discuss K a in the 4-dimensional language. Consider a hypersurface
S of constant phase in spacetime, i.e., S ≡ { p ∈ R4 |θ p =constant}. We can easily
see that K a is the normal covector of S (Theorem 4.4.2), and thus K a is the normal
vector of S . On the other hand, (6.6.34) indicates that K a K a = 0, and hence K a is
a null vector field and S is a null hypersurface. In addition, K a K a = 0 also gives
0 = ∂b (K a K a ) = 2K a ∂b K a = 2K a ∂b ∂a θ = 2K a ∂a ∂b θ = 2K a ∂a K b , (6.6.40)
and thus the integral curves of K a are null geodesics lying on S . Also, from (6.6.35)
we can see that ∂ a K a = 0.
Suppose 0 is the surface of simultaneity of {t, x i } at t0 . Let S0 ≡ S ∩ 0 (see
Fig. 6.40), then S0 is the set of all the points in 0 that have the same phase, namely
a wavefront at t0 in the 3-dimensional language. When K μ is a constant, S is a
3-dimensional plane (a null hyperplane) and S0 is a 2-dimensional plane, and thus
once again we see that (6.6.39) represents a plane wave. S can be interpreted as the
world sheet of a 2-dimensional wavefront, which describes the time evolution of the
wavefront (the propagation of the wave). Suppose 1 is the surface of simultaneity
at t1 (> t0 ), then after a time t1 − t0 , S0 will propagate to a new plane S1 ≡ S ∩ 1 .
The direction of the propagation is the direction orthogonal to S0 in 0 , and the
speed of the propagation is exactly the speed of light (which is a consequence of the
Fig. 6.40 A monochromatic electromagnetic plane wave. The world sheet of a wavefront S0 in the
3-language is a null hypersurface S . The integral curve of a normal vector K a of S represents the
world line of a photon
6.6 Electrodynamics 229
fact that S is a null hypersurface). The integral curves of the projection of K a onto
0 , i.e., the wave 3-vector ka , are orthogonal to S0 , which represent the direction
of the wave propagation, and thus in the 3-dimensional language can be regarded
as light rays. Therefore, the integral curves of K a can be regarded as light rays in
the 4-dimensional language. In this perspective, we can also naturally see that K a
deserves the name wave 4-vector.
Given a monochromatic plane wave, its wave 4-vector is also naturally given
(which is a constant null vector field in R4 ); however, we can see from (6.6.37) that
its angular frequency ω and wave 3-vector k a will depend on the inertial frame we
choose. That is, K a is absolute, while ω and k a are relative. Similarly, the K a at
any point p can also be decomposed in terms of an arbitrary instantaneous observer
( p, Z a ) as
K a = ωZ a + k a , (6.6.41)
where
ω = −K a Z a (6.6.42)
and k a can be interpreted as the angular frequency and the wave 3-vector measured
by this observer, respectively. From the fact that the wave 4-vector K a is null, i.e.,
K a K a = 0, we can easily see the following relation between ω and k a :
ω2 = k a ka = k 2 . (6.6.43)
and stipulate that the world line of the photon is a null geodesic such that its affine
parameter β satisfies
P a = (∂/∂β)a . (6.6.45)
Therefore, the world lines of the photons coincide with the integral curves of the
wave 4-vector of the corresponding electromagnetic wave. In terms of the 3 + 1
decomposition, we can follow that of a massive particle and define the time and
spatial components of a photon’s 4-momentum as the energy E and the 3-momentum
pa of the photon, respectively, i.e.,
P a = E Z a + pa . (6.6.46)
Noticing (6.6.44), we can compare the above equation with (6.6.41) and obtain
E = ω , pa = k a , (6.6.47)
i.e., the energy E and the 3-momentum pa of a photon are respectively proportional
to the angular frequency ω and the wave 3-vector k a of the corresponding electro-
magnetic wave, with a coefficient . From P a Pa = 0 one can easily see that the
energy E and the magnitude p of the 3-momentum pa has the following simple
relation:
E 2 = pa pa = p 2 . (6.6.48)
8 Note that a “photon” in the geometric optics approximation is still a classical concept since there
is no procedure of quantization. A key difference between a QED photon and this classical limit
is that the QED photon is not localizable, whereas the classical counterpart follows a specific ray
path.
6.6 Electrodynamics 231
K a Ca = 0 . (6.6.50)
This is in fact an equivalent formulation for Aa satisfying the Lorenz gauge condition. Now
let
Ca = Ca + α K a (α = constant) , (6.6.51)
then from K a K a = 0 and (6.6.49) we can easily see that the electromagnetic field Fab
= F , and thus (6.6.51) is just a gauge transformation.
corresponding to Ca satisfies Fab ab
[It follows from K a Ca = 0 that (6.6.51) guarantees K a Ca = 0, and so it is also a gauge
transformation within the Lorenz gauge condition]. Using the fact that the time component
K 0 of K a is nonvanishing, we can choose α = −C0 /K 0 so that C0 = 0. Thus, one can
always choose a proper gauge and render the polarization vector C a a spatial vector. Later
on we will assume the fact that C a is a spatial vector.
Let Z a = (∂/∂t)a represent the zeroth coordinate basis vector of an inertial frame, then from
E a = Fab Z b and Ba = −∗ Fab Z b we can derive from (6.6.49) that
[where in the second equality we used the facts that C a is spatial (Z b Cb = 0) and ω =
−Z b K b ] and also
1 1
Ba = −∗ Fab Z b = − Z b εabcd F cd = ε̂acd 2C [c K d] sin θ = ε̂acd C c K d sin θ ,
2 2
where ε̂ is the volume element associated with the spatial Euclidean metric. The above two
equations can be expressed in terms of “arrows” as
Ab = Re(Cb eiθ ) (Re stands for “take the real part”) , (6.6.55)
and then generalize C a to a constant complex vector field. This method will provide to
us even richer physics. The previous proof of K a Ca = 0 and the argument that C a can be
chosen to be a spatial vector field are still valid when C a is complex, and thus the discussions
and conclusions based on them still hold (including the transverse property of E and B).
The key consequence of C a being complex is that the linearly polarized light is generalized
to elliptically polarized light. Here we will only discuss the electric field E as an example.
Now (6.6.52) should be expressed as
E = Re[iωCe−i(ωt−ki x ) ] , (6.6.52 )
i
2μ · ν
tan 2β = , (6.6.61)
μ2 − ν 2
E 12 E2
2
+ 22 = 1 . (6.6.64)
m n
6.6 Electrodynamics 233
Thus, as time goes on, the end point of the vector E will draw an ellipse in the x y-plane, and
therefore E indeed represents elliptically polarized light. When m = n, it becomes circularly
polarized light, and when m or n is zero, it goes back to linearly polarized light.
[The End of Optional Reading 6.6.3]
Let
γ ≡ −V a Ua ,
234 6 Special Relativity
then
U a = γ V a + γ ua ,
ω = −(ωV a + k a )(γ Va + γ u a ) = γ (ω − k a u a ) .
Suppose the angle between the spatial vectors k a and u a is θ . It follows from (6.6.43)
that
ω = γ ω(1 − u cos θ ) . (6.6.65)
This is the quantitative relation of the Doppler effect. If θ = 0, i.e., the observer
moves away from the light source, then (6.6.65) gives
1−u
ω = γ ω(1 − u) = ω < ω, (6.6.66a)
1+u
which represents a redshift; if θ = π , i.e., the observer moves towards the light
source, then
1+u
ω = γ ω(1 + u) = ω > ω, (6.6.66b)
1−u
which represents a blueshift; if θ = π/2, i.e., the observer moves transversely, then
the relation of the frequencies is
ω = γ ω , (6.6.66c)
which is called the transverse Doppler effect. The above are all the Doppler effects
for a rest light source, following which one can also discuss the Doppler effects for
a rest observer (Exercise 6.20).
Exercises
˜6.1. The relative speed between two inertial observers is u = 0.6c. Both of their
clocks C and C are zeroed when they meet each other. Use a spacetime
diagram to discuss the following questions: (a) In the inertial reference frame
of C (according to its judgement of simultaneity), what is the reading of C
when the reading of C is 5 µs? (b) When the reading of C is 5 µs, what is the
actual reading of C seen by the observer carrying C?
˜6.2. A celestial object is moving away from us with a constant speed 0.8c straight
forward. The light flash it radiates has a period of 5 days when detected by
us. Using a spacetime diagram, find the period of the light flash measured by
an observer on that celestial object.
6.6 Electrodynamics 235
˜6.3. Denote the arc length of the segments oa and oe in Fig. 6.20 as τ and τ ,
respectively. (a) Express τ /τ in terms of the relative speed of the two clocks.
(b) Find the value of τ /τ in the cases where u = 0.6c and u = 0.8c.
6.4. Three inertial point masses A, B and C are aligned and moving along a
straight line (see Fig. 6.42) with relative speeds u B A = 0.6c and u C A = 0.8c.
Suppose B thinks (measures) that C moves 60 m. Make a spacetime diagram
and find the time of this process measured by A.
˜6.5. A and B are two inertial observers in the same inertial frame that are emitting
neutrons toward each other. Each neutron leaves its neutron source at a relative
speed of 0.6c. Suppose the emission rate of the source B measured by B is
104 s−1 (i.e., 104 per second). Using a spacetime diagram, find the emission
rate of the source B measured in the reference frame of a neutron emitted by
A (according to the neutron’s standard clock).
˜6.6. The mean lifetime of rest muons is τ0 = 2 × 10−6 s. A muon produced by
cosmic rays is traveling down with a constant speed 0.995c relative to the
Earth. Using a spacetime diagram, find (a) the mean lifetime of the muon
measured by an Earth observer; (b) the distance that muon travels within its
lifetime measured by an Earth observer.
6.7. From the perspective of an inertial frame R, two standard clocks C1 and C2
at a place A start to move together with a constant speed v = 0.6c after being
zeroed. Both of the clocks arrive at another place B when their reading is 1 s.
C1 turns back to A with a constant speed v right after it arrives at B, while
C2 stays at B for 1 s (according to its reading) and then gets back to A with a
constant speed v. There is another clock C3 staying at A all the time, which
is also zeroed at the time when C1 and C2 leave A. (a) Sketch the world line
of C1 , C2 and C3 . (b) Find the readings τ1 , τ2 and τ3 of these three clocks
when C2 gets back to A.
˜6.8. (Multiple choice). A pair of twins A and B stand still at the same spatial point
in an inertial frame R. At some moment when A and B are the same age, A
starts to move eastward under an inertial motion with a speed u relative to
the frame R. A while later, B also moves eastward and catches up A with a
speed v > u. When they meet each other again, A will be
(1) older than B, (2) younger than B, (3) the same age as B.
˜6.9. Two standard clocks A and B stand still at the same spatial point in an
inertial frame. At some moment, A starts to move in a straight line with a
speed u = 0.6c. 2 s later (according to the clock A), A turns around and
moves back with a speed u = 0.6c. Both of the clocks are zeroed when they
are separated. (1) Find the readings of both clocks when they meet again. (2)
What is the reading of B viewed by A when A’s reading is 3 s.
236 6 Special Relativity
˜6.10. The equatorial speed of the Earth’s rotation is about 1600 km/h. A and B are
twins standing on the equator. A flies eastward by plane along the equator
for one lap in a speed of 1600 km/h and meets B again when he gets back.
(Ignore the effects of the gravitational fields of the Earth and the Sun. We
will see in Chap. 7 that the existence of gravitational fields corresponds to
a curved spacetime). (a) Sketch the world sheet of the Earth’s surface and
the world lines of A and B (note that the motion of A cancels the Earth’s
rotation, and thus A is the inertial observer). (b) Which one of A and B is
younger? (c) What is their age difference? (Answer: about 10−7 s). NB: This
experiment has been done in 1971 using cesium atomic clocks, not humans,
of course. See Hafele and Keating (1972a; 1972b).
˜6.11. A car whose rest length is l = 5 m moves into a garage with a constant
speed u = 0.6c. The garage has a solid back wall. To simplify the problem,
we assume the information of the car’s front hitting the wall propagates
in the speed of light, and each part of the car will stop once receiving this
information. (a) Suppose the doorman of the garage measures that the reading
of a clock C at the back of the car is zero, find the reading of C when the
back of the car “learns” that the front hits the wall. (b) Find the rest length lˆ
ˆ in terms
of the car after it comes to a complete stop. (c) Express the ratio l/l
of u.
6.12. Prove Proposition 6.3.4.
˜6.13. Suppose the world line of an observer is a hyperbola G in the t x-plane
(see Fig. 6.43), which satisfies x > 0 and x 2 − t 2 = K 2 (K is a constant).
Find Aa Aa , i.e., the magnitude square of the observer’s 4-acceleration Aa .
(The result is a constant, and thus G is called an observer undergoing con-
stant acceleration motion. Note that the acceleration here refers to the 4-
acceleration).
˜6.14. Prove Proposition 6.6.2.
*6.15. Suppose the electric field and the magnetic field measured from Fab by an
instantaneous observer are respectively E a and B a (also denoted by E and
B). Show that:
(1) Fab F ab = 2(B 2 − E 2 ),
(2) Fab ∗ F ab = 4 E · B. Hint: one may write Fab ∗ F ab as the expression for
the components in terms of an inertial coordinate system.
NB: this problem indicates that, although E and B are observer-dependent,
B 2 − E 2 and E · B are independent of the observer. In fact, these are the
only two independent invariants one can construct from Fab .
˜6.16. Prove Proposition 6.6.5 (one only needs to prove the last two Maxwell’s
equations).
˜6.17. Show that the energy density and the 3-momentum density of an electromag-
netic field measured by an instantaneous observer are respectively T00 =
References 237
References
The principle of relativity requires that the laws of physics have the same mathemat-
ical expression in all inertial coordinate systems. When applied to special relativity,
this “law of laws” requires that the mathematical expressions for the laws of physics
be Lorentz covariant. Therefore, when formulating physics in the framework of
special relativity, all the known laws of physics should be inspected; those that sat-
isfy this requirement remain laws, while those that do not must be reformed until
they meet this criterion. First, we inspect Maxwell’s theory of electromagnetism.
Maxwell’s equations are endowed with Lorentz covariance (which can be seen more
explicitly in its 4-dimensional formulation, see Sect. 6.6), and thus can be integrated
into the framework of special relativity without being reformed. This is in fact not
strange at all, since one of the important reasons special relativity came about is that
Maxwell’s theory contradicts the notion of pre-relativity spacetime. Next, we will
inspect Newton’s laws of motion. As an example, consider the law of conservation
of momentum. As we pointed out at the beginning of Sect. 6.3, if the definition of
momentum p = m u is still used, then conservation of momentum violates Lorentz
covariance and must be modified. By redefining momentum as p = m u(1 − u 2 )−1/2 ,
the law of conservation of momentum is now Lorentz covariant, making it a valid law
in the framework of special relativity. Thirdly, let us inspect Newton’s theory of uni-
versal gravity. The basic equation in Newton’s theory of gravity is Poisson’s equation
∇ 2 φ = 4πρ, which indicates the relation between the gravitational potential φ and
the mass density ρ.1 This equation has Galilean covariance but not Lorentz covari-
ance, and hence should be modified. From another perspective, Poisson’s equation
∇ 2 φ = 4πρ has a solution of the following form:
1 In Chap. 6, we used ρ and μ to represent the charge density and mass density, respectively. From
this chapter on, since the charge density will show up less frequently, we will follow the convention
of the majority and use ρ to represent the mass density.
© Science Press 2023 239
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_7
240 7 Foundations of General Relativity
ρ( r , t)
φ(
r , t) =
dV ,
|
r − r|
which indicates that the gravitational potential φ at a point r and a time t is determined
by the mass density ρ at all spatial points at t. This means that the gravitational field
has an infinite speed of propagation, which obviously contradicts special relativity.
Thus, Newton’s theory of gravity must be modified.
The form of Newton’s law of universal gravity is quite similar to Coulomb’s law
of electrostatics. Since James Clerk Maxwell reformulated and generalized electro-
statics to such a beautiful theory of electromagnetism, it does not seem that it should
be difficult to reformulate Newton’s theory of gravity into a theory that fits in the
framework of special relativity. However, the situation is much more complicated
than this. The key point is, although the law of universal gravity and Coulomb’s law
are similar, there exists a “sign difference”. There are two types of electric charge
(positive and negative; like charges repel, while opposite charges attract); however,
masses can only be positive, and hence can only attract other masses. Following the
theory of electromagnetism, one might construct a gravitational theory within the
framework of special relativity, and according to this theory there will be gravita-
tional waves similar to electromagnetic waves when the gravitational field changes,
which also propagate at the speed of light. Unfortunately, due to the sign difference
we just mentioned, the energy carried away by such a gravitational wave has to be
negative. This means that the energy of a system will increase when radiating gravi-
tational waves, which will result in the intensity of the radiation increasing, bringing
more energy into the system. This cycle inevitably leads to physically absurd con-
sequences. Although this difficulty can be overcame by modifying the theory, new
difficulties will show up. In fact, there exists far from one gravitational theory in the
framework of special relativity; however, each theory has its own problems. Although
one cannot completely rule out the possibility of building a satisfying gravitational
theory in the framework of special relativity, Albert Einstein struck out on his own
and successfully created a revolutionary gravitational theory independent of special
relativity, this brand new theory is named general relativity. Interestingly, after hav-
ing tried to modify a gravitational theory in the framework of special relativity in
order to overcome its difficulties, what people obtained at last is a theory exactly the
same as Einstein’s general relativity!
There are two important factors that motivated Einstein to set up general relativ-
ity: the “universality” of gravity and Mach’s principle. Here we will only introduce
the former one. The meaning of the “universality” of Newtonian gravity is twofold:
① Every massive object exerts forces on other massive objects as a source of the
gravitational field, and any massive object in a gravitational field will in return expe-
rience a gravitational force. (A neutral object in an electrostatic field neither exerts
nor experiences any electric force, and hence the electric force is not universal). ②
Any two objects with the same initial position and velocity experiencing only a grav-
itational force must have the same position and velocity as one another at any given
moment, regardless of their mass and composition. This conclusion has been verified
by numerous increasingly precise experiments, which can be expressed as: any two
7.1 Gravity and Spacetime Geometry 241
point masses at the same point in a gravitational field have the same gravitational
acceleration. Although this is not a surprising conclusion at all, why is this so? Two
point charges in an electrostatic field are not like this. Suppose the mass of a point
then the electric force
charge q is m, located at a place where the electric field is E,
acting on it is f = q E,
and the acceleration it acquires is
f q
a = = E . (7.1.1)
m m
If we place another point charge q with a mass m at this same point, then its
a and a are not equal unless they have the same
acceleration will be a = (q /m ) E.
charge-to-mass ratio. When having a similar discussion about gravity, we may also
distinguish the “mass” and the “charge”. The “charge” of a point mass is a measure
of the amount of matter it contains, which determines the force it experiences in
a gravitational field, and thus can be called the gravitational mass, denoted by
m G ; the “mass” of a point mass is a measure of its inertia, which determines its
acceleration when a force is applied, and thus can be called the inertial mass, denoted
by m I [i.e., the m in (7.1.1)].2 Following the discussion above it is not difficult
to determine that the gravitational acceleration of a point mass in a gravitational
field is a = (m G /m I )g , where g is the gravitational field strength at this point. If
different point masses have different mass-to-charge ratios, then they cannot have the
same gravitational acceleration at the same point in a gravitational field. However,
countless experiments, each one more precise than the last, have shown that the ratio
m G /m I is the same for any point mass; by adjusting the gravitational constant G one
can even set the ratio to 1 and make it as simple as m G = m I . This fact is usually
called the equivalence principle (see Sect. 7.5 for details). This is an extremely
unusual experimental fact which deserves serious consideration. The “charge” and
“mass” for gravity are two completely different concepts, so how could they be
equal? This question cannot be answered by Newton’s theory of gravity. In Newton’s
theory of gravity, this is admitted as an experimental fact (it is an axiom in Newton’s
formalism). Is m G = m I just a coincidence? Could there be any deeper reason hiding
underneath this fact? Could there exist a theory that is more beautiful, in which
m G = m I can be proved by reasoning? Pondering over the equivalence principle,
in addition to the inspiration from Mach’s principle, led Einstein to the creation of
general relativity.
The fact that m G = m I is equivalent to the fact that all the objects in a gravitational
field that experience no force other than gravity and have the same position and
velocity will “march together”. This kind of characterless collective behavior strongly
implies that gravity is an intrinsic property of the whole spacetime background, which
2 Up to now, we have been discussing in terms of Newtonian gravity. In Newton’s theory of gravity,
there are two types of gravitational mass: active and passive. The former refers to the mass of an
object as a source of its gravitational field, which determines the strength of the gravitational field
it produces; the latter refers to the gravitational mass of the object as a test point mass in an external
gravitational field, which determines the strength of the gravitational force it experiences in a given
gravitational field. The gravitational mass in the main text refers to the passive gravitational mass.
242 7 Foundations of General Relativity
is substantially different from all the other forces. Physics is the study of the motion
(evolution) of physical objects. Physical objects can be compared to actors. Just like
the performance of actors cannot be done without a stage, the evolution of physical
objects also always happens on some kind of stage (or background), and this stage
(background) is spacetime. Before general relativity came out, people used to assume
that the background spacetime of relativity is Minkowski spacetime. Minkowski
spacetime is so simple that people often forgot that it exists. The “marching together”
phenomenon in the gravitational field attracted Einstein’s attention to the spacetime
background. Just like the actors on a lifting stage can be raised simultaneously without
any effort due to the behavior of the stage itself, this “marching together” phenomenon
under the gravitational force rather strongly implies that gravity is purely an effect
of spacetime background. One may speculate as follows: when gravity is negligible,
then the spacetime is flat; when gravity is non-negligible (e.g., when the gravitational
field of the Earth or the Sun must be considered), the spacetime becomes curved,
and how it is curved depends on the distribution of the matter which produces the
gravitational field. According to this hypothesis, gravity is so distinct from other
forces that in the 4-dimensional language it is not even a force, but the effect of
curved spacetime! Therefore, a point mass that experiences no force other than gravity
should be called a free point mass. Recalling that the world line of a free point mass in
Minkowski spacetime is a geodesic, one can further assume naturally that the world
line of a free point mass in curved spacetime is also a geodesic (of that spacetime).3
A free point mass is the simplest point mass and a geodesic is the simplest world
line, and thus this assumption is also in conformity with aesthetic principles. Instead
of a 4-dimensional force called “gravity” exerting on a point mass, the existence of
gravity is manifested by a curved spacetime, which changes the motion of a point
mass by changing its geodesic. This is the most basic postulate of general relativity.
Based on this postulate, one can deduce m G = m I as a logical consequence. (Here
we come to the decisive step). Suppose two free point masses have the same initial
velocity and position, i.e., their world lines intersect and their tangent vectors are
equal at the intersection. Since the world line of a free point masses is a geodesic,
which is uniquely determined by the initial conditions, i.e., the starting point of the
geodesic and the tangent vector there (see Theorem 3.3.4), these two world lines must
coincide. Translating to the language of physics, this is to say that the states of two
free point masses with the same initial condition in a gravitational field must be the
same at any time later, which is exactly an equivalent expression for m G = m I . Thus,
once realizing that gravity in essence is the curvature of 4-dimensional spacetime,
the experimental fact m G = m I , which was long mysterious in origin, is now a very
natural conclusion. In its unique and elegant way, general relativity interprets gravity
as a geometric effect of a 4-dimensional spacetime for the first time (which also
unifies gravity and geometry for the first time), and the key to success is adding
the time dimension. Solely using the 3-dimensional spacetime one cannot interpret
gravity as a geometric effect.
3The gravitational field produced by the point mass is ignored (similar to the treatment of a test
charge in electromagnetism).
7.1 Gravity and Spacetime Geometry 243
Remark 1 Optional Reading 8.3.2 will provide a more specific interpretation for the
statement “gravity is an effect of curved spacetime” in detail.
F a = U b ∂b P a , (7.1.2)
where ∂b is the derivative operator associated with the Minkowski metric ηab . When
gravity exists, a natural assumption is to change the ∂b in the above equation to
the derivative operator ∇b associated with the curved metric gab , and to regard the
4-force on a free point mass as vanishing. Hence, its equation of motion is
0 = U b ∇b (mU a ) = mU b ∇b U a , (7.1.3)
and thus a free point mass moves along a geodesic. This proposition is very similar to
the corresponding proposition without gravity, the only difference is: when gravity
does not exist, the world line of a free point mass is a geodesic of Minkowski space;
when gravity exists, the world line of a free point mass is a geodesic in curved space-
time. This is exactly a manifestation of the fact that general relativity is independent
of special relativity. In general relativity, gravity is not represented by a 4-force on
the left-hand side of the equation of motion (7.1.2), but its effect on the motion of
a point mass is manifested by making the spacetime curved and requiring the point
mass to move along a geodesic in the curved spacetime. In other words, the effect of
gravity is substituting ∇b for ∂b on the right-hand side of (7.1.2).
(c) The way that the spacetime is curved is affected by the matter distribution. The
specific relation is described by Einstein’s equations. [For details, see Sect. 7.7; once
we have Einstein’s equations, it will be clear that (b) is not an independent postulate
any more].
244 7 Foundations of General Relativity
It can be proved that when gravity is weak enough, and the velocity of the point
mass is low enough, the calculation results of general relativity agree with those of
Newtonian mechanics approximately. Thus, Newtonian mechanics can be regarded as
the weak-field and low-speed approximation of general relativity mechanics (see Sect.
7.8.2).Nonetheless,weshouldpointoutthat,althoughtheresultsareapproximatelythe
same, the viewpoints are explicitly different. Take the free fall of an apple as an exam-
ple. According to Newtonian mechanics, this apple acquires an acceleration because
it experiences the Earth’s gravity, and thus undergoes a non-inertial motion. However,
according to general relativity, the apple does not experience a 4-force, and thus is a
free point mass. The effect from the Earth is that the spacetime becomes curved, and
the world line of the apple is a geodesic in this curved spacetime, whose 4-acceleration
(defined as Aa ≡ U b ∇b U a , whereU a is the 4-velocity and ∇b is the derivative operator
associated with the metric of the curved spacetime) is zero. That is, for the same motion
oftheapple’sfreefall,inNewton’stheoryithasa(3-dimensional)acceleration(relative
to an inertial frame), while in general relativity it does not have any (4-dimensional)
acceleration. Conversely, now suppose the apple is at rest on the ground. In Newton’s
theory, the Earth’s gravity is canceled by the normal force from the ground, and thus the
appleremainsatrestwithazero(3-dimensional)acceleration,whichundergoesaniner-
tial motion; while in general relativity, the apple only experiences one 4-dimensional
force (the normal force from the ground), and thus its world line is not a geodesic and
its (4-dimensional) acceleration is nonzero. Have you realized that while you sit cosily
reading this book, your 4-dimensional acceleration is not zero due to the curved space-
time caused by the Earth?
Attributing gravity to curved spacetime is a great triumph of human wisdom.
Bernhard Riemann presented the concept of the intrinsic curvature as well as how
to compute it when he was only 28 years old (in 1854). Before his early death (at
age 40), Riemann had attempted to find a theory that unifies electromagnetism and
gravity. The most important reason that it did not work out is that he focused on
space and the spatial curvature rather than spacetime and the spacetime curvature. It
was not until 1905 when special relativity came out that space and time were treated
equally (in fact, it was not until 1908 when Hermann Minkowski brought up the
absolute concept of spacetime, see Chap. 14 in Volume II). Finally, a few years after
that, the groundbreaking idea that “gravity in essence is the curvature of spacetime”
is gradually established along Einstein’s conception of general relativity.
In the view of general relativity, every physical phenomenon is nothing but the evolu-
tion of physical objects in some curved spacetime background (M, gab ). Therefore, to
study physics from the viewpoint of general relativity, one first needs to find the evo-
lution equations of those physical objects on the given curved spacetime background.
Since the gravitational field in practical life or in a laboratory is too weak, the differ-
ence between general relativity and Newton’s theory of gravity is normally hard to
7.2 Physical Laws in Curved Spacetime 245
be measured, and it is hopeless to deduce the physical laws in curved spacetime from
observations or experiments. Therefore, one can only “guess” these laws by making
hypotheses based on some fundamental principles, and the validity of the hypotheses
can be verified by the consistency of the conclusions derived from them as well as,
if possible, the results of the experiments. Of course, this “guess” is warranted, and
one of the important bases is the principle of general covariance. When producing
general relativity, Einstein proposed the following principle of general covariance:
the mathematical expressions for all physical laws does not change under an arbitrary
coordinate transformation. However, an article by E. Kretschmann in 1917 argued
that this formulation for the principle of general covariance imposes no restriction
on the laws of physics. Even Newton’s equation of motion can be made generally
covariant by a non-substantive reformulation [see Ohanian and Ruffini (1994)]. This
criticism triggered a heated discussion among physicists (including Einstein him-
self), and thus many different formulations for the principle of general covariance
were raised. Here we introduce a formulation as follows that not only grasps the
essence but is also convenient to apply [see Wald (1984) pp. 57, 68]:
the formulation for the principle of general covariance by Einstein). In contrast, the
Christoffel symbols σ μν do not obey the tensor transformation law, which means
an equation that contains Christoffel symbols is not an equality of tensors, and thus
is not generally covariant. However, in textbooks that use abstract indices, even the
Christoffel symbol c ab is regarded as a tensor (associated with a coordinate system);
the same holds for ∂a vb , the result of ∂a of a coordinate system acting on a vector
field va . The equations that contain c ab and ∂a vb are still to be viewed as equalities
of tensors. The reason why they are not generally covariant is because they do not
satisfy the formulation for the principle of general covariance we introduced above,
since they contain quantities not derivable from gab , i.e., c ab and ∂a vb , which puts
the coordinate system corresponding to c ab and ∂a vb in a special position. In a word,
both kinds of textbooks say that an equation containing Christoffel symbols is not
generally covariant, but their reasons are different (due to different formulations of
the principle of general covariance).
Based on the discussion above, we can put forward two principles that the physical
laws in curved spacetime must obey: (a) the principle of general covariance; (b) when
gab equals the Minkowski metric ηab , they should go back to the physical laws in
special relativity.4 Although these two necessary criteria cannot uniquely determine
the physical laws in curved spacetime, one can use them as guidance, together with
physical and aesthetic considerations, to acquire the physical laws naturally in many
cases. Since the difference between general relativity and special relativity is nothing
but the difference between the spacetime background [i.e., between (M, gab ) and
(R4 , ηab )], the 4-dimensional description of physical objects in special relativity
can be naturally generalized to general relativity. For instance, the world lines of
point masses and photons are still timelike and null curves, respectively (of course,
this actually already generalizes the connotation of “the principle of invariant light
speed” and “point masses must move slower than light” to general relativity); the
proper time of a point mass is still the length of its world line, the 4-velocity U a
of a point mass is still defined as the unit tangent vector of its world line, and the 4
momentum is still defined as P a := mU a (m is the rest mass); the energy of a point
mass relative to an instantaneous observer ( p, Z a ) is still defined as E := −P a Z a ,
and an electromagnetic field is still described by a 2-form field Fab , etc. In order to
find the physical laws obeyed by these physical quantities, in most of the cases one
only needs to substitute all the ηab and ∂a in the expressions for the corresponding
laws in special relativity with gab and ∇a . This method may be dubbed the “minimal
substitution rule”. It is easy to see that a formula obtained in this manner obeys the
two principles we stated above. Here are some examples of applying this rule: the
4-acceleration of a point mass in curved spacetime is defined as
Aa := U b ∇b U a , (7.2.1)
4 Principle (a) is put in the same way in all textbooks (although the formulation for the principle of
general covariance may be different); however, there are at least two ways of stating the principle
(b) in different books. The other one is: (b) the equivalence principle. With regard to the effects of
the physical laws being derived, these two ways are equivalent. For details, see Sect. 7.5.
7.2 Physical Laws in Curved Spacetime 247
F a := U b ∇b P a . (7.2.2)
For a free point mass, F a = 0 (gravity is not a 4-force!), and the equation above
becomes U b ∇b U a = 0, i.e., the geodesic equation, which agrees with the basic pos-
tulate (b) of general relativity (see Sect. 7.1). For a point mass in an electromagnetic
field, its equation of motion is then
q F a b U b = U b ∇b P a . (7.2.3)
Note that the effect from the electromagnetic field Fab on the point mass is manifested
on the left-hand side of the equation (as a 4-force q F a b U b ), while the effect from
gravity on the point mass is manifested on the right-hand side of the equation (by
the derivative ∇a not being ∂a ). The equations of motion of the electromagnetic field
Fab (Maxwell’s equations in curved spacetime) should be
1 1
Tab = (Fac Fb c − gab Fcd F cd ) . (7.2.6)
4π 4
Another important basis for this equation holding in curved spacetime is that it
satisfies ∇ a Tab = −Fbc J c [see Exercise 6.18 (a)], which indicates that the total
energy, momentum and angular momentum of the electromagnetic field and charged
particle field are all conserved (see the end of Sect. 6.6.4). The reader should verify
this equation.
Since (7.2.5) can be expressed as dF = 0, we can at least locally introduce an
electromagnetic 4-potential A such that F = d A, and hence (7.2.4) can be expressed
in terms of A as
− 4π Jb = ∇ a (∇a Ab − ∇b Aa ) = ∇ a ∇a Ab − ∇ a ∇b Aa . (7.2.7)
In special relativity, the second term on the right-hand side of the equation above
is −∂ a ∂b Aa , which can be easily rewritten as −∂b ∂ a Aa , and then using the Lorenz
gauge condition we can express (7.2.7) in special relativity as
However, now ∇a and ∇b do not commute, if we want to use the Lorenz condition
∇ a Aa = 0 we need to rewrite the second term on the right-hand side of (7.2.7) using
(3.4.4) as −∇ a ∇b Aa = −∇b ∇ a Aa − Rb d Ad = −Rb d Ad , which turns (7.2.7) into
248 7 Foundations of General Relativity
∇ a ∇a Ab − Rb d Ad = −4π Jb . (7.2.8)
Interestingly, if we use the minimal substitution rule directly to the equation (6.6.31)
in special relativity, we have
∇ a ∇a Ab = −4π Jb , (7.2.9)
which is obviously different from (7.2.8). This example indicates that the minimal
substitution rule does not uniquely determine the physical laws in some circum-
stances. More consideration needs to be taken when cases like this are encountered.
For this example, it can be shown that (7.2.8) leads to the law of charge conservation
∇a J a = 0 (Exercise 7.1) while (7.2.9) does not. From this physical consideration,
we choose (7.2.8) as the equation of motion of the 4-potential A. The ambiguity of
this example comes from the non-commutativity of the derivative operators, which is
a problem that all the equations containing second or higher derivatives (with two or
more ∇a acting successively) will encounter when transferred from special relativity
to general relativity. The reader may compare this with the following fact: When
transferred from classical mechanics to quantum mechanics, the non-commutativity
of the operators is also the source of ambiguity.
[Optional Reading 7.2.1]
For a source-free electromagnetic field, (7.2.8) becomes
∇ a ∇a Ab − Rb d Ad = 0 . (7.2.8 )
Inspired by the discussion at the end of Sect. 6.6.5 (before Optional Reading 6.6.5), we
want to consider a wave solution Ab = Cb cos θ of the equation above, which is a product
of the “slowly changing” amplitude Cb and the “rapidly changing” phase factor, and look
for the possibility of applying the geometric optics approximation. The difference between
(7.2.8 ) and the corresponding equation ∂ a ∂a Ab = 0 in Minkowski spacetime is that the
former contains the curvature term Rb d Ad , which needs to be negligible if we want to apply
the geometric optics approximation. Consider three length scales as follows:
(1) The characteristic length L̃ above which the change of Cb or K a ≡ ∇ a θ is notable;
(2) The length that describes the “magnitude” of the spacetime curvature
R̃ ≡ |Rμνσρ |−1/2 ,
where Rμνσρ is a typical component of Rabcd in a typical local inertial frame (see Sect. 7.5
for details);
(3) The wavelength λ (λ ≡ 2π/ω, ω ≡ −Z a K a ) of Ab relative to the local inertial frame
we mentioned above.
If these three satisfy λ L̃ and λ R̃, then both the derivative term ∇ a ∇a Cb and the
curvature term Rb d Ad can be neglected, and thus we have approximately
(∇ a θ)∇a θ = 0 . (7.2.10)
Hence, K a ≡ ∇ a θ is still the null normal vector of the null hypersurface S = { p ∈ R4 | θ p =
C} (C = constant), the integral curves of K a are still null geodesics (the proof is similar to
Sect. 6.6.5, note that ∇a being torsion free assures that ∇a ∇b θ = ∇b ∇a θ), a light signal still
7.3 Fermi-Walker Transport and Non-Rotating Observers 249
propagates along a null geodesic, and the angular frequency of the electromagnetic wave
(photon) relative to an observer with a 4-velocity Z a is still
ω = −K a Z a , (7.2.11)
and so on. Thus, the geometric optics approximately holds when λ L̃ and λ R̃. This
approximation is used in many places in this text (such as Sect. 9.2.1 and Sect. 10.2.2).
References for the geometric optics in curved spacetime are: Wald (1984) p. 71; Misner et al.
(1973) Sect. 22.5; Straumann (1984) pp. 100–103.
[The End of Optional Reading 7.2.1]
d∗ F = 4π ∗ J , (7.2.4 )
dF = 0 , (7.2.5 )
where ∗ F is the dual form of F ≡ Fab (see Sect. 5.6), which is still a 2-form, and ∗ J is the
dual 3-form of the 1-form Ja . The equivalence of (7.2.5 ) and (7.2.5) can be seen directly
from the definition of exterior differentiation, while the equivalence of (7.2.4 ) and (7.2.4) is a
bit more tricky to show. By definition, (d∗ F) f ab = d f (εabcd F cd /2) = 3∇[ f (εab]cd F cd )/2.
Contracting the right-hand side with εe f ab yields 3εe f ab εcdab (∇ f F cd )/2 = −3×
4δc e δd f (∇ f F cd )/2 = −6∇ f F e f , and so εe f ab (d∗ F) f ab = 6∇ f F f e . Contracting this equa-
tion again with εegcd yields −(d∗ F)gcd = εegcd ∇ f F f e . It is not difficult to see from the
definition ∗ Jgcd ≡ J e εegcd that the above equation can be expressed as (7.2.4 ) if and only
if (7.2.4) holds. Thus, (7.2.4 ) and (7.2.4) are equivalent.
[The End of Optional Reading 7.2.2]
After reading Sect. 7.1, many readers may want to learn more about topics like the
equivalence principles, Einstein’s elevator, local inertial frames, and the relationship
between gravity and an inertial force. To have a precise understanding of these topics
some basic concepts will be necessary. This section will introduce an important one,
namely the concept of a non-rotating observer. (The observer in an Einstein elevator
is not only free-falling, but also non-rotating).
Imagine you are traveling around the world on an airplane. A small arrow is fixed
in front of you, perpendicular to your chest and pointing away from you. At a proper
time τ1 you take a nap, and until you wake up at τ2 , the arrow will of course still be
perpendicular to your chest, but the spatially-pointed direction can be different from
that at τ1 since the motion of the airplane is arbitrary. If the pointed direction has
changed, it is natural to say that the arrow “changed its direction” in τ ≡ τ2 − τ1 ,
or it rotated in τ . However, what does it mean by “changing its direction”? How do
we judge if the direction is changed or not? This is actually to ask: what is a rotation?
How to determine if a rotation occurs? The answer is clear in Newtonian mechanics:
250 7 Foundations of General Relativity
the axis of a gyroscope flywheel (or “a gyroscope axis” for short) represents a fixed
direction [see Sachs and Wu (1977) pp. 50, 52]. If you have a gyroscope in your hand,
the arrow and the gyroscope axis are parallel at τ1 but not parallel at τ2 , then we can
conclude that the arrow has rotated within τ . This criterion can be generalized to
general relativity.
Now we translate this criterion into the 4-dimensional language. Let G(τ ) repre-
sent your world line, then at the time τ1 the arrow is represented by a spatial vector wa
at a point p1 ≡ G(τ1 ) (“spatial” means it is perpendicular to your 4-velocity Z a | p1
at τ1 ). For convenience’s sake, we set the magnitude of wa to 1. As your proper time
flows, the arrow corresponds to a spatial vector field with unit length on the curve
G(τ ). Similarly, if we also represent the direction of the gyroscope axis at each time
using a unit vector, then the gyroscope axis corresponds to another spatial vector field
X a with unit length on G(τ ). The 3-dimensional description we mentioned before
indicates that wa and X a coincide at p1 ≡ G(τ1 ) but do not coincide at p2 ≡ G(τ2 )
(see Fig. 7.1). Since we stipulate X a to represent the non-rotating direction, we say
that wa rotated in τ ≡ τ2 − τ1 . To describe the rotating vector field wa on the world
line G(τ ), we should first describe the non-rotating vector field X a , since it is the
criterion for measuring the rotation of wa . As a non-rotating spatial vector field on
G(τ ), what mathematical property does X a have? A natural guess is: X a is a vec-
tor field parallelly transported along G(τ ). However, except for special cases, this
is not a correct guess. The key point is that the vector field parallelly transported
along G(τ ) determined by a spatial vector X a | p1 at p1 ≡ G(τ1 ) is not a spatial vec-
tor field in general. [Proof: suppose X a is parallelly transported along G(τ ), then
Z b ∇b (X a Z a ) = X a Z b ∇b Z a = X a Aa , where ∇a is the derivative operator associ-
ated with the spacetime metric gab , and Aa is the 4-acceleration of G(τ ). As long
as G(τ ) is not a geodesic, and X a is not orthogonal to Aa , then the right-hand side
of the above equation is nonzero. Hence, X a Z a is not a constant along G(τ ), and
cannot be everywhere vanishing on G(τ )]. To describe the motion of a non-rotating
spatial vector field X a along G(τ ), E. Fermi (in 1922) and A. G. Walker (in 1923)
introduced a derivative notion along a curve, which is of physical importance and
closely related to, but different from, a covariant derivative. This derivative, dubbed
the Fermi-Walker derivative, is defined as follows:
Definition 1 Suppose G(τ ) is a timelike curve5 (where τ is the proper time) in the
spacetime (M, gab ), and FG (k, l)6 represents the collection of all smooth tensor
fields of type (k, l) along G(τ ). A map D F /dτ : FG (k, l) → FG (k, l) is called a
Fermi-Walker derivative operator (or Fermi derivative for short) if it satisfies the
following conditions:
(a) Linearity ;
(b) Leibniz rule ;
(c) Commutativity with contraction ;
DF f df
(d) = ∀ f ∈ FG (0, 0) ; (7.3.1)
dτ dτ
a a
DF v Dv
(e) = + (Aa Z b − Z a Ab )vb ∀va ∈ FG (1, 0) , (7.3.2)
dτ dτ
Remark 1 Condition (e) stipulates the expression for the Fermi derivative of a vector
field, and combining it with the other conditions yields the results of D F /dτ acting
on an arbitrary tensor field.
5 We only discuss the case where G(τ ) is a non-self-intersecting curve, otherwise one will encounter
causal difficulties (see Chap. 11 in Volume II). In fact, the timelike curves representing observers
in this text are all assumed to be non-self-intersecting curves.
6 Note that we are abusing the notation here, since F (k, l) technically denotes the collection of
M
all the tensor fields of type (k, l) on the manifold M but some fields in FG (k, l) here do not lie on
the curve G.
252 7 Foundations of General Relativity
Proof Property (1) can be easily seen from (7.3.2). Property (2) can be easily proved
from (7.3.2) and the definition of Aa (using Aa Z a = 0). The proof for property (3)
is left as Exercise 7.3. The proof for property (4) is as follows:
Remark 2 Property (1) of the Fermi derivative indicates that Fermi transport along
a geodesic is parallel transport; property (2) indicates that the 4-velocity of G(τ ) is
always Fermi transported along G(τ ); from property (4) we can see that DF va /dτ =
0 = DF u a /dτ ⇒ d(gab va u b )/dτ = 0, which can be abbreviated as “Fermi transport
preserves the inner product”, similar to “parallel transport preserves the inner prod-
uct”.
Proof Omitted. [The reader may refer to Sachs and Wu (1977) p. 51 and the reference
therein].
Remark 3 ① From the fact that Z a is Fermi transported along G(τ ) and that Fermi
transport preserves the inner product, we can see that the vector field va Fermi
transported along G(τ ) determined by a spatial vector va | p ∈ V p is everywhere per-
pendicular to Z a , and hence is a spatial vector field. ② Each basis vector of an
orthonormal tetrad (whose zeroth basis vector equals Z a | p ) at p ∈ G determines a
vector field Fermi transported along G(τ ) based on Proposition 7.3.2, and from the
fact that Fermi transport preserves the inner product we can see that these four vector
fields are orthonormal at each point of the curve. Thus, an orthonormal tetrad at p
uniquely determines an orthonormal tetrad field Fermi transported along G(τ ), in
which the zeroth basis vector field is the tangent vector field Z a along G(τ ).
Fermi transport has an important physical meaning: the necessary and sufficient
condition for a spatial vector field wa with a constant magnitude on a world line
G(τ ) to have no spatial rotation is that wa is Fermi transported along G(τ ), i.e.,
DF wa /dτ = 0 (for the reason see Proposition 7.3.6). Therefore, a gyroscope axis
(which can be viewed as a unit vector) is a spatial vector field Fermi transported along
7.3 Fermi-Walker Transport and Non-Rotating Observers 253
tran w a'
slat C'2
ion
o
the world line of the gyroscope. For instance, suppose {t, x, y, z} is a Lorentzian
system of Minkowski spacetime, and G(τ ) is a t-coordinate line of this system,
then the coordinate basis vectors (∂/∂t)a , (∂/∂ x)a , (∂/∂ y)a , (∂/∂z)a are all Fermi
transported along G(τ ), and thus the latter three are non-rotating spatial vector fields
on G(τ ), which physically represent the three axes of the gyroscope (orthogonal to
each other). Conversely, if a spatial vector wa (with a constant magnitude) is not
Fermi transported along G(τ ), then it has a spatial rotation.
In order to introduce Proposition 7.3.6, we first talk about the definition of a spatial
rotation. In Newtonian mechanics, any motion of a rigid body can be decomposed
into a translation and a rotation. Figure 7.2 represents the motion of a rigid body
from a configuration C1 to another configuration C2 . This can be done in two steps:
first move to a configuration C2 by a translation, and then arrive at C2 by a rotation
with respect to a fixed point o (called the “base point”). To describe this rotation,
one can choose another point of the rigid body, whose position turns from a to a
during the rotation. Just as the motion of the base point represents the translation of
the body, the motion of the point a (from a to a ) represents the rotation of the body.
be the position vector of a relative to o, then the rotation of the rigid body is
Let w(t)
manifested by dw(t)/dt = 0, and thus can be described by the rotation of the vector
w(t). with one end fixed at o is said to be rotating if
More precisely, the vector w(t)
there exists a vector ω(t)
such that
dw(t)
= ω(t)
× w(t)
, (7.3.5)
dt
where ω(t)
is called the (instantaneous) angular velocity of the rotation. Noticing
that d(w · w)/dt
= 2w · dw/dt
= 2w
· (ω
× w)
= 0, we can see that a rotation pre-
serves the magnitude of a vector. From the above definition of a vector’s rotation,
one can prove using Newtonian mechanics that a gyroscope axis (as a unit vector)
is non-rotating, i.e., its ω
= 0. Hence, a gyroscope axis represents a non-rotating
direction.
To generalize the Newtonian definition above for a rotation of a vector to special
relativity (and then to general relativity), we first rewrite (7.3.5) in terms of the
components in a Cartesian system (or physically called a Galilean system) as
dwi (t)
= εi jk ω j wk , (7.3.5 )
dt
254 7 Foundations of General Relativity
and imagine that there is an observer G at the base point o (the end of ω). Since o
is at rest relative to an inertial frame, the world line G(τ ) of G should be a geodesic
when carried over to special relativity, and w is a spatial vector field wa on the curve.
Let {t, x } represent the coordinates of the observer G’s inertial frame, then on G(τ )
i
where wi and ω j are the ith and jth components of wa and ωa , respectively, in
the system {t, x i }. For any point p on G(τ ), if we lower the index of the angular
velocity vector ωa and make it an angular velocity 1-form ωa using the induced
metric h ab of W p , and use ab to represent the dual differential form of ωa in W p ,
i.e., ab ≡ (∗ ω)ab = ωc εcab (where εcab is the volume element associated with h ab ),
then ab is called the angular velocity 2-form, using which one can rewrite (7.3.6)
as
dwi
= −i j w j . (7.3.7)
dτ
Take an orthonormal spatial triad field {(ei )a } on the world line such that (e3 )a
is parallel to ωa , then ω1 = ω2 = 0, ω3 = 0, and so we can say that wa is rotat-
ing with respect to the axis (e3 )a . On the other hand, from ab = ωc εcab we know
that {ω1 = ω2 = 0, ω3 = 0} corresponds to {23 = 31 = 0, 12 = 0}, and hence
one can also say that ωa is rotating in the (1, 2)-plane (generally, a rotation in the
(i, j)-plane means that the nonzero components of ab are i j and ji ). These two
statements are equivalent for a 3-dimensional vector space W p , but the latter one is
more convenient to be carried over to 4 dimensions. Now, it is not necessary to restrict
the spatial rotation of a spatial vector field on a geodesic in Minkowski spacetime.
Here we will generalize the definition for the “spacetime rotation” of an arbitrary
vector field on an arbitrary timelike curve in any spacetime.
Dva
= −ab vb , (7.3.8)
dτ
then we say that va undergoes a spacetime rotation with an angular velocity ab .
In other words, the angular velocity 2-form for the spacetime rotation of va is ab .
If Dva /dτ = 0, then we say va has no spacetime rotation.
Proposition 7.3.3 Suppose two vector fields va and u a on G(τ ) undergo the same
spacetime rotation ab , then va u a is a constant on G(τ ).
7.3 Fermi-Walker Transport and Non-Rotating Observers 255
Proof
D a Dva Du a
(v u a ) = u a + va = u a (−ab vb ) + va (−ab u b ) = −2ab v(a u b) = 0 ,
dτ dτ dτ
a
Z |p Z
a
|p
a
p A |p
1
Aa | p = lim ( Z̃ a | p − Z a | p ) ,
τ →0 τ
Proposition 7.3.6 The necessary and sufficient condition for a spatial vector field
wa with a constant magnitude on the world line G(τ ) of an observer to have no
spatial rotation is that wa is Fermi transported along G(τ ), i.e., D F wa /dτ = 0.
Proof Since wa has a constant magnitude, from the paragraph above Remark 4 we
know that wa undergoes a spacetime rotation, i.e., there exists an ab such that
Dwa /dτ = −ab wb . Combining this with ˆ ab ≡ ab − ˜ ab yields
Dwa
ˆ ab wb =
− ˜ ab wb .
+ (7.3.9)
dτ
7.3 Fermi-Walker Transport and Non-Rotating Observers 257
DF wa
ˆ ab wb .
= − (7.3.10)
dτ
Since ˆ ab represents the spatial rotation of wa , the necessary and sufficient condition
for a spatial vector field wa with a constant magnitude to have no spatial rotation is
that DF wa /dτ = 0.
ˆ ab (the
Conversely, suppose wa has a spatial rotation, let ωa be the dual form of
Hodge dual in the 3-dimensional space W p of p ∈ G), i.e.,
ˆ ab = ωc εcab .
(7.3.11)
DF wa
= −εa bc wb ωc . (7.3.12)
dτ
Or, let εabcd represent the volume element associated with gab , then (7.3.12) can also
be written using εbcd = Z a εabcd as
DF wb
gab = εabcd Z b wc ωd . (7.3.12 )
dτ
The ωa defined by (7.3.11) is called the spatial angular velocity (or angular velocity
for short) of the spatial vector field wa . That is, a non-Fermi transported spatial vector
field wa can be described by a nonzero spatial angular velocity ωa .
Suppose {(ei )a } is an orthonormal spatial triad field on G(τ ). Since any two basis
vectors are orthogonal, they have a “rigid relationship”, and one can expect that these
three basis vectors have the same spacetime angular velocity ab , and thus have the
same spatial angular velocity ˆ ab . See the following proposition:
Proposition 7.3.7 The three basis vector fields in any orthonormal spatial triad
ˆ ab (no more gauge
field {(ei )a } on G(τ ) have the same spatial angular velocity
freedom).
Proof See Optional Reading 7.3.1.
Remark 5 ① This ˆ ab shared by each (ei )a is called the angular velocity 2-form
for the spatial rotation of this triad field, and the corresponding ωa (satisfying
ˆ ab = ωc εcab ) is called the spatial angular velocity vector of this triad field. ②
One may ask: suppose (e1 )a and (e2 )a rotates with respect to (e3 )a with an angu-
lar velocity ωa [parallel to (e3 )a ], then (e3 )a is non-rotating, and hence has zero
angular velocity. How can one say that these three vectors have the same angular
velocity? The answer is: using the “gauge freedom” (see Remark 4), one can say
that the angular velocity of (e3 )a is also ωa (since a rotation with respect to itself is
258 7 Foundations of General Relativity
equivalent to no rotation), and so there is no contradiction. Thus, we can also see that
the proof of Proposition 7.3.7 requires the use of the gauge freedom. It should be
emphasized that: when one finds that a basis vector in a spatial triad field [e.g., (e3 )a ]
is non-rotating along a curve, one cannot assert based on Proposition 7.3.7 that the
other two basis vector are also non-rotating, since they can rotate with respect to (e3 )a .
Dwa
= −ab wb . (7.3.13)
dτ
Choose an orthonormal tetrad field on G(τ ) such that (e0 )a = Z a , (e1 )a = αwa (where α
≡ +
is the normalization factor), then a necessary and sufficient condition for ab ab ab
to satisfy (7.3.13) is that (e )b = 0. Thus,
ab 1
0 = ab (e1 )b = μν (eμ )a (eν )b (e1 )b = μ1 (eμ )a = 01 (e0 )a + 21 (e2 )a + 31 (e3 )a ,
and hence 01 = 21 = 31 = 0. Since ab (e1 )b = 0 is the only restriction on ab , and
there is no restriction on the other 3 components 02 , 03 and 23 , one can choose 02 ,
03 and 23 arbitrarily. This is the gauge freedom of the spacetime angular velocity ab of
wa .
Proof of Proposition 7.3.5 Choose an orthonormal tetrad field such that (e0 )a = Z a , (e1 )a =
αwa (where α is the normalization factor). It follows from
7.4 The Proper Coordinate System of an Arbitrary Observer 259
D a DZ a Dwa
0= (Z wa ) = wa + Za = −wa ˜ ab Z b − Z a ab wb
dτ dτ dτ
˜ ab − ab )Z a wb = (ab −
= ( ˜ ab )(e0 )a (e1 )b α −1 = (01 − ˜ 01 )α −1
that 01 = ˜ 01 . Using the gauge freedom of ab we can let 02 =
˜ 02 and 03 = ˜ 03 .
˜ ˆ ˜
Noticing that i j = 0, we see that ab ≡ ab − ab = i j (e )a (e )b is a pure spatial
i j
rotation.
The tetrad of an observer is only defined on the world line of the observer. In order to
record the events (experimental results) near the world line, one needs to extend this
tetrad in some way and form a coordinate system. We certainly want the coordinate
basis of this system on the world line to coincide with the tetrad of the observer. This
section will introduce a coordinate system which satisfies this requirement and is
quite convenient, called the proper coordinate system of an observer. This system
should be determined by two ingredients of the observer—the world line G(τ ) and the
orthonormal tetrad field on G(τ ). Since we will talk about general observers, G(τ )
is not necessarily a geodesic, and it can have an arbitrary 4-acceleration Âa (the hat
stands for the 4-acceleration of the observer, as distinguished from the 4-acceleration
of a point mass being measured). Also, the orthonormal triad field {(ei )a } is not
necessarily Fermi transported along G(τ ), but can have an arbitrary angular velocity
wa . Of course, both ωa and Âa are spatial vector fields on G(τ ), i.e., ωa Z a = 0,
Âa Z a = 0. Suppose μ(s) is an arbitrary spacelike geodesic that starts from p on
G(τ ) and is orthogonal to G(τ ) at p, where s is the affine parameter that is equal
to the arc length, i.e., T a ≡ (∂/∂s)a is the unit tangent vector. Let q be a point near
G(τ ), then there exists a unique spacelike geodesic μ(s) passing through q. [See
Fig. 7.4. If q is far from G(τ ), then there may be more than one such geodesic, or
there may not be any such geodesic. Luckily, the observer G only cares about events
260 7 Foundations of General Relativity
p q
(s)
close to themselves]. Suppose the spacelike geodesic μ(s) passing through q starts
from a point p = μ(0) on G, we would like to define four coordinates (called proper
coordinates) t, x 1 , x 2 , x 3 for q using this geodesic μ(s). Suppose V p is the tangent
space of p, and W p is the 3-dimensional subspace in V p that is orthogonal to Z a | p ,
then T a | p ∈ W p . Denote T a | p as wa for short, and denote its components in (ei )a as
wi , then the four proper coordinates of q are defined as
where τ p is the proper time of p (as a point on G), and sq is the parameter value of
μ(s) at q, namely the arc length of the segment pq on μ(s). As long as p is near G(τ ),
we can use (7.4.1) to define the coordinates, and thus we obtain the proper coordinate
system {t, x i } of the observer G, whose coordinate patch is an open neighborhood
of G(τ ) [or of a segment of G(τ )]. As the simplest example, we point out that any
Lorentzian coordinate system in 4-dimensional Minkowski spacetime can be viewed
as the proper coordinate system of the inertial observer whose world line is an x 0 -
coordinate line of this system. (Note that the word “inertial” has already required
the triad to be Fermi transported along the curve, which is parallelly transported here).
Proof Let (e1 )a represent the first basis vector of the orthonormal tetrad at p, and treat
it as the wa we mentioned above. The proper coordinates of each point on the spacelike
geodesic μ1 (s) determined by (e1 )a satisfy x 2 = x 3 = 0, t = τ p , and thus μ1 (s) is
an x 1 -coordinate line. For this curve, w1 = 1 in x 1 (q) = sq w1 , and hence x 1 = s for
each point on the curve. Thus, the coordinate basis (∂/∂ x 1 )a | p = (∂/∂s)a | p = wa =
(e1 )a . In a similar manner we have (∂/∂ x 2 )a | p = (e2 )a and (∂/∂ x 3 )a | p = (e3 )a .
Moreover, it is not difficult to see that G(τ ) is the coordinate line for the proper
coordinate t, and t = τ on this curve, and hence Z a | p = (∂/∂t)a | p . This indicates
that the proper coordinate basis {(∂/∂ x μ )a } coincides with the orthonormal tetrad
{Z a | p , (ei )a | p }. Therefore, the components of gab | p in the proper coordinate system
are gμν | p = ημν .
7.4 The Proper Coordinate System of an Arbitrary Observer 261
gμν | p = ημν is a major feature of the proper coordinate system. Of course, this
simple result does not necessarily hold for a point outside G(τ ).
A proper coordinate system has many uses. For example, by means of it one can
define the 3-velocity and 3-acceleration for a point mass.
where x i (t) are the parametric representations for L with t as the parameter in the
proper coordinate system.
ha bU b
u a := , (7.4.4)
γ
Using the proper coordinate system, one can also find another expression for γ ≡
−Z a Ua :
where Proposition 7.4.1 is used in the third equality. It follows from (7.4.5) and
(7.4.6) that h a b U b /γ = (∂/∂ x i )a dx i /dt, and thus (7.4.4) is equivalent to (7.4.2).
The 3-velocity defined above can help deepen the understanding of inertial forces
and Coriolis forces in Newtonian mechanics (and their generalizations in curved
spacetime). According to Newtonian mechanics, Newton’s second law does not hold
when a non-inertial observer G measures the motion of a point mass. To preserve
the form of this law, people introduced the concept of a fictitious force. Suppose the
3-acceleration of G relative to an inertial frame is aˆ . (The hat is added to represent
the 3-acceleration of the observer, in order to distinguish from the 3-acceleration a
of the point mass being measured). When G makes a measurement, if they regard
any point mass L being measured as experiencing an imaginary inertial force −m aˆ
(where m is the mass of the point mass), then the equation of motion of a free point
mass after the inertial force is taken into account is −m aˆ = m a , and thus the 3-
acceleration of L relative to G is a = −aˆ . This can be called the inertial acceleration
of L relative to G which, when multiplied by m, is the inertial force. (We stipulate
that the observer and the world line of the point mass intersect, and the measurement
is made at the intersection). When G is rotating, however, a Coriolis force must be
introduced in addition to the inertial force to preserve the form of Newton’s second
law. However, the phrase “the observer is rotating” may sometimes cause confusion,
so it is necessary to discuss this in greater detail.
Consider a large rigid disk which rotates around its own axis. A swivel chair is put
on the edge of the disk, and the chair base is fixed on the disk (but the chair can rotate
around the axis fixed on its base). Due to the rotation of the disk, the observer in the
swivel chair undergoes a circular motion (the world line is a helix), which is a special
case of orbital motion. Of course, the observer in the swivel chair can also rotate with
respect to their own axis. (This motion is unrelated to the shape of the world line; it is
described by the motion of the orthonormal frame attached to the observer along the
world line). Since the observer has been regarded as a point mass, and the motion of
a point mass cannot be separated as a rotation and a translation, “the orbital motion
of an observer on a rotating disk is circular motion” is the most accurate way to refer
to this type of motion. However, in our daily life we also often refer to the circular
motion of a point mass as a rotation, which can be easily confused with the rotation of
its frame. Unfortunately, distinguishing orbital motion and a frame rotation happens
to be the key for distinguishing inertial forces and Coriolis forces. Therefore, we refer
to the circular motion (a special case of orbital motion) of the observer caused by
the rotation of the disk and the rotation of the frame realized using a swivel chair as
revolution and rotation, respectively. This is similar to calling the Earth’s (viewed as
a point mass) circular motion around the Sun as revolution, while calling the Earth’s
(now treated as a rigid body) rotation around its axis as rotation. Certainly, the word
revolution is not as appropriate as the term orbital motion when the world line of
the observer is not a helix. Later we will see that inertial forces and Coriolis forces
originate from the orbital motion and the rotation of the observer, respectively. Now
let us have a quantitative discussion with an arbitrary spacetime as the background;
in the low speed approximation, the conclusions for Minkowski spacetime agree
7.4 The Proper Coordinate System of an Arbitrary Observer 263
where (ei )a is the orthonormal spatial triad of the observer at p, εabc ≡ Z d εdabc ,
Z d is the 4-velocity of G at p, and εabcd is the volume element associated with the
spacetime metric gab .
Now let us discuss the physical meaning of each term on the right-hand side of
(7.4.7). If G is a freely falling non-rotating observer (for Minkowski spacetime this
is an inertial observer), i.e., Âa = 0, ωa = 0, then from (7.4.7) we can see that the
3-acceleration of L as measured by G is a a = 0. Take Minkowski spacetime as an
example, this indicates nothing but the simple fact that there is only a relative velocity
but no relative acceleration between two point masses undergoing inertial motion. In
contrast, if G is not a freely falling non-rotating observer, then there are the following
three possibilities:
(a) The world line of G is not a geodesic ( Âa = 0), but G is still a non-rotating
observer (ωa = 0, i.e., its tetrad is Fermi transported along the world line). Now
(7.4.7) becomes
a a = − Âa + 2( Âb u b )u a . (7.4.8)
Let  and u represent the magnitudes of the spatial vectors Âa and u a , and let θ be
the angle between them. Then the magnitude of the second term on the right-hand
side of the above equation is 2 Âu 2 cos θ 2 Âu 2 , and hence the second term can be
neglected under the non-relativistic approximation u 1. For Minkowski spacetime,
suppose G I is the instantaneous rest inertial observer of G at p (see Fig. 7.5), and
â a is the 3-acceleration of G relative to G I , then it follows from Proposition 6.3.6
that â a = Âa . Since in Newtonian mechanics, −â a is exactly the inertial acceleration
added for a point mass when observed by a non-inertial observer, the first term − Âa
on the right-hand side of (7.4.8) can be interpreted as an inertial acceleration, and
the second term is the relativistic correction term for the inertial acceleration (which
vanishes under the Newtonian approximation u 1). For curved spacetime, it can be
264 7 Foundations of General Relativity
proved that (Lemma 7.4.3 is used, left as Exercise 7.6) as long as we interpret G I as
the freely falling observer that is at rest relative to G at p, then we still have â a = Âa
(â a is the 3-acceleration of G relative to G I ), and hence the first and second terms on
the right-hand side of (7.4.8) can still be interpreted as the inertial acceleration and the
corresponding correction term, respectively. In conclusion, the inertial acceleration
is caused by the 4-acceleration Âa of the observer (which depends on its orbital
motion).
(b) The world line of G is a geodesic ( Âa = 0), but G has a rotation (ωa = 0),
such as a rotating observer in the swivel chair fixed on the floor of a freely falling
spaceship. Now (7.4.7) becomes
a a = −2εa bc ωb u c = 2
u×ω
. (7.4.9)
This 3-acceleration of the free point mass L relative to G comes completely from
the rotation of the observer (ωa = 0). The right-hand side of the equation above is
the same as the expression for the Coriolis acceleration in Newtonian mechanics,
and hence in curved spacetime is also called the Coriolis acceleration. This clearly
indicates the difference between an inertial acceleration and a Coriolis acceleration:
the former originates from the non-geodesic motion of the observer, while the latter
comes from the rotation of the observer. In the case of a rotating disk, many textbooks
on mechanics assume that the observer on the rotating disk must have a corresponding
rotation due to the revolution, and attribute Coriolis forces to the revolution of the
observer. Actually, the rotation and revolution of the observer on a disk are in principle
independent. Suppose an observer is holding a gyroscope, sitting in a swivel chair
whose base is fixed on the edge of the disk. Then the observer can adjust (“rotate”)
the swivel chair properly and always face the direction indicated by the gyroscope,
and thus is non-rotating while revolving with the disk. In this case, a point mass
being measured will only have an inertial acceleration but no Coriolis acceleration!
(c) The world line of G is not a geodesic ( Âa = 0), and G has a rotation (ωa = 0).
A free point mass observed by G will have both an inertial acceleration and a Coriolis
acceleration.
Many authors regard Coriolis force as a type of inertial force, this is nothing
but a problem of name, which is totally fine. However, in order to distinguish the
orbital motion and rotation of an observer, this text prefers the name used by some
other authors [e.g., Misner et al. (1973)], i.e., to call the fictitious forces caused by
7.4 The Proper Coordinate System of an Arbitrary Observer 265
the orbital motion and rotation of an observer as inertial forces and Coriolis forces,
respectively.
[Optional Reading 7.4.1]
To prove Proposition 7.4.2, we first prove the following Lemma.
Lemma 7.4.3 The Christoffel symbols of the spacetime metric gab in the proper coordinate
system of G(τ ) have the following simple forms:
0 00 = σ i j = 0 , 0 0i = 0 i0 = i 00 = Âi ,
(7.4.10)
i 0 j = i j0 = −ωk ε0ki j , σ = 0, 1, 2, 3 , i, j, k = 1, 2, 3 .
where Âa and ωa are the 4-acceleration and spatial angular velocity of the observer G,
respectively, and ε0ki j are the components of the volume element associated with gab in the
proper coordinate system.
Proof Since the orthonormal triad {(ei )a } of the observer G has a spatial rotation with an
angular velocity ωa , from Sect. 7.3 we know that
where in the last step we used Z i = 0 and Z 0 = −1. Using also Z 0 = 1, Â0 = 0 = Â0 , we
have
x 0 ≡ t = τ p = constant, x i = sT i , T i = constant, i = 1, 2, 3 .
266 7 Foundations of General Relativity
d2 x σ dx μ dx ν dx i dx j
0= 2
+ σ μν = σ i j , σ = 0, 1, 2, 3 .
ds ds ds ds ds
Proof of Proposition 7.4.2 The world line of a free point mass is a geodesic, and its equation
in the proper coordinate system of G(τ ) is
d2 x μ dx ν dx σ
2
+ μ νσ = 0, (7.4.15)
dτ L dτ L dτ L
where the affine parameter τ L of the geodesic is the proper time of the point mass L. Choose
t ≡ x 0 as another parameter [the coordinate
μ
time of the proper coordinate system of G(τ )]
dx μ dt dx μ
of L, and denote dt/dτ L as γ . Then, dx
dτ L = dt dτ L = γ dt , and hence
2 μ
d2 x μ d dx μ d dx μ d x dγ dx μ
=γ =γ γ =γ γ + . (7.4.16)
dτ L2 dt dτ L dt dt dt 2 dt dt
Hence,
a i = −γ −1 u i dγ /dt − ( i 00 + 2 i 0 j u j + i jk u j u k )
= γ −1 u i dγ /dt − ( Âi − 2ωk ε0ki j u j ) = −γ −1 u i dγ /dt − Âi − 2εi jk ω j u k , (7.4.18)
where in the second equality we used Lemma 7.4.3, and in the third equality we used
ε0ki j = εki j . To derive γ −1 dγ /dt, we set μ = 0 in (7.4.16), and find d2 t/dτ L2 = γ dγ /dt.
Then setting μ = 0 in (7.4.15) yields
d2 t dx ν dx σ dγ dt dx i 2 dγ
0= 2
+ 0 νσ =γ + 2 0 0i γ =γ + 2 Âi u i γ 2 ,
dτ L dτ L dτ L dt dt dt dt
where Lemma 7.4.3 is used in both the second and third equalities. From the above equation
we get −γ −1 dγ
dt = 2 Âb u . Plugging this into (7.4.18) and rewriting it using the abstract
b
Proof gμν | p = ημν is the conclusion of Proposition 7.4.1 (which holds for the
proper coordinate system of any observer). Lemma 7.4.3 gives σ μν | p = 0 (σ, μ, ν =
0, 1, 2, 3) when G(τ ) is a geodesic and the corresponding observer is
non-rotating.
DP a
(a) ∇ a Fab = −4π Jb , (b) ∇[a Fbc] = 0 , .
(c) q F a b U b = U b ∇b P a ≡
dτ
(7.5.2)
Suppose {x μ } is an arbitrary local coordinate system, we want to write down the
expressions for the components of (7.5.2) in this system. First we look at (a). Recall
that the coordinate components of ∇a vb are denoted by vν ;μ (see Sect. 3.1), i.e., vν ;μ ≡
(dx ν )b (∂/∂ x μ )a ∇a vb . Similarly, one should denote the coordinate components of
∇a F c b as F σ ν;μ , i.e., F σ ν;μ ≡ (dx σ )c (∂/∂ x μ )a (∂/∂ x ν )b ∇a F c b . Hence,
a b b b
∂ ∂ ∂ ∂
F μ ν;μ = (d x μ )c ∇a F c b = δ a c ∇a F c b = ∇a F a b
∂xμ ∂xν ∂xν ∂xν
are the coordinate components of ∇a F a b , and the component expression for (7.5.2)(a)
is
F μ ν;μ = −4π Jν . (7.5.3a)
Similarly, the coordinate components of ∇a Fbc are denoted by Fνσ ;μ , and the coor-
dinate component expression for (7.5.2)(b) is
Finally, for (7.5.2)(c), the coordinate components of the left-hand side are obviously
q F μ ν U ν . Using DP μ /dτ to represent the coordinate components of DP a /dτ , we
have
DP μ
q F μν U ν = . (7.5.3c)
dτ
7.5 Equivalence Principles and Local Inertial Frames 269
Note that in general DP a /dτ = dP a /dτ , because it is not difficult to show that (see
Exercise 3.6) DP a /dτ = dP a /dτ + a νσ U ν P σ . Since ∀ p ∈ G we have μ νσ | p = 0
for the proper coordinate system of G, the above equations can be written as
dP μ
(a) F μ ν;μ = −4π Jν , (b) F[νσ ;μ] = 0 , (c) q F μ ν U ν = . (7.5.4)
dτ
These are exactly the expressions for the corresponding laws in (7.5.2) in a global
inertial (Lorentzian) coordinate system in Minkowski spacetime. The discussion
above can be generalized to other physical laws. Thus, the proper coordinate system
of a freely falling non-rotating observer is similar to a global inertial (Lorentzian)
coordinate system, and therefore is called a local inertial frame, also called a local
Lorentz system or local Lorentz frame.
People often say: The laws of physics are the same in any local Lorentz system of
curved spacetime as in an inertial coordinate system in Minkowski spacetime [Misner
et al. (1973) p. 207], and thus all the physical experiments done by a freely falling
non-rotating observer G have the same (equivalent) results as the corresponding
experiments done by an inertial observer in flat spacetime. This is the conclusion
required by the Einstein equivalence principle. However, the statement above is not
quite precise, since all we can be certain about is σ μν | p = 0, ∀ p ∈ G, and once
one deviates from G(τ ), we cannot guarantee that σ μν = 0. In fact, if σ μν really
vanish in a neighborhood of G(τ ), then ∀ p ∈ G we have
i.e., the curvature at each point on G(τ ) vanishes, which is inconsistent with the
curved spacetime we supposed. The heart of the problem is that, by choosing a
coordinate system one can only make the σ μν on G(τ ) vanish but not the curvature
(curvature is independent of the coordinate system). Thus, the statement “the laws of
physics are the same in any local inertial frame of curved spacetime as in an inertial
frame in Minkowski spacetime” is not necessarily true for a point in the coordinate
patch but outside the curve G(τ ). Nevertheless, when the observer G is doing an
experiment, a “ finitely small” spacetime neighborhood U of the world line is usually
involved (e.g., an elevator is involved for an observer in the elevator, see Fig. 7.6),
and thus the problem becomes not that simple. Luckily, the effect of spacetime
curvature can only be made manifest (detected by experiments) in a sufficiently
large spacetime region. Hence, as long as the spacetime neighborhood that is involved
in an experiment is sufficiently small (for an elevator, as long as its spatial scale
and the falling time are sufficiently small), the result of the experiment will be
virtually indistinguishable from the corresponding experiment in flat spacetime. [This
is similar to the following simple example: at each point on a 2-dimensional sphere,
Rabc d is nonvanishing; however, if one only cares about a small piece of the sphere
S in the vicinity of a point, then S can be substituted approximately by a small
region S of the tangent plane of this point (see Fig. 7.7). For instance, in order to
measure the angle between two meridians of the Earth at the North Pole, one may
270 7 Foundations of General Relativity
reason that leads to the incorrect conclusion above is that the word “gravity” is used
twice in the deduction, while they have different meanings. The “gravity” felt by the
astronaut is only a fictitious apparent gravity, which is not produced by matter and
does not correspond to curved spacetime; the name only comes from the feeling of
the astronaut.
Physicists have varied opinions about the meaning and the value of the equivalence
principles. Especially, the opinions on their value are also different due to different
opinions on the meaning of equivalence principles. Some consider that they are of
great significance. For example, Misner et al. (1973) p. 386 said that “The principle
of equivalence has great power. With it one can generalize all the special relativistic
laws of physics to curved spacetime.” They also said (p. 207) “The vehicle that
carries one from classical mechanics to quantum mechanics is the correspondence
principle. Similarly, the vehicle between flat spacetime and curved spacetime is the
equivalence principle.” Some others, however, take a completely opposite view. For
example, J. L. Synge wrote in the preface of Synge (1960) that: “I have never been able
to understand this principle. ...... Does it mean that the effects of a gravitational field
are indistinguishable from the effects of an observer’s acceleration? If so, it is false.
In Einstein’s theory, either there is a gravitational field or there is none, according
as the Riemann tensor does not or does vanish. This is an absolute property; it has
nothing to do with any observer’s world line. Spacetime is either flat or curved, and
in several places in this book I have been at considerable pains to separate truly
gravitational effects due to curvature of spacetime from those due to curvature of the
observer’s world line (in most ordinary cases the latter predominate). The principle
of equivalence performed the essential office of midwife at the birth of general
relativity, ...... I suggest that the midwife be now buried with appropriate honors
and the facts of absolute spacetime be faced.” This view on equivalence principles
might be somewhat extreme, but some statements in the quotation above can yet
be regarded as a sobering pill which prevents us from misconstruing concepts. For
instance, his warning on distinguishing the real gravity caused by the spacetime
curvature from the apparent (fake) gravity caused by the observer’s world line being
curved (non-geodesic) is extremely necessary.
Here, we talk briefly about our humble understanding of equivalence principles.
Firstly, the Einstein equivalence principle is a hypothetical generalization of the
weak equivalence principle posed by Einstein during the conception of general rel-
ativity, which is very important as a midwife at the birth of general relativity. Even
Synge agreed with this.
Secondly, as mentioned in Sect. 7.2, physical laws in curved spacetime must
obey two principles: (a) the principle of general covariance, and (b) when gab equals
ηab , they can go back to the corresponding laws in special relativity. This is the
how this text and some other textbooks state them. More textbooks, however, state
principle (b) in another way: (b ) the Einstein equivalence principle. From (a) and
(b ) one can obtain their minimal substitution rule: “the equation of a physical law
in a local Lorentz system of curved spacetime can be obtained by changing the
commas in the equation of the corresponding physical law in a Lorentzian coordinate
system of Minkowski spacetime to semicolons (i.e., changing partial derivatives to
272 7 Foundations of General Relativity
covariant derivatives).” Thus, using the Einstein equivalence principle (together with
the principle of general covariance) we can obtain the laws of physics in general
relativity from the corresponding laws of physics in special relativity, and therefore
it can be said to be the “bridge that brings us from spacial relativity to general
relativity”. However, just like what we did in Sect. 7.2, one can also get the physical
laws of curved spacetime not by mentioning equivalence principles but by saying (in
adding to the principle of general covariance) that “the physical laws should go back
to the corresponding laws in special relativity when gab equals ηab ”. (Either way, we
obtain the minimal substitution rule).8 Once the physical laws in curved spacetime are
accepted (and thus general relativity is formulated), one can totally discuss physics
problems without using equivalence principles (although many authors like to use
equivalence principles in many problems). Therefore, from this perspective, “burying
the midwife” seems have no influence on general relativity.
Thirdly, for some complicated situation (such as when talking about if a charged
particle moving along a geodesic in curved spacetime has electromagnetic radiation),
“whether or not the principle of equivalence is violated” has been a controversial issue
for a long time. We think the point is that the precise meaning of the “principle of
equivalence” in these situations has yet to be clarified (another important problem is
the definition of radiation). In this sense, maybe it is not excessive at all when Synge
said “I have never been able to understand this principle.”
Fourthly, besides general relativity, there exist tons of different gravitational the-
ories [see, for example, Will (2018)]. All the gravitational theories can be classified
into two major kinds, namely metric theories (which require the spacetime to have a
metric, and the world line of a free point mass is the geodesic of this metric, etc.) and
non-metric theories. General relativity is of course a metric theory. There are also
many other metric theories out there. For example, another famous and competitive
metric theory is called the Brans-Dicke theory, in which the quantities describing
gravity also contains a scalar field φ other than the metric field gab . The criterion
for judging which gravitational theory is the correct one is of course experiments.
To this end, we need a theory about gravitational experiments. R. H. Dicke had been
working on this kind of theory since the 1960s. His pioneering works have gradu-
ally deepened people’s understanding of equivalence principles and their meaning.
At last, people realized that one should put equivalence principles at the important
position of inspecting the foundation of gravitational theories (not just for general
relativity). There are three levels of equivalence principles, namely the weak equiv-
alence principle (WEP), the Einstein equivalence principle (EEP), and the strong
equivalence principle (SEP). The difference between the SEP and the EEP is that:
the EEP (and WEP) only consider the external gravitational field of a system (e.g.,
an elevator) but do not consider the self-gravitational field generated by the objects
in the system, i.e., they only consider the passive aspects of gravity but ignore the
8 However, this rule will lead to an ambiguity in the order of the operators when two derivative
operators act successively; other considerations need to be taken into account to overcome this issue
(See Sect. 7.2). Therefore, the claim that equivalence principles “can carry all the special relativistic
laws of physics to curved spacetime” seems to be too strong. Misner et al. (1973) pp. 390–391 has
a specific discussion on this.
7.6 Tidal Forces and the Geodesic Deviation Equation 273
active aspects; however, the SEP considers both the active and passive aspects, and
talks about “self-gravitational systems” which includes from the self-gravity of stars
all the way to the gravity between two lead balls in the Cavendish experiment. The
EEP can be regarded as the special case of the SEP when self-gravity is negligible.
The experimental verification for these three equivalence principles are significant
for choosing the gravitational theory. Any gravitational theory will satisfy the WEP
(because the WEP has been verified by experiments which are more and more pre-
cise, no one would like to create a theory that violates the WEP), but this is not true
for the EEP and the SEP. Study shows that [see Will (1995) and Will (2014)], if the
EEP is true, then only metric theories can be correct. This indicates that if experi-
ments that are more and more precise can verify the EEP, then there will be less and
less room for non-metric theories. Further discussion also shows that [still see Will
(1995) and Will (2018)], general relativity satisfies the SEP while none of the other
known theories (including the Brans-Dicke theory) does. (Unfortunately, this discus-
sion is not a rigorous proof, and thus till now the conclusion above is technically still
a conjecture). Therefore, if experiments that are more and more precise can verify
the SEP, then general relativity is very likely to be the only correct gravitational
theory. Thus, we can see that the experimental verification for the three equivalence
principles has very significant theoretical meaning, and these experiments are now
under way with higher and higher precision.
Proposition 7.5.1 only shows that the components σ μν of the Christoffel symbol
in the proper coordinate system of a freely falling non-rotating observer vanish on
the world line of this observer. Once off the world line, σ μν can be nonvanishing.
To see the physical effect of this statement, let us consider the following thought
experiment. Put eight balls in an elevator into a circular pattern (the plane of the
circle is perpendicular to the ground), as shown in the left part of Fig. 7.8. First we
will discuss what happens using Newtonian mechanics. Suppose the line that goes
through balls 1 and 2 happens to pass through the Earth’s center, and each ball is
at rest relative to the other at the beginning. Since the gravitational field at ball 1 is
slightly stronger than that at ball 2, the gravitational acceleration of ball 1 is slightly
greater than ball 2, and thus the distance between them will gradually increase. A
while later, the whole system will look like what is shown in the right part of Fig. 7.8,
which is not round anymore. Imagine ball 1 is an observer, they will find that the
distance between them and ball 2 increases with time. However, if the eight balls are
arranged in a circle in an inertial spaceship in a region without gravity (and relatively
at rest at the beginning), then ball 1 (as an observer) will not find that the distance
between them and ball 2 has any change. Thus, even for mechanical experiments,
an elevator on the Earth’s surface is not completely equivalent to a spaceship in a
region without gravity.
274 7 Foundations of General Relativity
2
g
3 4
ground ground
Although these are thought experiments, phenomena with a similar principle can
also be found in daily life. One example is the changing of the tides. Now we will
have a simplified analysis of this phenomenon using Newtonian mechanics, in order
to highlight the essence of this concept. The leading cause of the tidal phenomenon
is the Moon, while the Sun gives a secondary contribution. Ignoring the effect of the
Sun can simplify the problem a lot without changing the essence of it. The Earth, as
an object, is located in the gravitational field of the Moon. Assume that the Earth’s
surface is covered by a layer of sea water. Consider two points A and B on the
water’s surface, such that the line going through them passes though the Earth’s
center. Suppose at a certain moment A is the closest to the Moon, then B is the
furthest from the Moon. The gravitational forces from the Moon on A and B are
different, so the two points will move away from each other; thus, the sea’s surface
near A and B will bulge outwards (left, Fig. 7.9).9 As the Earth rotates, A will not
face the Moon, and the sea level will drop. After the Earth rotates half a cycle, A is
the furthest from the Moon (right, Fig. 7.9), and the sea water will rise again. For
someone who is freely falling near the ground, the distance from the Earth’s center to
their head and feet are different, so there also exists a force that stretches their body
(if one only considers the Earth’s gravitational field), although this “tidal force” is
so small that they will not be able to feel it. If you are freely falling at the surface of
a neutron star, the tidal force can be as large as 1011 N, and you will be torn apart
and dead. Remark: ① A neutron star is a celestial object that is composed mainly of
neutrons, whose density can be as high as 1014 times that of water! The high density
causes an extremely high gradient of the surface gravitational field, see Sect. 9.3. ②
According to the usual estimation, the critical pressure or tension that a human body
can tolerate (above which the body will be torn apart) is about 107 N/m2 .
The discussion above indicates that any object in the gravitational fields of the
Earth and the Moon experiences a tidal force. In fact, the tidal phenomenon is a
9 From the viewpoint of an observer on the Earth, the reason for A to bulge is the combination of
two forces: (a) the Moon’s gravitational force, and (b) the centrifugal force caused by the Earth’s
circular motion around the barycenter of the Earth and the Moon. The net force of these two forces
is called the tide-generating force (or tide-raising force).
7.6 Tidal Forces and the Geodesic Deviation Equation 275
Earth Earth
A B
Moon Moon
1
r (t )
o g
universal feature of gravitational fields. We will discuss the tidal phenomenon quan-
titatively using Newton’s theory of gravity and general relativity, respectively.
First, we use Newton’s theory of gravity. Without loss of generality, we still take
the example of an Einstein elevator close to the Earth’s surface. Suppose there are
small balls everywhere inside the elevator (see Fig. 7.10). Let r(t) and r(t) + λ (t)
represent the position vectors of two balls 1 and 2 next to each other relative to the
origin o in a Cartesian coordinate system, then λ(t) is the position vector of ball 2
/dt 2 is the acceleration (tidal acceleration) of ball
relative to ball 1, and hence d2 λ
2 relative to ball 1. To calculate the tidal acceleration, one can use the gravitational
potential φ and Newton’s second law to write that
d2 x i ∂φ
=− i ,
dt 2 ∂ x r
d (x + λ )
2 i i
∂φ ∼ ∂φ ∂ ∂φ j
= − = − − λ ,
dt 2 ∂ x i r+λ ∂ x i r ∂ x j ∂ x i r
which is the expression for the tidal acceleration in Newton’s theory of gravity.
276 7 Foundations of General Relativity
Using (7.6.1) one can have a clear idea about the change of the distance between
the two balls in Fig. 7.8. Choose a coordinate system {x, y, z} such that the z-axis is
pointing straight upward, then the z-component of the relative acceleration between
balls 1 and 2 is
d2 λ z d2 φ d2 φ
ã z ≡ 2
= − 2 λz = − 2 λz , (7.6.2)
dt dz dr
where r is the distance between ball 1 and the Earth’s center. The Earth’s gravi-
tational potential is φ⊕ = −G M⊕ /r⊕ , and hence ã z = 2G M⊕ λz /r⊕3 . Suppose the
initial distance between the two balls is λz = 1 m. Plugging in the following
numerical values in SI: G = 6.67 × 10−11 , M⊕ = 6 × 1024 , r⊕ = 6.37 × 106 yields
ã z = 0.31 × 10−5 m·s−2 . Suppose the two balls are initially at rest with respect to
each other, then the increment of their distance after t = 5 s will be
1 z 1
λz = ã (t)2 = × 0.31 × 10−5 × 52 ∼
= 4 × 10−5 m . (7.6.3)
2 2
Now we will investigate the tidal phenomenon from the perspective of general rel-
ativity. What we will show is that the tidal phenomenon is an inevitable outcome
of the intrinsic curvature of spacetime. Again we will take Fig. 7.10 as an example.
Each ball can be viewed as a freely falling observer, whose world line is a timelike
geodesic with the proper time τ as the affine parameter. These geodesics form a
geodesic congruence in an open subset U of the spacetime10 (physically it corre-
sponds to a freely falling reference frame), and the tangent vectors Z a ≡ (∂/∂τ )a of
the geodesics form a timelike vector field on U . Let μ0 (s) be a smooth transverse
curve11 [transverse means that the tangent vector of any point on μ0 (s) is not tangent
to the geodesic passing through this point], then each geodesic γ (τ ) in the congru-
ence that intersects μ0 (s) can be labeled by s, i.e., it can be denoted by γs (τ ), in
which s is the value of s at the intersection of this geodesic and μ0 (s). Choose the
initial setting of the proper time of each γs (τ ) such that the τ at the intersection of
μ0 (s) and each γs (τ ) is zero. Suppose φτ is an element of the one-parameter (local)
group of diffeomorphisms corresponding to the vector field Z a and μτ (s) represents
the image of the curve μ0 (s) under the map φ(τ ) (see Fig. 7.11). All the curves μτ (s)
with different values of τ cover a subset of S , on which each point is determined by
two real numbers (coordinates) τ and s, and therefore S is a two dimensional mani-
fold. All the geodesics on S forms a subset of the geodesic congruence, where each
geodesic can be labeled using a parameter s, and thus this subset is also called a one-
parameter family of geodesics. (The geodesics in the congruence fill a 4-dimensional
open subset U of the spacetime, while this one-parameter family of geodesics only
covers a 2-dimensional surface S ). In conclusion, a given transverse curve μ0 (s)
picks a one-parameter family of geodesics {γs (τ )}. Let η ≡ (∂/∂s)a , then Z a and ηa
are the coordinate basis vector fields of S , and hence they commute:
10 A congruence of curves in U is a family of curves, such that for each p ∈ U there is a unique
=0 s)
0(
0 = [Z , η]a = Z b ∇b ηa − ηb ∇b Z a , (7.6.4)
where ∇a can be any torsion-free derivative operator. Choose the ∇a associated with
the spacetime metric, then
Z b ∇b (ηa Z a ) = ηa Z b ∇b Z a + Z a Z b ∇b ηa = Z a Z b ∇b ηa
1
= Z a ηb ∇b Z a = ηb ∇b (Z a Z a ) = 0 , (7.6.5)
2
where the second equality used the fact that Z a is the tangent vector of the geodesic,
the third equality used (7.6.4) and the fifth equality used the fact that Z a Z a = −1
at each point. Equation (7.6.5) indicates that ηa Z a is a constant along any geodesic
γs (τ ). Therefore, as long as we choose μ0 (s) in the first place such that it is orthogonal
to all γs (τ ) (which is always possible), then any μτ (s) will be orthogonal to γs (τ ).
After this choice, the ηa of each point on S can be viewed as a spatial vector of
the geodesic observer γs (τ ) passing through this point, and thus from now on we
will denote ηa by wa in this text. Suppose s is small, then γ0 (τ ) and γs (τ ) can
be viewed as the world lines of ball 1 and ball 2 in Fig. 7.10, respectively. Now we
call γ0 (τ ) the fiducial observer, and set λa ≡ wa s, then λa can be regarded as the
in Fig. 7.10, namely the position vector of ball 2 relative to the fiducial observer
λ
(ball 1). Hence, ũ b ≡ Z a ∇a λb can now be interpreted as the 3-velocity of ball 2
relative to the fiducial observer. [Note that it is a spatial vector field on the world
line γ0 (τ ) of ball 1 since Z b (Z a ∇a λb ) = Z a ∇a (Z b λb ) − λb Z a ∇a Z b = 0, where in
the second equality we used the geodesic equation Z a ∇a Z b = 0 and the fact that
λb is spatial (i.e., Z b λb = 0)]. Similarly, ã c ≡ Z a ∇a (Z b ∇b λc ) can be interpreted as
the 3-acceleration of ball 2 relative to ball 1 [which is also a spatial vector field on
γ0 (τ )]. Consider a third geodesic γs̄ (τ ) in the one-parameter family of geodesics
(which corresponds to a ball 2̄ next to ball 2 on the line passing through 1 and 2). The
position vector λ̄a of it relative to ball 1 is naturally λ̄a = wa s̄, and hence the ratio
of the tidal accelerations of ball 2̄ and ball 2 is a constant s̄/s. Thus, instead of
considering specific balls 2, 2̄, etc. (i.e., using λa ), we can directly use wa to define
the following universal quantities which apply to all the balls close to ball 1 in the
one-parameter family of geodesics:
278 7 Foundations of General Relativity
u b := Z a ∇a wb , (7.6.6)
a := Z ∇a u = Z ∇a (Z ∇b w ) ,
c a c a b c
(7.6.7)
both of which are spatial vector fields living on the fiducial geodesic γ0 (τ ). In fact, wa
plays the role of a measuring unit of the position vectors of this family: the position
vector of any γs (τ ) is equal to wa times s. Note that wa has different names in
different works, we refer it to as the separation vector, which is in agreement with
Misner et al. (1973) and Hawking and Ellis (1973). Similarly, u b and a c also play the
roles of measuring units for the 3-velocity and 3-acceleration of this family, which are
called the 3-velocity and the 3-acceleration (tidal acceleration) measured by ball 1,
respectively. Given a one-parameter family of geodesics and a fiducial geodesic γ0 (τ )
in the family, a 3-velocity field u b and a 3-acceleration field a c will be determined. Our
mission is to reveal the close relationship between a c and the spacetime curvature,
see the following proposition:
Proposition 7.6.1 The tidal acceleration measured by an arbitrary fiducial geodesic
γ0 (τ ) in any one-parameter family of timelike geodesics has the following relation
with the spacetime curvature tensor [called the geodesic deviation equation] :
a c = −Rabd c Z a wb Z d . (7.6.8)
Proof
a c = Z a ∇a (Z b ∇b wc ) = Z a ∇a (wb ∇b Z c ) = wb Z a ∇a ∇b Z c + (Z a ∇a wb )∇b Z c = p c + q c ,
(7.6.9)
[in the second step we used (7.6.4), i.e., [Z , w]b = 0] where p c ≡ wb Z a ∇a ∇b Z c ,
and q c ≡ (Z a ∇a wb )∇b Z c . Also,
where in the third equality we used the geodesic equation and (7.6.4). Plugging the
above equation into (7.6.9) yields (7.6.8).
Now we will make a few more comments on the geodesic deviation equation.
(1) The geodesic deviation equation (7.6.8) is an equation that describes the rel-
ative acceleration a c between two neighboring (“infinitesimally nearby”) geodesics,
and a c is the second order derivative of the separation vector wa that describes the
separation of the two curves. Surely there will be a separation between the two curves
(wa = 0), and the separation vector may change with time (u a = 0), but there is not
necessarily a deviation (a c is not necessarily nonvanishing).12
12 There exists such geodesic families in flat spacetime, in which we have u b = 0 and a c = 0 on a
fiducial geodesic γ0 (τ ) (such as a parallel geodesic family). There also exists such geodesic families
in flat spacetime, where we have u b = 0 on γ0 (τ ) [one can just let γ0 (τ ) and the nearby geodesic
become not parallel]. However, there does not exist such a geodesic family, where a c = 0 on γ0 (τ )
unless the spacetime is not flat.
7.6 Tidal Forces and the Geodesic Deviation Equation 279
(2) Equation (7.6.8) reflects the close relationship between a c and the spacetime
curvature tensor Rabc d : for flat spacetime (Rabc d = 0), a c must vanish, and thus the
geodesics that are initially parallel will always be parallel [see the footnote after (1)].
However, as long as Rabc d = 0, there will exist a geodesic family whose geodesic
deviation (characterized by a c ) is nonvanishing, this is reflected by the fact that the
geodesics that are initially parallel will eventually no longer be parallel. The pre-
cise meaning of “initially parallel” is that u b |τ =0 ≡ Z a ∇a wb |τ =0 = 0. This equation
indicates that, by means of the physical meaning of u b with respect to a timelike
geodesic family, the relative 3-velocity between two neighboring geodesics is zero
at the beginning (τ = 0), and hence is said to be “initially parallel”. However, as
long as a c |τ =0 ≡ Z b ∇b u c |τ =0 = 0, after a while u b will not be zero anymore, i.e.,
the two geodesics will “become not parallel”. Just as we said in Sect. 3.5, one of
the equivalent formulations for the curvature tensor being nonvanishing is that there
exist geodesics that are parallel at first which become not parallel.
(3) The Christoffel symbols σ μν depend on the coordinate system. By choos-
ing the proper coordinate system of a freely falling non-rotating observer one can
make the Christoffel symbols vanish on the world line of the observer (see Proposi-
tion 7.5.1), and this can account for the weightlessness of the observer in Einstein’s
elevator. However, the tidal acceleration a c is directly related to the Riemann tensor
Rabc d (7.6.8), and as a tensor, the latter cannot be made to vanish by choosing any
coordinate system. Thus, the tidal acceleration cannot be eliminated by a coordinate
transformation. Although the observer in Einstein’s elevator cannot feel gravity (the
“gravitational field strength” at this observer is zero), they can still feel the tidal force.
This is an interpretation of Fig. 7.8 from general relativity. On the other hand, at least
indirectly, the λz in (7.6.3) being small verifies the statement that “the effect of
spacetime curvature is only manifested in a spacetime region which is large enough.”
(4) So far we only focused on timelike geodesic families. We choose the proper
time as the affine parameter, and choose μτ (s) to be orthogonal to the geodesics.
This is for no reason except to emphasize the physical meaning that a c is the tidal
acceleration (in order to have a better correspondence with Fig. 7.10). From the
pure mathematical perspective, the geodesic deviation equation (7.6.8) also holds
for spacelike and null geodesic families, one just needs to interpret τ as the affine
parameter of a geodesic. In this case a c no longer has the physical interpretation
of the tidal acceleration, and the separation vector ηa does not need to be orthogo-
nal to Z a . Actually, the geodesic deviation equation also holds for a metric with a
non-Lorentzian signature. Furthermore, one can even talk about geodesics on a man-
ifold without a metric as long as there is a derivative operator; although orthogonality
is not defined, there is still a geodesic deviation equation, i.e., we have the following
proposition:
Proposition 7.6.1 The geodesic deviation equation of an arbitrary one-parameter
family of geodesics {γs (λ)} in (M, ∇a ) is
a c = −Rabd c T a ηb T d , (7.6.8 )
280 7 Foundations of General Relativity
where Rabd c is the Riemann tensor, T a ≡ (∂/∂λ)a is the tangent vector of the fiducial
geodesic γ0 (λ), ηa is the separation vector on γ0 (λ) (as defined before), and a c ≡
T a ∇a (T b ∇b ηc ).
Proof The same as the proof of Proposition 7.6.1.
[Optional Reading 7.6.1]
The tidal acceleration a c in (7.6.8) is defined in terms of wa (7.6.7). To compare with
Newton’s theory of gravity, we introduced λa ≡ wa s and considered it as corresponding
to the relative position vector λ . Why can λa be interpreted as the position vector of ball
2 relative to ball 1? Suppose p and q are two arbitrary points in flat space, wa is the unit
tangent vector of the line between p and q at p, and s is the length of the line between
the two points, then λa ≡ wa s can be referred to as the position vector of p relative to
q (note that |λa | = s ). Back to the problem of the geodesic deviation in curved space.
Take any μτ (s) in the family of transverse curves, and let p ≡ μτ (0), q ≡ μτ (s) ( p and q
represent the fiducial observer and the point mass being measured at a time τ , respectively).
Use the arc length s to reparametrize μτ (s), i.e., μτ (s ) = μτ (s), and let wa and wa be the
tangent vectors of μs (s) and μτ (s ) at p, respectively, then wa = wa ds/ds . Hence, if we set
λa ≡ wa s, then we have λ ≡ wa s = wa s when s is small. Noticing that |wa | = 1,
we see that |λa | = s ; comparing with the position vector in flat space, we may say that
λa is the position vector of ball 2 relative to ball 1. In the main text one does not need to
introduce the arc length parameter s , and thus there is no wa , so one just needs to care about
wa whose length changes with τ (note that s does not change with τ ). The change of the
“distance” between balls 1 and 2 is completely manifested by the change of |wa | with τ ,
and so defining the relative 3-velocity and relative 3-acceleration using λa ≡ wa s has a
perfect correspondence with Fig. 7.10.
[The End of Optional Reading 7.6.1]
[Optional Reading 7.6.2]
If we add a constant related to s to the τ of each γs (τ ), then μτ (s) will become non-
orthogonal to the geodesics, and thus whether or not ηa and Z a are orthogonal depends on the
zero setting of the proper time of each geodesic. Further, what if we take an arbitrary affine
parameter τ to substitute for τ ? Since τ is an affine parameter, it follows from Theorem
3.3.3 that τ is an affine parameter if and only if τ = ατ + β. α and β should of course be
constants on each geodesic (and α = 0), but they can be different for different geodesics, i.e.,
α and β can be functions of s: τ = α(s)τ + β(s). This change of the affine parameter can
be viewed as a coordinate transformation {τ, s} → {τ , s } on the 2-dimensional manifold
S , where
s = s , τ = α(s)τ + β(s) . (7.6.10)
Let Z a and ηa represent the new coordinate basis vectors, i.e., Z a ≡ (∂/∂τ )a and ηa ≡
(∂/∂s )a , then it is not difficult to show that
Z a = α −1 Z a , ηa = ηa + ν Z a , (7.6.11)
where ν(τ, s) ≡ −(τ dα/ds + dβ/ds) can be viewed as a function on S . Since we only care
about the separation between the fiducial geodesic γ0 (τ ) and a geodesic γs (τ ) next to it,
ηa and ηa can be viewed as vectors describing the same separation (Fig. 7.12). That is, if
the separation vectors ηa and ηa only differ by multiplication by a factor, they describe the
same separation. Thus, there exists a “gauge arbitrariness” on the choice of the separation
vector. If one insists to use the proper time, but allows each geodesic to have arbitrary zero
setting, this is equivalent to setting α = 1 in (7.6.10) while letting β(s) be arbitrary. Then
Z a = Z a , ηa = ηa + ν Z a , and ν = −dβ/ds. Equation (7.6.8) can be expressed as
7.6 Tidal Forces and the Geodesic Deviation Equation 281
q'
a
q
a
a c = −Rabd c Z a ηb Z d = α −2 a c .
This is natural since substituting τ for the proper time τ is equivalent to substituting a
“coordinate clock” for the standard clock. The rate of this coordinate clock is α −1 times the
rate of the standard clock, and the “tidal acceleration” measured using this clock is naturally
α −2 times the result measured by the standard clock.
[The End of Optional Reading 7.6.2]
[Optional Reading 7.6.3]
A solution ηb to the geodesic deviation equation (7.6.8 ) is called a Jacobi field on the
geodesic γ (λ) being considered. Two points p, q ∈ γ (λ) are said to be conjugate if there
exists a non-vanishing Jacob field ηb on γ (λ), which vanishes at p and q. In this case, we
also say that p and q are a pair of conjugate points on the geodesic γ (λ). For instance, the
south and north poles s and n on the 2-dimensional sphere shown in Fig. 7.13 are a pair of
conjugate points on the geodesic γ from s to n (half of the great circle). It is not difficult
to accept the following intuitive statement: p, q ∈ γ are a pair of conjugate points if there
exists a geodesic from p to q that is infinitesimally close to but different from γ (such as the
γ in the figure). The precise meaning of the condition after the word “if” is: there exists a
one-parameter family of geodesics from p to q which includes γ . The logic above can be
formulated as:
There exists a geodesic from p to q that is infinitesimally close to but different from γ
⇔ there exists a one-parameter family of geodesics from p to q which includes γ .
⇒ p, q ∈ γ are a pair of conjugate points ⇔ there exists a non-vanishing Jacob field ηb on
γ (λ), which vanishes at p and q.
This logic can help us clarify two subtle problems, which will be introduced as follows
in the manner of Q&A (we stipulate that γ is a geodesic):
Q: Suppose p, q ∈ γ are a pair of conjugate points, does there exist a geodesic from p
to q that is different from but infinitesimally close to γ ?
A: Not necessarily. Because the ⇒ in the relation above cannot be changed to ⇔. There
does exist such situations, in which p, q ∈ γ are conjugate but one cannot find a geodesic
from p to q that is different from but infinitesimally close to γ (omitted).
Q: Suppose there exists a geodesic γ that passes through p, q ∈ γ and is different from
γ , can we say that p and q are conjugate?
282 7 Foundations of General Relativity
'
''
s
A: No. Because only γ existing cannot guarantee that there exists a one-parameter family
of geodesics between p and q which includes γ . A counter example: extending the great
arc γ to d, and denote this major arc as γ̃ , then s, d ∈ γ̃ and there exists a geodesic γ
(the minor arc of the great circle) which passes s and d and is different from γ̃ . However,
s, d ∈ γ are not a pair of conjugate points since (intuitively speaking) there does not exist a
geodesic connecting s and d that is “infinitesimally close” to γ̃ (γ is certainly not close to
it), or (precisely speaking) there does not exist a nonvanishing Jacobi field ηb satisfying the
vanishing at the end points condition.
For the significance of conjugate points on the arc length problem, see Sect. 3.3; for the
use of it in the proofs of the singularity theorems, see Wald (1984) pp. 223–233.
[The End of Optional Reading 7.6.3]
Since the distribution of matter produces gravity, and gravity is manifested by the
spacetime curvature, a natural hypothesis is that the spacetime curvature is affected by
the matter distribution. The matter distribution is described by the energy-momentum
tensor Tab , and hence there should exist an equation that relates Tab and the spacetime
curvature. Considering that Newton’s theory of gravity should be the weak-field and
low-speed approximation of general relativity, the comparison between the geodesic
deviation equation (7.6.8) and the tidal force acceleration (7.6.1) in Newton’s theory
of gravity provides important clues for seeking (guessing) this equation. Since the a c
in (7.6.8) is defined in terms of wa instead of λa , for convenience’s sake, we should
change the λi in (7.6.1) to wi . Suppose {x i } is a Cartesian system of the 3-dimensional
Euclidean space, then (7.6.1) can be written as
∂ c ∂ c d2 w i ∂ c j ∂ ∂φ
ac = ai = = − w
∂xi ∂xi dt 2 ∂xi ∂x j ∂xi
c c
∂ ∂φ ∂ ∂φ
=− w b
∂ b = −w b
∂ b = −wb ∂b ∂ c φ .
∂xi ∂xi ∂xi ∂xi
This is the tidal acceleration derived from Newton’s theory of gravity, which should be
an approximation of the a c derived from general relativity. Therefore, the comparison
7.7 The Einstein Field Equation 283
between the above equation and (7.6.8) implies the following correspondence:
Rabd c Z a Z d ↔ ∂b ∂ c φ . (7.7.1)
In fact, this is what Einstein assumed and published initially. However, from Sect.
6.4 we can see that the energy-momentum tensor Tab satisfies ∂ a Tab = 0; using the
minimal substitution rule in Sect. 7.2 we have ∇ a Tab = 0, and hence (7.7.3) leads to
∇ a Rab = 0 , (7.7.3 )
Raising the index d using the metric and contracting it with the lower index b yields
0 = ∇a Rc a − ∇c R + ∇b Rc b = 2∇ a Rca − ∇c R ,
T = Ta a = ρUa U a + p(δa a + Ua U a ) = −ρ + 3 p .
284 7 Foundations of General Relativity
1 1 1
0 = ∇ a Rab − ∇b R = ∇ a Rab − gab ∇ a R = ∇ a (Rab − Rgab ) ,
2 2 2
what is inside the parenthesis on the right-hand side of this equation can be taken as
G ab . Therefore, we can define G ab as
1
G ab ≡ Rab − Rgab , ∇ a G ab = 0 , (7.7.5)
2
(G ab is called the Einstein tensor, see Definition 3 and Theorem 3.4.8 in Sect. 3.4),
and substitute (7.7.3) by the equation G ab = 8π Tab , namely we assume
1
Rab − Rgab = 8π Tab . (7.7.6)
2
1 1 1
Rab = 8π Tab + gab R = 8π Tab + gab (−8π T ) = 8π(Tab − gab T ) . (7.7.6 )
2 2 2
Thus,
1 1
Rab Z a Z b = 8π(Tab Z a Z b − gab Z a Z b T ) = 8π(ρ + T )
2 2
∼ 1
= 8π(ρ − ρ) = 4πρ = 4π Tab Z a Z b ,
2
7.7 The Einstein Field Equation 285
which is exactly (7.7.2). Therefore, one should take (7.7.6) as the equation describing
the relation between the spacetime curvature and a matter field. This equation is
dubbed the Einstein field equation, which is a basic postulate of general relativity.13
In Minkowski space Rabc d = 0 everywhere; hence, G ab = 0, and from Einstein’s
equation we know that Tab = 0. However, is there any physics if there is no matter?
In fact, special relativity studies the motion of physical objects and their interactions,
but the gravitational interaction between them is ignored, i.e., the gravitational fields
produced by the physical objects are ignored, and therefore the spacetime is approx-
imately flat. Thus, special relativity is the approximation of general relativity when
gravity (spacetime curvature) can be ignored. As long as gravity is not negligible,
the spacetime cannot be treated as flat, and in principle special relativity cannot be
applied.
An important special case is Tab = 0, in which Einstein’s equation becomes
1
Rab − Rgab = 0 , (7.7.8)
2
called the vacuum Einstein equation. Given a coordinate system, the components
Rμν of the Ricci tensor can be expressed by the components gμν of the metric and its
partial derivatives (up to the second order) [see (3.4.21)], and the dependence of Rμν
on gμν is highly nonlinear.14 Therefore, (7.7.8) can be viewed as a set of nonlinear
2nd-order partial differential equations for the unknown functions gμν , each solution
gab is a vacuum metric. The Minkowski metric is naturally a solution to the equation
(7.7.8), while a solution to (7.7.8) can be a curved metric. An important example
is the vacuum solution found by Karl Schwarzschild within two months after the
publication of Einstein’s equation, see Sect. 8.3 and Chap. 9 for details.
It is not difficult to show that the scalar curvature R vanishes when Tab = 0, and
thus the vacuum Einstein equation (7.7.8) can be simplified as
Rab = 0 . (7.7.8 )
This indicates that the Riemann tensor of a vacuum metric (i.e., a solution to the
vacuum Einstein equation) gab is equal to its Weyl tensor (see Definition 2 of Sect.
3.4), which is usually nonvanishing.
Equation (7.7.6) with Tab = 0 is called Einstein’s equation with source, which
is similar to Maxwell’s equations with source in Minkowski spacetime [see (6.6.10)],
except there is an important difference. For Maxwell’s equations, one can solve for
the unknown Fab when the source (4-current density J a ) is assigned. It seems that
for Einstein’s equation one can also assign Tab (as a given quantity) and then solve
13 The story being told here is a cleaned up version of the much more convoluted path which Einstein
actually followed originally. In fact, Einstein did not define the Einstein tensor first, and the form
of his equation published in November 1915 was (7.7.6 ) instead of (7.7.6).
14 Specifically, the dependence of G
μν on the second order derivatives of gμν is linear, while the
dependence on the first order derivatives is quadratic. What is worse, G μν also contains the inverse
g μν of gμν (for raising the indices), which is very complex when expressed as a function of gμν .
286 7 Foundations of General Relativity
for the unknown quantity gab ; however, there is an issue: Tab is not meaningful when
gab is undetermined. Take a perfect fluid with zero pressure (dust) as an example. To
define a dust as a matter field, we mean to assign a 4-velocity field U a and a proper
density field ρ to it. The energy-momentum tensor of the dust is Tab = ρUa Ub ,
where Ua ≡ gac U c . Therefore, as long as gac is undetermined, the value of Tab is not
known. Moreover, the 4-velocity field U a should be timelike and normalized, and
both of these concepts involve the metric gab , and so one can hardly view U a as a given
quantity when gab is unknown. Thus, it is improper to treat gab and Tab as respectively
unknown and given quantities. The source of this difference between Einstein’s
equation and Maxwell’s equations is that: the spacetime background (Minkowski
spacetime) is already stipulated in Maxwell’s theory, and the right-hand side of
the equation ∂ a Fab = −4π Jb will be a given quantity −4π ηbc J c when a 4-current
vector J a is given; for Einstein’s equation, however, gab that describes the spacetime
background is yet to be determined, and unfortunately, it appears on both sides of
the equation, and thus one cannot simply consider the right-hand side as being given
beforehand. When solving Einstein’s equation, one should treat gab and the quantities
describing matter fields (e.g., for a dust they are U a and ρ) together as unknown
quantities and solve for them simultaneously. We will provide an example of solving
Einstein’s equation in Sect. 8.4, where the “matter field” will be an electromagnetic
field.15
The non-linearity of Einstein’s equation means that it does not satisfy the super-
position principle, which leads to many consequences. For instance, the sum of
two solutions to an equation is not a solution. This is another significant difference
between Einstein’s equation and Maxwell’s equations.
The Einstein tensor satisfies ∇ a G ab = 0 [see (7.7.5)], and therefore Einstein’s
equation contains ∇ a Tab = 0, which includes a lot of information about the motion
of matter. In fact, for a perfect fluid, this is the equation of motion for the matter
field (see Sect. 6.5). For a perfect fluid with zero pressure, i.e., a dust, it follows
from ∇ a Tab = 0 that the world line of a dust particle is a geodesic [see (6.5.8) and
a few sentences after that]. This conclusion can also be generalized to any object
whose self-gravity is weak enough [Fock (1939); Geroch and Jang (1975)]. Thus,
the postulate in Sect. 7.1 about the world lines of free particles being geodesics is no
longer an independent postulate.
Another completely different approach to obtain Einstein’s field equation is
through the Lagrangian formulation of general relativity, which will be introduced
in Chap. 16 (Volume III). Since it does not involve any knowledge that has not been
covered so far, readers who want to learn about deriving Einstein’s equation through
the variational principle may refer to Sect. 16.1 (except for the optional reading)
directly after reading this section.
15Conventionally, an electromagnetic field is not classified as a matter field, but as the source of a
gravitational field we will later on refer it to as a matter field for convenience.
7.8 Linear Approximation and the Newtonian Limit 287
The non-linearity of the Einstein field equation brings many difficulties to the task of
solving the equation as well as the study of general relativity in general. In most of the
cases the gravitational field is weak, and one can approximate the field equation as a
linear equation, which will significantly simplify the problem. In the 4-dimensional
language, a weak gravitational field means that the spacetime metric gab is close to
the Minkowski metric ηab .16 Define γab using the following equation:
then γab is “small”, which means that the components of γab in a Lorentzian coordi-
nate system of ηab satisfy |γμν | 1, so that the second and higher order terms can all
be neglected. Under this approximation, γab can be treated as some kind of physical
field (similar to the electromagnetic field) in Minkowski spacetime. The difference
between γab and an ordinary physical field is that the sum of γab and ηab gives the
spacetime metric. From this perspective (plus the fact that γab is “small”), γab can be
viewed as a perturbation of ηab . For convenience and to avoid confusion, we stipulate
that the tensor indices are all raised and lowered by ηab and ηab (instead of g ab and
gab ), with only one exception, which is g ab . g ab will still represent the inverse of gab
rather than ηac ηbd gcd . Under the linear approximation, it is not difficult to see from
(7.8.1) that
g ab = ηab − γ ab , (7.8.2)
1 cd
c ab = g (∂a gbd + ∂b gad − ∂d gab ) . (7.8.3)
2
Plugging (7.8.1) and (7.8.2) into the above equation and only keeping the first-order
terms in γab , we have
1 cd
(1)c ab = η (∂a γbd + ∂b γad − ∂d γab ) . (7.8.4)
2
16 In the linearized theory of gravity, people usually discuss the spacetime with the background
manifold R4 , or a spacetime region where a flat Lorentzian metric η̃ab can be defined. In the former
case the Minkowski metric ηab is globally defined, and in the latter case it is convenient to denote
the (locally) flat metric η̃ab as ηab .
288 7 Foundations of General Relativity
Using the property that (1)c ab itself is a first-order small term, plugging the above
equation into (3.4.20) yields the first-order approximation of the Riemann tensor
(with lower indices) of gab (called the linearized Riemann tensor)
(1)
Racbd = ∂d ∂[a γc]b − ∂b ∂[a γc]d . (7.8.5)
Using ηcd to raise and contract the indices, we obtain the first-order approximation
of the Ricci tensor of gab (the linearized Ricci tensor)
(1) 1 1
Rab = ∂ c ∂(a γb)c − ∂ c ∂c γab − ∂a ∂b γ , (7.8.6)
2 2
where γ ≡ γ a a = ηab γab . From this one can easily get the first-order approximation
of the Einstein tensor (called the linearized Einstein tensor)
(1) (1) 1 1 1 1
G ab = Rab − ηab R (1) = ∂ c ∂(b γa)c − ∂ c ∂c γab − ∂a ∂b γ − ηab (∂ c ∂ d γcd − ∂ c ∂c γ ) .
2 2 2 2
(7.8.7)
Therefore,
1 1 1
∂ c ∂(a γb)c − ∂ c ∂c γab − ∂a ∂b γ − ηab (∂ c ∂ d γcd − ∂ c ∂c γ ) = 8π Tab (7.8.8)
2 2 2
is called the linearized Einstein equation. Let
1
γ̄ab ≡ γab − ηab γ , (7.8.9)
2
then the linearized Einstein equation can be further simplified as
1 1
− ∂ c ∂c γ̄ab + ∂ c ∂(a γ̄b)c − ηab ∂ c ∂ d γ̄cd = 8π Tab . (7.8.8 )
2 2
The left-hand side of this equation vanishes when ∂ b ≡ ηbc ∂c acts on it, and thus
the equation above assures ∂ b Tab = 0. This has an important physical meaning: it
indicates that the divergence of the energy-momentum tensor vanishes in the lin-
earized theory of gravity, and hence assures that the laws of conservation of energy,
momentum and angular momentum also hold in the linearized theory of gravity (as
a physical theory).
Equation (7.8.8 ) can also be further simplified. In order to do this, we first review
a heuristic example. Maxwell’s equation ∂ a Fab = −4π Jb in Minkowski spacetime
can be expressed using the electromagnetic 4-potential Aa as [see (6.6.30)]
∂ a ∂a Ab − ∂b ∂ a Aa = −4π Jb . (7.8.10)
7.8 Linear Approximation and the Newtonian Limit 289
Ãa = Aa + ∂a χ (7.8.11)
is called a gauge transformation since Ãa and Aa correspond to the same Fab . One
can always choose χ so that the 4-potential satisfies the Lorenz gauge:
∂ a Aa = 0 , (7.8.12)
∂ a ∂a Ab = −4π Jb . (7.8.13)
In the linearized theory of gravity, there exists a very similar gauge freedom. Suppose
ξ a is an infinitesimal vector field (“infinitesimal” means that the components ξ μ of ξ a
are small enough so that the product with γαβ or itself can be regarded as second-order
terms and neglected), the following transformation of γab :
called the Lorenz gauge condition of the linearized theory of gravity.17 From the
equation above we can see that the second and third terms on the right-hand side
of the linearized Einstein equation (7.8.8 ) of this type of γ̄ab vanish, and hence the
equation can be simplified as
which is very similar to (7.8.13)! Now we will show that (7.8.15) can always be
satisfied by choosing ξ a . Suppose that γ̄ab does not satisfy (7.8.15), in order to
choose ξ a such that γ̃ab determined by (7.8.14) has a corresponding
17 Also called the de Donder gauge condition or harmonic gauge condition of the linearized theory
of gravity.
290 7 Foundations of General Relativity
1
γ̃¯ab = γ̃ab − ηab γ̃ (γ̃ ≡ ηab γ̃ab )
2
that satisfies (7.8.15). A simple calculation starting from (7.8.14) shows that ∂ b γ̃¯ab =
∂ b γ̄ab + ∂ b ∂b ξa , and hence as long as we choose a ξ a satisfying
∂ b ∂b ξa = −∂ b γ̄ab , (7.8.17)
then ∂ b γ̃¯ab = 0 is guaranteed. A ξ a that satisfies (7.8.17) must exist, since the com-
ponent form of this equation in an inertial coordinate system will be the following
familiar equation:
∂ 2 ξμ ∂ 2 ξμ ∂ 2 ξμ ∂ 2 ξμ
− + + + = −∂ ν γ̄μν .
∂t 2 ∂x2 ∂ y2 ∂z 2
When γ̄μν is given, the solutions to it not only exist, but also they are numerous.
[Optional Reading 7.8.1]
There is a subtlety in the derivation from (7.8.3) to (7.8.4) that we should specify. Take
the term g cd ∂a gbd as an example, it can be expressed as
Plugging the two equations above into G ab (s) = 8π Tab (s), and ignoring all the O(s 2 )
and higher order terms, what we obtain will be the linear (first-order) approximation of the
Einstein equation, namely (7.8.8). And the derivation from (7.8.3) to (7.8.4) is one of the
steps in this procedure. Since neither γ cd nor ∂a γbd in (7.8.18) contains a zeroth-order term
of s, γ cd ∂a γbd is at least a second-order term. Thus, this term can be neglected and we have
(7.8.4).
[The End of Optional Reading 7.8.1]
x μ = x μ − ξ μ (x) , (7.8.19)
(the x in the parentheses is an abbreviation for x σ ) where ξ μ (x) are four arbitrary infinitesimal
functions of the same order as γab . [See Misner et al. (1973) pp. 439–440]. Consider the
coordinate components gρσ = ηρσ + γρσ , under the above coordinate transformation the
tensor transformation law
∂xρ ∂xσ
gμν (x ) = gρσ (x) (7.8.20)
∂ x μ ∂ x ν
can be reduced to
∂ξ ρ ∂ξ σ
gμν (x ) = δ ρ μ + μ δ σ ν + ν gρσ (x)
∂x ∂x
∂ξ σ ∂ξ ρ
= gμν (x) + ν gμσ (x) + μ gρν (x)
∂x ∂x
∂ξν ∂ξμ
= ημν + γμν + μ + ν , (where ξμ = ημρ ξ ρ ) ,
∂x ∂x
= g − η . Then, up to higher
up to terms of higher order than γab and ξ a . Define γμν μν μν
order terms,
γμν = γμν + ξμ,ν + ξν,μ .
(x) = g (x ) − η
On the other hand, γμν
μν μν = [gμν (x ) − gμν (x)] + [gμν (x) − ημν ] turns
out to be
γμν (x) = [gμν (x ) − gμν (x)] + γμν . (7.8.21)
(x ) − g (x) = ξ
Hence, up to terms of higher order than γab and ξ a , we have gμν μν μ,ν + ξν,μ .
To see how the above coordinate description is related to the gauge transformation (7.8.14) in
the active language, we consider the one-parameter local group of diffeomorphisms generated
by a vector field X a , denoted by φλ (see Optional Reading 2.2.2), with λ as the parameter.
Here we choose X a such that the infinitesimal vector field ξ a in the gauge transformation
is ξ a = ε X a , where ε is an infinitesimal number. For both gab and g̃ab (λ) ≡ φλ∗ gab we can
split them as
and obtain that ηab + γ̃ab (λ) = g̃ab (λ) = φλ∗ gab = φλ∗ ηab + φλ∗ γab , i.e.,
When λ is small, one can rewrite the above equation by means of Lie derivatives as
γ̃ab (λ) = γab + λL X γab + λL X ηab + O(λ2 ) = γab + LλX γab + LλX ηab + O(λ2 ) ,
where the last step can be easily seen from (4.2.8). Ignoring the higher order terms, we have
where we have set λ = ε, and (4.3.1 ) is used in the last step. Therefore, the gauge transfor-
mation (7.8.14) can be obtained from changing the metric gab to g̃ab (ε) by a one-parameter
local group of diffeomorphisms, with the perturbation background ηab being unchanged.
Suppose λ is so small that both the domain U and the range φλ [U ] of the diffeomorphism
φλ : U → φλ [U ] are contained in the coordinate patch of {x μ }. Then four functions y μ (λ) ≡
292 7 Foundations of General Relativity
x μ = y μ (ε) = φ−ε
∗ μ
x = x μ − εL X x μ = x μ − Lξ x μ = x μ − ξ μ ,
where in the last step we used (4.2.2) and (2.2.3 ), and ξ μ are the components of ξ a in {x μ }.
This is exactly the infinitesimal coordinate transformation (7.8.19). Noticing that gab =
(φλ )∗ g̃ab (λ), according to Theorem 4.1.3, ∀q ∈ U the coordinate components of gab |φλ (q)
in {y μ (λ)} equal the corresponding coordinate components of g̃ab (λ)|q in {x μ }. Especially,
for λ = ε, this yields
∂ c ∂ d ∂ c ∂ d
g̃cd (ε)|q = gcd |φε (q) = gμν (x )|φε (q) ,
∂xμ q ∂xν q ∂ x μ φε (q) ∂ x ν φε (q)
(x )|
Since gμν φε (q) = gμν x |φε (q) = gμν (x|q ), it turns out that g̃ab (ε) = gμν (x)
μ ν
(dx )a (dx )b . Then, precisely to the order of ε, we have
Lξ gab = ε L X gab = g̃ab (ε) − gab = [gμν (x) − gμν (x)] (dx μ )a (dx ν )b .
This means that the coordinate components of Lξ gab in {x μ } are actually gμν
(x) − g (x),
μν
not gμν (x ) − gμν (x) in (7.8.21). However, their difference
[gμν (x ) − gμν (x)] − [gμν
(x) − gμν (x)] = gμν (x ) − gμν
(x)
∂gμν
∂γμν
= − ξρ + O(ε2 ) = −ξ ρ + O(ε2 ) = O(ε2 )
∂xρ ∂xρ
is negligible on the order of ξ a . Therefore, the gauge transformation (7.8.14) and the infinites-
imal coordinate transformation (7.8.19) are equivalent up to terms of higher order than ξ a .
In fact, what we just saw is a special case of gauge transformations in general relativity. We
will come back to a more general discussion in Sect. 8.10.
[The End of Optional Reading 7.8.2]
In this subsection we will show that Newton’s theory of gravity can be regarded as
the limit of general relativity under the weak-field and low-speed condition. First,
let us give an interpretation for the “weak-field and low-speed condition”. Take the
gravitational field around the Earth as an example, it corresponds to a slightly curved
metric field gab = ηab + γab , where γab is “small”. In Fig. 7.14, E and D represent,
respectively, the world lines of the Earth and a shell shot from a cannon on the ground
(their relative speed u E D 1), and μ represents the world line of a “high speed”
muon from a cosmic ray. Its “high speed” is from the perspective of an observer
on the Earth; the muon regards itself as being at rest while E is moving at a high
7.8 Linear Approximation and the Newtonian Limit 293
speed. Either way, their relative speed is close to the speed of light (u μE ∼ = 1). As a
flat metric field, ηab has many inertial coordinate systems, such as the inertial frame
{t, x i } which uses the world line of E as a t-coordinate line, and the inertial frame
{t , x i } which uses the world line of μ as a t -coordinate line; these two systems
differ by a boost. The 3-speeds of the Earth, the shell as well as cars, airplanes, etc.
relative to the system {t, x i } are all very small, while the 3-speeds of them relative
to {t , x i } are large. The “weak-field and low-speed limit” should be interpreted as
follows: there exists an inertial coordinate system of ηab (in the example above it
is {t, x i }), in which all the objects we are concerned with have coordinate speeds
much less than 1 (and thus in {t, x i } one cannot use the Newtonian theory to discuss
a problem that involves a muon), and |γμν | ≡ |gμν − ημν | 1.
Specifically speaking, the “weak-field and low-speed” condition guarantees that
there exists an ηab such that γab = gab − ηab is “small”, and there exists an inertial
coordinate system {t, x i } of ηab which satisfies:
(1) The energy-momentum tensor Tab of the source of the gravitational field can
be expressed in this system as:
Tab ∼
= ρ(dt)a (dt)b . (7.8.22)
That is, only T00 , the time-time component of Tab , is nonvanishing in this system.
The space-time components T0i vanish since the small velocity of the source leads
a the small momentum density; the space-space components Ti j vanishing indicates
that, compared with the mass density, the 3-dimensional stress can be ignored (for
instance, the pressure p in the Earth’s center is only 10−10 times the density ρ). Thus,
although in general relativity each component of the energy-momentum tensor Tab of
the matter field contributes to the spacetime curvature, in Newton’s theory of gravity
(as is known to all) only the mass density ρ contributes to the gravitational field.
(2) (a) The spacetime geometry changes slowly due to the low-speed motion of
the source, and hence ∂ γ̄μν /∂t can be ignored; (b) the low-speed motion of an object
in the gravitational field leads to the fact that its 4-velocity U a is approximately equal
to the 4-velocity Z a ≡ (∂/∂t)a of an observer in the {t, x i } system, i.e., U a ∼ = Za.
The linearized Einstein equation under the Lorenz gauge condition can be sim-
plified under the above approximations:
where we used the approximation condition (2) in the third equality, and ∇ 2 is the
in the 3-dimensional coordinate system {x i }. On
square of the derivative operator ∇
the other hand, from the approximate condition (1) we can see that the components of
the right-hand side of (7.8.16) ∼
= −16πρ when μ = ν = 0, and the other components
vanish, i.e.,
∇ 2 γ̄i j = 0 . (7.8.24 )
The unique solutions γ̄0i and γ̄i j for equations (7.8.24) and (7.8.24 ) that are well-
behaved at infinity are constants, which can be set to zero by means of a gauge
transformation. Thus, the only nonzero component of γ̄μν is γ̄00 , which satisfies
equation (7.8.23). Let
1
φ ≡ − γ̄00 , (7.8.25)
4
and interpret φ as the Newtonian gravitational potential, then equation (7.8.23) will
become the well-known Poisson equation in Newton’s theory of gravity:
∇ 2 φ = 4πρ . (7.8.26)
The conclusion that the only nonzero component of γ̄μν is γ̄00 can also be expressed
in terms of a tensor equation as
Hence,
γ̄ ≡ ηab γ̄ab = γ̄00 ηab (dt)a (dt)b = −γ̄00 = 4φ . (7.8.28)
Also, from γab = γ̄ab + ηab γ /2 we get γ = ηab γab = ηab γ̄ab + ηab ηab γ /2 = γ̄ +
2γ , and thus γ = −γ̄ . Therefore,
1
γab = γ̄ab − ηab γ̄ . (7.8.29)
2
By means of (7.8.27) and (7.8.28), the above equation can be rewritten as
Based on the discussion above, we can derive the equation of motion for a point mass
under the Newtonian approximation. Suppose there is no force acting on the point
mass other than gravity, then from the viewpoint of general relativity its world line
should be a geodesic, whose equation in the inertial coordinate system of ηab is
7.8 Linear Approximation and the Newtonian Limit 295
d2 x μ μ dx ν dx σ
+ νσ = 0, (7.8.31)
dτ 2 dτ dτ
where τ is the proper time of the point mass. Under the Newtonian approximation, the
condition U a ∼= Z a satisfied by the 4-velocity U a of the point mass assures that τ ∼
=t
(the proper time is approximately equal to the coordinate time) and u i ≡ dx i /dt ∼ =0
(the 3-velocity is approximately zero), and hence U ν ≡ dx ν /dt is approximately
(1, 0, 0, 0). Therefore, (7.8.31) can be expressed approximately as
d2 x μ
= − μ 00 . (7.8.32)
dt 2
It follows from (7.8.4) that [the superscript (1) of is omitted]
1 00 1 ∂γ00 ∼
0 00 = η (γ00,0 + γ00,0 − γ00,0 ) = − = 0,
2 2 ∂t
1 1 1 ∂γ00
i 00 = ηi j (γ j0,0 + γ0 j,0 − γ00, j ) ∼
= − δ i j γ00, j = − , i = 1, 2, 3 ,
2 2 2 ∂xi
(7.8.33)
This is exactly the equation of motion for a point mass that only undergoes gravi-
tational force in Newton’s theory of gravity. Equations (7.8.26) and (7.8.34) are the
basic equations of Newton’s theory of gravity; thus, Newton’s theory of gravity can
be regarded as the weak-field and low-speed limit of general relativity. From
1 1
φ ≡ − γ̄00 = − γ00 (7.8.35)
4 2
we have g00 = η00 + γ00 = −(1 + 2φ), or,
1
φ = − (1 + g00 ) . (7.8.36)
2
This reflects the close relation between the metric component g00 and the Newtonian
gravitational potential under the Newtonian approximation. Again, take the balls 1
and 2 in Fig. 7.10 as an example. Choose the inertial coordinate system {t, x, y, z}
of ηab such that the z-axis is vertically upwards, then Z a = (∂/∂t)a in (7.6.8).
296 7 Foundations of General Relativity
The resemblance between the gravitational field and the electromagnetic field makes
people expect that there exists gravitational radiation in general relativity similar to
the electromagnetic radiation. Actually, the fact that there exists a wave solution
to Einstein’s equation which propagates at the speed of light was already well-
known soon after general relativity was published. Nevertheless, for quite a while the
authenticity of gravitational waves was in doubt. A. S. Eddington suggested in 1922
that a gravitational wave solution only represents the wave motion of the spacetime
coordinates, and thus has no observational effect. The situation has turned around
since the 1950s. Using a coordinate independent method, H. Bondi and collaborators
showed that gravitational waves indeed carry energy and momentum, and the mass of
the system must decrease when it emits gravitational waves. This led to the physical
authenticity and observability of gravitational radiation being gradually accepted.
First we will discuss the gravitational waves under the approximation of linearized
gravity. Before introducing the wave solutions to the linearized Einstein equation,
let us first discuss some useful gauge conditions in the linearized theory of gravity.
As we have seen in Sect. 7.8.1, the Lorenz gauge condition
1
∂ b γ̄ab = ∂ b γab − ∂a γ = 0 (7.9.1)
2
in linearized gravity is inspired by the Lorenz gauge condition ∂ a Aa = 0 of the
electromagnetic field. However, in electrodynamics, ∂ a Aa = 0 and the wave equation
(with source) ∂ b ∂b Aa = −4π J a cannot determine the 4-potential Aa completely,
since another 4-potential
Aa = Aa + ∂a χ (7.9.2)
7.9 Gravitational Radiation 297
∂ a ∂a χ = 0 . (7.9.3)
∂χ
= −A0 . (7.9.4)
∂t
∂ a ∂χ
(∂ ∂a χ ) = ∂ a ∂a = −∂ a ∂a A0 = 0 ,
∂t ∂t
∇ 2 χ0 = ∂ a ∂a χ . (7.9.5)
18 Here we only require ηab to be flat on U , and the same for Proposition 7.9.2 (see the first footnote
in Sect. 7.8). It follows from Theorem 3.4.9 that for any (locally) flat metric there exists a coordinate
system such that the metric components are constant. For Lorentzian signature, one can further find
a coordinate transformation and turn them into ημν .
298 7 Foundations of General Relativity
The situation of linearized gravity is very similar: the linearized Einstein equation
and the Lorenz gauge condition ∂ a γ̄ab = 0 cannot determine γab completely, since
if we set
γab = γab + ∂a ξb + ∂b ξa , (7.9.6)
then γab also satisfies (7.8.16) and ∂ a γ̄ab = 0 as long as ξa satisfies
∂ b ∂b ξa = 0 . (7.9.7)
∂ c ∂c γ = 0 , ∂ c ∂c γ0ν = 0 , ν = 0, 1, 2, 3 . (7.9.9)
Note that in the above proposition, γab is not necessarily a solution to the linearized
Einstein equation. Now we consider γab as a solution to the source-free linearized
Einstein equation in the Lorenz gauge, then (7.8.16) with Tab = 0 is reduced to
7.9 Gravitational Radiation 299
1
∂ c ∂c γab − ηab ∂ c ∂c γ = 0 . (7.9.10)
2
Contracting both sides of the above equation with ηab yields ∂ c ∂c γ = 0, and (7.9.10)
becomes
∂ c ∂c γab = 0 . (7.9.11)
In this case, the conditions in (7.9.9) are both satisfied. In fact, it is obvious that
(7.9.10) and (7.9.11) are equivalent to each other, since if one is satisfied, so is the
other. From the Lorenz gauge condition (7.9.1) we also see that ∂ a ∂ b γab = 21 ∂ a ∂a γ ,
and hence ∂ c ∂c γ = 0 also leads to
∂ a ∂ b γab = 0 . (7.9.12)
Corollary 7.9.3 Suppose a smooth symmetric tensor field γab is a solution of the the
source-free linearized Einstein equation satisfying the Lorenz gauge condition. Then,
for each point p in the domain U of γab , there exists γab = γab + ∂a ξb + ∂b ξa in an
open neighborhood U ⊂ U of p, which is a solution of the source-free linearized
Einstein equation satisfying the transverse-traceless gauge condition.
Now let us count the degrees of freedoms of γab in the TT gauge. It follows from
γμν = γνμ that γab has at most 10 independent components, while they are also con-
strained by (7.9.8). The conditions in (7.9.8) contain in total 4 + 4 + 1 = 9 equations,
but ∂ ν γ0ν = 0 is also an outcome of γ0ν = 0, and so among these 9 equations only
8 are independent. Therefore, γab has only 10 − 8 = 2 independent components.19
Later we will see that in physics they correspond to the two independent polarization
states (modes) of gravitational plane waves, see Sect. 7.9.2.
For the linearized Einstein equation (not necessarily source-free), there are also
some other common gauge conditions, such as the transverse gauge condition,
which requires
19 Note that this is a handwaving discussion, since the constraint counting is actually very subtle
when it comes to partial differential equations. For example, the second equation in (7.9.15) can be
regarded as a constraint for ξ0 in the first equation, but it does not mean that ξ0 has no degree of
freedom! For another example, the 1-dimensional wave equation ∂t2 u − c2 ∂x2 u = 0 has the general
solution u = f + (x − ct) + f − (x + ct), with f ± being arbitrary C 2 functions of one variable. If
the wave equation is considered to be a constraint, is the number of constraints 1 or −1?
300 7 Foundations of General Relativity
1
∂i γ 0i = 0 , ∂i s i j = 0 (where si j = γi j − δ kl γkl δi j ) , i, j = 1, 2, 3 ,
3
(7.9.13)
γ0μ = 0 , μ = 0, 1, 2, 3 . (7.9.14)
The reader may refer to Carroll (2019) for more discussions about these gauge
conditions.
[Optional Reading 7.9.1]
Proof of Proposition 7.9.2 (1) According to Proposition 7.9.1, there exists a function ξ0 on
U such that ∀ p ∈ U ,
∂ξ0 1
∂c ∂ c ξ0 = 0 , = − γ00 (7.9.15)
∂t 2
are both satisfied on a neighborhood U0 of p.
(2) For each of i = 1, 2, 3, there is obviously a smooth function ξi on U satisfying
∂ξi ∂ξ0
= −γ0i − i . (7.9.16)
∂t ∂x
Then, using (7.9.16) and the second equation of (7.9.15) we can derive that
∂ ∂ξ ∂
∂c ∂ c ξi = ∂c ∂ c i = −∂c ∂ c γ0i − i ∂c ∂ c ξ0 = 0 ,
∂t ∂t ∂x
where (7.9.9) and the first equation in (7.9.15) are used in the last step. Thus, the right side
of (7.9.17) is independent of t on U0 , and thus there exist smooth functions X i (i = 1, 2, 3)
that satisfy on U0
∂ X i ∂γ0i 1 ∂γ00
= 0, ∇ 2 X i = − + − ∇ 2 ξi . (7.9.18)
∂t ∂t 2 ∂xi
Combining (7.9.16), (7.9.17) and (7.9.18), we can see that each function ξi + X i on U0
satisfies
∂(ξi + X i ) ∂ξ0
+ i = −γ0i , ∂c ∂ c (ξi + X i ) = 0 . (7.9.19)
∂t ∂x
(3) For convenience, denote ξ ≡ (ξ1 , ξ2 , ξ3 ) and X ≡ (X 1 , X 2 , X 3 ). For example, the nota-
∂ξ
· ξ can be regarded as an abbreviation of δ i j ij . From the first equation in (7.9.19),
tion ∇ ∂x
we obtain on U0
7.9 Gravitational Radiation 301
∂γ
· ∂ξ + ∇
∇ · ∂ X + ∇ 2 ξ0 = −δ i j 0 j . (7.9.20)
∂t ∂t ∂xi
Then, one can find on U0 that
∂ 1 · ξ − ∇
· X = 0 .
− (γ00 + γ ) − ∇
∂t 2
[The reader should complete the proof. Hint: use (7.9.20), (7.9.1) and (7.9.15)]. Thus,
· ξ − ∇
− 21 (γ00 + γ ) − ∇ · X is independent of t when restricted to U . This allows us to
0
find a function φ defined on an open neighborhood Uφ ⊂ U0 of p such that
∂φ 1 · ξ + ∇
· X .
= 0, ∇2φ = (γ00 + γ ) + ∇ (7.9.21)
∂t 2
(4) Applying ∇ 2 on both sides of the second equation in (7.9.21), one finds on Uφ that
∇2∇2φ = 0 . (7.9.22)
[The reader should complete the proof. Hint: use (7.9.18), (7.9.1) and (7.9.9)]. This is
· ∇∇
equivalent to say that ∇ 2 φ = 0, namely the 3-vector field ∇∇ 2 φ is divergence-free.
Thus, there exists an open neighborhood Uφ ⊂ Uφ of p diffeomorphic to R4 such that
2φ = ∇
∇∇ × Y is satisfied on U for some 3-vector field Y defined on U . Since φ does
φ
not depend on t when restricted on Uφ , we can require that Y does not depend on t on Uφ .
Thus, there exists a 3-vector field X on U which is independent of t such that ∇ 2 X = Y is
satisfied on an open neighborhood U ⊂ Uφ of p. Then, we have on U that
× X − ∇φ)
∇ 2 (∇ = 0. (7.9.23)
(5) So far we have introduced a series of functions and 3-vector fields, whose domains
are open neighborhoods of p. Since we do not care about their behaviors outside these
neighborhoods, they can be extended arbitrarily to smooth functions or 3-vector fields on U .
Thus, from now on, all concerned functions and 3-vector fields are defined on U , while the
equations they satisfy are valid on U ⊂ U .
Now we define a 3-vector field ξ = (ξ1 , ξ2 , ξ3 ) on U as follows:
ξ = ξ + X − ∇φ × X .
+∇ (7.9.24)
When restricted to U ⊂ Uφ ⊂ Uφ ⊂ U0 ⊂ U , both φ and X are independent of t, and so
(7.9.19) gives
∂ξi ∂ξ0
+ i = −γ0i , (7.9.25)
∂t ∂x
× X − ∇φ
∂c ∂ c ξ = ∂c ∂ c ∇ × X − ∇φ
= ∇2 ∇ = 0, (7.9.26)
where (7.9.23) is used in the last step of (7.9.26). Using the second equation in (7.9.21), we
have
· ξ = ∇
∇ · X − ∇ 2 φ = − 1 (γ00 + γ ) .
· ξ + ∇
2
302 7 Foundations of General Relativity
∂ξν ∂ξ0 · ξ = − 1 γ .
∂a ξ a = ημν =− +∇ (7.9.27)
∂xμ ∂t 2
Similarly, the wave equations (7.9.15) and (7.9.26) for ξ0 and ξ can be combined into
∂c ∂ c ξa = 0 . (7.9.28)
Finally, the second equation in (7.9.15) can be combined with (7.9.25) into
∂ξν ∂ξ0
+ ν = −γ0ν . (7.9.29)
∂t ∂x
(6) Now let us consider the tensor field γab = γ + ∂ ξ + ∂ ξ on U . It follows that
ab a b b a
= γ + 2 ∂ ξ a . From now on the equations will be restricted on U . First, we
γ = ηab γab a
can see that γ = 0 due to (7.9.27). From (7.9.27) and (7.9.28) we obtain that ∂ b γab =
b 1
∂ γab − 2 ∂a γ = 0, i.e., γab satisfies the Lorenz gauge condition. Also, it follows from
(7.9.29) that
∂ξν ∂ξ0
γ0ν = γ0ν + + ν = 0, ν = 0, 1, 2, 3 .
∂t ∂x
satisfies
Having these, we have proved the existence of a gauge transformation such that γab
the TT gauge condition on U .
[The End of Optional Reading 7.9.1]
The source-free linearized Einstein equation is a good description for the gravitational
waves emitted by a source far away from an observer. In Sect. 7.8, we have seen
that under a gauge transformation γab satisfies the Lorenz gauge condition. Then,
according to Corollary 7.9.3, a further gauge transformation can make it satisfy the
transverse-traceless (TT) gauge condition at least in an open neighborhood of the
observer. Under the TT gauge condition, now we will investigate wave solutions of
the source-free linearized Einstein equation.
In the TT gauge, γab satisfies (7.9.8). The traceless condition reduces the Lorenz
gauge condition to ∂ b γab = 0, and the source-free linearized Einstein equation
becomes (7.9.11). From now on, all the equations in this subsection are valid on
an open neighborhood U of the observer’s world line, on which a flat Lorentzian
metric ηab is defined, whose components in a coordinate system {x μ } are ημν . Then,
the ordinary derivative operator ∂a of the coordinate system {x μ } is the derivative
operator associated with ηab .
As an ansatz, let us consider a solution to (7.9.11) of the following form:
∂b K a = 0 , ∂c Hab = 0 . (7.9.31)
In other words, all the components of K a and Hab in {x μ } are constants. Note that
γab in (7.9.30) remains unchanged if we replace f by C f and Hab by Hab /C for any
nonzero constant C. Hence, if the range of f is bounded, we can assume that −1
f 1 with | f (K μ x μ )| = 1 at some spacetime point. In this way, Hab represents the
amplitude of the wave solution (7.9.30), called the polarization tensor, and K a will
be the wave 4-vector for a gravitational wave.
Noticing that
∂c (K μ x μ ) = K μ ∂c x μ = K μ (dx μ )c = K c , (7.9.32)
we have
where f and f are the first and the second order derivatives of f , respectively.
Hence, to obtain a solution that is nonzero and non-constant, we should consider
f = 0, f = 0 and Hab = 0 (meaning that they are not identically zero, with possible
zero points). Then, the TT gauge condition is now equivalent to
K c K c Hab f (K μ x μ ) = 0 . (7.9.35)
Since ∂c ∂d γab = 0, it follows from (7.8.5) that the first-order Riemann curvature of
gab = ηab + γab vanishes, i.e., gab is a flat metric in the linear (first-order) approxima-
tion. Then, there exists a coordinate system {x μ } such that gab = ημν (dx μ )a (dx ν )b
(see the first footnote in Sect. 7.9.1) in the linear approximation. (Note that the above
coordinate transformation does not correspond to a gauge transformation described
in Optional Reading 7.8.1). Therefore, a solution of the form (7.9.36) turns out to be
304 7 Foundations of General Relativity
a trivial solution at least in the linear approximation, and hence it is not regarded as
having any physical effect.
From now on we will assume f = 0. In this case, (7.9.35) implies
K a Ka = 0 . (7.9.37)
where (7.9.34) is used in the third equality. This indicates that in the linear approx-
imation, a wavefront S is a null surface with respect to either ηab or gab . In
other words, gravitational waves described by (7.9.30) propagate at the speed of
light in vacuum just in a way similar to electromagnetic waves in Sect. 6.6.5, see
Fig. 7.15. (gab = ηab + γab and its curvature R a bcd correspond to the electromagnetic
4-potential Aa and the electromagnetic field Fab , respectively).
To demonstrate an important property of this plane wave solution, let us define
K̃ a = g ab K b (note that in linearized gravity we stipulate that K a = ηab K b ). In the
linear approximation we have g ab = ηab − f (K μ x μ )H ab , and thus up to higher order
terms,
K̃ a = ηab K b − f (K μ x μ )H ab K b = K a ,
where (7.9.34) is used in the second equality. By means of the linearity and the
Leibniz rule of L K̃ , the Lie derivative of gab = ηab + f (K μ x μ )Hab can be written
as
Using (4.2.8) and setting the ∇a therein to the ordinary derivative ∂a in {x μ }, we have
where (7.9.31) is used in the last step. Similarly one finds L K ηab = 0. Then,
∇a K b = 0 , i.e., ∇a K̃ b = 0 . (7.9.39)
This indicates that the rays of these gravitational plane waves are parallel to each
other. Therefore, they are called plane-fronted gravitational waves with parallel
rays, or pp-waves for short.20 Generally speaking, any spacetime that admits a
nonzero null vector field K̃ a satisfying ∇a K̃ b = 0 is called a pp-wave, see Stephani
et al. (2003).
Now we will find this wave solution explicitly. Since K a is nonzero, in a Lorentzian
coordinate system {x μ } (with t ≡ x 0 ) it can be decomposed as
K a = ω(∂/∂t)a + k a , (7.9.40)
then ω and k a can be interpreted as the angular frequency and the wave 3-vector,
respectively. Also, K a being null indicates that ω2 = k a ka ≡ k 2 . One can further
choose {x μ } such that k a is in the z-direction (z ≡ x 3 ), i.e., the wavefront of each
time t is a constant-z plane (the phase K μ x μ = −ωt + kz at t is only a function of
z). Then, K a can be expressed as
Thus, among the components Hμν of Hab , the nonvanishing ones can only be H11 =
−H22 and H12 = H21 . Therefore, Hab can be written as
(+) (×)
Hab = H11 Hab + H12 Hab , (7.9.43)
20 Notice that gab = ηab + f (K μ x μ )Hab is a pp-wave only in the linear approximation.
306 7 Foundations of General Relativity
where
(+)
Hab = (dx 1 )a (dx 1 )b − (dx 2 )a (dx 2 )b , (7.9.44)
(×)
Hab = (dx )a (dx )b + (dx )a (dx )b .
1 2 2 1
(7.9.45)
In (7.9.43), H11 and H12 are arbitrary real numbers, corresponding to the two degrees
of freedom we discussed in Sect. 7.9.1 by counting the degrees of freedom. Corre-
spondingly, if we define
(+) (+)
γab = f (K μ x μ )Hab = f (−ωt + kz)[(dx 1 )a (dx 1 )b − (dx 2 )a (dx 2 )b ] , (7.9.46)
(×) (×)
γab = f (K μ x μ )Hab = f (−ωt + kz)[(dx 1 )a (dx 2 )b + (dx 2 )a (dx 1 )b ] , (7.9.47)
Plugging the above solution into (7.8.5) yields the linearized Riemann curvature
tensor
(1)
Racbd = (K d K [a Hc]b − K b K [a Hc]d ) f (K μ x μ ) .
To verify that the curvature is indeed nonzero, we can decompose it into two terms:
(1) (1)(+) (1)(×)
Racbd = H11 Racbd + H12 Racbd , (7.9.49)
where
(1)(+) (+) (+)
Racbd = K d K [a Hc]b − K b K [a Hc]d f (K μ x μ ) , (7.9.50)
(1)(×) (×) (×) μ
Racbd = K d K [a Hc]b − K b K [a Hc]d f (K μ x ) . (7.9.51)
(1)(+)
It is obvious to see that Racbd = 0 since, for example,
d
(1)(+) ∂ f (−ωt + kz)
Racbd = K b [K c (dx 1 )a − K a (dx 1 )c ] ,
∂x1 2
(+) (×)
have an arbitrary polarized mode with Hab = α Hab + β Hab satisfying α 2 + β 2 =
1. All these polarization modes are on an equal footing. In fact, if we set
1
x 0 = x 0 = t , x 1 = √ (x 1 + x 2 ) ,
2
(7.9.52)
1
x 3 = x 3 = z , x 2 = √ (−x 1 + x 2 ) ,
2
(+) (×)
Thus, in the new coordinate system {x μ }, gravitational waves γab and γab are now
cross-polarized and plus-polarized, respectively, which shows that the plus-polarized
and cross-polarized modes are equivalent up to a choice of the Lorentzian coordinate
system.
[Optional Reading 7.9.2]
To see more precisely that all the polarization modes of a gravitational wave are on an
equal footing, we define the following vector fields:
(+) a (+) a
(e0 )a = (∂/∂t)a , e1 = (∂/∂ x 1 )a , e2 = (∂/∂ x 2 )a ,
(×) a 1 (+) a (+) a (×) a 1 (+) a (+) a
e1 = (∂/∂ x 1 )a = √ e1 + e2 , e2 = (∂/∂ x 2 )a = √ − e1 + e2 ,
2 2
We can see that ① (e0 )a , K a and their linear combinations, such as (e3 )a = ω K − (e0 ) ,
1 a a
(+) a (+) a
are all eigenvectors of both (H (+) )a b and (H (×) )a b with eigenvalue 0; ② (e1 ) and (e2 )
are eigenvectors of (H (+) )a b with eigenvalues ±1, respectively; ③ (e1(×) )a and (e2(×) )a are
eigenvectors of (H (×) )a b with eigenvalues ±1, respectively.
For the polarization tensor Hab expressed in (7.9.43), we can set H ≡ (H11 )2 + (H12 )2
and 0 ψ < π such that H11 = H cos 2ψ and H12 = H sin 2ψ. Then, (7.9.43) can be
written as
(ψ) (ψ) (+) (×)
Hab = H Hab , where Hab = Hab cos 2ψ + Hab sin 2ψ . (7.9.55)
a
It is easy to see that K a , (e0 )a and their linear combinations are all eigenvectors of H (ψ) b
with eigenvalue 0. Moreover,
308 7 Foundations of General Relativity
Now let us discuss the physical effect of polarized gravitational waves. Consider
the following monochromatic gravitational plane wave solution:
t = t (τ ) , x 1 = x 1 (τ ) = a cos ϕ ,
(7.9.60)
z = z(τ ) = 0 , x 2 = x 2 (τ ) = a sin ϕ ,
7.9 Gravitational Radiation 309
where a > 0 is a constant. When there is no gravitational wave, these particles are
located along a circle of radius a at rest in the reference frame of {x μ }. When the
gravitational wave of the form (7.9.59) passes through this region, the metric becomes
It can be proved that for the particles on the circle described by (7.9.60), their world
lines are still geodesics with respect to gab . [See Exercise 7.10. In fact, the result
of which shows that the t-coordinate lines for any gravitational wave of the form
(7.9.30) are geodesics]. However, the coordinates x 1 and x 2 are no longer the spatial
Cartesian coordinates of these particles at a time t. Instead, their spatial Cartesian
coordinates at t are now
√ √
y 1 = x 1 1 + h cos ωt , y 2 = x 2 1 − h cos ωt
and z. From the parametric equations of the world lines of these particles, we can
see that they are located along an ellipse at t, described by
2 2
y1 y2
√ + √ = 1, z = 0. (7.9.61)
a 1 + h cos ωt a 1 − h cos ωt
For any integer n, when 2nπ − π2 ωt 2nπ + π2 , the major and the minor axes
of the ellipse are along the x 1 -axis and the x 2 -axis, respectively; when 2nπ + π2
ωt 2nπ + 3π 2
, the major and the minor axes of the ellipse are exchanged, along the
x 2 -axis and x 1 -axis, respectively. Therefore, as the gravitational wave passes through,
these particles are located along an oscillating ellipse, as shown in Fig. 7.16. The
eccentricity of the ellipse at t can be calculated as
2h| cos ωt| ∼
egrav (t) = = 2h| cos ωt| . (7.9.62)
1 + h| cos ωt|
x2 x2 x2 x2
x1 x1 x1 x1
ωt = 0 ωt = 12– π ωt = π ωt = 32– π
Fig. 7.16 The effect of a linearly polarized gravitational plane wave on a circle in one period
310 7 Foundations of General Relativity
√
2h ∼
Hence, the maximum value of egrav (t) is 1+h = 2h, which only depends on the
amplitude h. It is important that the directions of the major and the minor axes are
eigenvectors of the polarization tensor, which can be referred to as the polarization
directions of the gravitational wave. From the viewpoint of continuum mechanics,
the effect of a weak gravitational wave can be regarded as a strain.
[Optional Reading 7.9.3]
The polarization modes of gravitational waves we discussed above are analogous to
the linear polarization modes of electromagnetic waves, whose polarization directions are
fixed. As we know, electromagnetic waves can be circularly/elliptically polarized. Similarly,
gravitational waves can also be circularly/elliptically polarized. For example, given two
nonzero constants h (+) and h (×) , the gravitational wave described by
(+) (×)
γab = h (+) Hab cos(−ωt + kz) + h (×) Hab sin(−ωt + kz) (7.9.63)
is elliptically polarized. It can be seen from (7.9.55) and (7.9.56) that when h (+) h (×) > 0,
the angular velocity of the polarization (i.e., the angular velocity of the eigenvector with
eigenvalue +1) along the propagation direction is −ω/2; when h (+) h (×) < 0, the angular
velocity of the polarization is ω/2.
Notice that the metric gab = ηab + γab with the γab given in (7.9.63) does not abide by the
ansatz (7.9.30), and thus the conclusions for (7.9.30) may not be applicable to it. However,
one can show that (exercise) in the linear approximation, the spacetime corresponding to
(7.9.63) is still a pp-wave, and the t-coordinate lines are still geodesics.
[The End of Optional Reading 7.9.3]
Minkowski spacetime (R4 , ηab ). Let u ≡ t − z, and f (u) and g(u) be two arbitrary smooth
functions of u with the only requirement being that f 2 + g 2 is nonvanishing. Suppose P is
a function of the coordinates x, y and u defined as follows:
1
P(x, y, u) = f (u)(x 2 − y 2 ) + g(u)x y . (7.9.64)
2
It is not difficult to verify that
gab := ηab + 2P(du)a (du)b = ηab + 2P[(dt)a − (dz)a ][(dt)b − (dz)b ] (7.9.65)
is a Lorentzian metric field on R4 . Firstly, it can be easily seen from the above equation that
gab is symmetric. Secondly, let
The matrix being invertible indicates that gab is non-degenerate, and thus is a metric tensor
field. It is not difficult to see that it has the Lorentzian signature. The above discussion
indicates that (R4 , gab ) is a spacetime, which has the same base manifold as R4 but has a
different metric field. By calculating the curvature tensor we can see that this is a curved
spacetime (see Proposition 7.9.4). The inverse matrix of (7.9.68) equipped with the basis
vectors in (7.9.67) gives
g ab = (∂/∂ x)a (∂/∂ x)b + (∂/∂ y)a (∂/∂ y)b − (1 + 2P)(∂/∂t)a (∂/∂t)b
+ (1 − 2P)(∂/∂z)a (∂/∂z)b − 2P[(∂/∂t)a (∂/∂z)b + (∂/∂z)a (∂/∂t)b ] .
(7.9.69)
We will use g ab and gab to raise and lower indices.
Proposition 7.9.4 The gab defined by (7.9.65) is a non-flat solution to the vacuum Einstein
equation.
Proof First we compute the Riemann tensor Rabc d of gab using the tetrad method introduced
in Sect. 5.7. Step one: choose the tetrad in (7.9.67). It follows from (7.9.68) that this is a
rigid tetrad (although not orthonormal). It is easy to verify that its dual tetrad reads
1
(e1 )a = (dx)a , (e2 )a = (dy)a , (e3 )a = [(dt)a + (dz)a ] − P(du)a , (e4 )a = (du)a .
2
(7.9.70)
Step two: compute the connection 1-forms using Theorem 5.7.4. One finds that there are
only four nonvanishing ωμν :
312 7 Foundations of General Relativity
From the inverse of (7.9.68) we can see that the components g μν of g ab in the dual basis
can also be arranged into the matrix on the right-hand side of (7.9.68), and hence it follows
from ωμ ρ = g ρν ωμν that the nonvanishing ωμ ρ are
R4 1 = R1 3 = f dx ∧ du + gdy ∧ du = f e1 ∧ e4 + ge2 ∧ e4 ,
(7.9.73)
R4 2 = R2 3 = gdx ∧ du − f dy ∧ du = ge1 ∧ e4 − f e2 ∧ e4 .
Rabc d = Rab1 3 (e1 )c (e3 )d + Rab2 3 (e2 )c (e3 )d + Rab4 1 (e4 )c (e1 )d + Rab4 2 (e4 )c (e2 )d
= [ f (e1 )a ∧ (e4 )b + g(e2 )a ∧ (e4 )b ][(e1 )c (e3 )d + (e4 )c (e1 )d ]
+ [g(e1 )a ∧ (e4 )b − f (e2 )a ∧ (e4 )b ][(e2 )c (e3 )d + (e4 )c (e2 )d ] .
(7.9.74)
This is a nonvanishing tensor, since at least one of the following components is nonvanishing
(the requirement for f and g is that f 2 + g 2 is nonvanishing):
R414 1 = Rabc d (e4 )a (e1 )b (e4 )c (e1 )d = − f , R424 1 = Rabc d (e4 )a (e2 )b (e4 )c (e1 )d = −g .
This indicates that (R4 , gab ) is not a flat spacetime. It is easy to find the Ricci tensor from
(7.9.74):
Rac = Rabc b = ( f − f )(e4 )a (e4 )c = 0 ,
and thus gab is a solution to the vacuum21 Einstein equation.
For later use, we can also derive Rabcd from (7.9.74), see the following proposition:
Proposition 7.9.5
Proof Exercise 7.11. Hint: use Rabcd = gde Rabc e , and notice that
gde (e3 )e ≡ (e3 )d = g3μ (eμ )d = g34 (e4 )d = −(e4 )d , gde (e1 )e ≡ (e1 )d = g11 (e1 )d = (e1 )d .
Given the importance of the null vector K a in the propagation of gravitational waves, let us
prove the following proposition:
21 In fact, this equation is Rac = −(∂1 ∂1 P + ∂2 ∂2 P)(e4 )a (e4 )c . P taking the specific form in
(7.9.64) makes ∂1 ∂1 P = f = −∂2 ∂2 P, which assures Rac = 0.
7.9 Gravitational Radiation 313
Proposition 7.9.6 Suppose ∇b is the torsion-free derivative operator associated with the
gab in (7.9.65), then ∇b K a = 0.
Proof Adopt the tetrad in (7.9.67) as well as its dual tetrad (7.9.70) and notice that
K a = (e3 )a . It follows from (5.7.4) that ω3 ν a = −γ ν 3τ (eτ )a , ν = 1, 2, 3, 4. Since the non-
vanishing ωμ ν a are shown in (7.9.72), we have ω3 ν a = 0, ν = 1, 2, 3, 4. Thus, from the
above equation we get γ ν 3τ = 0, ν, τ = 1, 2, 3, 4, and hence it follows from (5.7.1) that
Since (eτ )b is an arbitrary basis vector, the above equation indicates that ∇b (e3 )a = 0.
Noticing that (e3 )a = K a , we have ∇b K a = 0.
From Proposition 7.9.6 we obtain K b ∇b K a = 0 and ∇(a K b) = 0, and thus ① the integral
curves of K a are (null) geodesics; ② K a is a Killing vector field.
The above discussions are purely mathematical. Physically speaking, the curved space-
time (R4 , gab ) represents a gravitational plane wave. It follows from (7.9.65) that P is
the only available quantity that determines (R4 , gab ), and thus the first thing we should
investigate when studying gravitational waves is the function P(x, y, u). To facilitate under-
standing, we first look at a simple example. Suppose f (u) and g(u) can be expressed as
The allure of the above equation is that it looks like some kind of monochromatic plane
wave. However, notice that although (∂/∂t)a and (∂/∂z)a are respectively timelike and
spacelike vector field when measured by ηab , this is not necessarily true when measured by
gab . If (∂/∂t)a were not timelike or (∂/∂z)a were not spacelike, one could not treat t and z as
time and spatial coordinates, and the wave interpretation of (7.9.77) would become unclear.
Fortunately, it can be proved that there indeed exist certain spacetime regions in (R4 , gab ),
where (∂/∂t)a and (∂/∂z)a are timelike and spacelike when measured by gab , and thus at
least in these regions we can interpret (7.9.77) as a monochromatic gravitational plane wave
propagating along the z-direction at the speed of light c = 1. The product of K a defined by
(7.9.66) and ω can be interpreted as the wave 4-vector ωK a , since (7.9.66) indicates that
the time and spatial components of ωK a in the coordinate system {t, x, y, z} are the angular
frequency ω and the wave 3-vector k measured in this system:
ωK 0 = ω , ωK 1 = ωK 2 = 0 , ωK 3 = k = ω .
K a (and hence ωK a ) being null reflects the fact that the phase ωu of the above gravitational
wave propagates at the speed of light, see Fig. 7.15 (in which K a should be substituted
by ωK a ). Suppose G 1 and G 2 are two inertial observers (measured by ηab ), whose spatial
coordinates are (x, y, z 1 ) and (x, y, z 2 ), respectively. They have different phases at the time
t1 , which are ωt1 − kz 1 and ωt1 − kz 2 . Suppose after some amount of time t2 − t1 , G 2
“acquires” the phase of G 1 at t1 , i.e.,
ωt2 − kz 2 = ωt1 − kz 1 ,
then we say that the value of the phase ωt1 − kz 1 propagates from G 1 to G 2 in a time interval
t2 − t1 , and so the speed of the propagation is
314 7 Foundations of General Relativity
Ka
p 1= (t1,x,y,z1 )
z2 − z1 ω
v= = = 1.
t2 − t1 k
Thus, the speed of the propagation of gravitational waves is the speed of light. (This is
only the coordinate speed, what is more meaningful in the geometric language is the phase
velocity. The wavefront being null in the 4-dimensional language assures that this phase
velocity is the speed of light). Figure 7.17 is a 4-dimensional illustration of this discussion,
in which γ is an integral curve of the null 4-vector K a (a null geodesic), and p1 and p2 are
the intersections of γ and the world lines of G 1 and G 2 . The phase value ωt1 − kz 1 at p1 is
“acquired” by G 2 at p2 : the phase propagates from p1 to p2 along the null geodesic. Note
that the physical interpretation above can only apply to some certain regions of (R4 , gab ),
where (∂/∂t)a is timelike and (∂/∂z)a is spacelike. However, now we can pull out the non-
intrinsic factors such as observers and coordinates and only leave the null geodesic γ and
two arbitrary points p1 and p2 on it. In this way, the wave interpretation can be carried
over to the whole spacetime. In fact, K a represents the direction of the propagation of all
the information (not only the phase) of the gravitational wave. The reason is as follows: as
K a is a Killing vector field, its corresponding one-parameter group of diffeomorphisms is
a one-parameter group of isometries, and the integral curves of K a are exactly the orbits
of this isometry group. Suppose U2 is an arbitrary neighborhood of p2 (see Fig. 7.18) ,
then there must exist a neighborhood U1 of p1 and an isometry φ : U1 → U2 such that
p2 = φ( p1 ). Therefore, any information about the gravitational wave in U2 is completely
contained in U1 (due to the isometry). In this sense, we can say that all the information
of the gravitational wave propagates along K a (at the speed of light). This interpretation
based on the isometries can be applied to not only the special case in (7.9.76), but also
the gab defined by (7.9.64) [in which f (u) and g(u) are arbitrary] and (7.9.65). Hence,
we say that there exists a gravitational plane wave in the spacetime (R4 , gab ), or refer to
(R4 , gab ) as a gravitational plane wave spacetime. Sachs and Wu (1977) also provides
a deeper argument for this gravitational plane wave interpretation from the perspective of
group theory by comparing it with the electromagnetic plane waves in Minkowski spacetime.
Furthermore, Proposition 7.9.6 indicates that gab is a pp-wave.22 When f and g are linearly
dependent, then (R4 , gab ) is called a monochromatic gravitational plane wave spacetime.
22 In fact, the metric for any pp-wave can be expressed in the Brinkmann coordinate system in the
following general form:
where P is an arbitrary smooth function. It is not difficult to see that this is equivalent to (7.9.65)
(by setting v = t−z
2 ), and taking P to be of the form (7.9.64) is just a special case.
7.9 Gravitational Radiation 315
p1
U1
To further understand the gravitational wave of (R4 , gab ), we supplement the above with
the following propositions and remarks. For generality, we do not put any constraint on the
the form of the function P(x, y, u) in the following two propositions.
Proposition 7.9.7 Let ∇a represent the derivative operator associated with gab in (7.9.65),
then
∂2 P ∂2 P
∇ a ∇a P = + . (7.9.78)
∂x 2 ∂ y2
∂2 P ∂2 P
+ = 0,
∂x 2 ∂ y2
and hence ∇ a ∇a P = 0, i.e., the P(x, y, u) in (7.9.64) is a solution to the source-free wave
equation in curved spacetime. Together with Rac = 0 (i.e., gab satisfies the vacuum Einstein
equation), we can see the legitimacy of the statement “the curved spacetime (R4 , gab ) rep-
resents a gravitational wave in vacuum”. This also shows (at least partially) the motivation
for taking P to be of the form in (7.9.64).
Proposition 7.9.8 The constant-u surfaces in (R4 , gab ) are null hypersurfaces.
Proof It follows from (7.9.67) that K a = gab K b = gab (e3 )b . Following the derivation in
(2.6.10a) we get gab (e3 )b = g3μ (eμ )a , and hence
where in the last step we used (7.9.70). Noticing that ∇a u is a normal covector of a constant-u
surface, we can see that its normal vector ∇ a u = −K a is null.
316 7 Foundations of General Relativity
Remark 2 In the special case of (7.9.76), ωu = ωt − kz represents the phase of the wave,
while ω is a constant, and hence a constant-u surface is a 3-dimensional wavefront S in
the 4-dimensional language. S being a null hypersurface indicates that the gravitational
wave in (7.9.76) propagates at the speed of light. Proposition 7.9.8 guarantees that the
constant-u surfaces are still null hypersurfaces (still have K a as the normal vector) for
general P = P(x, y, u). Therefore, one may regard u as some kind of (generalized) phase,
and the constant-u surfaces being hypersurfaces indicates that the phase velocity of the
gravitational wave represented by this general P(x, y, u) is still the speed of light.
Now we introduce the emission of gravitational waves. First, let us make a comparison
with electromagnetic waves. If a charged particle in a system undergoes a non-
uniform velocity (relative to an inertial frame), it will emit electromagnetic waves.
As is well-known, the major contribution to the radiation field comes from electric
dipole radiation, which is much stronger than the magnetic dipole radiation and
electric quadruple radiation (these two are of the same order). Similarly, under the
Newtonian approximation, if a point mass in a system undergoes a non-uniform
velocity, it will emit gravitational waves. What corresponds to the electric dipole
moment is the mass dipole moment
=
D m p rP , (7.9.80)
P
where m P and rP are the mass and position vector of the point mass P, and the right-
hand side of the above equation is summed over all the point masses in the system.
Since the intensity of electric dipole radiation is proportional to the square of the
second order time derivative of the electric dipole moment, one may expect that the
contribution from the mass dipole moment to the intensity of gravitational radiation
¨ However, from (7.9.80) we can see that D
is proportional to D. ˙ = m r˙ is
P P P
equal to the total momentum p of the system; it follows from the conservation of
momentum that p˙, and thus D ¨ = 0, i.e., gravitational waves do not include grav-
itational dipole radiation corresponding to electric dipole radiation. According to
the theory of electromagnetic radiation, the intensity of magnetic dipole radiation is
proportional to the square of the second order time derivative of the magnetic dipole
moment. The quantity in a gravitational system corresponding to the magnetic dipole
moment is
μ
= rP × (m P u P ) ,
P
where u P is the velocity of the point mass P, and m P u P is the current contribu-
tion of P. The right-hand side of the above equation is nothing but the total angular
7.9 Gravitational Radiation 317
momentum of the system. It follows from the conservation law of angular momen-
tum that μ ˙ = 0, and hence gravitational waves do not include gravitational dipole
radiation corresponding to magnetic dipole radiation either. In short, there does not
exist any dipole radiation in gravitational waves. One can only get a nonvanishing
result when studying quadrupole radiation [see Misner et al. (1973) pp. 974–978 for
details]. Since the order of quadrupole radiation is higher than dipole radiation, the
gravitational waves emitted from a gravitational system are weaker than the electro-
magnetic waves emitted by an electromagnetic system in a similar condition.
The source emitting a strong gravitational wave is usually considered to be related
to a dramatic change of an astrophysical or cosmological process, such as the col-
lapse of a star that is not spherically symmetric,23 a supernova explosion (see Sect.
9.3.2), the dramatic disturbance inside an active galactic nucleus, the merger of
a pair of black holes or neutron stars, cosmic inflation (see Chap. 15), etc. [See
Cai et al. (2017) for a review of different sources of gravitational waves]. In these
cases the gravitational field is not weak, and thus the linear approximation is not
applicable. The rigorous analysis of these process must involve the arduous task of
solving the nonlinear Einstein equation in a non-spherically symmetric case. The
emission of gravitational waves is still a problem that has not been fully compre-
hended. Nowadays, the understanding of this problem has been furthered with the
help of numerical analysis and computational simulation, which has developed into
an important branch called numerical relativity.
Since general relativity predicts the physical existence of gravitational radiation, the
detection of gravitational waves becomes a significant subject. As we have discussed,
sources for the gravitational waves that reach the solar system are all very far away.
Hence, the gravitational waves being detected can be totally regarded as plane waves,
and they are so weak that the linear approximation is applicable. Unfortunately, this
also makes it very difficult to directly detect a gravitational wave on or near the Earth.
(The currently observed gravitational waves have amplitudes as small as h ∼ 10−21 ).
Due to such a difficulty, there were no direct observations of gravitational waves until
2015, although Joseph Weber initiated the detection of gravitational waves early in
the 1960s. In the 20th century, evidence of the existence for gravitational waves
merely came from indirect detections, among which the most important one is the
observation of binary pulsars.
23 According to Birkhoff’s theorem (see Sect. 8.3.3), the spherical evolution of any spherically
symmetric star (such as collapse and oscillation) will not emit a gravitational wave no matter
how dramatic it is, just like there does not exist a spherically symmetric electromagnetic wave in
Maxwell’s theory. (The spherical wave of an oscillating electric dipole in a distant region is not a
spherically symmetric electromagnetic wave, since the fields E and B are not spherically symmetric.
In fact, a spherically symmetric electromagnetic wave corresponds to the radiation of an electric
monopole, but this kind of radiation does not exist in Maxwell’s theory).
318 7 Foundations of General Relativity
A pulsar is a rapidly rotating neutron star (see Sect. 9.3.2), which has a mechanism
of emitting electromagnetic waves. If the Earth lies in the sweeping range of a beam
of radiation, then one can receive radio pulse signals with a precise period. An
approximately isolated gravitational system formed by two stars orbiting around
their center of mass is called a binary star, which emits gravitational waves due to
the accelerating motion of the two stars. Like electromagnetic waves, gravitational
waves carry energy and momentum as well as angular momentum when they are
emitted. As a consequence, the radii of the orbits of the stars become smaller and
smaller, and the period becomes shorter and shorter. However, unlike many other
astrophysical processes, the emission of gravitational waves from a binary system
is very weak, and so the linearized theory of gravity can be applied to calculate the
loss of energy and the change of the orbital period. In order to be detectable, these
effects need to satisfy at least two conditions: ① the orbit is sufficiently small (i.e.,
the two stars are close enough), such that the effect of general relativity is evident;
② A method for measuring the orbital period with rather high precision is available.
The binary pulsar PSR 1913+16 discovered by R. A. Hulse and J. H. Taylor in 1974
happens to satisfy these two conditions. [A binary pulsar is a binary that contains
a pulsar, PSR is the identifier for pulsars, while 1913 and +16 stands for its right
ascension and declination (angular coordinates)]. The maximum distance between
the two stars in this binary is only about 3 × 109 m (about 4.8 solar radii) which
satisfies the condition ①; the pulsar in the binary makes it satisfy the condition ②:
since the period of the radio signal emitted from a pulsar is reputed to be “as precise
as the tick of a clock”, one can use this to record how its orbital period changes,
and compare with the result calculated from general relativity. If the observation
agrees with the calculation on account of gravitational waves, it will be evidence
for the existence of gravitational waves. Taylor and collaborators carried out this
observation with extraordinarily high accuracy and obtained the rate of change of
the orbital period. After thousands of observations, their results were announced in
1978, which agrees very well with the predictions calculated from the quadrupole
radiation formula in the linearized theory of gravity. This was the first quantitative
evidence of gravitational waves ever since gravitational waves were proposed, even
though it was indirect evidence. Hulse and Taylor were awarded the 1993 Nobel
Prize in Physics for this discovery.
The first attempt to directly detect gravitational waves was started by Joseph Weber
at the University of Maryland in 1966. He designed a resonant mass antenna for
detecting gravitational waves, called the Weber bar. It is a suspended aluminum
cylinder with length 153 cm and diameter 66 cm, which has a resonance frequency
of 1660 Hz. When a gravitational wave near the resonant frequency passes through
the Weber bar in a proper direction, the resonance of the bar will be excited, which will
amplify the vibration and could potentially be detected by piezoelectric sensors if the
change of the bar’s length is large enough. After years of efforts, Weber announced
that the evidence of gravitational waves was observed from the detectors in two
different locations. Unfortunately, Weber’s observation could not be confirmed by
the experiments of any other group [Ohanian and Ruffini (1994); Liu and Zhao
(2004)].
7.9 Gravitational Radiation 319
In the 1970s, there appeared another important type of gravitational wave detector,
namely a laser interferometer. An interferometer has two long orthogonal arms,
and the idea is similar to the resonant mass antenna, i.e., to detect the length pertur-
bations of its arms due to gravitational waves. However, the range of the detectable
frequencies of a laser interferometer is much wider instead of only near a resonance
frequency. Here we briefly review the principle of interferometers. An interferometer
consists of two mirrors and a beam splitter (see Fig. 7.19). When a laser beam is shot
to the beam splitter through the vertical arm in Fig. 7.19, part of it will be transmitted
while the remaining part will be reflected, and thus the laser will be divided into two
beams which propagate along the two arms of the interferometer. Each of the two
beams hits the mirror placed at the end of each arm and gets bounced back to the beam
splitter. After that the beams recombine and propagate towards the right through the
horizontal arm in Fig. 7.19, which will be received by the sensor at the end of the hor-
izontal arm. When there is no gravitational wave, the recombined beams are tuned
to have opposite phases (a crest meets a trough) by applying a waveplate, so that
the composite signal vanishes. However, when a gravitational wave comes by, the
lengths of the arms will change slightly (similar to the effect shown in Fig. 7.16) and
so the light paths of the two beams will change slightly, causing a nonvanishing com-
posite signal to be received by the sensor, called an interference signal. Therefore,
interference will be present when there is a gravitational wave passing by.
Based on the above idea, the study groups in MIT and Caltech started to jointly
build the Laser Interferometer Gravitational-Wave Observatory (LIGO) since
the 1980s (early discussions and attempts on interferometric detectors began in the
late 1960s). After decades of preparation, LIGO started its first operation in 2002.
However, it was not sensitive enough to detect any gravitational wave successfully.
In 2010, LIGO was shut down and upgraded into an improved version—Advanced
LIGO, whose sensitivity is about ten times its previous version. The operation of
320 7 Foundations of General Relativity
LIGO restarted in 2015. At 09:50:45 UTC on 14 September 2015, LIGO made the
first direct observation of gravitational waves [Abbott et al. (2016)]. The signal of
this event was named GW150914, which comes from a merger of two black holes
occurred 1.3 billion light-years away, with the amplitude of γμν (the components of
γab in a Lorentzian coordinate system) being so small that it is equivalent to changing
a length of 4 km by a thousandth of the width of a proton. Due to this unprecedented
observation, three leaders of LIGO, Rainer Weiss, Barry Barish and Kip Thorne,
were awarded the 2017 Nobel Prize in Physics.
To make precise detections, the LIGO observatory consists of two identical inter-
ferometers, located in Washington state and Louisiana state, USA, respectively. The
distance between them is about 3030 km over the Earth’s surface (the straight line
distance is about 3002 km). Besides making independent measurements, an impor-
tant utility of having two detectors far apart is to determine the location of the source
of the gravitational wave. Since the gravitational wave travels at the speed of light,
it would take 10 ms to propagate from one LIGO interferometer to the other. In the
GW150914 event, the time delay between the two detectors was 7 ms. Using this time
delay, the source of the signal can be located through triangulation. This is exactly
the principle of how human ears identify the location of the source of a sound wave.
Interestingly, the signal of GW150914 has a frequency varying between 35 Hz–
250 Hz, which happens to be inside the human audible range. In 2017, the Virgo
interferometer in Italy started to detect gravitational waves which provides “a third
ear” for locating the source of the gravitational wave more precisely. Furthermore,
having two identical LIGO detectors also helps to extract the actual gravitational
wave signal from the noise. Since the detectors are extremely sensitive, any vibra-
tion from the local environment will be recorded, and one of the challenges of the
detection is to remove these noises. By comparing the signals obtained by the two
detectors located far apart, one can filter out the random vibrations that do not happen
at both places, with the gravitational wave signals that are identical remaining. To
minimize the noises, LIGO also applied a series of mechanisms to isolate the vibra-
tions, including optics suspensions and seismic isolation, and many techniques in
the data analysis, such as matched filtering. The reader may refer to Saulson (2017)
for more technical details of noise reduction.
Since the first direct observation in 2015, there have already been numerous events
of direct observation of gravitational waves, mainly detected by LIGO and Virgo.
Nevertheless, now there are more and more gravitational wave detectors becom-
ing available or under preparation. For example, KAGRA (Kamioka Gravitational
Wave Detector) started its observation in 2020. Also, third-generation interferometric
detectors with longer arms and a greater sensitivity, such as the Einstein Telescope
and Cosmic Explorer, have been proposed and are expected to be available in the
2030s. Besides the ground-based interferometers, there are also multiple on-going
projects for space-based interferometric detectors, such as LISA (Laser Interferom-
eter Space Antenna), TianQin, Taiji, and DECIGO (Deci-hertz Interferometer Grav-
itational wave Observatory), where the long arms are replaced by the laser beams
between spacecrafts. Once available, they will be used to detect low-frequency grav-
itational waves. In addition to interferometric detectors, there are also other methods
7.9 Gravitational Radiation 321
Table 7.1 Methods of gravitational wave detection and their frequency bands
Frequency band Frequency range Detection method Current and future
observatories
High-frequency 10 Hz–106 Hz Ground-based LIGO, Virgo,
interferometer KAGRA, Einstein
Telescope, Cosmic
Explorer
Low-frequency 10−7 Hz–10 Hz Space-based LISA, TianQin, Taiji,
interferometer DECIGO
Very-low-frequency 10−10 Hz–10−7 Hz Pulsar timing array IPTA
Extremely-low- 10−18 Hz–10−14 Hz CMB polarization BICEP, AliCPT
frequency
of detecting gravitational waves, such as by using pulsar timing arrays [e.g., IPTA
(International Pulsar Timing Array)] one can detect very-low-frequency gravita-
tional waves, and by measuring the polarization pattern of the cosmic microwave
background (CMB) [e.g., BICEP (Background Imaging of Cosmic Extragalactic
Polarization), AliCPT (Ali CMB Polarization Telescope)] one can detect extremely-
low-frequency gravitational waves, including the primordial gravitational waves gen-
erated in the early universe (see Sect. 10.3). For a detailed introduction to the methods
of the pulsar timing array and CMB polarization, see, for example, Maggiore (2018).
The above-mentioned detecting methods and their corresponding frequency bands
are summarized in Table 7.1 [see also Chen et al. (2017)].
The observation of gravitational waves is significant not only because it con-
firmed the last undetected prediction of general relativity, but more importantly, it
also opened up a brand new window for observing the universe. Traditionally, people
could only make astronomical observations by detecting the electromagnetic waves in
different frequency bands. Now that gravitational waves can also be directly detected,
it enables more possibilities for astronomical observation. For example, since the
electromagnetic field interacts with matter, the electromagnetic waves from a distant
celestial object can be easily scattered or absorbed during the propagation. However,
the interaction between gravitational waves and matter is much more weaker, so it is
possible to observe celestial events we could not observe before (like the binary black
hole merger of GW150914). Furthermore, the earliest electromagnetic radiation we
can observe is the cosmic microwave background radiation when photon decou-
pling occurred (see Sect. 10.3), but through gravitational waves it is now possible to
make observations of the early universe. With these prospects, gravitational-wave
astronomy is currently emerging, and hopefully it will lead to more revolutionary
discoveries of the universe in the near future.
[Optional Reading 7.9.5]
Using the example of gravitational plane waves in Optional Reading 7.9.2, we will now
introduce the mechanism of receiving gravitational waves in a geodesic reference frame
(where the world lines of the observers are geodesics) [see also Sachs and Wu (1977)]. In a
322 7 Foundations of General Relativity
vibrating mechanical detector like a Weber bar, each molecule of the aluminum bar can be
considered as an observer, and the bar can be viewed as a reference frame in a sub-spacetime
of (R4 , gab ). Since there also exists non-gravitational interactions between the molecules,
the world lines of the molecules are not geodesics. However, in practice one can still use a
geodesic reference frame (which is the simplest choice). This is because the response to the
gravitational waves in the reference frame of the bar can be derived from the response in the
geodesic frame through Newtonian mechanics and solid state physics [see Weber (1961)].
The relative acceleration of two neighboring observers in a geodesic reference frame
under the action of the spacetime curvature is the tidal acceleration (see Sect. 7.6). Under
the action of the gravitational wave in (7.9.77), the magnitude and direction of the tidal
acceleration will change periodically, which leads to a relative oscillation between two
neighboring observers. Take a geodesic γ (τ ) as the fiducial observer, let us compute the
tidal 3-acceleration a c of the neighboring observers around this observer. Suppose p ∈ γ ,
Z a is the 4-velocity of γ at p (namely the unit tangent vector of γ ), and W p is the 3-
dimensional subspace in the tangent space V p of p which is orthogonal to Z a (in a picture
it would be a small plane orthogonal to Z a ), then a spatial separation vector wa represents a
neighboring observer (Sect. 7.6).24 The tidal acceleration a c of the observer corresponding to
wa relative to the fiducial observer γ (τ ) is given by the geodesic deviation equation (7.6.8):
a c = −Rabd c Z a wb Z d . (7.9.81)
∀wb ∈ Wb , the above equation determines an a c ∈ W p , and thus the above equation defines
a linear map ψ : W p → W p . From the “multifaceted view of tensors” (see Sect. 2.4) we can
see that ψ can be viewed as a tensor of type (1, 1) on W p , denoted by ψ c b , i.e.,
a c = ψ c b wb . (7.9.82)
Comparing with (7.9.81) yields
ψ c b = −Rabd c Z a Z d . (7.9.83)
In order to compute ψ c b , one can first choose a convenient orthonormal triad {(E i )a }:
(E 1 )a = (∂/∂ x)a + E −1 Z 1 K a ,
(E 2 )a = (∂/∂ y)a + E −1 Z 2 K a , (7.9.84)
−1
(E 3 ) = E
a
K −Z ,
a a
where E ≡ −gab Z a K b > 0, Z 1 ≡ gab Z a (∂/∂ x)b = Z b (∂/∂ x)b (and hence Z 1 is a coor-
dinate component of Z b instead of a frame component), and Z 2 ≡ gab Z a (∂/∂ y)b =
Z b (∂/∂ y)b . The reader should verify that: ① {(E i )a } is indeed orthogonal measured by
gab ; ② (E 3 )a is the result of normalizing h a b K b = K a + Z a Z b K b , namely the projection
of K a at p onto W p ; ③ {(E i )a } is parallelly transported (and thus is Fermi transported)
along a geodesic. [Hint for the proof: It follows from γ (τ ) being geodesic and ∇a K b = 0
that E is a constant along the curve, from which one can easily show that Z b ∇b (E 3 )a = 0.
Noticing that ∇b (∂/∂ x)a = −K a ω1 3 b , one can show that Z b ∇b (E 1 )a = 0]. Let S be
the wavefront that includes p ∈ γ (see the null hypersurface in Fig. 7.20), Sˆ repre-
sent the 3-dimensional subspace formed by all the elements in V p tangent to S , and
S p ≡ Sˆ ∩ W p = {wa ∈ W p |gab wa K b = 0}, then {(E 1 )a , (E 2 )a } is a basis of S p . Since
in a picture we always draw a subspace (e.g., W p ) as a small plane (draw a subspace of V p as
a subspace of M), there is no difference between Sˆ and S in Fig. 7.20. The physical meaning
24More precisely, wa only gives the direction of the “separation”, it is really wa s (where s is
small) that determines a neighboring observer in this direction, see Sect. 7.6.
7.9 Gravitational Radiation 323
of the mathematical settings above is very clear: in the view of a geodesic observer γ (τ ), the
gravitational wave passes by along the spatial direction (E 3 )a , and the 2-dimensional wave-
front S p is orthogonal to the direction of propagation (E 3 )a (see Fig. 7.20). The components
of ψ c b in the triad {(E i )a } are
A B C D E F G H
√ √ √
w1 1 1/ 2 0 −1/ 2 −1 −1/ 2 0 √ √
√ √ √ 1/ 21/ 2
w2 0 −1/ 2 −1 −1/ 2 0 1/ 2 1
√ √ √ √
a1 α 0 w1 α α/ 2 0 −α/ 2 −α −α/ 2 0 α/ 2
= √ √ √ √
a2 0 −α w2 0 α/ 2 α α/ 2 0 −α/ 2 −α −α/ 2
G H
F
A B C D E
a
(b) α = 0, β > 0
A B C D E F G H
√ √ √ √
w1 1 1/ 2 0 −1/ 2 −1 −1/ 2 0 1/ 2
√ √ √ √
w2 0 −1/ 2 −1 −1/ 2 0 1/ 2 1 1/ 2
√ √ √ √
a1 0 β w1 0 −β/ 2 −β −β/ 2 0 β/ 2 β β/ 2
= √ √ √ √
a2 β 0 w2 β β/ 2 0 −β/ 2 −β −β/ 2 0 β/ 2
E F
D
A B C G H
a
325
326 7 Foundations of General Relativity
gravitational wave can be detected by measuring the relative acceleration between a free
particle and another (fiducial) free particle, as we have discussed above. An electromagnetic
wave is the propagation of the oscillation of the electromagnetic field, when detecting one
just needs to measure the acceleration of a charged particle relative to an inertial frame, whose
expression, namely a = (q/m) E, is much simpler than the tidal acceleration. Suppose the
electromagnetic wave being detected is linearly polarized, then what corresponds to Fig. 7.23
is the much simpler Fig. 7.24. We have shown in Optional Reading 7.9.2 that the polarization
tensor of a gravitational wave will come back to itself after rotating by (an integer times) π
about the z-axis, while the polarization vector of an electromagnetic wave will come back
after rotating by (an integer times) 2π . This difference is also manifested by the polarization
patterns in Figs. 7.23 and 7.24: the pattern in any square in Fig. 7.23 (i.e., at any time) will
come back to itself after rotating by (an integer times) π about the direction of propagation
(the line perpendicular to the page that passes through the centre of symmetry), while the
pattern in any square in Fig. 7.24 will come back after rotating by (an integer times) 2π .
This difference between Figs. 7.23 and 7.24 reflects again the fact that photons are spin-1
while gravitons are spin-2, as mentioned in Optional Reading 7.9.2.
[The End of Optional Reading 7.9.5]
Exercises
˜7.1. Show that Maxwell’s equation in curved spacetime, ∇ a Fab = −4π Jb , contains
the law of conservation of charge, i.e., ∇a J a = 0. NB: ∇ a Fab = −4π Jb is
equivalent to (7.2.8) rather than (7.2.9), and hence this problem indicates that
(7.2.8) rather than (7.2.9) gives the charge conservation.
F ωa
˜7.2. Show that Ddτ = Dω
dτ
a
+ (Aa ∧ Z b )ωb , ∀ωa ∈ FG (0, 1).
7.9 Gravitational Radiation 327
(a) Show that G(τ ) is a timelike hyperbola (i.e., G in Fig. 6.43), τ is the proper
time, and A is the magnitude of the 4-acceleration Aa of G(τ ).
*(b) Show that any ray μ(s) starting from the origin o of the system {T, X, Y, Z }
that intersects G(τ ) is orthogonal to G(τ ).
*(c) Suppose the parameter s of μ(s) in (b) is the arc length of μ, as we collect
all of the rays μ(s) starting from o that intersect G(τ ), we obtain a spatial vec-
tor field wa ≡ (∂/∂s)a on G(τ ). Show that wa is Fermi transported along G(τ ).
*(d) Let Z a ≡ (∂/∂τ )a , and choose {Z a , wa , (∂/∂Y )a , (∂/∂ Z )a } as an orthonor-
mal tetrad field on G(τ ), find the proper coordinate system {t, x, y, z} of G(τ )
and specify its coordinate patch.
Answer: T = (A−1 + x) sinh At, X = (A−1 + x) cosh At, Y = y, Z = z.
(e) Write down the expression for the line element of the Minkowski metric
in the above proper coordinate system. Compute the Christoffel symbol of the
Minkowski metric in this system, and verify that it satisfies Lemma 7.4.3, i.e.,
(7.4.10).
7.6. Suppose G is a non-rotating, freely falling, instantaneous rest observer of a
point mass L at a point p ∈ L (i.e., the 4-velocity Z a of G and the 4-velocity
U a of L are tangent at p), Aa is the 4-acceleration of L at p, and a a is the
3-acceleration of L at p relative to G [defined by (7.4.3)]. Show that a a = Aa .
NB: This claim can be viewed as the generalization of Proposition 6.3.6 to
curved spacetime.
˜7.7. A metric gab is said to be Ricci flat if the Ricci tensor of gab vanishes. Show
that a necessary and sufficient condition for a 4-dimensional Lorentzian metric
gab being a solution to the vacuum Einstein equation is that gab is Ricci flat.
˜7.8. Suppose (M, gab ) is a Ricci flat spacetime (see the above problem for the def-
inition), and ξ a is one of the Killing vector fields of the spacetime. Show that
Fab := (dξ )ab satisfies the source-free (Ja = 0) Maxwell equation of (M, gab ).
Hint: use ∇a ξ a = 0 satisfied by any Killing vector field ξ a (the result of Exer-
cise 4.11).
7.9. Suppose γab satisfies (a) ∂ a γ̄ab = 0; (b) γ = 0; (c) γ0i = 0 (i = 1, 2, 3); (d)
γ00 = constant. Find an “infinitesimal” vector field ξ a such that γ̃ab ≡ γab +
∂a ξb + ∂b ξa satisfies the transverse-traceless gauge conditions:
(a) ∂ a γ̃¯ab = 0; (b) γ̃ = 0; (c) γ̃0i = 0 (i = 1, 2, 3); (d) γ̃00 = 0.
328 7 Foundations of General Relativity
7.10. Suppose gab = ηab + γab represents a gravitational wave with γab of the form
(7.9.30), {t, x i } is a Lorentzian coordinate system, ∇a is the derivative operator
associated with gab , and Z a ≡ (∂/∂t)a . Show that Z a ∇a Z b = 0, i.e., the t-
coordinate lines are geodesics. Hint: compute L Z gab by plugging in the ansatz
(7.9.30), contract it with Z a , then use (4.3.1 ) and the TT gauge condition.
NB: By a similar proof, this conclusion can also be applied to the elliptically
polarized waves of the form (7.9.63).
7.11. Prove Proposition 7.9.5.
7.12. Verify the properties ①–③ of {E i }a in (7.9.84).
7.13. Prove (7.9.86).
7.14. Prove (7.9.78), i.e., ∇ a ∇a P = (∂ 2 P/∂ x 2 ) + (∂ 2 P/∂ y 2 ).
References
Abbott, B. P. et al. (2016), ‘Observation of Gravitational Waves from a Binary Black Hole Merger’,
Phys. Rev. Lett. 116(6), 061102. arXiv:1602.03837.
Cai, R.-G., Cao, Z., Guo, Z.-K., Wang, S.-J. and Yang, T. (2017), ‘The Gravitational-Wave Physics’,
Natl. Sci. Rev. 4(5), 687–706. arXiv:1703.00187.
Carroll, S. M. (2019), Spacetime and Geometry, Cambridge University Press, Cambridge.
Chen, C.-M., Nester, J. M. and Ni, W.-T. (2017), ‘A brief history of gravitational wave research’,
Chin. J. Phys. 55, 142–169. arXiv:1610.08803.
Fock, V. A. (1939), ‘Sur le mouvement des masses finies d’Apres la theorie de gravitation Ein-
steinienne’, J. Phys. U.S.S.R. 1, 81–166.
Geroch, R. P. and Jang, P. S. (1975), ‘Motion of a body in general relativity’, J. Math. Phys.
16, 65–67.
Hawking, S. W. and Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
d’Inverno, R. A. (1992), Introducing Einstein’s Relativity, Clarendon Press, Oxford.
Liu, L. and Zhao, Z. (2004), General Relativity (in Chinese), Higher Education Press, Beijing.
Maggiore, M. (2018), Gravitational Waves: Volume 2: Astrophysics and Cosmology, Oxford Uni-
versity Press, Oxford.
Misner, C., Thorne, K. and Wheeler, J. (1973), Gravitation, W H Freeman and Company, San
Francisco.
Ohanian, H. C. and Ruffini, R. (1994), Gravitation and Spacetime, W W Norton and Company,
Inc., New York.
Sachs, R. K. and Wu, H. (1977), General Relativity for Mathematicians, Spinger-Verlag, New York.
Saulson, P. R. (2017), Fundamentals Of Interferometric Gravitational Wave Detectors, World Sci-
entific, Singapore.
Stephani, H., Kramer, D., MacCallum, M. A. H., Hoenselaers, C. and Herlt, E. (2003), Exact
Solutions of Einstein’s Field Equations, Cambridge University Press, Cambridge.
Straumann, N. (1984), General Relativity and Relativistic Astrophysics, Spinger-Verlag, Berlin.
Synge, J. L. (1960), Relativity: The General Theory, North-Holland Publishing Company, Amster-
dam.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Weber, J. (1961), General Relativity and Gravitational Waves, Wiley-Interscience, New York.
References 329
Will, C. M. (1995), Stable clocks and general relativity, in ‘30th Rencontres de Moriond: Euro-
conferences: Dark Matter in Cosmology, Clocks and Tests of Fundamental Laws’, pp. 417–428.
arXiv:gr-qc/9504017.
Will, C. M. (2014), ‘The confrontation between general relativity and experiment’, Living Reviews
in Relativity 17(1), 4. arXiv:1403.7377.
Will, C. M. (2018), Theory and Experiment in Gravitational Physics, Cambridge University Press,
Cambridge.
Chapter 8
Solving Einstein’s Equation
Suppose there exists a timelike Killing vector field ξ a in (M, gab ), whose integral
curves have the parameter t, i.e., ξ a = (∂/∂t)a . Choose any coordinate system {x μ }
where t is the zeroth coordinate (i.e., t = x 0 ) and the integral curve of ξ a is the
x 0 -coordinate line (namely the coordinate system adapted to ξ a , see Sect. 4.2). Let
gμν be the components of gab in this coordinate system, then
∂gμν
= (Lξ g)μν = 0 , (8.1.1)
∂t
1To be precise, within 34 days. Inspired by Einstein’s Mercury perihelion result of November 18,
1915, he looked for an exact solution. He communicated what he found in a letter to Einstein on
December 22, 1915. His solution was published in January 1916. Furthermore, this was in the
middle of World War I, and Schwarzschild was in the army on the Russian front!
© Science Press 2023 331
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_8
332 8 Solving Einstein’s Equation
where we used Theorem 4.2.2 in the first equality, and the second equality is due
to the fact that ξ a is a Killing vector field. Equation (8.1.1) indicates that all of the
components gμν are independent of the time coordinate t, i.e., gμν is “time-translation
invariant”. This is exactly where the term “stationary” comes from.
Inversely, if there exists a local coordinate system {x μ } in (M, gab ) such that
∂gμν
=0 (t ≡ x 0 is a timelike coordinate) , (8.1.2)
∂t
Example 2 in a way suggests that confusion may arise if one does not take the
geometric perspective. Stationarity is an intrinsic property of the spacetime geometry,
which does not depend on the choice of the coordinate system. Note that both of the
following statements are wrong:
(1) (WRONG!) If some coordinate components gμν of the metric depend on the
timelike coordinate t of this coordinate system, then the spacetime is not stationary.
(2) (WRONG!) The spacetime in Example 2 is a stationary spacetime in the
coordinate system {T, X }, but is not a stationary spacetime in the coordinate system
{t, x}.
and hence the expression for the line element of gab in this system is simplified as
1
= lim ( f |s − f |q ) = u( f ) ,
t→0 t
and hence φ∗ v a = u a , i.e., φ∗ [(∂/∂t)a | p ] = −(∂/∂t)a |q . Similarly, one can show that
φ∗ [(∂/∂ x i )a | p ] = (∂/∂ x i )a |q , i = 1, 2, 3 .
Let gμν and (φ ∗ g)μν represent the components of gab and (φ ∗ g)ab , respectively, in
the system {t, x i }, then
where the last step is because of 0 = (Lξ g)μν = ∂gμν /∂t, i.e., gμν are constants along
C(t). Similarly, we have (φ ∗ g)i j | p = gi j | p , but (φ ∗ g)0i | p = −g0i | p . Luckily, g0i =
0 (where the hypersurface orthogonality is used), and hence (φ ∗ g)μν | p = gμν | p .
Noticing that p is arbitrary, we know that (φ ∗ g)ab = gab , and so φ : M → M is an
isometry.
[Optional Reading 8.1.1]
Technically, the definition of a Killing vector field has a strong version and a weak version.
The weak definition only cares about the local properties: any vector field ξ a satisfying the
Killing equation ∇(a ξb) = 0 (equivalent to Lξ gab = 0) is called a Killing vector field. This
ξ a may be incomplete, i.e., the range of its parameter t is not the whole R but an interval of
R. The strong definition, however, requires that ξ a be complete. Accordingly, the definitions
of stationary and static spacetimes also have a weak version and a strong one, depending on
whether or not the timelike Killing vector field is complete. When we are only concerned
with local issues, it is not necessary to emphasize the difference between them; however,
when global issues are involved, some conclusions only hold if the spacetime satisfies the
strong condition. For instance, if a region W is removed from a strong static spacetime
(M, gab ), this spacetime will become a weak static spacetime. Suppose what is shown in
Fig. 8.2 is the 0 in Proposition 8.1.1, then t1 = { p ∈ M|t ( p) = t1 } is meaningless when t
is sufficiently large, since the Killing field ξ a is not well-defined at the zero of the parameter
t of each integral curve inside the “shadow region” (and hence t is not well-defined). Thus,
it is possible that Proposition 8.1.1 only holds locally for a static spacetime.
In a word, the key difference between the strong and weak definitions is whether ξ a is
complete or not. ξ a generates a one-parameter group of isometries when it is complete,
while it only generates a one-parameter local group of isometries when it is incomplete. For
convenience’s sake, we usually omit the word “local” in the text.
[The End of Optional Reading 8.1.1]
336 8 Solving Einstein’s Equation
ds 2 = r 2 (dθ 2 + sin2 θ dϕ 2 ) ,
where r is the radius of the sphere. Without loss of generality, here we only talk
about the unit sphere (r = 1), whose line element is
ds 2 = dθ 2 + sin2 θ dϕ 2 . (8.2.1)
and
ξ3a ≡ [ξ1 , ξ2 ]a = (∂/∂θ )a cos ϕ − (∂/∂ϕ)a cot θ sin ϕ (8.2.2c)
are also Killing fields, and ξ1a , ξ2a , ξ3a are linearly independent. From Sect. 4.3 we
have learned that the one-parameter group of diffeomorphisms corresponding to a
Killing vector field is a one-parameter group of isometries, and hence the collection
of all the isometries on (S 2 , h ab ) is a 3-parameter group, which is isomorphic to the
rotation group S O(3) of the 3-dimensional Euclidean space. Readers who are not
familiar with group theory do not have to worry too much about this, one just needs
to know that S O(3) is such a group, each element of which is a rotation that keeps
the origin in the 3-dimensional Euclidean space fixed (see Appendix G in Volume II
for details).
When talking about spacetime symmetries, one should pay attention to the rela-
tion and difference between isometries and diffeomorphisms. An isometry must be
a diffeomorphism, but not vice versa. Each smooth vector field corresponds to a
one-parameter group of diffeomorphisms (we will omit the term “local” from now
on), and so any manifold M has infinitely many one-parameter groups of diffeo-
morphisms. The collection of all the diffeomorphisms is a group of infinitely many
8.2 Spherically Symmetric Spacetimes 337
ds 2 = −dt 2 + dr 2 + dŝ 2 ,
where
dŝ 2 = r 2 (dθ 2 + sin2 θ dϕ 2 ) .
Thus, for the Minkowski metric, the K in (8.2.3) is the square of the radius of
the orbit 2-sphere S which we have been discussing. To figure out the meaning
of K in a non-flat spacetime, a geometric concept will be helpful to us, namely
the area of S . Suppose ε̂ is the area element on S associated with ĝab , then the
area of S will be A = S ε̂. Also, ε̂ can be expressed using the coordinate system
{θ, ϕ} on S as ε̂ = ĝdθ ∧ dϕ, in which ĝ is the determinant of ĝab in the system
{θ, ϕ}. After reading off ĝi j from (8.2.3) we can find ĝ = K 2 sin2 θ , and hence
ε̂ = K sin θ dθ ∧ dϕ. Therefore,
2π π
A=K dϕ sin θ dθ = 4π K .
0 0
Seemingly, this is the same as the expression (8.2.3) for the dŝ 2 of Minkowski space-
time, and the r in each equation is called the radius. However, the radius r in general
does not necessarily have the meaning of “the distance between the center and each
point on S ”. In fact, the following three cases are all possible: ① There does not
exist a point that can be regarded as the center of S at all. Let us look at a simplified
example: suppose S 1 is a circle in the manifold R × S 1 (a cylindrical surface), then
the center of S 1 will not be on the manifold R × S 1 (Fig. 8.5). Similarly, in R × S 2
there does not exists a point that can be regarded as the center of S 2 either. ② There
exists a point in the spacetime that can be regarded as the center of S , but due to
the curved metric, the distance between S and this point is not equal to the radius r
defined by (8.2.4). ③ There exists more than one center of S .
[Optional Reading 8.2.1]
Before we wrote down (8.2.3) we have assumed the following claim: suppose (S , ĝab )
has the maximal symmetry represented by ξ1a , ξ2a , and ξ3a , then the line element of ĝab can
always be expressed as (8.2.3). Now we briefly introduce how to prove this claim. Suppose
the components of ĝab in the coordinate system {θ, ϕ} are ĝ11 , ĝ22 and ĝ12 , then from
ξ1a = (∂/∂ϕ) we can see that ĝ11 , ĝ22 and ĝ12 are not functions of ϕ. Writing down the
equations of the coordinate components of Lξ2 ĝab = 0 satisfied by ξ2 , and taking ĝ11 (θ),
ĝ22 (θ) and ĝ12 (θ) as functions to be solved for, we obtain ĝ12 = 0, ĝ11 = K (constant) and
ĝ22 = K sin2 θ. It is not difficult to verify that Lξ3 ĝab = 0, which completes the proof.
[The End of Optional Reading 8.2.1]
340 8 Solving Einstein’s Equation
Proposition 8.3.1 Suppose a static spherically symmetric spacetime (M, gab ) has
only one2 hypersurface orthogonal timelike Killing vector field ξ a , and G 3 is the
subgroup of its isometry group that is isometric to S O(3), then all of the orbit
spheres of G 3 must be orthogonal to ξ a .
Proof φ ∈ G 3 can be viewed as an isometry from M to M. Since whether or not
a vector field is timelike, Killing and hypersurface orthogonal are all determined
by the metric, one can believe that φ∗ ξ a is also a hypersurface orthogonal timelike
Killing vector field (see Exercise 4.12). Now that we only have one such vector field,
we have φ∗ ξ a = ξ a . Assume that ξ a is not orthogonal to an orbit sphere S of G 3 ,
then there exists a projection ξ̂ a of ξ a which is tangent to S . One can always find a
rotation φ̂ : S → S on the sphere such that ξ̂ a will change under this rotation, i.e.,
φ̂∗ ξ̂ a = ξ̂ a . However, φ̂ : S → S can be regarded as the result of some φ ∈ G 3
(φ : M → M) restricted to S . That is, as long as ξ̂ a is nonvanishing, there exists a
φ ∈ G 3 such that φ∗ ξ̂ a = ξ̂ a , and thus φ∗ ξ a = ξ a , which contradicts φ∗ ξ a = ξ a .
Suppose is a hypersurface orthogonal to ξ a , then according to Proposition 8.3.1,
an orbit surface of G 3 passing through any point of lies on , as shown in Fig. 8.6.
Using this geometric property, we can further simplify the static line element (8.1.3).
To do this we only have to specify how to define the 3-dimensional local coordinate
system {x 1 , x 2 , x 3 } on the constant-t surface . x 1 can be defined using the radius
of the orbit sphere: the x 1 of each point is defined as the radius r of the orbit sphere
where the point stays. x 2 and x 3 can be defined using the “carry method”: suppose
S is an orbit sphere in , then it is a (2-dimensional) hypersurface in , on which
there exists a unit normal vector field n a tangent to . Since for any point on
there exists an orbit sphere lying on that passes through the point, n a is a vector
field defined on whose integral curves (one of them is shown as the dashed line in
Fig. 8.6) are everywhere orthogonal to the orbit spheres. By choosing any spherical
coordinates θ and ϕ on S , we can “carry” these two coordinates to the other orbit
spheres by means of the integral curves of n a (that is, setting the values of θ and ϕ at
2Of course, ξ a multiplied by an arbitrary constant is also a Killing vector field. Here by “one” we
mean “one linearly independent”.
8.3 The Vacuum Schwarzschild Solution 341
each point on each integral curve as the values of them at the intersection of S and
this curve), then we get a local coordinate system {r, θ, ϕ} on . In this coordinate
system, gi j dx i dx j in (8.1.3) takes the simplest form. From the above definition of θ
and ϕ we can see that the integral curves of the normal vector field coincide with the
r -coordinate lines (only with different parameters), and thus gab (∂/∂r )a (∂/∂θ )b =
0, gab (∂/∂r )a (∂/∂ϕ)b = 0. Hence, the coefficients of the terms dr dθ and dr dϕ in
gi j dx i dx j vanish. Also considering that the induced metric of gi j dx i dx j on each
orbit sphere is given by (8.2.5), we have
and therefore
ds 2 = g00 dt 2 + g11 dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.3.1)
According to (8.1.3), neither g00 nor g11 is a function of t. Considering the spherical
symmetry, we can believe that g00 and g11 are not functions of θ or ϕ either [motivated
readers may prove this using the property that θ and ϕ are constants on the integral
curves of (∂/∂r )a and (∂/∂t)a ]. Denote g00 and g11 as −e2 A(r ) and e2B(r ) , respectively,
then (8.3.1) becomes
This is a quite general line element expression for a spherically symmetric metric
that has a unique static Killing vector field in the above coordinate system {t, r, θ, ϕ}.
We emphasize that {t, r, θ, ϕ} is a local coordinate system in M, by which we mean
that its domain (coordinate patch) cannot be the whole manifold M. Surely, even the
coordinates θ and ϕ on each orbit sphere cannot be defined on the whole sphere (one
cannot use a coordinate system to cover the whole S 2 , see Sect. 2.1). Moreover, for
instance, a point where (dr )a = 0 is not in the coordinate patch of {t, r, θ, ϕ} (the
point at X = T = 0 in Fig. 9.13 is such a point).
The static spherically symmetric metric satisfying the vacuum Einstein equation is
called the vacuum Schwarzschild solution, or Schwarzschild solution for short,
which in physics describes the outer gravitational field of a spherically symmetric
star (e.g., the Sun). We have pointed out in Chap. 7 that the vacuum Einstein equation
is equivalent to (see Exercise 7.7)
Rab = 0 . (8.3.3)
Since the general form of a static spherically symmetric metric (line element) (8.3.2)
only contains two undetermined functions of one variable, namely A(r ) and B(r ),
342 8 Solving Einstein’s Equation
solving this equation now becomes simple: one can just express the Ricci tensor
Rab in terms of these two functions, set it to zero, and then solve for A(r ) and B(r )
from the resulting differential equations. In Sect. 5.7 we have introduced in detail the
method and outcomes of computing the Riemann tensor of the line element (8.3.2)
using the orthonormal tetrad, from which we can easily obtain the expression of Rab
in terms of A(r ) and B(r ). To help the readers to better understand the coordinate
basis method of computing the curvature, here we compute Rab again directly using
the coordinate basis. First we compute the Christoffel symbols of the line element
(8.3.1). It follows from (3.4.19) that the nonvanishing Christoffel symbols are
0
01 = 0
10 =A , 1
00 = A e2(A−B) , 1
11 =B ,
1
1
22 = −r e−2B , 1
33 = −r sin2 θ e−2B , 2
12 = 2
21 = , (8.3.4)
r
1
2
33 = − sin θ cos θ , 3
13 = 3
31 = , 3
23 = 3
32 = cot θ ,
r
where stands for the derivative with respect to r . Plugging (8.3.4) into (3.4.21) we
find that the nonvanishing Rμν are
Thus, Rab = 0 is equivalent to the following three differential equations for the
undetermined functions A(r ) and B(r ) [Equations (8.3.7) and (8.3.8) give the same
equation]:
−A + A B − A 2 − 2r −1 A = 0 , (8.3.9)
−1
−A + A B − A + 2r 2
B = 0, (8.3.10)
−2B
−e [1 + r (A − B )] + 1 = 0 . (8.3.11)
A = −B , (8.3.12)
and hence
A = −B + α , α = constant. (8.3.13)
Noticing (8.3.12), (8.3.11) can be rewritten as an equation with only one undeter-
mined function B(r ):
1 − 2r B = e2B , (8.3.14)
8.3 The Vacuum Schwarzschild Solution 343
where C is a constant of integration. By a direct check we can see that (8.3.13) and
(8.3.15) also satisfy (8.3.9) and (8.3.10), and hence they are the general solutions of
the unsolved equations (8.3.9)–(8.3.11). Plugging the A and B in these two results
into the line element (8.3.2) yields
C 2α 2 C −1 2
ds 2 = − 1 + e dt + 1 + dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.3.16)
r r
The fact that α is a constant assures that (∂/∂ tˆ)a is a Killing vector field just like
(∂/∂t)a . One may choose tˆ to be the Killing time coordinate in the first place when
the coordinate system {t, r, θ, ϕ} was defined, then the tˆ in (8.3.17) can be simply
written as t:
C C −1 2
ds = − 1 +
2
dt + 1 +
2
dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.3.17 )
r r
which shows one of the benefits of choosing tˆ as the time coordinate in the first place.
When r is sufficiently large, the linearized approximation of general relativity
(see Sect. 7.8.1) can be applied. Also, (1 + C/r )−1 ∼ = 1 − C/r , and hence (8.3.17 )
approximately gives
C 2
ds 2 = [−dt 2 + dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 )] − (dt + dr 2 ) .
r
The first term on the right-hand side of the above equation is a flat line element,
which can be rewritten as [−dt 2 + dx 2 + dy 2 + dz 2 ] by a coordinate transformation
x = r sin θ cos ϕ, y = r sin θ sin ϕ, z = r cos θ . Thus, the Schwarzschild metric can
be expressed as gab = ηab + γab when r is large, where the 00- (i.e., tt-) component
344 8 Solving Einstein’s Equation
of the small quantity γab is γ00 = −C/r . Comparing with (7.8.35) we get φ = C/2r ,
and from Newton’s theory of gravity we also know that φ = −M/r (where M is the
mass of the star). Therefore, C = −2M, and hence (8.3.17 ) can be expressed as
2M 2M −1 2
ds 2 = − 1 − dt 2 + 1 − dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.3.18)
r r
This is the most common expression of the vacuum Schwarzschild solution, in which
M is the mass of the star. For a precise understanding of the concept of the “mass of
a star”, see Optional Reading 9.3.1 and Chap. 12 in Volume II.
Now, let us discuss the Schwarzschild metric in a more “physical” manner, i.e.,
we will discuss the spatial geometry outside a static spherically symmetric star.
The cylindrical surface in Fig. 8.7 represents the world sheet of the surface of a
static spherically symmetric star, and the spacetime geometry outside this surface
is described by the Schwarzschild metric. There exists a static reference frame in
Schwarzschild spacetime, in which each constant-t surface t can be interpreted
as the space in this reference frame at t. The intersecting surface S of t and the
cylindrical surface represents the surface of the star at t (which is suppressed as a
1-dimensional circle in the figure). Suppose G 1 and G 2 are two static observers who
have the same values of θ and ϕ, and the intersections p1 and p2 of their world
lines and t represent the positions of these two observers at t. The spatial geometry
outside of S in is described by the induced metric h ab of the Schwarzschild metric;
the corresponding line element is
2M −1 2
dŝ 2 = 1 − dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.3.19)
r
Let us compute the spatial distance l between p1 and p2 . The distance between two
points in a Riemannian space (the metric is positive definite) is defined as the arc
length of the shortest curve among all the curves connecting these two points.3 It is
not difficult to show that the curve γ on t from p1 to p2 with θ and ϕ being constants
is the shortest curve between p1 and p2 , whose length (and thus the distance between
p1 and p2 ) is
3 Technically speaking, the distance between two points in a Riemannian space is defined as the
infimum of the set of the lengths of all the curves between these two points (as a subset of R).
8.3 The Vacuum Schwarzschild Solution 345
r2 −1/2
2M
l= (h i j dx dx )
i j 1/2
= 1− dr > r2 − r1 ,
r1 r
where r1 and r2 are the r -coordinates of G 1 and G 2 , respectively. The equation above
indicates that the spatial distance between G 1 and G 2 at any time t is a constant (which
is a property of static observers). l is also called the proper distance between G 1
and G 2 , which is not equal to their coordinate distance r2 − r1 . This is exactly a
reflection of (t , h ab ) being non-Euclidean.
In this chapter, the main point regarding the Schwarzschild metric is about find-
ing the solution from Einstein’s equation. We will have a detailed discussion on
Schwarzschild spacetime later in Chap. 9.
To facilitate future lookup, here we list the components of the Christoffel sym-
bol and the Riemann tensor (with lower indices) of the Schwarzschild metric in
the Schwarzschild coordinate system as follows (in which x 0 , x 1 , x 2 , x 3 stand for
t, r, θ, ϕ, respectively):
⎫
M M ⎪
0
01 =
0
10 = (1 − 2M/r )−1 , 1
00 = (1 − 2M/r ) , ⎪
⎪
⎪
r2 r2 ⎪
⎪
⎬
1 M −1 , 1 1 2
11 = − 2 (1 − 2M/r ) 22 = −r (1 − 2M/r ) , 33 = −r (1 − 2M/r ) sin θ , ⎪
r ⎪ ⎪
⎪
⎪
2 2 1 2 3 3 1 3 3 ⎪
⎭
12 = 21 = , 33 = − sin θ cos θ , 13 = 31 = , 23 = 32 = cot θ ,
r r
(8.3.20)
⎫
2M M M ⎪
⎪
R0101 = − 3 , R0202 = (1 − 2M/r ) , R0303 = (1 − 2M/r ) sin2 θ , ⎬
r r r
M M ⎪
R1212 = − (1 − 2M/r )−1 , R1313 = − (1 − 2M/r )−1 sin2 θ , R2323 = 2Mr sin2 θ .⎪
⎭
r r
(8.3.21)
Aa = ∇ a ln χ , (8.3.22)
where χ ≡ (−ξa ξ a )1/2 , and ξ a is the timelike Killing vector field of the stationary space-
time. Since the 4-acceleration is orthogonal to the 4-velocity, Aa is a spatial vector field
on the world line of the stationary observer. This is an intrinsic vector field of the station-
ary spacetime geometry itself. The gravitational field strength g in the Newtonian language
must correspond to a certain intrinsic geometric quantity in general relativity. −Aa is exactly
346 8 Solving Einstein’s Equation
such a quantity, and thus can be called the “gravitational field” (gravitational acceleration
field) in the stationary spacetime. Now we will show that this terminology indeed agrees
g | = 9.8 m · s−2 in your mind. Consider
with the value of the gravitational field strength |
approximately that there is a Schwarzschild metric outside the Earth, then
Schwarzschild showed that the static spherically symmetric solution to the vacuum
Einstein equation is the Schwarzschild solution, as we have introduced above. Later
it was found that the static condition can actually be removed, because in 1923
G. D. Birkhoff proved the following theorem: a spherically symmetric solution to
the vacuum Einstein equation must be static. Here we briefly sketch the idea of the
348 8 Solving Einstein’s Equation
proof. The general form of a static spherically symmetric line element is (8.3.2). If
one removes the static condition, the expression for the line element will not be as
simple, for example the coefficient of the cross term dtdr will be nonzero. However,
by an appropriate coordinate transformation, one can change the line element to the
same form as (8.3.2), and the only difference is that the functions of one variable
A(r ) and B(r ) now become functions of two variables A(t, r ) and B(t, r ). Let A ,
B , Ȧ and Ḃ represent ∂ A/∂r , ∂ B/∂r , ∂ A/∂t and ∂ B/∂t, respectively. Through a
procedure which is slightly more complicated than the computation in Sect. 8.3.2
[see Carmeli (1982); Stephani (1982)], we will still obtain the Schwarzschild line
element (8.3.18).
Birkhoff’s theorem is a powerful theorem, which asserts that as long as a non-static
matter distribution keeps being spherical symmetric (such as a star that is sharply
contracting, expanding, oscillating, or even exploding in the radial direction), the
external spacetime geometry will still be described by the vacuum Schwarzschild
solution. This provides great convenience for the study of stellar evolution (see Sects.
9.3 and 9.4).
Birkhoff’s theorem is very similar to the following theorem in electrodynamics:
the electromagnetic field of a spherically symmetric charge distribution (i.e., a spher-
ically symmetric solution to the vacuum Maxwell equations) must be an electrostatic
field. An electromagnetic wave is the propagation of a time-dependent electromag-
netic field in space, and “a spherically symmetric electromagnetic field must be an
electrostatic field” indicates that there does not exist any spherically symmetric elec-
tromagnetic wave. (A spherical electromagnetic wave is an electromagnetic wave
whose wavefront is a sphere; its electromagnetic field does not have spherical sym-
metry, and thus it is not a spherically symmetric electromagnetic wave). Similarly,
since a gravitational wave will not appear in a stationary gravitational field (station-
ary means time-independent), Birkhoff’s theorem indicates that there does not exist
any spherically symmetric gravitational wave. Noticing that spherically symmetric
radiation is monopole radiation, an equivalent statement of the conclusion above is:
there does not exist monopole electromagnetic or gravitational radiation. The major
contribution of electromagnetic radiation comes from dipole radiation. In contrast,
from Sect. 7.9 we can see that for gravity there exists neither monopole radiation
8.4 The Reissner-Nordström Solution 349
nor dipole radiation. The major contribution of gravitational radiation comes from
quadruple radiation. Table 8.1 provides a comparison between these two kinds of
radiation.
Later, it was found that the original formulation by Birkhoff was not precise
enough. The revised Birkhoff’s theorem can be formulated as follows: a spherically
symmetric solution to the vacuum Einstein equation must be the Schwarzschild
metric. The difference between this revised version and the original version is that
the extended Schwarzschild metric will be non-stationary in some spacetime region,
see Sect. 9.4.3 for details. The original Birkhoff’s theorem was first challenged by
A. Z. Petrov in 1963 [see Stephani et al. (2003) p. 232 and the references therein].
For a proof of the revised Birkhoff’s theorem, see Appendix B of Hawking and Ellis
(1973). Kuang and Liang (1988) further generalized this theorem by weakening the
spherical symmetry condition to “conformally spherical symmetry”. The definition
of the term “conformal” will be introduced in Sect. 12.1 (Volume II).
The Schwarzschild metric describes the curved spacetime (vacuum) outside a static
spherically symmetric star. Many actual stars (or celestial bodies) carry electric
charges, and their exterior spacetime is not vacuum but filled with an electromag-
netic field. A spacetime with only an electromagnetic field but without a matter
field is called an electrovacuum (or electrovac for short) spacetime. The Tab in the
electrovacuum Einstein equation G ab = 8π Tab is the energy-momentum tensor for
some electromagnetic field Fab (we will only talk about source-free electromagnetic
fields), i.e.,
1 1
Tab = (Fac Fb c − gab Fcd F cd ) . (8.4.1)
4π 4
Hence, the electrovacuum Einstein equation can also be expressed as
1 1
G ab ≡ Rab − Rgab = 2(Fac Fb c − gab Fcd F cd ) , (8.4.2)
2 4
Table 8.1 Comparative table for gravitational radiation and electromagnetic radiation
Monopole radiation Dipole radiation Quadrupole radiation
Electromagnetic Nonexistant Exists (major) Exists
radiation
Gravitational radiation Nonexistant Nonexistant Exists (major)
350 8 Solving Einstein’s Equation
∇ a Fab = 0 , (8.4.3a)
∇[a Fbc] = 0 . (8.4.3b)
Here ∇a is the derivative operator associated with the metric gab , and gab must satisfy
(8.4.2). Thus, an electrovacuum spacetime is determined by three ingredients: a
background manifold M, a metric field gab and an electromagnetic field Fab , among
which gab and Fab are the solutions of the simultaneous equations formed by (8.4.2)
and (8.4.3). This system of equations is called the Einstein-Maxwell equations. It
is easy to show from (8.4.1) that (Exercise 8.4) the trace of the energy-momentum
tensor Tab of the electromagnetic field is T ≡ g ab Tab = 0, and hence from Einstein’s
equation Rab − 21 Rgab = 8π Tab one can easily see that (Exercise 8.4) the scalar
curvature R = 0. Therefore, the electrovacuum Einstein equation can be simplified
as
Rab = 8π Tab . (8.4.4)
where ∗ Fab is the Hodge dual of Fab . Fab is called a null electromagnetic field if
ab ab = 0 , (8.4.6)
Fab F ab = 0 , (8.4.8a)
and
Fab ∗ F ab = 0 . (8.4.8b)
Thus, although both E and B depend on the observer, B 2 − E 2 and E · B are two
invariants (i.e., scalar fields). (In fact, these are the only two independent invariants
that one can construct out of E and B). The two equations above indicate that (8.4.8)
is equivalent to
B2 = E 2 , (8.4.11a)
E · B = 0 . (8.4.11b)
These two equations indicate that the E and B measured by an instantaneous observer
are orthogonal and have the same magnitude, which are exactly the two basic prop-
erties of an electromagnetic plane wave in Minkowski spacetime. It can be proved
that (see Appendix D in Volume II), suppose in an arbitrary spacetime there exists
a null electromagnetic field Fab whose energy-momentum tensor is Tab , then the
4-momentum density W a ≡ −T a b Z b of Fab (see Sect. 6.4) measured by an instan-
taneous observer ( p, Z a ) is a future-directed null vector.
[The A(r ) and B(r ) in (8.3.2) may be confused with the 4-potential A and the
magnetic field B, and hence we now denote them by α(r ) and β(r )]. This coordinate
system can not only simplify the line element, but also simplify the components of
the electromagnetic field. The electromagnetic field Fab produced by a charged static
spherically symmetric star is also static and spherically symmetric. The components
Aμ of its electromagnetic 4-potential Aa are independent of the coordinates t, θ, ϕ,
and there is no component tangent to the orbit sphere, i.e., A2 = A3 = 0. Note that Aa
has a gauge freedom: suppose χ is an arbitrary function of r , then Ãa = Aa + ∇a χ
and Aa correspond to the same Fab . From this equation we get
Thus, for any given Aa one can always choose a suitable χ (r ) such that Ã1 = 0, and
hence A0 can be regarded as the only component of Aa . Also from
dA0
− F01 = F10 = ∂1 A0 = , (8.4.13)
dr
i.e., Fab only has one independent component F01 , whose expression can be obtained
by solving Maxwell’s equations (8.4.3). Equation (8.4.3b) is automatically satisfied
since it follows from F = d A that dF = d(d A) = 0. The coordinate component
form of (8.4.3a) reads
F μν ;μ = 0 , ν = 0, 1, 2, 3 . (8.4.14)
1 ∂ √ 1 ∂ √
F μν ;μ = √ μ
−g F μν + ν
σμF
μσ
=√ μ
−g F μν , (8.4.15)
−g ∂ x −g ∂ x
√
and it follows from (8.4.13) and (8.4.12) that the only nonvanishing −g F μν are
√ √
−g F 01 = − −g F 10 = r 2 F10 e−(α+β) sin θ . Hence, when ν = 1, 2, 3, (8.4.14) are
identities, and when ν = 0 it gives
d 2
[r F10 (r )e−α(r )−β(r ) ] = 0 ,
dr
whose general solution is
Q α+β
F10 = e , where Q = constant. (8.4.16)
r2
So far, an electromagnetic field Fab satisfying Maxwell’s equations has the following
expression:
Q
Fab = − 2 eα+β (dt)a ∧ (dr )b . (8.4.17)
r
The equation above still contains undetermined functions α(r ) and β(r ), which
should be obtained from Einstein’s equation (8.4.4). From the very beginning, we
have two sets of undetermined functions, namely {Fμν (r )} and {α(r ), β(r )}. Do not
naively think that the former only appears in Maxwell’s equations and the latter only
appears in Einstein’s equation, so that they can be solved independently. In truth, both
of them appear in both sets of equations, and thus the Einstein-Maxwell equations
are coupled equations, which means they are interdependent on each other. Now we
will solve Einstein’s equation Rab = 8π Tab . In order to do so, first we compute the
energy-momentum tensor Tab of Fab . It follows from (8.4.1) and (8.4.12) that the
nonvanishing coordinate components of Tab are
2 −2β 2 −2α
T00 = F10 e /8π , T11 = −F10 e /8π ,
2 −2(α+β) 2 −2(α+β)
(8.4.18)
T22 = r 2 F10 e /8π , T33 = r 2 F10 e sin2 θ/8π .
8.4 The Reissner-Nordström Solution 353
On the other hand, the expressions for the nonvanishing coordinate components Rμν
of the Ricci tensor Rab are given by (8.3.5)–(8.3.8), and hence the component equa-
tions for Einstein’s equation (8.4.4), R00 = 8π T00 and R11 = 8π T11 , are equivalent
to
We can easily get from the two equations above that α = −β , which is the same as
(8.3.12) in the process of finding the Schwarzschild solution; hence, here we can also
set α = −β by redefining t. Under this premise, we can see from (8.4.16) that the
remaining two component equations R22 = 8π T22 and R33 = 8π T33 are equivalent
to
Q2
(r e2α ) = 1 − 2 .
r
Hence,
Q2 C
e2α = 1 + + , (8.4.21)
r2 r
and thus
Q2 C −1
e2β = 1 + 2 + . (8.4.22)
r r
Q
F10 = . (8.4.24)
r2
One can now check that these expressions for α, β and F10 do satisfy (8.4.19) and
(8.4.20). When r is sufficiently large, Q 2 /r 2 C/r , and hence (8.4.23) becomes
approximately
C C −1 2
ds 2 ∼
=− 1+ dt 2 + 1 + dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.4.25)
r r
From the physical perspective, when r is sufficiently large, the gravitational field
of a charged spherically symmetric star should approximately obey Newton’s the-
ory of gravity, and the spacetime metric should be approximately the same as the
Schwarzschild metric, and thus C = −2M. On the other hand, the star can be viewed
354 8 Solving Einstein’s Equation
as a point charge when r is sufficiently large, and the F10 it produces should be equal
to its electric charge divided by r 2 , and hence from (8.4.24) we can see that the phys-
ical meaning of the constant Q is the electric charge of the star. Therefore, ultimately
(8.4.23) can be written as
2M Q2 2M Q 2 −1 2
ds = − 1 −
2
+ 2 dt + 1 −
2
+ 2 dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) ,
r r r r
(8.4.26)
which is called the Reissner-Nordström line element (or RN line element for
short). It describes the exterior spacetime geometry of a static spherically symmetric
star (object) with a mass M and electric charge Q, whose corresponding electromag-
netic field Fab and 4-potential Aa are
Q Q
Fab = − (dt)a ∧ (dr )a , Aa = − (dt)a . (8.4.27)
r2 r
The metric gab expressed by (8.4.26) together with the electromagnetic field expressed
by (8.4.27) form the RN solution of the Einstein-Maxwell equations.
Now let us have some discussion on the electromagnetic field of the RN solution.
From (8.4.27) we can easily obtain Fab F ab = −2Q 2 /r 4 = 0, and thus the Fab of RN
spacetime is a nonnull electromagnetic field. People always say that the electromag-
netic field of RN spacetime is an electrostatic field. To understand this statement, one
should notice that an observer needs to be specified when talking about an electric
field and magnetic field. Now we will show that the electric field and magnetic field
for the Fab of an RN solution measured by a static observer G are, respectively, an
electrostatic field and zero. The 4-velocity of G is
Normalizing the dual coordinate basis vectors (dr )a , (dθ )a , (dϕ)a , we have the
orthonormal spatial triad of G:
It is easy to show that (Exercise 8.5) the electric field E a ≡ Fab Z b and magnetic
field Ba ≡ −∗ Fab Z b measured by G are E a = rQ2 (e1 )a and Ba = 0, or
Q
Ea = (e1 )a , Ba = 0 [where (e1 )a ≡ f 1/2 (∂/∂r )a ] . (8.4.28)
r2
Thus, the result of Fab measured by a static observer in RN spacetime is an elec-
trostatic field generated by a point charge Q and with no magnetic field, which also
confirms the fact that Fab is nonnull.4
4 In Volume II we will introduce the electromagnetic duality transformation, which only changes
the formulation but does not change the essence of the physics. For instance, one can either say that
8.5 Axisymmetric Metrics [Optional Reading] 355
If we do not assume that the metric is static, i.e., we change the α(r ) and β(r )
in (8.4.12) to α(t, r ) and β(t, r ), then we will arrive at exactly the same result as
we obtained above. [For details of the derivation, see Carmeli (1982)]. This can be
regarded as a generalization of Birkhoff’s theorem: the electrovacuum spherically
symmetric solution to Einstein’s equation must be the RN solution.
Many celestial bodies also have rotation. Due to the rotation, the symmetry of a spherically
symmetric star will be degraded to axial symmetry. Moreover, an axisymmetric matter dis-
tribution will have axial symmetry whether or not it has any rotation with respect to the
axis. Mathematically speaking, a metric gab is said to be axisymmetric if there exists a one-
parameter group of isometries whose orbits (except for the fixed points) are closed spacelike
curves. Thus, in an axisymmetric spacetime there exists a spatial Killing vector field ψ a
whose integral curves are closed curves. An axisymmetric metric gab is said to be stationary
axisymmetric if it has a timelike Killing field ξ a , and ξ a commutes with the Killing field ψ a
which represents the axial symmetry:
∂gμν ∂gμν
= (Lξ g)μν = 0 , = (Lψ g)μν = 0 , (8.5.2)
∂t ∂ϕ
and hence gμν can only be functions of x 2 and x 3 . In order to further simplify the solving
process, here we only discuss the stationary axisymmetric metrics satisfying the following
condition: ∀ p ∈ M, ∃ a 2-dimensional surface S passing through p and orthogonal to both
ξ a | p and ψ a | p . That is, for any vector u a at p that is tangent to S we have gab u a ξ b | p =
gab u a ψ b | p = 0. (Note that since a 2-dimensional surface in a 4-dimensional spacetime is
not a hypersurface, it has more than one linearly independent normal vector, see Fig. 8.10).
Many important stationary axisymmetric metrics satisfy this condition. Choose an arbitrary
coordinate system {x 2 , x 3 } on an orthogonal surface S0 , carry x 2 and x 3 to any point outside
S0 using the integral curves of ξ a and ψ a (i.e., set the x 2 and x 3 on each integral curve
as constants), and set the zeros of the Killing parameters t and ϕ such that t and ϕ are
constants on each orthogonal surface S (from a proposition similar to Proposition 8.1.1
we can see that this is always possible). In this way we obtain a local coordinate system
{x 0 ≡ t, x 1 ≡ ϕ, x 2 , x 3 }, where the coordinate lines of x 0 and x 1 are the integral curves of
ξ a and ψ a , respectively, while the coordinate lines of x 2 and x 3 lie on the orthogonal surface
S. Thus, the components gμν of gab in this system satisfy
a charged static star carries electric charge but no magnetic charge, or say that it carries magnetic
charge but no electric charge, or even say that it has both electric and magnetic charges (and the
amount is flexible, as long as the sum of the squares of them is invariant). When we discuss the RN
solution in this section we adopt the most common formulation, i.e., the star carries only electric
charge but no magnetic charge, and the corresponding electromagnetic field has only an electrostatic
field but no magnetic field.
356 8 Solving Einstein’s Equation
Fig. 8.10 S is a
2-dimensional surface
orthogonal to both ξ a and
ψ a (with one dimension
suppressed in the figure)
Let V ≡ −g00 = −gab ξ a ξ b , W ≡ g01 = gab ξ a ψ b , X ≡ g11 = gab ψ a ψ b , then the line ele-
ment can be expressed as
From (8.5.2) we know that V, X, W, g22 , g33 , g23 can only be functions of x 2 and x 3 , and thus
solving Einstein’s equation can be boiled down to the problem of finding these 6 functions
of two variables. However, the problem can be further simplified. Define a function ρ using
the following equation:
ρ 2 := V X + W 2 . (8.5.4)
V, X, W are not functions of t and ϕ, which leads to ξ a ∇a ρ = ∂ρ/∂t = 0 and ψ a ∇a ρ =
∂ρ/∂ϕ = 0, i.e., ∇ a ρ is orthogonal to ξ a and ψ a , and thus is tangent to each S. We will do
two things on the surface S0 : ① choose ρ as the second coordinate x 2 , ② take any constant-ρ
line and define arbitrarily a 1-dimensional coordinate z on the line, and then carry z to the
other points on S0 using the integral curves of ∇ a ρ. The coordinate basis vector (∂/∂ρ)a of
the 2-dimensional coordinate system {x 2 ≡ ρ, x 3 ≡ z}5 obtained in this way is orthogonal
to (∂/∂z)a , and hence g23 | S0 = 0. Carry the x 2 and x 3 outside S0 using the integral curves
of ξ a and ψ a as we mentioned above, then we get a coordinate system {x μ }, in which
x 0 ≡ t, x 1 ≡ ϕ, x 2 ≡ ρ, x 3 ≡ z. Two points needs to be elucidated: ① ρ is defined by
(8.5.4), while x 2 ≡ ρ is only defined on S0 , and then we carry it outside the surface. Why
do we also have x 2 ≡ ρ outside the surface? This is the outcome of ξ a ∇a ρ = 0, ξ a ∇a x 2 =
0 (requirements of the carry method) [and the corresponding ψ a ∇a ρ = 0, ψ a ∇a x 2 = 0]
together with (x 2 − ρ)| S0 = 0. ② From g23 | S0 = 0, ξ c ∇c g23 = 0 and ψ c ∇c g23 = 0 one can
easily see that g23 = 0 holds on the whole coordinate patch. The proof of the latter two
equations are as follows (here we only take ξ c ∇c g23 = 0 as an example):
5This definition will be invalid when ∇a ρ = 0, and hence the coordinate patch does not contain
points with ∇a ρ = 0.
8.6 Plane Symmetric Metrics [Optional Reading] 357
1
ds 2 = −V (dt − wdϕ)2 + V −1 [ρ 2 dϕ 2 + e2γ (dρ 2 + dz 2 )] , γ ≡ ln(V 2 ) .
2
(8.5.6)
The equation above indicates that the undetermined functions of two variables are now
reduced from 4 to 3, namely V (ρ, z), w(ρ, z) and γ (ρ, z). In the special case of V = 1, w =
γ = 0, the equation above will turn into the line element expression of the Minkowski metric
in the cylindrical coordinate system
ds 2 = −dt 2 + ρ 2 dϕ 2 + dρ 2 + dz 2 .
Readers interested in the derivation of (8.5.6) may refer to Chap. 20 in Stephani et al. (2003),
while those who only want to see the conclusion and a sketch of the derivation may refer to
Wald (1984) pp. 166–168.
An important example of a stationary axisymmetric solution to the vacuum Einstein solution
is the Kerr solution, which describes the exterior spacetime geometry of a particular kind of
uncharged rotating star,6 see Chap. 13 for details.
If an axisymmetric metric also has translational invariance along the axis of symmetry, then
it is called a cylindrically symmetric metric. Precisely speaking, besides the Killing vector
field reflecting the axial symmetry, for a cylindrically symmetric metric there also exists a
Killing vector field ηa reflecting the “translational invariance along the axis”, which satisfies
① [η, ψ]a = 0; ② the integral curves of ηa are homeomorphic to R.
Readers interested in cylindrically symmetric metrics may refer to Chap. 22 in Stephani et al.
(2003).
Before the definition of a spherically symmetric metric was given in Sect. 8.2, we have
discussed the symmetry of a 2-dimensional surface (S 2 , h ab ) in 3-dimensional Euclidean
space. In a similar sense, we shall go over the symmetry of a 2-dimensional Euclidean plane
(R2 , δab ) before introducing the definition of a plane symmetric metric. In a simple manner,
we have found all 3 independent Killing vector fields of (R2 , δab ) in Example (1) of Sect.
4.3, i.e., ξ1a ≡ (∂/∂ x)a and ξ2a ≡ (∂/∂ y)a reflecting the translational invariance and ξ3a ≡
−y(∂/∂ x)a + x(∂/∂ y)a reflecting the rotational invariance. From the linear combinations
of ξ1a , ξ2a , ξ3a one can have infinitely many Killing vector fields (note that the coefficients
should be constants instead of functions on R2 ), and the corresponding isometries form a
3-parameter group of isometries, called the Euclidean group, denoted by E(2) (see Sect.
G.5.5 in Volume II for details). Following the definition of a spherically symmetric metric
(see Definition 1 of Sect. 8.2), we have the following definition of a plane symmetric metric:
Definition 1 A spacetime metric gab is said to be plane symmetric if its group of isometries
has a subgroup G 3 that is isomorphic to E(2), and all the orbits of G 3 are 2-dimensional
planes.
H. Taub proved the following theorem [Taub (1951)]: a plane symmetric solution to the
vacuum Einstein equation must be a static metric, whose line element expression is
1
ds 2 = √ (−dT 2 + dZ 2 ) + (1 + k Z )(dX 2 + dY 2 ) , (8.6.1)
1 + kZ
6 Not the exterior spacetimes of all uncharged rotating stars can be described by the Kerr solution,
see Hawking and Ellis (1973) p. 161 for this caveat.
358 8 Solving Einstein’s Equation
where k is a constant. The coefficient of (−dT 2 + dZ 2 ) being positive indicates that T and
Z are respectively timelike and spacelike coordinates. The components of the metric not
containing T means that (∂/∂ T )a is a timelike Killing field, and thus the metric is static. At
the beginning, Taub’s paper only required the metric to have the plane symmetry, i.e., it only
required three Killing vector fields (∂/∂ X )a , (∂/∂Y )a and −Y (∂/∂ X )a + X (∂/∂Y )a , based
on which he showed that it must contain the fourth (extra) Killing vector field (∂/∂ T )a .
This is very much like Birkhoff’s theorem. Moreover, Taub’s original theorem has the same
shortcoming as the Birkhoff’s theorem: it omitted another possibility when deriving (8.6.1)
which is on an equal footing with it. In fact, it can be proved from the vacuum condition
and the plane symmetry that the metric will have either the form of (8.6.1) or the following
form:
1
ds 2 = − √ (−dT 2 + dZ 2 ) + (1 + k Z )(dX 2 + dY 2 ) . (8.6.2)
1 + kZ
The coefficient of (−dT 2 + dZ 2 ) in the above equation is negative, which means Z is a
timelike coordinate and T is a spacelike coordinate. The metric components not depending on
T indicates that (∂/∂ T )a is a spacelike Killing field; together with the other two spatial Killing
fields (∂/∂ X )a and (∂/∂Y )a , this indicates that the spacetime is spatially homogeneous,
since it has the translational invariance in the three spatial directions (represented by the T -,
X - and Y -axes). This metric does not have a timelike Killing vector field, and hence is not
static. Thus, Taub’s theorem should be revised as follows: a plane symmetric solution to the
vacuum Einstein equation is either static or spatially homogeneous.
Another drawback of Taub’s original paper is that (8.6.1) contains an arbitrary constant
k, which may mislead people to think that (8.6.1), just like the Schwarzschild metric, is
a one-parameter family of metrics. (Indeed, the parameter M of the Schwarzschild metric
indicates that it is a one-parameter family). In the case k = 0, we introduce new coordinates
t = k −1/3 T , z = k −4/3 (1 + k Z ), x = k 2/3 X and y = k 2/3 Y , then (8.6.1) and (8.6.2) will
turn into
C2 4π 2
F12 = C1 , F30 = Y Y −2 , A≡ (C1 + C22 ) , C1 , C2 are constants.
2 C
(8.6.5)
When Fab = 0, the metric (8.6.3) will be simplified to (8.6.1 ) [for (∇a Y )∇ a Y < 0] or
(8.6.2 ) [for (∇a Y )∇ a Y > 0].
The expression (8.6.3) represents the plane symmetric metric produced by a plane symmetric
electromagnetic field Fab . The so-called plane symmetric electromagnetic field refers to
ξ1a ≡ (∂/∂ x)a , ξ2a ≡ (∂/∂ y)a , ξ3a ≡ −y(∂/∂ x)a + x(∂/∂ y)a . (8.6.7)
It is not difficult to verify that (Exercise 8.9) the Fab in (8.2.5) satisfies (8.6.6). However,
a plane symmetric metric can also be produced by a non-plane symmetric electromagnetic
field. An electromagnetic field with only translational symmetries but no rotational symmetry
[i.e., (8.6.6) only holds for i = 1, 2] is called a semi-plane symmetric electromagnetic
field (“2/3-plane symmetric” may be more appropriate). Some special solutions of a plane
symmetric metric produced by this kind of electromagnetic field are scattered in the literature.
Li and Liang (1985) found the general solutions of plane symmetric metrics produced by
semi-plane symmetric electromagnetic fields, and classified them into two types:
J (T + Z )
Type A ds 2 = ± √ (−dT 2 + dZ 2 ) + T (dX 2 + dY 2 ) , (8.6.8a)
T
J (T + Z )
Type B ds 2 = ± √ (−dT 2 + dZ 2 ) + (T + Z )(dX 2 + dY 2 ) , (8.6.8b)
T+Z
where J (T + Z ) is an arbitrary function satisfying J˙/J > 0 ( J˙ ≡ ∂ J/∂ T ).7 The electro-
magnetic field corresponding to (8.6.8a) and (8.6.8b) is a semi- (2/3-) plane symmetric
source-free null electromagnetic field. The general solutions (8.6.3) and (8.6.8) correspond
to a nonnull, plane symmetric and a null, semi-plane symmetric source-free electromagnetic
field, respectively. It is natural to ask: is there any plane symmetric metric produced by an
electromagnetic field (no matter what symmetry it has) other than (8.6.3) and (8.6.8)? Kuang
et al. (1987) proved that: ① The plane symmetric metrics produced by electromagnetic fields
only have three types, namely (8.6.3), (8.6.8a) and (8.6.8b) (and the line elements obtained
from them by coordinate transformations); ② The plane symmetric metric (8.6.3) cannot be
produced by an electromagnetic field with source; ③ Plane symmetric metrics (8.6.8a) and
(8.6.8b) can also be produced by electromagnetic fields with source, i.e., every metric of
type A or B can be interpreted as either being produced by a source-free electromagnetic
field or an electromagnetic field with source. [These two interpretations correspond to the
same energy-momentum tensor Tab , called a dual interpretation.8 ] Both of them are null
electromagnetic fields; the former is semi- (2/3-) plane symmetric (has only translational
7 The line elements given by two different functions J (T + Z ) based on either (8.6.8a) or (8.6.8b)
could differ only by a coordinate transformation (i.e., one can be obtained from the other via a
coordinate transformation). Such two line elements represent the same geometry, and thus such two
functions J (T + Z ) are said to be equivalent. To figure out all the different geometries described by
(8.6.8a) and (8.6.8b), one needs to find the criterion for determining whether two arbitrary functions
J (T + Z ) are equivalent. This necessary and sufficient criterion was found in Kuang et al. (1986).
8 The energy-momentum tensor T of the source of the electromagnetic field (dust) should also
ab
appear on the right-hand side of Einstein’s equation just like the energy-momentum tensor Tab of
the electromagnetic field, which makes the question very complicated. One simplified discussion
is to stipulate that Tab = 0, see Tariq and Tupper (1976) for its physical meaning.
360 8 Solving Einstein’s Equation
symmetries but no rotational symmetry), while the latter one, on the contrary, has only rota-
tional symmetry but no translational symmetry (i.e., Lξ3 Fab = 0, Lξ1 Fab = 0, Lξ2 Fab = 0),
which may also be called a semi-plane symmetric electromagnetic field (of another kind), or
more precisely a 1/3-plane symmetric electromagnetic field. With this, the plane symmetric
metrics produced by electromagnetic fields are finally exhausted.
The fact that a plane symmetric metric can be produced by a semi-plane symmetric elec-
tromagnetic field indicates that the symmetry of the electromagnetic field can be weaker
than the symmetry of the metric. It is natural to ask: can the symmetry of the metric be
weaker than the symmetry of the electromagnetic field? For example, does there exist a
semi-plane symmetric metric produced by a plane symmetric electromagnetic field? The
answer is affirmative: Li and Liang (1989) provided a specific example (a special solution).
We mention in passing that the three Killing fields reflecting the spherical symmetry are on an
equal footing, and there does not exist any spherically symmetric metric produced by a semi-
(2/3- or 1/3-) spherically symmetric electromagnetic field. A spherically symmetric metric
produced by an electromagnetic field can only be the RN metric, whose electromagnetic
field can only be a spherically symmetric, source-free nonnull electromagnetic field.
Besides the coordinate basis method and the orthonormal tetrad method, there is also a third
commonly used method of computing curvature, that is the “null tetrad method” proposed
by Newman and Penrose (1962). This method can be viewed as a variant of the rigid tetrad
method: instead of using an orthonormal tetrad, here one uses a complex9 “null tetrad”.
Suppose p is a point of a 4-dimensional spacetime (M, gab ), and {(eμ )a } is an orthonormal
tetrad at p. Define 4 special vectors at p as follows:
1 1
m a := √ [(e1 )a − i(e2 )a ] , m̄ a := √ [(e1 )a + i(e2 )a ] ,
2 2
(8.7.1)
1 1
l a := √ [(e0 )a − (e3 )a ] , k a := √ [(e0 )a + (e3 )a ] ,
2 2
then gab m a m b = gab m̄ a m̄ b = gab l a l b = gab k a k b = 0, i.e., all 4 of them are null vectors.
Note that m a and m̄ a are both complex vectors conjugate to each other. To distinguish from
other tetrads, this text will use {(εμ )a } to represent a null tetrad, and stipulate the numbering
as [in agreement with Stephani et al. (2003)]
(εμ )a can be regarded as a special case of an arbitrary basis field (eμ )a , which we mentioned
at the beginning of Sect. 5.7; however, one should not confuse this with the (eμ )a in (8.7.1),
which only refers to an orthonormal tetrad. It is not difficult to see that the inner product of
any two basis vectors in a null tetrad has only the following two pairs of nonzero ones:
and thus the matrices constituted by the components gμν and g μν of the metric gab and its
inverse g ab are ⎡ ⎤
01 0 0
⎢1 0 0 0 ⎥
(gμν ) = ⎢ ⎥ μν
⎣ 0 0 0 −1 ⎦ = (g ) . (8.7.3)
0 0 −1 0
Just like in §5.7, the number indices μ of (εμ )a and (εμ )a can also be raised and lowered
using g μν and gμν . Applying (5.7.5) to a null tetrad yields
Equation (8.7.3) indicates that (εμ )a is a (complex) rigid tetrad, and hence we have ωμνa =
(εμ )b ∇a (εν )b and ωμνa = −ωνμa (i.e., ωμν = −ωνμ ), and for the corresponding Ricci
rotation coefficients
Proposition 8.7.1 If we exchange all the 1s and 2s in the subscripts of ωμνρ (and keep
all the 3s and 4s unchanged), we obtain its complex conjugate ω̄μνρ , e.g., ω134 = ω̄234 ,
ω342 = ω̄341 , ω421 = ω̄412 , ω122 = ω̄211 , ω344 = ω̄344 .
Proof It follows from (8.7.5) that ω̄μνρ = (ε̄μ )b (ε̄ρ )a ∇a (ε̄ν )b , and it is not difficult to prove
this proposition using this equation. For example,
Proposition 8.7.1 not only holds for ωμνρ , but also holds for all the quantities (including
tensors) that carry null tetrad indices, e.g., ω41 = ω̄42 , ω21 = ω̄12 , R31 = R̄32 , R12 = R̄21 ,
R34 = R̄34 .
The process of computing the curvature tensor using the null tetrad method is similar to that
using the orthonormal tetrad method; that is, one finds all the connection 1-forms ωμν of
the chosen null tetrad and then finds all the curvature 2-forms Rμν . The components ωμνρ
of the connection 1-forms can still be computed from (5.7.19) and (5.7.20), in which the
(eμ )a should now be interpreted as (εμ )a . After finding all the ωμν one can still use Cartan’s
second equation of structure to compute all the Rμν .
Proposition 8.7.2 In a null tetrad, Cartan’s second structure equation (5.7.8) reads
362 8 Solving Einstein’s Equation
Proof When we have a metric gab , Cartan’s second equation (5.7.8) can be written as
where g λτ are the components of g ab in the null tetrad. Noticing that the only nonzero g λτ
are g 12 = g 21 = 1 and g 34 = g 43 = −1, we can write down all 6 independent components
of Rμν as follows:
R43 = dω43 + ω32 ∧ ω41 + ω32 ∧ ω41 = dω43 + 2 Re(ω32 ∧ ω41 ) , (8.7.7a )
R21 = dω21 + ω32 ∧ ω41 − ω32 ∧ ω41 = dω21 + 2i Im(ω32 ∧ ω41 ) , (8.7.7f )
These two equations together are equivalent to (8.7.6c). Therefore, (8.7.7a)–(8.7.7f) are
equivalent to (8.7.6a)–(8.7.6c).
The whole formalism introduced by Newman and Penrose based on the null tetrad method
is called the Newman-Penrose formalism, or NP formalism for short. The basic idea of
the NP formalism is to separate all kinds of sets of condensed equations [e.g., (8.7.6)] into
multiple component equations, which will certainly lead to the appearance of quantities with
many indices, such as ωμνρ , Rρσ μν , etc. For the sake of making the equations look simpler
(and other purposes), the NP formalism uses many notations which carry less or no indices
to represent these quantities with many indices. We will introduce all three kinds of them as
follows:
(1) Due to Proposition 8.7.1, only 12 out of the 24 linear combinations of the complex
ωμνρ are linearly independent. (Comparing with the fact that there are 24 linearly independent
real ωμνρ in a orthonormal tetrad, you will find this is quite natural). Use 12 Greek letters
without indices to represent 12 linearly independent combinations of ωμνρ as follows [(8.7.5)
is used]:
κ ≡ −ω144 = −m a k b ∇b ka , (8.7.8a)
ρ ≡ −ω142 = −m m̄ ∇b ka ,
a b
(8.7.8b)
σ ≡ −ω141 = −m a m b ∇b ka , (8.7.8c)
τ ≡ −ω143 = −m a l b ∇b ka , (8.7.8d)
ν ≡ ω233 = m̄ a l b ∇b la , (8.7.8e)
8.7 The Newman-Penrose (NP) Formalism [Optional Reading] 363
μ ≡ ω231 = m̄ a m b ∇b la , (8.7.8f)
λ ≡ ω232 = m̄ m̄ ∇b la ,
a b
(8.7.8g)
π ≡ ω234 = m̄ a k b ∇b la , (8.7.8h)
1 1
ε ≡ (ω214 − ω344 ) = (m̄ a k b ∇b m a − l a k b ∇b ka ) , (8.7.8i)
2 2
1 1
β ≡ (ω211 − ω341 ) = (m̄ a m b ∇b m a − l a m b ∇b ka ) , (8.7.8j)
2 2
1 1
γ ≡ (ω433 − ω123 ) = (k a l b ∇b la − m a l b ∇b m̄ a ) , (8.7.8k)
2 2
1 1
α ≡ (ω432 − ω122 ) = (k a m̄ b ∇b la − m a m̄ b ∇b m̄ a ) . (8.7.8l)
2 2
These 12 greek letters are called the spin coefficients.
Proposition 8.7.3 The 24 ωμνρ can be expressed in terms of the 12 spin coefficients as
follows:
Proof Only 8 out of the 24 equations above need to be checked (the others can be read
directly from the definition of the spin coefficients), the verification is as follows.
Firstly, since (ε3 )a and (ε4 )a are real vectors, ω343 and ω344 are real. Secondly, it follows
from ω213 = −ω123 = −ω̄213 that ω213 + ω̄213 = 0, and hence ω213 is imaginary. Similarly
we can see that ω214 is also imaginary. Also, ε ≡ 21 (ω214 − ω344 ) = 21 (ω434 + ω214 ), and
hence ω434 = 2Re(ε) = ε + ε̄, ω214 = 2iIm(ε) = ε − ε̄. Similarly, we have ω433 = γ + γ̄ ,
ω213 = γ − γ̄ . Furthermore, from the definitions of α and β we get β = − 21 (ω121 + ω341 ),
ᾱ = 21 (ω121 − ω341 ). Thus, ω341 = −(ᾱ + β), ω121 = ᾱ − β, from which we can easily get
ω122 = β̄ − α, ω342 = −(α + β̄).
(2) Since the derivatives of spin coefficients along the 4 basis vectors appear frequently in
all kinds of equations, we introduce the following 4 notations for derivatives:
δ ≡ m a ∇a , δ̄ ≡ m̄ a ∇a , ≡ l a ∇a , D ≡ k a ∇a . (8.7.9)
(3) The components of the Riemann tensor Rabc d have 4 indices. We would like to denote
them using notations with less indices. Rabc d is determined by its “traceless part” (Weyl
tensor) Cabc d and “trace part” (Ricci tensor) Rab . Due to various symmetries, the Weyl
tensor has only 10 real independent components, which can be represented by 5 complex
quantities 0 , 1 , 2 , 3 , 4 defined as
1
0 := C4141 , 1 := C4341 , 2 := (C4343 − C4312 ) ,
2 (8.7.10)
3 := C3432 , 4 := C3232 ,
where Cμνρσ are the components of Cabcd in the null tetrad. The Ricci tensor Rab only has
10 real independent components due to the symmetry Rab = Rba . In the null tetrad, among
364 8 Solving Einstein’s Equation
the 10 independent components R44 , R43 , R42 , R41 , R33 , R32 , R31 , R22 , R21 , R11 , 6 are
complex and 4 are real. It is obvious that R44 , R43 , R33 are real, and R21 is also real since
R21 = R12 = R̄21 . In terms of linear combinations of these 4 real numbers, one can define
the following 4 real quantities:
1 1 1
00 := R44 , 11 := (R21 + R43 ) , 22 := R33 , R := 2(R21 − R43 ) .
2 4 2
(8.7.11a)
The fourth real quantity R is actually the scalar curvature [it is easy to show that the
scalar curvature indeed equals 2(R21 − R43 )]. In terms of the 6 complex components
R42 , R41 , R32 , R31 , R22 , R11 , one can define 6 complex quantities
1 1 1
01 := R41 , 10 := R42 , 02 := R11 ,
2 2 2 (8.7.11b)
1 1 1
20 := R22 , 12 := R31 , 21 := R32 .
2 2 2
The above 10 quantities excluding R can be arranged into a 3 × 3 “conjugate symmetric”
¯ τ λ , λ, τ = 0, 1, 2):
matrix [λτ ] (satisfying λτ =
0 1 2
1 1 1
0 2 R44 2 R41 2 R11
1 1
2 R42
1
4 (R 21 + R43 ) 1
2 R31
1 1 1
2 2 R22 R
2 32 2 R33
The 3 independent off-diagonal elements together with the 3 real diagonal elements and the
real number R represent exactly the 10 real independent components of Rab .
The NP formalism contains 3 equation systems that are very useful, namely (A) the NP
equations; (B) the Bianchi identities; (C) the commutation relations. Here we introduce
them as follows.
(A) NP equations.
Expressing R41 , R32 , R21 , R43 in terms of 0 , 1 , 2 , 3 , 4 as well as the 10 quantities
00 , · · · , 22 and R, and expressing ω41 , ω32 , ω21 , ω43 in terms of the 12 spin coefficients,
one can reformulate (8.7.6) into the following 18 equations, called the NP equations:
δλ − δ̄μ = ν(ρ − ρ̄) + π(μ − μ̄) + μ(α + β̄) + λ(ᾱ − 3β) − 3 + 21 , (8.7.12m)
δν − μ = (μ2 + λλ̄) + μ(γ + γ̄ ) − ν̄π + ν(τ − 3β − ᾱ) + 22 , (8.7.12n)
δγ − β = γ (τ − ᾱ − β) + μτ − σ ν − εν̄ − β(γ − γ̄ − μ) + α λ̄ + 12 , (8.7.12o)
δτ − σ = (μσ + λ̄ρ) + τ (τ + β − ᾱ) − σ (3γ − γ̄ ) − κ ν̄ + 02 , (8.7.12p)
ρ − δ̄τ = −(ρ μ̄ + σ λ) + τ (β̄ − α − τ̄ ) + ρ(γ + γ̄ ) + νκ − 2 − R/12 ,
(8.7.12q)
α − δ̄γ = ν(ρ + ε) − λ(τ + β) + α(γ̄ − μ̄) + γ (β̄ − τ̄ ) − 3 . (8.7.12r)
Now we will illustrate the verification of the NP equations by some examples. First take
(8.7.12a) as an example, it is in fact a reformulation of the fourth and second components
of (8.7.6a). In the null tetrad, the components R4241 of Rabcd can be expressed as
R4241 = (ε4 )a (ε2 )b Rab41 = (ε4 )a (ε2 )b [(dω41 )ab + ω41a ∧ (ω21b + ω43b )] ,
where (8.7.6a) is used in the second step. Since (ω41 )b = σ (ε1 )b + ρ(ε2 )b + τ (ε3 )b +
κ(ε4 )b , we have
The last step is tedious but not difficult, which is left as an exercise. The operation of lowering
the index of ωμ ν ρ occurs a lot in the derivation, which relies on the expression (8.7.3) for
the components g νσ of g ab in the null tetrad. Since the matrix in (8.7.3) is quite simple, it is
pretty easy to do the calculation. For instance,
Moreover,
(ε4 )a (ε2 )b [ω41a ∧ (ω21b + ω43b )] = κ(ω212 + ω432 ) − ρ(ω214 + ω434 ) = 2κα − 2ρε ,
and hence
1 1 1
00 ≡ R44 = R4μ4 μ = (R414 1 + R424 2 + R434 3 )
2 2 2
1
= (R4142 + R4241 − R4344 ) = R4241 . (8.7.14)
2
Comparing (8.7.13) and (8.7.14) yields (8.7.12a). Thus, (8.7.12a) is nothing but a compo-
nent equation of (8.7.6a). This might be unapparent for the beginning readers to see since
00 , which represents the curvature component, is written on the right-hand side of the
equation. Now we introduce the derivation of a more complicated equation (8.7.12f). This
is a reformulation of the 4th and 3rd component equations of (8.7.6c). First,
366 8 Solving Einstein’s Equation
R4321 + R4343 = (ε4 )a (ε3 )b (Rab21 + Rab43 ) = (ε4 )a (ε3 )b [(dω21 )ab + (dω43 )ab + 2ω32a ∧ ω41b ] ,
where (8.7.6c) is used in the second equality. Through a tedious but straightforward com-
putation we get
R4321 + R4343 = 2[(Dγ − ε) − α(τ + π̄) − β(τ̄ + π ) + γ (ε + ε̄) + ε(γ + γ̄ ) − τ π + νκ] .
(8.7.15)
On the other hand, from the definition (8.7.10) we know that 2 = (C4343 − C4312 )/2.
Applying the definition of the Weyl tensor [Equation (3.4.14)] to the n = 4 case yields
1 1
Cabcd = Rabcd − [(gac Rdb − gad Rcb ) − (gbc Rda − gbd Rca )] + R(gac gdb − gad gcb ) .
2 6
Noticing (8.7.3), we have C4343 = R4343 − R34 − R/6, C4312 = R4312 , and hence
1 1 1
2 = (R4343 − R4312 ) − R34 − R. (8.7.16)
2 2 12
It follows from (8.7.10) that 11 = (R12 + R43 )/4 and R = 2(R12 − R34 ), and hence
− γ σ μν = g σβ ωμβν , (8.7.20)
and hence (8.7.19) in the null tetrad becomes
and hence we obtain (8.7.22a). The other 3 equations can be verified in a similar manner.
In order to help the readers to better understand the method of solving Einstein’s equation
using the NP formalism, this text will provide two specific examples in Sect. 8.8.2 and
Optional Reading 8.9.1.
Due to the antisymmetry, the electromagnetic tensor Fab has at most 6 inde-
pendent complex components in the null tetrad, which may be chosen as
F43 , F42 , F41 , F32 , F31 , F21 . Moreover, they also satisfy the following relations:
F43 = F̄43 , F42 = F̄41 , F32 = F̄31 , F21 = −F12 = − F̄21 ,
368 8 Solving Einstein’s Equation
and thus among all 6 of them, F43 and F21 are respectively real and imaginary (their
sum is complex), and the other 4 are equivalent to two independent complex quantities
(we may take F41 and F23 ). Therefore, they are represented by 3 complex quantities
0 , 1 and 2 , defined as
∇ a Fab = 0 , (8.8.2a)
∇[a Fbc] = 0 (8.8.2b)
have the following form in the NP formalism:
The first and second terms on the right-hand side of the equation above are respec-
tively
and hence the sum of the first and second terms on the right-hand side of (8.8.4) is
¯ 0 − κ2 − κ̄
π 0 + π̄ ¯ 2 . Similarly, the sum of the fourth and fifth terms on the
right-hand side of (8.8.4) is −κ2 + κ̄ ¯ 2 − π̄
¯ 0 + π 0 , and therefore,
Hence,
1
D1 − δ̄0 = (π − 2α)0 + 2ρ1 − κ2 + (k a l b k c + m̄ a m b k c − 2k a m b m̄ c )∇c Fab .
2
(8.8.5)
Let G ≡ (k a l b k c + m̄ a m b k c − 2k a m b m̄ c )∇c Fab , then to verify (8.8.3a) one only has
to show that G = 0. Maxwell’s equations is certainly involved in verifying this. From
(8.7.3) we can see that
g ac = m a m̄ c + m̄ a m c − l a k c − k a l c , (8.8.6)
0 = [m a k b m̄ c + m̄ a k b m c − (l a k b k c + k a k b l c )]∇c Fab
= [m a k b m̄ c − (m a m̄ b k c + k a m b m̄ c ) + k a l b k c ]∇c Fab
= [−m b k a m̄ c − (−m b m̄ a k c + k a m b m̄ c ) + k a l b k c ]∇c Fab = G ,
where the second equality is because ∇[c Fab] = 0 leads to m̄ [a k b m c] ∇c Fab = 0 and
l [a k b k c] ∇c Fab = 0, and the third equality comes from the fact that Fab = −Fba . The
other equations in (8.8.3) can be verified similarly.
It follows from (7.2.6) that (Exercise 8.11)
1 1 1
T11 = ¯2,
0 T12 = T21 = ¯1,
1 T13 = T31 = ¯ 2 1 ,
2π 2π 2π
1 1 1
T14 = T41 = 0 ¯1, T22 = ¯0,
2 T23 = T32 = 2 ¯1, (8.8.7)
2π 2π 2π
1 1 1 1
T24 = T42 = ¯ 0 1 ,
T33 = ¯2,
2 T34 = T43 = 1 ¯1, T44 = ¯0.
0
2π 2π 2π 2π
Then, from (8.7.11a), (8.7.11b) and the component form of Einstein’s equation
Rμν = 8π Tμν we obtain the following succinct relations between 00 , · · · , 22
which represent the curvature tensor and 0 , 1 , 2 which represent the electro-
magnetic field tensor:
¯0,
00 = 20 ¯1,
01 = 20 ¯2,
02 = 20
(8.8.8)
¯1,
11 = 21 ¯2,
12 = 21 ¯2.
22 = 22
¯τ ,
λτ = 2λ λ, τ = 0, 1, 2 . (8.8.9)
Hence, the null condition for an electromagnetic field can also be expressed equiva-
lently as
0 2 − 21 = 0 . (8.8.11)
In this subsection, we will introduce the detailed process of solving the Einstein-
Maxwell equations using the Newman-Penrose formalism by a specific example
[see Liang (1995)]. Suppose the metric to be found has the following line element
expression in a coordinate system {t, z, ϕ, ρ}:
By means of (8.7.1), starting from the above orthonormal tetrad fields one can con-
veniently construct the following null tetrad fields
1
m a = √ [e−η/2 (∂/∂z)a − ie−(η+χ)/2 (∂/∂ϕ)a ] , (8.8.15a)
2
1
m̄ a = √ [e−η/2 (∂/∂z)a + ie−(η+χ)/2 (∂/∂ϕ)a ] , (8.8.15b)
2
8.8 Solving the Einstein-Maxwell Equations Using the NP … 371
1 √
l a = √ e−ξ/2 [(∂/∂t)a − (∂/∂ρ)a ] = 2e−ξ/2 (∂/∂u)a , (8.8.15c)
2
1 √
k a = √ e−ξ/2 [(∂/∂t)a + (∂/∂ρ)a ] = 2e−ξ/2 (∂/∂v)a . (8.8.15d)
2
After computing all the ωρμν using (5.7.19) [in which the (eμ )a should be interpreted
as (εμ )a ] and (5.7.20) or any other method, one can find all (12) complex spin
coefficients from (8.7.8) as follows:
κ = τ = ν = π = β = α = 0, (8.8.16a)
√
2 −ξ/2 ∂η ∂χ
ρ=− e 2 + , (8.8.16b)
4 ∂v ∂v
√
2 −ξ/2 ∂η ∂χ
μ= e 2 + , (8.8.16c)
4 ∂u ∂u
√
2 −ξ/2 ∂ξ
ε= e , (8.8.16d)
4 ∂v
√
2 −ξ/2 ∂χ
σ = e , (8.8.16e)
4 ∂v
√
2 −ξ/2 ∂χ
λ=− e , (8.8.16f)
4 ∂u
√
2 −ξ/2 ∂ξ
γ =− e . (8.8.16g)
4 ∂u
When solving the Einstein-Maxwell equations, we have already assumed that there
is only an electromagnetic field but no matter fields (“electrovacuum”). The trace-
lessness of the energy-momentum tensor Tab of the electromagnetic field leads to the
fact that the scalar curvature R vanishes. Noticing (8.8.16a), we can see that the NP
equations take the following form:
0 = μρ − λσ − 2 + 11 , (8.8.17l)
0 = −3 + 21 , (8.8.17m)
−μ = μ(μ + 2γ ) + λ + 22 ,
2
(8.8.17n)
0 = 12 , (8.8.17o)
−σ = σ (μ − 2γ ) + λρ + 02 , (8.8.17p)
ρ = ρ(2γ − μ) − σ λ − 2 , (8.8.17q)
0 = −3 . (8.8.17r)
Our discussion is limited only to the case of a source-free electromagnetic field, and
hence when (8.8.16a) holds Maxwell’s equations will take the following form:
From Einstein’s equations (8.8.9) we can see that (8.8.17d) and (8.8.17o) will lead
to 1 = 0 or 0 = 2 = 0. It follows from the null condition 0 2 − 21 = 0 that
an electromagnetic field with 0 = 2 = 0 can only be a nonnull electromagnetic
field, while an electromagnetic field with 1 = 0 can be either null or nonnull. Here
we only discuss nonnull electromagnetic fields with 1 = 0; that is, we only seek
for the solutions of nonnull electromagnetic fields with 1 = 0 (which must have
0 = 0 and 2 = 0). In this case, Maxwell’s equations (8.8.18) will be simplified
to
δ̄0 = 0 , (8.8.19a)
D2 = −λ0 + (ρ − 2ε)2 , (8.8.19b)
−0 = (μ − 2γ )0 − σ 2 , (8.8.19c)
δ2 = 0 . (8.8.19d)
∂2 ∂2
− ie−χ/2 = 0. (8.8.20)
∂z ∂ϕ
8.8 Solving the Einstein-Maxwell Equations Using the NP … 373
However, one cannot yet say that ∂2 /∂z = ∂2 /∂ϕ = 0 since 2 is a complex-
valued function. Suppose 2 = Ceiθ , where C and θ are real-valued functions, then
¯ 2 = 2C 2 .
22 = 22 (8.8.21)
Thus,
∂θ ∂θ
= = 0,
∂z ∂ϕ
To make the solving process more tractable, we will only discuss the case where
∂χ /∂u = 0. As long as we have a solution under this condition, we will obtain an
exact solution. Of course, we cannot assure beforehand that there must be a solution
in this case, and so this is a tentative approach. Now we only have to care about the
case ∂χ /∂v = 0. This is because ∂χ /∂u = ∂χ /∂v = 0 will make the line element
(8.8.13) locally the same as a plane symmetric metric, and the plane symmetric
metrics generated by “semi-plane symmetric” (which locally looks like cylindrically
symmetric) electromagnetic fields have been exhausted by Li and Liang (1985). The
condition ∂χ /∂u = 0 brings us many simplifications, for instance it leads to λ = 0,
also one can now integrate (8.8.22) and get
There is now only one unsolved Maxwell equation remaining, namely (8.8.23), which
can be simplified as
374 8 Solving Einstein’s Equation
∂0 ∂ξ ∂η
−4 =2 + 0 − χ a(u)e−(2ξ +2η+χ)/4 , (8.8.26)
∂u ∂u ∂u
where the ’ represents the derivative of a function of one variable (for the above equa-
tion it is χ ≡ dχ /dv). The condition ∂χ /∂u = 0 also simplifies the NP equations,
for instance (8.8.17g) now becomes
1 −ξ ∂η
− 20 = e χ , (8.8.27)
4 ∂u
which says that 20 is a real number, and thus 02 = 20 . Noticing that a(u) = 0
(otherwise the electromagnetic field vanishes), by combining (8.8.27) with (8.8.9)
and (8.8.24) we get
1 ∂η (−2ξ +2η+χ)/4
0 (u, v) = − χ e . (8.8.28)
8ā(u) ∂u
Taking the derivative of the above equation with respect to u and plugging into
(8.8.26) we obtain
2
∂η ∂ 2 η
−1 ∂η
− 2|a| = −ā ā
2
+ 2+ eη+χ/2 . (8.8.29)
∂u ∂u ∂u
Now we look back at the NP equations (8.8.17). Equation (g) has been used. By means
of (8.8.16) and (8.8.27) it is not difficult to verify that (p) is automatically satisfied.
The assumption that 1 = 0 leads to 01 = 10 = 12 = 21 = 11 = 0, and so
(d) and (o) become identities; also, (c) becomes equivalent to (k) and (e), which states
nothing but the fact that the Weyl tensor of the spacetime has its component
1 = 0 . (8.8.30)
3 = 0 . (8.8.31)
4 = 0 , (8.8.32)
2 = μρ . (8.8.33)
If we leave (l) [i.e., (8.8.33)] and (b) to the end to determine 2 and 0 (no need
to solve), then the NP equations (8.8.17) has only 5 unsolved equations remaining,
namely (a), (f), (h), (n) and (q). Noticing 11 = 0, λ = 0 and (8.8.33), we see that
these 5 equations take the following form:
8.8 Solving the Einstein-Maxwell Equations Using the NP … 375
1
η(u, v) = − χ + ln[g(v) − f (u)] , (8.8.39)
2
where g(v) and f (u) are arbitrary functions. Hence, (8.8.35) becomes
∂ 2ξ 1
= − (g − f )−2 f g ,
∂u∂v 2
integrating this yields
1
ξ(u, v) = − ln(g − f ) + F(u) + G(v) , (8.8.40)
2
where F(u) and G(v) are arbitrary functions. Plugging (8.8.39) and (8.8.40) into
(8.8.13) yields
Define new coordinates ũ and ṽ as follows: dũ = e F(u) du, dṽ = eG(u) dv, then
Equations (8.8.42) and (8.8.42 ) represent the same line element (the only difference
is the coordinate notations u and v are changed to ũ and ṽ, which is not essential),
and thus when taking F(u) = G(v) = 0 we do not lose any solution. Henceforth we
will take this choice, i.e., take (8.8.42 ) as the line element.
376 8 Solving Einstein’s Equation
Now, 3 unsolved equations remain, namely (8.8.29), (8.8.34) and (8.8.37), and
the undetermined functions are g(v), f (u), χ (v) and a(u). Equation (8.8.37) is
equivalent to
2
∂ 2 η ∂ξ ∂η 1 ∂η
− + + 2|a|2 e−(η+χ/2) = 0 . (8.8.43)
∂u 2 ∂u ∂u 2 ∂u
By means of (8.8.39) and (8.8.40) (where F = G = 0) one can rewrite the equation
above as
f = 2|a(u)|2 . (8.8.44)
ā −1 ā f = f − 2|a|2 = 0 , (8.8.45)
where (8.8.44) is used in the second equality. The equation above indicates that
either a = 0 or f = 0; however, from (8.8.44) we know that the latter leads to
a = 0, which is not allowed, and hence we have only a = 0, i.e., a = constant.
Thus, integrating (8.8.44) yields
f = 2 A2 u + c1 , f = A2 u 2 + c1 u + c2 , (8.8.46)
Plugging this into (8.8.34), by a brief calculation we can see that (8.8.34) is equivalent
to
8g (v)χ −2 (v) + g(v) = f (u) − (4|a|2 )−1 f 2 (u) . (8.8.48)
In the above equation, the left-hand side is not a function of u and the right-hand side
is not a function of v, and thus both sides are equal to a constant, denoted by K , i.e.,
where A, c1 and c2 are arbitrary constants, the functions g(v) and χ (v) are quite
arbitrary but are related by (8.8.48), in which the value K on both sides depends on
our choice of the constants A, c1 and c2 .
Conclusion: After choosing the constants A, c1 and c2 , any real function pair
(g(v), χ (v)) satisfying (8.8.49) determines a cylindrically symmetric metric by
(8.8.52), whose corresponding source is a cylindrically symmetric nonnull elec-
tromagnetic field described by a complex-valued function pair (0 , 2 ) satisfying
(8.8.28) and (8.8.24). There are many real function pairs (g(v), χ (v)) that satisfy
(8.8.49), for instance the following 3 function pairs all satisfy (8.8.49) with c1 and
c2 chosen to be zero, i.e., K =√0:
(1) g(v) = sin v, χ (v) = 2√ 2v.
(2) g(v) = ln v, χ (v) = 4 2(ln√v)1/2 .
(3) g(v) = v 1/α , χ (v) = (2/α) 2(α − 1) ln v, where α ∈ (1, ∞). This example
forms a one-parameter subfamily (with the parameter α) of the cylindrically sym-
metric solution family of the Einstein-Maxwell equations, in which √ the simplest one
is the solution characterized by α = 2, i.e., g(v) = v 1/2 , χ (v) = 2 ln v.
The electromagnetic field Fab described by a complex-valued function pair
(0 , 2 ) satisfying (8.8.28) and (8.8.24) can also be expressed in terms of its non-
vanishing components in the coordinate basis {(∂/∂t)a , (∂/∂ρ)a , (∂/∂z)a , (∂/∂ϕ)a }:
−χ/4 1
Ft z = −Fzt = −a1 e 1 − uχ , (8.8.53a)
4
−χ/4 1
Fρz = −Fzρ = a1 e 1 + uχ , (8.8.53b)
4
χ/4 1
Ftϕ = −Fϕt = −a2 e 1 + uχ , (8.8.53c)
4
χ/4 1
Fρϕ = −Fϕρ = a2 e 1 − uχ , (8.8.53d)
4
where Ft z ≡ Fab (∂/∂t)a (∂/∂z)b , the others are defined similarly; a1 , a2 ∈ R are
the real and imaginary parts of a, respectively. It is not difficult to verify that Fab
constituted by (8.8.53) satisfies the source-free Maxwell equations ∇ a Fab = 0 and
∇[a Fbc] = 0, and the energy-momentum tensor Tab constituted by Fab according to
(8.4.1) satisfies Einstein’s equation Tab = Rab /8π , where Rab is the Ricci tensor of
the metric (8.8.52).
378 8 Solving Einstein’s Equation
The line element of the vacuum Schwarzschild solution in the Schwarzschild coor-
dinate system {t, r, θ, ϕ} is given by
−1
2 = − 1 − 2M dt 2 + 1 − 2M
dsSch dr 2 + r 2 (dθ 2 + sin2 θdϕ 2 ) (r > 2M) .
r r
Starting from the Schwarzschild coordinate system, we apply the coordinate trans-
formation {t, r, θ, ϕ} → {u, r, θ, ϕ}, where
r
u ≡ t − r∗ , r∗ ≡ r + 2M ln −1 (r∗ is called the tortoise coordinate), (8.9.1)
2M
then the Schwarzschild line element turns into the following form:
2
dsSch = −(1 − 2Mr −1 )du 2 − 2dudr + r 2 (dθ 2 + sin2 θ dϕ 2 )
= [−du 2 − 2dudr + r 2 (dθ 2 + sin2 θ dϕ 2 )] + 2Mr −1 du 2 . (8.9.2)
The square bracket on the right-hand side of the above equation can also be written
as −dt 2 + dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 ), which is nothing but the flat line element, and
hence
2
dsSch = dsflat
2
+ 2Mr −1 du 2 . (8.9.3)
Once we change the constant M in the above equation to a function m(u) of the
coordinate u, we obtain the following new line element [called the Vaidya line
element]:
2
dsVai = dsflat
2
+ 2m(u)r −1 du 2
= −[1 − 2m(u)r −1 ]du 2 − 2dudr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.9.4)
Let gab represent the Vaidya metric, then from the equation above one can read off
all of its nonvanishing components in the system {u, r, θ, ϕ}:
Now that we have the metric we can compute its Einstein tensor G ab ≡ Rab −
Rgab /2, and from Einstein’s equation G ab = 8π Tab we can find its energy-
momentum tensor Tab in order to figure out what is the source associated with this
metric. The nonvanishing components of the Vaidya metric are already given in
(8.9.5), and the corresponding inverse matrix has the nonvanishing components
grr = 1 − 2m(u)r −1 , g ur = gr u = −1 , g θθ = r −2 ,
g ϕϕ = (r sin θ )−2 .
(8.9.8)
Plugging them into (3.2.10 ) yields the nonvanishing Christoffel symbols
u
uu = −mr −2 , u
θθ =r, u
ϕϕ = r sin2 θ
r
uu = −ṁr −1 + mr −3 (r − 2m) , r
ur = r
ru = mr −2 ,
r
θθ = 2m − r , r
ϕϕ = (2m − r ) sin2 θ ,
θ θ
rθ = θr = r −1 , θ
ϕϕ = − sin θ cos θ ,
ϕ ϕ −1 ϕ ϕ
rϕ = ϕr =r , θϕ = ϕθ = cot θ , (8.9.9)
where ṁ ≡ dm(u)/du. Plugging these into (3.4.21), we see that the Ricci tensor Rab
has only one nonvanishing component, i.e.,
and hence
Rab = −2ṁr −2 (du)a (du)b . (8.9.11)
From the above equation we also get R = g uu Ruu = 0, and hence G ab = Rab . There-
fore, it follows from Einstein’s equation G ab = 8π Tab that
ṁ
Tab = − (du)a (du)b . (8.9.12)
4πr 2
Let
ka ≡ −(du)a , k a ≡ g ab kb = −g ab (du)b , (8.9.13)
k a = (∂/∂r )a . (8.9.14)
380 8 Solving Einstein’s Equation
Hence, k a ka = 0, and thus k a is a null vector field. Now (8.9.12) can also be expressed
as
ṁ
Tab = − ka kb . (8.9.12 )
4πr 2
When ṁ < 0, the above equation can be viewed as a special case of the energy-
momentum tensor in the following form:
What kind of field can have an energy-momentum tensor like that shown in the
above equation? It can be proved that (see Appendix D in Volume II) the energy-
momentum tensor of a source-free null electromagnetic field (satisfying Fab F ab = 0)
can be expressed in the form (8.9.15), in which
E2
2 ≡ (E is the electric field measured by an orthonormal tetrad).
2π
(8.9.16)
A null electromagnetic field can be viewed as a “matter field” formed by many pho-
tons propagating along the null direction k a . Moreover, a matter field formed by
other particles with zero rest masses (such as massless scalar particles and neutri-
nos10 ) moving along the k a -direction also has an energy-momentum tensor of the
form (8.9.15). This kind of matter field is called a pure radiation field. In sum-
mary, the matter fields whose energy-momentum tensors can be expressed in terms
of (8.9.15) can be classified into two kinds: ① source-free null electromagnetic fields;
② pure radiation fields. The difference between them is that there exists a 2-form
field Fab for the former one, which satisfies the source-free Maxwell equations and
Tab = Fac Fb c /4π . It can be proved that (see Optional Reading 8.9.1) the matter
field corresponding to (8.9.12 ) does not obey the source-free Maxwell equations,
and thus the source of the Vaidya metric is a pure radiation field instead of a null
electromagnetic field.
When compared with the Schwarzschild metric, the Vaidya metric has mainly
the following three differences. ① The mass parameter M of the former one is a
constant while the m in the latter is a function of u. ② The former is a solution to
the vacuum Einstein equation G ab = 0 while the latter is a solution of the Einstein
equation with source G ab = 8π Tab , where Tab represents a pure radiation field. ③
By finding the general solution to the Killing equation one can show that, the former
has four independent Killing vector fields, in which one of them is timelike, and
hence is a stationary metric; the latter has only three independent Killing vector
fields (which are exactly those three reflecting spherical symmetry) with no timelike
Killing field, and hence the Vaidya solution is not a stationary metric. The above
three properties of the Vaidya metric are closely related. If we interpret m still as the
10 This is included because neutrinos are massless in the Standard Model of particle physics. How-
ever, now it has been experimentally confirmed that neutrinos have nonzero masses, and thus
technically it should not be included anymore.
8.9 The Vaidya Metric and the Kinnersley Metric 381
mass of a spherically symmetric star, and interpret u as the proper time of the star
(Sect. 8.9.3 will justify this interpretation), then m being a function of u (property
①) indicates that the mass of the star changes with time with a rate ṁ. Why is it
so? Because it keeps emitting massless particles (property ②) (for convenience they
are also called “photons”, although they are not quanta of an electromagnetic field),
which takes away energy ceaselessly. Calculation (see Sect. 8.9.3) shows that the
energy flows to infinity per unit time happen to be equal to −ṁ, i.e., equal to the
decreasing rate of the energy (mass) m of the star (assuming ṁ < 0),11 which agrees
with the law of the conservation of energy. It is exactly the feature that m is time
dependent which renders the Vaidya metric a non-stationary metric (property ③).
In consideration of the above-mentioned properties, P. C. Vaidya himself called this
kind of star a “shining star”, although the “shining” is not caused by photons but
other massless particles. It is natural to ask: does not a static star described by the
Schwarzschild metric shine? Of course a star shines, but the thing is, to simplify
the solving process, Schwarzschild ignored the energy-momentum tensor of the
photons emitted from the star (which also form a bath for the star) and treated its
exterior as a vacuum. This is how we can have the well-known, exceptionally easy,
while extensively used, vacuum Schwarzschild solution. Thus, the familiar physical
interpretation “the vacuum Schwarzschild solution describes the exterior metric field
of a static spherically symmetric star” is only an approximate statement.
11 As a solution to Einstein’s equation, the derivative of the parameter m(u) can either be positive
or negative (also, of course, zero). However, in order to make this solution a metric corresponding
to a matter field which is physically acceptable, we need to require ṁ < 0.
382 8 Solving Einstein’s Equation
1
ka = −(du)a , la = − h(du)a − (dr )a ,
2
r r (8.9.21)
m a = √ [(dθ)a − i sin θ(dϕ)a ] , m̄ a = √ [(dθ)a + i sin θ(dϕ)a ] ,
2 2
and the corresponding m a , m̄ a , l a and k a are
1
k a = (∂/∂r )a , m a = √ [(∂/∂θ)a − i sin−1 θ(∂/∂ϕ)a ] , (8.9.21 )
2r
1 1
l a = (∂/∂u)a − h(∂/∂r )a , m̄ a = √ [(∂/∂θ)a + i sin−1 θ(∂/∂ϕ)a ] .
2 2r
The readers should verify that this null tetrad indeed satisfies
After computing all of the ωρμν using (5.7.19) [in which (eμ )a should be interpreted as
(εμ )a ] and (5.7.20) or any other method, one can find all of the 12 spin coefficients using
(8.7.8) as follows:
κ = σ = ν = τ = λ = π = ε = 0, (8.9.22a)
1 1 2m(u) m 1
ρ=− , μ=− 1− , γ = 2 , β = −α = √ cot θ .
r 2r r 2r 2 2r
(8.9.22b)
Using (8.9.22a) one can simplify the NP equations into the following form:
Dρ = ρ 2 + 00 , (8.9.23a)
0 = 0 , (8.9.23b)
0 = 1 + 01 , (8.9.23c)
Dα = αρ + 10 , (8.9.23d)
Dβ = β ρ̄ + 1 , (8.9.23e)
Dγ = 2 + 11 − R/24 , (8.9.23f)
0 = 20 , (8.9.23g)
Dμ = ρ̄μ + 2 + R/12 , (8.9.23h)
0 = ψ3 + 21 , (8.9.23i)
0 = −4 , (8.9.23j)
δρ = ρ(ᾱ + β) − 1 + 01 , (8.9.23k)
δα − δ̄β = μρ + (α ᾱ + β β̄ − 2αβ) − 2 + 11 + R/24 , (8.9.23l)
−δ̄μ = −3 + 21 , (8.9.23m)
−μ = μ2 + μ(γ + γ̄ ) + 22 , (8.9.23n)
−β = γ (−ᾱ − β) − β(γ − γ̄ − μ) + 12 , (8.9.23o)
0 = 02 , (8.9.23p)
ρ = −ρ μ̄ + ρ(γ + γ̄ ) − 2 − R/12 , (8.9.23q)
α = α(γ̄ − μ̄) + γ β̄ − 3 . (8.9.23r)
Plugging (8.9.22b) into (8.9.23), one can readily find the 5 complex quantities 0 ∼ 4
representing the Weyl tensor and the 4 real quantities 00 , 11 , 22 , R representing the
Ricci tensor as well as 3 independent complex quantities 01 , 02 , 12 . Among them only
two are nonvanishing:
8.9 The Vaidya Metric and the Kinnersley Metric 383
2 = −m(u)/r 3 , (8.9.24)
22 = −ṁ(u)/r . 2
(8.9.25)
Noticing (8.7.11a), especially 22 = R33 /2 therein, we can see that the Ricci tensor of the
Vaidya metric is
Rab = R33 (ε3 )a (ε3 )b = R33 (−ka )(−kb ) = 222 ka kb = −2ṁ(u)r −2 (du)a (du)b ,
(8.9.26)
which agrees with the Rab [see (8.9.11)] derived using the coordinate basis method [Equa-
tion (3.4.21)].
Using the above result one can now also show that the matter field corresponding to the
Vaidya metric is not an electromagnetic field. In the NP formalism, an electromagnetic field
Fab is represented by complex quantities 0 , 1 , 2 , whose relations with 00 , · · · , 22
representing the Ricci tensor are given in (8.8.8). Since 22 is the only nonvanishing one
among 00 , · · · , 22 , (8.8.8) gives
0 = 1 = 0 , 2 = Aeiα , (8.9.27)
√
where A ≡ −ṁ(u)/2r −1 , and α is a real function of the coordinates. Plugging (8.9.27)
into the source-free Maxwell equations (8.8.3), one finds that (a), (c) are identities and (b),
(d) leads to, respectively,
∂α 1 ∂α ∂α
= 0, − −i = cot θ . (8.9.28)
∂r sin θ ∂ϕ ∂θ
The first equation indicates that α = α(u, θ, ϕ), and the real and imaginary parts of the
second equation gives ∂α/∂θ = 0 [and hence α = α(u, ϕ)] and ∂α/∂ϕ = − cos θ. These
two equations contradict each other. Thus, the matter field of the Vaidya metric is not an
electromagnetic field, and therefore can only be a pure radiation field.
[The End of Optional Reading 8.9.1]
The Vaidya metric is a generalization of the Schwarzschild metric, and a new met-
ric defined by W. Kinnersley is a generalization of the Vaidya metric [Kinnersley
(1969)]. Now we introduce this metric. Suppose L(u) is an arbitrary smooth time-
like curve (imagine it as the world line of a rocket) in 4-dimensional Minkowski
spacetime (R4 , ηab ), where u is the proper time. (Here we use u instead of τ , the
purpose will be clear later). Following Kinnersley, we will use λa (instead of U a in
the convention of this text) to represent the 4-velocity of L(u), i.e., λa ≡ (∂/∂u)a .
Suppose p is an arbitrary point in R4 , then L and the past light cone surface of p
have exactly one intersection,12 denoted by q (see Fig. 8.11). Let {X μ } be an arbi-
trary inertial coordinate system, λμ be the components of λa in this system, and
ψ a , ξ a be the position vectors of p, q in this system, i.e., ψ a ≡ ψ μ (∂/∂ X μ )a | p ,
ξ a ≡ ξ μ (∂/∂ X μ )a |q , where ψ μ ≡ X μ ( p), ξ μ ≡ X μ (q). Originally, u and λa are
12 There is an exception when L(u) is asymptotically null (e.g., the hyperbola in Exercise 6.13).
only a scalar field and a vector field defined on L(u); however, their domains can be
naturally extended to the whole R4 : ∀ p ∈ R4 , we have a unique q ∈ L, and thus we
can define u( p) := u(q), λμ ( p) := λμ (q). [Define λa | p by defining its coordinate
components λμ ( p), i.e., λa | p := λμ (q)(∂/∂ X μ )a | p ]. Thus, the parametric equations
for each integral curve C(u) of λa in the coordinate system {X μ } are
[Because the tangent of the curve represented by the above parametric equations has
components dX μ (u)/du = dξ μ (u)/du = λμ in the system {X μ }. When σ μ = 0 the
above equation will degenerate to X μ (u) = ξ μ , namely the parametric equations of
L(u)]. This indicates that the λa of any point p satisfies
λa ∂a u = (∂/∂u)a ∂a u = 1 . (8.9.30)
σ a = r λa + σ̂ a . (8.9.31)
Contracting both sides of this equation with λa ≡ ηab λb , and noticing that λa λa = −1
and that σ̂ a is orthogonal to λa , we obtain
r = −λa σ a . (8.9.32)
Also, let
k a ≡ r −1 σ a , n a ≡ r −1 σ̂ a , (8.9.33)
then we have
where m(u) is a function of u. Now that there are two metrics (ηab and gab ) on R4
[with L(u) removed], we need to pay additional attention to raising and lowering
indices (and other constructions involving a metric). For those quantities defined as
vectors (each carries an upper index) in the first place, such as λa , σ a and k a , it is
crystal clear. We stipulate that for all the tensors obtained by raising and lowering
indices (e.g., λa , σa , ka ), the indices are raised and lowered by ηab . For those tensors
386 8 Solving Einstein’s Equation
whose indices are raised and lowered by gab we will write out gab explicitly, for
instance gab λb is not equal to λa (=ηab λb ).13
First we discuss the simple case where L(u) is a geodesic of ηab (we will call it
an η-geodesic for brevity). In this case, the Kinnersley metric (8.9.36) comes down
to the Vaidya metric (when ṁ = 0) or the Schwarzschild metric (when ṁ = 0). In
order to see this, one only needs to write out the line element of gab in an appropriate
coordinate system {u, r, θ, ϕ} and compare it with (8.9.4). Take u and r which are
already defined for each point as the first two coordinates of the system {u, r, θ, ϕ},
and leave θ and ϕ to be defined below. Suppose {T, X, Y, Z } is the inertial coordinate
system of ηab , whose origin of the spatial coordinates (X = Y = Z = 0) as a world
line coincides with the geodesic L(u), then the components of λa in this coordinates
are λμ = (1, 0, 0, 0) (see Fig. 8.13), and hence the r in (8.9.32) satisfies
r = −ημν σ μ λν = −η00 σ 0 λ0 = σ 0 .
On the other hand, the 3-dimensional space p in Fig. 8.13 can be viewed as the
whole space at the time of p. From the figure we can see that σ 0 = the length of the
line segment qa = the length of the line segment ap, and thus r = σ 0 indicates that
the value of r at p is the spatial distance between p and the geodesic L(u). Set up a
spherical coordinate system {r, θ, ϕ} on p with a as the origin and r as the radial
coordinate, in which θ and ϕ are defined as follows:
Combining this {r, θ, ϕ} with u yields the 4-dimensional coordinate system we want.
The u and r of this system and the T of {T, X, Y, Z } has the following relation: T =
u + r . Hence, the line element of ηab in this system is −du 2 − 2dudr + r 2 (dθ 2 +
sin2 θ dϕ 2 ), and therefore the line element of the Kinnersley metric gab is
13However, gab k b is equal to ka (=ηab k b ). This is because it follows from (8.9.36) that gab k b =
ηab k b + 2mr −1 (du)a (du)b k b , and k b (du)b = k b ∂b u = 0 (According to the definition of u, it is a
constant on an integral curve of k a ).
8.9 The Vaidya Metric and the Kinnersley Metric 387
which has the same form as (8.9.4). Thus, the Kinnersley metric (8.9.36) comes down
to the Vaidya metric (when ṁ = 0) or the Schwarzschild metric (when ṁ = 0) in
the case where L(u) is an η-geodesic. The actual generalization by Kinnersley is the
case where L(u) is not an η-geodesic, which will be discussed in detail below.
Take the Z -axis of this system as the polar axis, define the coordinates θ and ϕ on
the future light cone surface of q (including q) as follows:
Since the direction of the 4-acceleration λ̇a changes continuously as q moves along
L, when defining θ and ϕ we need to keep rotating the direction pointing at the north
pole in order to guarantee that it keeps align with λ̇a .
By a calculation based on the preceding discussion (see Optional Reading 8.9.2
for details) one can find all the nonvanishing components of the Kinnersley metric
gab in the system {u, r, θ, ϕ}:
with
f ≡ a(u) sin θ + b(u) sin ϕ − c(u) cos ϕ , g ≡ [b(u) cos ϕ + c(u) sin ϕ] cot θ ,
(8.9.39b)
where
is the magnitude of the 4-acceleration of L(u),14 and b and c describe the time rate of
change (u as the time) of the direction of λ̇a , see Optional Reading 8.9.2 for details.
If a segment of L(u) is a timelike hyperbola (see Exercise 6.13), then a = constant
and b = c = 0 in this segment.
One can further calculate the Ricci tensor Rab and scalar curvature R of the
Kinnersley metric:
1
Tab = − (ṁ + 3ma cos θ )ka kb . (8.9.41)
4πr 2
Similar to (8.9.12 ), the matter field corresponding to the above expression is also a
pure radiation field rather than an electromagnetic field. Although this matter field
is formed by massless particles which are not photons, we will refer to them as
“photons” for the sake of convenience.
14Note that the a(u) defined in this text has a sign difference compared with Kinnersley (1969) and
Bonnor (1994).
8.9 The Vaidya Metric and the Kinnersley Metric 389
As we have mentioned, the Kinnersley metric comes down to the Vaidya metric
when L(u) is an η-geodesic and ṁ = 0. By means of L(u) we can provide a more
intuitive interpretation for the physical meaning of the Vaidya metric. In this inter-
pretation one should note that there are two metric fields on R4 , namely ηab and the
(Vai)
Vaidya metric gab ; the geodesic, 4-acceleration, etc. we mentioned above are all
measured by ηab and its associated derivative operator ∂a .
One may imagine this: a star is undergoing geodesic motion (inertial motion) in
Minkowski space with L(u) as its world line. Since it keeps emitting particles, its
mass (energy) keeps decreasing (ṁ < 0). The 4-momentum of the star together with
the energy-momentum Tab of the surrounding radiation field produce a gravitational
(Vai)
field which makes the spacetime curved, and the spacetime is described by gab .
[However, one cannot ask a question like “is the world line a geodesic measured by
(Vai) (Vai)
gab ” since gab is not defined on the curve (r = 0)]. Since the geodesic L(u) “holds
(Vai)
the scales even”, i.e., it is isotropic, gab has spherical symmetry, but ṁ = 0 makes it
lose the stationarity. This intuitive physical interpretation can also be carried over to
the Kinnersley metric g (Kin) . Now L(u) is not an η-geodesic, and its radiation is not
isotropic anymore; hence, it is not appropriate to regard L(u) as the world line of a star.
Thus, we now change the star to a rocket, which keeps emitting “photons” outwards in
an anisotropic manner (to some extent similar to a real rocket emitting jets), and hence
is called a “photon rocket” in the literature. The recoil experienced by this rocket due
to the fact that it emits photons makes its energy and 3-momentum keep changing;
the former is manifested by ṁ < 0, and the latter renders the time rate of change of
the 3-momentum nonvanishing. Formulating in the 4-dimensional language, using
P a to represent the 4-momentum of the rocket, we have P a = mλa , and hence its
time rate of change is Ṗ a = ṁλa + m λ̇a , where Ṗ a ≡ λb ∂b P a . The first and second
terms represent the time rates of change of the energy and 3-momentum, respectively.
In the instantaneous rest inertial frame {X μ } at q, the component expression for this
equation reads
Ṗ μ = ṁλμ + m λ̇μ . (8.9.42)
Since at q we have
Now we will show that the energy and momentum increasing rates of the rocket
caused by this recoil are exactly the energy and momentum carried by the “photons”
emitted by the rocket to infinity per unit time times −1. To do so, we should calculate
390 8 Solving Einstein’s Equation
the energy and momentum flowing out of the sphere S in Fig. 8.14. Suppose {X μ } is an
instantaneous rest inertial frame at q, then {(eμ )a } ≡ {(∂/∂ X μ )a } is an orthonormal
tetrad field on R4 . Sect. 6.4 points out that T 0 j (= −T0 j ) is the j-component of the
energy flux density, and hence T 0 j (e j )a is the energy flux density vector. Therefore,
energy flowing outside S per unit time = T (e j ) n a dS =
0j a
T 0 j n j dS ,
S S
(8.9.45)
where n a ≡ ηab n b , while n b is the outgoing unit normal vector of the sphere S,
namely the n a in (8.9.33). Moreover, Sect. 6.4 also points out that T i j (ei )a (e j )b is
the 3-momentum flux density tensor, whose contraction with any spatial unit vector
gives the 3-momentum flux density vector. Therefore,
3-momentum flowing out of S per unit time = T i j (ei )a (e j )b n b dS = T i j (ei )a n j dS ,
S S
(8.9.46)
i-component of the 3-momentum flowing out of S per unit time = T i j n j dS .
S
(8.9.47)
It follows from the definition of the instantaneous rest inertial frame {X μ } at q and
the coordinates θ and ϕ that at any point on S we have (see Figs. 8.14 and 8.12)
Plugging in k a = λa + n a yields
1 1
T μν n ν = − (ṁ + 3ma cos θ )k μ k ν n ν = − (ṁ + 3ma cos θ )k μ ,
4πr 2 4πr 2
(8.9.51)
and hence
1
energy flowing out of S per unit time = − (ṁ + 3ma cos θ )k 0 dS
4πr 2 S
2π π
1
=− dϕ (ṁ + 3ma cos θ )r 2 sin θ dθ = −ṁ , (8.9.52a)
4πr 2 0 0
8.9 The Vaidya Metric and the Kinnersley Metric 391
the 3rd component of the 3-momentum flowing out of S per unit time
1
=− (ṁ + 3ma cos θ )k 3 dS
4πr 2 S
2π π
1
=− dϕ (ṁ + 3ma cos θ ) cos θr 2 sin θ dθ = −ma . (8.9.52b)
4πr 2 0 0
the 1st and 2nd components of the 3-momentum flowing out of S per unit time = 0 .
(8.9.52c)
Equations (8.9.52) also hold when r → ∞. Comparing them with (8.9.44) proves
the conclusion we claimed above, i.e., the increasing rates of the rocket’s energy and
momentum are exactly the energy and momentum carried by the “photons” it emits
to infinity per unit time times −1.
Based on the physical interpretation above, we may refer to the Kinnersley solu-
tion as the “solution of an arbitrary accelerating point mass”, or say that the Kin-
nersley metric represents the “gravitational field of an arbitrary accelerating point
mass”. However, one should note that: ① “accelerating point mass” means that
the 4-acceleration λ̇a ≡ λb ∂b λa of the rocket is nonvanishing (a = 0), and this 4-
acceleration is measured by ηab . Why is it not measured by gab ? The answer is: the
world line of the rocket has r = 0, while gab is not well-defined (is singular) on this
curve, and so it cannot be used to measure any quantity on the rocket’s world line.
② This “gravitational field of an accelerating point mass” is generated by the point
mass (rocket) together with the “photons” it emits, and the Tab corresponding to gab
is the energy-momentum tensor of the pure radiation field outside the rocket.
The preceding discussion about the Kinnersley metric also has a few subtleties,
as we will list below:
(1) When computing the energy and momentum flowing out of the sphere S, we
have used ηab for everything that involves a metric without mentioning; however, the
metric of Kinnersley spacetime is supposed to be the Kinnersley metric, and so the
legitimacy of the above calculation should be called into question. Regarding this,
Bonnor (1994) provides an answer as follows (gist, not exact words): the difference
between gab and ηab is only in the term with mr −1 . Adding this term will affect the
normalization of the n ν in (8.9.51), but its contribution to the integral will approach
zero when S approaches infinity. Thus, it turns out that ignoring the term with mr −1
will not affect the upshot.
(2) For any matter field known by physicists, the energy density measured by
any observer at any time is non-negative (called the weak energy condition, see
Appendix D in Volume II for details). Suppose ( p, Z a ) is an arbitrary instantaneous
observer, then it follows from (8.9.45) that
1
T00 = Tab Z a Z b = − (ka Z a )2 (ṁ + 3ma cos θ ) .
4πr 2
392 8 Solving Einstein’s Equation
When a = 0 (Vaidya), we only have to let ṁ < 0 to guarantee T00 > 0. However,
the case where a = 0 is not that simple since cos θ can be both positive and negative.
Nevertheless, as long as we assume m > 0, it is not difficult to see that T00 > 0 is
equivalent to −ṁ/3m a cos θ . Therefore, in order to make T00 non-negative for
any value of θ , besides ṁ < 0, we should also require that a −ṁ/3m. One can
consider this as some sort of constraint coming from the energy condition on the
relation between the two parameters m and a of the Kinnersley metric.
(3) Bonnor (1994) points out that, since the rocket undergoes an accelerating
motion, it should emit gravitational waves which carry energy and momentum out to
infinity. However, we have proved that under the premise without gravitational waves,
the energy and momentum carried only by the “photons” to infinity have already
satisfied the balance requirement, i.e., they are exactly the energy and momentum
increasing rates times −1. This implies that the energy and momentum carried by
gravitational waves to infinity vanishes. Hence, there is a paradox: does the Kin-
nersley spacetime have any gravitational radiation at all? Regarding this problem,
Damour (1995) and Dain et al. (1996) studied the gravitational radiation of the Kin-
nersley metric using very different approaches, and the basic conclusion is: both the
point-like accelerating rocket and the “photons” it is surrounded by emit gravita-
tional radiation; the energy and momentum carried by them cancel each other, and
so overall there are no gravitational waves in Kinnersley spacetime (the energy and
momentum carried by the gravitational waves to infinity vanish).
[Optional Reading 8.9.2]
Now we provide the detailed derivation of (8.9.39). It follows from (8.9.36) that among
all the components of gab and ηab in the system {u, r, θ, ϕ} the only different one is the uu-
component. Specifically speaking, if we use guu , gur , · · · , gϕϕ and 0 guu ,0 gur , · · · ,0 gϕϕ to
represent the components of gab and ηab in the system {u, r, θ, ϕ}, then
guu = 0 guu + 2mr −1 , gur = 0 gur , guθ = 0 guθ , guϕ = 0 guϕ , (8.9.53)
grr = 0 grr , gr θ = 0 gr θ , gr ϕ = 0 gr ϕ , gθθ = 0 gθθ , gθϕ = 0 gθϕ , gϕϕ = 0 gϕϕ .
∂ Xμ ∂ Xν
0
guu = ημν = ημν (r k̇ μ + ξ̇ μ )(r k̇ ν + ξ̇ ν )
∂u ∂u
= r 2 ημν k̇ μ k̇ ν + 2r ημν k̇ μ ξ̇ ν + ημν ξ̇ μ ξ̇ ν ,
8.9 The Vaidya Metric and the Kinnersley Metric 393
where the dotted quantities stand for the (partial) derivatives with respect to u, e.g., ξ̇ 0 ≡
dξ 0 /du, k̇ 1 ≡ ∂k 1 /∂u. Since the parametric equations of the curve L(u) are X μ (u) = ξ μ (u),
ξ̇ μ ≡ dξ μ /du is equal to the components λμ of the tangent vector λa of L(u) in the system
{X μ }. From ημν λμ λν = −1 we obtain
0
guu = −1 + 2r ημν k̇ μ λν + r 2 ημν k̇ μ k̇ ν . (8.9.55a)
Second,
∂ Xμ ∂ Xν
0
gur = ημν = ημν (r k̇ μ + ξ̇ μ )k ν = r ημν k̇ μ k ν + ημν λμ k ν = −1 , (8.9.55b)
∂u ∂r
where ημν k̇ μ k ν = 0 can be derived from ημν k μ k ν = 0, while ημν λμ k ν = −1 comes from
λa k a = −1. In a similar manner one can find the expressions for the other components of
ηab in {u, r, θ, ϕ}:
0
guθ = r 2 ημν k̇ μ k ν ,θ + r ημν λμ k ν ,θ , (8.9.55c)
μ ν μ ν
0
guϕ = r ημν k̇ k
2
,ϕ + r ημν λ k ,ϕ , (8.9.55d)
μ ν
0
grr = ημν k k = 0 , (8.9.55e)
0
gr θ = r ημν k μ k ν ,θ = 0 , (8.9.55f)
0
gr ϕ = r ημν k μ k ν ,ϕ = 0 , (8.9.55g)
0
gθ θ = r 2 ημν k μ ,θ k ν ,θ , (8.9.55h)
0
gθ ϕ = r 2 ημν k μ ,θ k ν ,ϕ , (8.9.55i)
0
gϕϕ = r 2 ημν k μ ,ϕ k ν ,ϕ , (8.9.55j)
where the second equalities in (f) and (g) come from ημν k μ k ν = 0. In order to find the final
form of the above expressions, one must compute the partial derivatives of k μ with respect
to u, θ and ϕ, i.e., k̇ μ , k μ ,θ and k μ ,ϕ . To find k μ ,θ and k μ ,ϕ , one only needs to care about
the k μ on the future light cone surface with the fixed q being the apex. In this case (8.9.50)
holds, and we can again list the following expression (with a new equation number):
and hence
χ μ − kμ
k̇ μ | p = lim , (8.9.57)
du→0 du
where k μ and χ μ are respectively the components of k a and χ a in the instantaneous rest
inertial coordinate system {X μ } ≡ {T, X, Y, Z } at q. Now that k μ has already been expressed
as (8.9.56a), the main thing is how to derive χ μ . Let { X̃ μ } ≡ {T̃ , X̃ , Ỹ , Z̃ } represent the
instantaneous rest inertial coordinate system at q̃ [according to the definition in the paragraph
containing (8.9.38), one only needs to change q to q̃], then the components of χ a in the system
{ X̃ μ } are
χ̃ μ = (1, sin θ cos ϕ, sin θ sin ϕ, cos θ) . (8.9.58)
To derive χ̃ μ from χ μ , we should first clarify the relation between the systems { X̃ μ } and
{X μ }. According to our requirement, the Z -axis in {X μ } should be aligned with the direction
of λ̇a |q , and the Z̃ -axis in { X̃ μ } should be aligned with the direction of λ̇a |q̃ . Note that { X̃ μ }
and {X μ } are inertial coordinate systems in two different inertial reference frames Rq̃ and
Rq , since the T -coordinate line G q and the T̃ -coordinate line G q̃ [the η-geodesic tangent
to L(u)] are not parallel in general. However, since both of them are inertial coordinate
system, one can always transfer one to the other via an appropriate translation and Lorentz
transformation. This transformation can be realized by the following three steps: ① Transfer
the origin of {X μ } (namely the point with T = X = Y = Z = 0) from q to q̃ and obtain a
coordinate system {X μ }. ② Use a boost in the T Z -plane to transfer {X μ } to another system
{ X̂ μ } (where the T̂ -axis is parallel to the T̃ -axis). This is an inertial coordinate system in the
inertial reference frame Rq̃ just like { X̃ μ }, only the Ẑ -axis is in general not parallel to λ̇a |q̃ ,
which is the key difference between { X̂ μ } and { X̃ μ }. ③ Apply a spatial rotation R to { X̂ μ }
and turn it into { X̃ μ }, in which the Z̃ -axis is aligned with λ̇a |q̃ . This R can be considered as
two rotations R1 and R2 acting successively (a composite map). R1 is a rotation around the
X̂ -axis that turns the Ẑ -axis to a new position (denoted by Ẑ˜ , see Fig. 8.15), which is the
intersection of the Ŷ Ẑ -plane and the cone with the Ŷ -axis as the axis and Z̃ as a generatrix;
R is a rotation around the Ŷ -axis that turns the Ẑ˜ -axis to the Z̃ -axis. Suppose the angles for
2
R1 and R2 are bdu and cdu.15 These three steps can be expressed as
15 After these two rotations, the X -axis may still not be coincide with the X̃ -axis, but this is not
a problem since the choice of the X -axis if the instantaneous rest inertial frame at each point is
flexible. One should “foresee” this and choose the X̃ -axis based on the result of rotating the X -axis.
8.9 The Vaidya Metric and the Kinnersley Metric 395
⎡ ⎤ ⎡ 1⎤
χ̂ 1 χ̃
⎣ χ̂ ⎦ = R ⎣ χ̃ 2 ⎦ ,
2 (8.9.60)
χ̂ 3 χ̃ 3
where R = R2 R1 is the 3 × 3 matrix described by the rotating angles bdu and cdu. From
Fig. 8.15 and Appendix G in Volume II we have
⎡ ⎤⎡ ⎤
cos(cdu) 0 sin(cdu) 1 0 0
R = R2 R1 = ⎣ 0 1 0 ⎦ ⎣ 0 cos(bdu) − sin(bdu) ⎦
− sin(cdu) 0 cos(cdu) 0 sin(bdu) cos(bdu)
⎡ ⎤
cos(cdu) sin(bdu) sin(cdu) cos(bdu) sin(cdu)
=⎣ 0 cos(bdu) − sin(bdu) ⎦.
− sin(cdu) sin(bdu) cos(cdu) cos(bdu) cos(cdu)
Plugging the above equation and χ̃ i given by (8.9.58) into (8.9.60) yields
⎡ 1⎤ ⎡ ⎤⎡ ⎤
χ̂ 1 0 cdu sin θ cos ϕ
⎣ χ̂ ⎦ = ⎣ 0
2 1 −bdu ⎦ ⎣ sin θ sin ϕ ⎦
χ̂ 3 −cdu bdu 1 cos θ
⎡ ⎤
sin θ cos ϕ + cdu cos θ
=⎣ sin θ sin ϕ − bdu cos θ ⎦. (8.9.62a)
−cdu sin θ cos ϕ + bdu sin θ sin ϕ + cos θ
Since the spatial rotation does not affect the 0-component of a 4-vector, we have
is a tensor equation. To solve it, one can choose a suitable coordinate system and
write it as a system of component equations
G μν (x) = 0 , μ, ν = 0, 1, 2, 3 , (8.10.2)
1
G μν (x) = Rμν (x) − R(x)gμν (x) ,
2
where Rμν (x) and R(x) can be expressed in terms of gμν (x) and its partial deriva-
tives, (8.10.2) can be viewed as a system of partial differential equations for the
unknown functions gμν (x). Also, since gμν = gνμ , gμν (x) only contains 10 inde-
pendent undetermined functions. On the other hand, due to the symmetry of μ and
ν, (8.10.2) also contains 10 algebraically independent partial differential equations.
Under suitable boundary conditions, it is reasonable that 10 independent equations
could determine 10 independent functions. However, things are not as simple as
8.10 Coordinate Conditions, the Gauge Freedom of General Relativity 397
that. The curvature tensor Rabc d satisfies the Bianchi identity ∇[a Rbc]d e = 0, from
which we have ∇a G a b = 0 [Equation (3.4.17)]. Written in terms of components, this
corresponds to 4 differential identities satisfied by the functions gμν (x):
G μ ν;μ = 0 , (8.10.3)
and the 10 functions gμν (x ) representing it are different from those in (8.10.4). For
example, the dependence of g00 (r ) = −[(1 − M/2r )/(1 + M/2r )]2 on its argu-
ment r is obviously different from the dependence of g00 (r ) on its argument r .
However, all Rμν derived from gμν (x ) also vanish, and thus (8.10.6) and (8.3.18) are
both (spherically symmetric) solutions to the vacuum Einstein equations G μν = 0
that satisfy the same boundary conditions, and therefore they represent the same
398 8 Solving Einstein’s Equation
The coordinates satisfying this condition are called Gaussian normal coordinates (see
Optional Reading 8.10.1 for details). Another example of a coordinate condition is
requiring the coordinates x σ to satisfy the following 4 equations:
g ab ∇a ∇b x σ = 0 (σ = 0, 1, 2, 3) . (8.10.8)
Calculation shows that [see Weinberg (1972) pp. 161–163] the above equations are
equivalent to the following 4 equations:
g μν λ
μν =0 (λ = 0, 1, 2, 3) . (8.10.8 )
Also (∂/∂t)a | p = n a | p , and hence g00 | p = (n a n a )| p = −1. Since the tangent vector (∂/∂t)a
is transported parallelly along γ (t), and parallel transport preserves the inner product, we
have
Now we discuss the Einstein equations with source G μν = 8π Tμν . Suppose the
matter field has N components, then usually it needs to satisfy N equations (such
as equations of motion). If the equations are independent (see Example 2 for the
non-independent case), then combining them with the Einstein equations we obtain
10 + N equations. It seems that they can determine 10 + N functions. However, the
10 gμν automatically satisfy G μ ν;μ = 0, and the equations of motion of the matter
field automatically lead to T μ ν;μ = 0, and hence G μ ν;μ − 8π T μ ν;μ automatically
vanishes. That is, we have the following differential identities
400 8 Solving Einstein’s Equation
which will “dispose of” 4 equations. Together with 4 coordinate conditions, these
equations determine 10 + N unknown functions exactly.
Example 1 Suppose the matter field is a perfect fluid, whose components contain the
proper density ρ, pressure p and the 4-velocity components U μ , and hence N = 6.
The equations they satisfy are: (a) the equation of state f (ρ, p) = 0, where f is a
certain function [see above (9.3.20)], (b) the divergence-free condition ∇ a Tab = 0
for the energy-momentum tensor,16 (c) the normalization conditions gμν U μ U ν = −1
for the 4-velocity. In total there are 1 + 4 + 1 = 6 equations, which agrees with the
generic discussion above.
and thus the numbers of the component equations and field components are both 4.
However, among the 4 equations above only 3 are independent, since Aa (or any
1-form) satisfies the following differential identity (which can be proved following
the proof of Exercise 7.1)
which will “dispose of” one equation, and make (8.10.11) one equation short. This
is caused by the gauge freedom of Aa , and so after adding the Lorenz condition
∇ a Aa = 0 (choosing a gauge), we can apply the generic discussion above. Assigning
a gauge condition here is similar to assigning a coordinate condition for gμν . As a
matter of fact, the latter is also some kind of gauge choice, see the next subsection
for details.
Finally, we should point out that for partial differential equations, the claim “given
suitable boundary conditions, there is a unique solution as long as the number of
equations is equal to the number of the undetermined functions” is not as simple as
that for ordinary differential equations. There are many subtleties in this case. One
may view this subsection as a hand-waving discussion (for illustrating the necessity
of coordinate conditions), and should not regard it as a rigorous analysis.
16 From Sect. 6.5 we can see that the divergence-free condition ∂ a Tab = 0 for the energy-momentum
tensor of a perfect fluid in Minkowski spacetime contains the equations of motion of the fluid, namely
(6.5.7) and (6.5.8), which has in total 1 + 3 = 4 equations. For a curved spacetime, the condition
∇ a Tab = 0 also leads to 4 similar equations.
8.10 Coordinate Conditions, the Gauge Freedom of General Relativity 401
The above discussion can also be formulated using the geometric language, i.e.,
instead of talking about the component equations G μν = 8π Tμν , we can discuss
the tensor equation G ab = 8π Tab . Take the vacuum field equation G ab = 0 as an
example. One can prove the following claim (see later): suppose φ : M → M is a
diffeomorphism, Rab [g] is the Ricci tensor of the metric gab , then
From this we can easily get G ab [g] = 0 ⇔ G ab [φ∗ g] = 0. This indicates that gab is a
solution to G ab = 0 if and only if φ∗ gab is also a solution. Thus, the boundary condi-
tions can only determine a solution gab to Einstein’s equation up to a diffeomorphism.
This is actually the active formulation (see Optional Reading 4.1.1) equivalent to the
passive version above that “boundary conditions can only determine gμν up to a
coordinate transformation”. In the passive formulation, the components gμν and gμν
of the same metric field gab in different coordinate systems represent the same (local)
geometry; in the active formulation, suppose φ : M → M̃ is a diffeomorphism, then
gab and g̃ab ≡ φ∗ gab represent the same geometry. In order to avoid confusion, first
we consider two manifolds M and M̃. If there exists a diffeomorphism φ : M → M̃,
then M and M̃ “cannot be more alike”. Then, we consider two spacetimes (or more
generally, two generalized Riemannian spaces) (M, gab ) and ( M̃, g̃ab ). If there exists
a diffeomorphism φ : M → M̃ and φ∗ gab = g̃ab , then these two spacetimes “cannot
be more alike”, i.e., they have the same spacetime geometry, and every phenomenon
that can be described by (M, gab ) can be described equivalently by ( M̃, g̃ab ). For
instance, suppose there are two vectors u a and v b at a point p in M, then there are
two corresponding vectors φ∗ u a and φ∗ v b at the point φ( p) in M̃. In addition, the
inner product of φ∗ u a and φ∗ v b , g̃ab |φ( p) (φ∗ u)a (φ∗ v)b , equals the inner product of
u a and v b , gab | p u a v b , because
One can also show that the tensor product of φ∗ u a and φ∗ v b corresponds to the tensor
product of u a and v b , i.e., (φ∗ u a )(φ∗ v b ) = φ∗ (u a v b ), etc. In short, we have at φ( p)
whatever we have at p, and we can do at φ( p) whatever we can do at p and get the
same result (matched by φ∗ ). If we consider the metric at p as a stage, and consider
manipulating the quantities at p as putting on a play, one can say colloquially that
φ∗ “carries the stage” of (M, gab ) to ( M̃, g̃ab ) so that we can “perform a play in
a different town” (i.e., manipulate the pushforward of the quantities at a different
point).
This discussion can also be applied to the case where M = M̃. Suppose on M
we have a metric field gab and a diffeomorphism φ : M → M, then based on the
discussion that (M, gab ) and ( M̃, g̃ab ) “cannot be more alike”, we can see that
(M, gab ) and (M, φ∗ gab ) are equivalent geometrically. However, one should notice
402 8 Solving Einstein’s Equation
that now there are two metrics gab | p and φ∗ gab | p at a point p in M. Suppose u a
and v a are vectors at p, by (M, gab ) and (M, φ∗ gab ) are equivalent we do not mean
that gab | p u a v b = (φ∗ g)ab | p u a v b (this only holds when φ is an isometry), instead
we mean that gab | p u a v b = (φ∗ g)ab |φ( p) (φ∗ u)a (φ∗ v)b , i.e., we can “carry the whole
stage and perform the same play at φ( p)”. Here we give an application example.
Let Rabc d and R̃abc d represent the Riemann tensor fields of gab and g̃ab ≡ φ∗ gab ,
respectively. Given Rabc d | p we would like to find R̃abc d |φ( p) . Knowing that we can
“perform the same play in a different town”, all we have to do is to push forward
Rabc d | p to φ( p) using φ∗ . More precisely speaking, when calculating Rabc d | p we
have done the following manipulation: first find the ∇a associated with gab , then find
Rabc d | p from (∇a ∇b − ∇b ∇a )ωc = Rabc d ωd . This manipulation is just like “per-
forming a play”. In order to find R̃abc d |φ( p) , in principle we need to perform a
similar manipulation: first find the ∇˜ a associated with g̃ab , then find R̃abc d | p from
(∇˜ a ∇˜ b − ∇˜ b ∇˜ a )ωc = R̃abc d ωd . Nevertheless, it is in fact not necessary to do it all
over again like this, because it is natural to believe that as long as we push forward
the result of the manipulation on gab (and quantities derivable from it) at p to φ( p)
using φ∗ , it must be equal to the result of the manipulation on g̃ab (and quantities
derivable from it) at φ( p). That is, we can believe that
For all quantities determined by gab (all geometric quantities), such as Rab , R, G ab ,
etc., we have similar relations, and thus (8.10.13) holds. If you want, the reader can
also verify (8.10.14) by computing it directly; hint: first verify that the ∇˜ a associated
with g̃ab satisfies
(where ξ a is an “infinitesimal” vector field). The difference between the metric before
and after the transformation, namely gab = ηab + γab and g̃ab = η̃ab + γ̃ab , is
φ ∗ gab − gab
∂a λb + ∂b λa ∼
= Lλ gab ∼
= t . (8.10.17)
t
Comparing this with (8.10.16) yields φt∗ gab − gab ∼ = g̃ab − gab , and hence g̃ab ∼ =
∗
φt gab − gab . Thus, the original metric gab and the new metric g̃ab after the transfor-
mation only differ by a diffeomorphism under the first-order approximation.
Of course, it is not just the spacetime geometry that we care about, but also
physics. Here is a general conclusion: suppose a physical theory is described by a
manifold M and some tensor fields T (i) living on it (for instance, for an electrovac
spacetime, T (i) includes at least gab and Fab ), then (M, T (i) ) and (M, T̃ (i) ) describe
the same physics if and only if there exists a diffeomorphism φ : M → M such that
T̃ (i) = φ∗ T (i) .
404 8 Solving Einstein’s Equation
Exercises
χ ≡ −gab ξ a ξ b .
(a) Show that χ is a constant along an integral curve of ξ a ;
(b) Show that the 4-acceleration Aa = ∇ a (ln χ ). Hint: use the Killing equation
∇ (a ξ b) = 0 and the result of (a).
˜8.4. Show that: (a) the trace of the energy-momentum tensor of an electromagnetic
field is zero, i.e., T ≡ g ab Tab = 0; (b) the scalar curvature of an electrovac
spacetime is R = 0.
˜8.5. Prove (8.4.7) and (8.4.28).
8.6. Suppose Fab is a 2-form field in an arbitrary spacetime, ∗ Fab is the dual
2-form field of Fab , and α ∈ [0, 2π ] is a constant real number, then Fab ≡
Fab cos α − ∗ Fab sin α is called a duality rotation of Fab with the angle α.
(a) Show that Fab is a source-free electromagnetic field if and only if Fab is a
source-free electromagnetic field. [The proof is straightforward. One can see
this directly from the exterior differential expressions (7.2.4 ) and (7.2.5 ) of
Maxwell’s equations].
(b) Show that the electromagnetic fields Fab and Fab have the same energy-
momentum tensor. Hint: the proof can be simplified by using the symmetric
expression (6.6.28 ).
(c) Let M ≡ 2Fab F ab , N ≡ 2Fab ∗ F ab , M ≡ 2Fab F ab , N ≡ 2Fab ∗ F ab .
Show that
(d) Let ab ≡ Fab + i∗ Fab , and ab ≡ Fab + i∗ Fab , then K ≡ ab ab and
K ≡ ab ab are complex scalar fields, and hence the K and K at each
spacetime point correspond to two vectors in the complex plane. Using the
result of (c) show that the vector K is the result of rotating the vector K
counterclockwise by an angle 2α (i.e., |K | = |K |, and the arguments of K
and K differ by 2α).
B)
(e) Suppose ( E, and ( E , B ) are the electric and magnetic fields of Fab
and Fab measured by an instantaneous observer, respectively. Show that
NB: For further interpretations of the physical meaning of the dual rotation,
see Volume II and Jackson (1998).
References 405
References
Bonnor, W. B. (1994), ‘The photon rocket’, Class. Quant. Grav. 11, 2007–2012.
Carmeli, M. (1982), Classical Fields General Relativity and Gauge Theory, John Wiley & Sons,
New York.
Damour, T. (1995), ‘Photon rockets and gravitational radiation’, Class. Quant. Grav. 12, 725–738.
arXiv:gr-qc/9412063.
Dain, S., Moreschi, O. M. and Gleiser, R. J. (1996), ‘Photon rockets and the Robinson-Trautman
geometries’, Class. Quant. Grav. 13, 1155–1160. arXiv:gr-qc/0203064.
Hawking, S. W. and Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
Jackson, J. D. (1998), Classical Electrodynamics, John Wiley & Sons, Inc., New York.
Kinnersley, W. (1969), ‘Field of an arbitrarily accelerating point mass’, Phys. Rev. 186, 1335–1336.
Kuang, Z. and Liang, C. (1988), ‘Birkhoff and Taub theorems generalized to metrics with conformal
symmetries’, J. Math. Phys. 29, 2475–2478.
Kuang, Z., Li, J. and Liang, C. (1986), ‘Gauge freedom of plane-symmetric line elements with
semi-plane-symmetric null electromagnetic fields’, Phys. Rev. D 34, 2241–2245.
Kuang, Z., Li, J. and Liang, C. (1987), ‘Completion of plane-symmetric metrics yielded by elec-
tromagnetic fields’, Gen. Rela. Grav. 19, 345–350.
Letelier, P. S. and Tabenski, R. R. (1974), ‘The general solution to Einstein-Maxwell equations with
plane symmetry’, J. Math. Phys. 15, 594.
Li, J. and Liang, C. (1985), ‘An extension of the plane-symmetric electrovac general solution to
Einstein equations’, Gen. Rela. Grav. 17, 1001–1013.
Li, J. and Liang, C. (1989), ‘Static semi-plane-symmetric metrics yielded by plane-symmetric
electromagnetic fields’, J. Math. Phys. 30, 2915–2917.
406 8 Solving Einstein’s Equation
In the first three sections of Chap. 8 we had a discussion on static spherically sym-
metric metrics and the vacuum Schwarzschild solution, which focused mainly on
finding the solution. In view of the essentialness of the Schwarzschild solution, this
chapter will further discuss several intimately related problems: Sect. 9.1 discusses
the timelike and null geodesics in Schwarzschild spacetime; Sect. 9.2 introduces
three experimental tests of general relativity posed by Einstein using the vacuum
Schwarzschild solution in his early years, namely the gravitational redshift, the pre-
cession of the perihelion of Mercury and the bending of starlight in the Sun’s gravita-
tional field; Sect. 9.3 discusses the spacetime geometric structure and physical states
in the interior of a spherically symmetric star, as well as the evolution of a spherically
symmetric star; Sect. 9.4 analyzes the theory of the extension of the Schwarzschild
spacetime in detail.
Let γ (τ ) be a timelike (or null) geodesic. For a timelike geodesic, τ represents the
proper time; for a null geodesic, τ represents a chosen affine parameter. In order to
find the parametric equations x μ (τ ) of γ (τ ), generally we need to solve the following
differential equations:
d2 x μ dx ν dx σ
2
+ μ νσ = 0, μ = 0, 1, 2, 3 . (9.1.1)
dτ dτ dτ
Since in these equations the unknown functions x μ (τ ) and their derivatives are cou-
pled with each other, solving for them is in general not simple. However, if the
spacetime has a sufficient amount of Killing vector fields, one can find x μ (τ ) in a
clever way using Theorem 4.3.3. Schwarzschild spacetime is an example of this.
© Science Press 2023 407
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_9
408 9 Schwarzschild Spacetimes
Before applying this theorem, we can also simplify the coordinate representation of
the geodesic γ (τ ) using the spherical symmetry of Schwarzschild spacetime.
Proposition 9.1.1 Suppose γ (τ ) is a timelike or null geodesic in Schwarzschild
spacetime, then one can always choose the Schwarzschild coordinates such that
θ = π/2 along γ (τ ), in other words, such that γ (τ ) lies in the “equatorial plane”.
t = t (τ ) , r = r (τ ) , θ = π/2 , ϕ = ϕ(τ ) .
and
a b 2 2 2 2
∂ ∂ dt dr dθ dϕ
−κ = gab = g00 + g11 + g22 + g33
∂τ ∂τ dτ dτ dτ dτ
2 2
2M dt 2M −1 dr 2 dϕ
=− 1− + 1− + r2 , (9.1.3)
r dτ r dτ dτ
where in the last step we used θ = π/2. Noticing that (∂/∂t)a and (∂/∂ϕ)a are
Killing vector fields, by means of Theorem 4.3.3, we can define two constants on the
geodesic γ (τ ):
∂ a ∂ b dt 2M dt
E := −gab = −g00 = 1− , (9.1.4)
∂t ∂τ dτ r dτ
a b
∂ ∂ dϕ dϕ
L := gab = g33 = r2 , (9.1.5)
∂ϕ ∂τ dτ dτ
This equation, which contains only the unknown function r (τ ) and its 1st-order
derivative, is solvable in principle. Plugging the r (τ ) we just obtained into (9.1.4)
and (9.1.5), in principle we can find the unknown functions t (τ ) and ϕ(τ ), and hence
find the parametric equations of γ (τ ).
410 9 Schwarzschild Spacetimes
We now discuss the physical meaning of these two constants E and L. Suppose
γ (τ ) is a timelike geodesic, then it represents the world line of a free point mass.
Let m be the mass of the point mass, then U a ≡ (∂/∂τ )a and P a ≡ mU a are its
4-velocity and 4-momentum, respectively. Suppose p is a point on γ (τ ), G is the
static observer passing through p, Z a is the 4-velocity of G at p (see Fig. 9.2), and
ξ a = (∂/∂t)a is the static Killing vector field, then it follows from Z a Z a = −1 that
Z a = χ −1 ξ a , (9.1.7)
where χ ≡ (−ξ b ξb )1/2 . From (6.3.17) we can see that −Z a P a is the energy value
obtained from the local measurement made by observer G on the point mass, which
was denoted by E; to avoid confusion, now we will denote it as E local . The E defined
by (9.1.4) can be rewritten as
1 χ χ
E = −ξa U a = − ξa P a = − Z a P a = E local , (9.1.8)
m m m
and thus E = E local . If the geodesic γ (τ ) reaches infinity, then E → E local /m when
r → ∞, and hence E can be interpreted as the energy per unit mass obtained from
the local measurement made on the point mass by a static observer at infinity. Since
E is a constant on γ (τ ), E local is not a constant on it, i.e., it is E that is conserved
in the motion of a free point mass instead of E local . Therefore, E can be interpreted
physically as the total energy (including gravitational potential energy) per unit mass
of a free point mass. In contrast, E local is the energy obtained from the local measure-
ment made by a static observer G, which does not include the gravitational potential
energy, and is not a conserved quantity along a geodesic. This can be interpreted
physically as follows: although a free point mass does not experience any force other
than gravity, the gravitational force does work on it when it is moving. Hence, as an
energy excluding the gravitational potential energy, E local is not a constant. Similarly,
if γ (τ ) is a null geodesic, then E can be interpreted as the total energy of the photon
times −1 .
9.1 Geodesics in Schwarzschild Spacetimes 411
Basically, Einstein’s original motivation for creating general relativity was purely
theoretical. However, any physical theory must face the challenge of experimental
verification after it comes out. We have seen in Sect. 7.9 that the direct detection of
gravitational waves provided a strong confirmation of Einstein’s theory one century
after it came out. Nevertheless, during the formulation of general relativity, Ein-
stein already made three important predictions early on by means of the vacuum
Schwarzschild solution which could be compared with experiments (later on dubbed
the three classical experimental tests). The earliest one (in 1907) is the gravitational
redshift of light waves, and the other two are the precession of the perihelion of Mer-
cury and the bending of starlight in the gravitational field of the Sun. The result of
the perihelion precession calculation already agreed with the existing observational
data, and the prediction of light deflection was also supported by observation very
soon. However, due to the lack of experimental techniques for measuring extremely
weak general relativity effects (including the gravitational redshift) with sufficient
precision, the development in experimental researches of general relativity had been
slow-going, or even almost stopped, for 45 years since the late 1910s. Since the
1960s, with the advancement of technology and the new discoveries of astronom-
ical observations, the experimental verification of general relativity has entered its
heyday; there appeared not only verifications of light deflection and gravitational
redshift with a higher precision, but also a series of brand new experiments. It is
safe to say that general relativity has passed all experimental tests so far, although
many experiments with higher precision and difficulty are yet to be conducted. In
this section, we will only discuss the three classical experimental tests proposed by
Einstein. For the past, present, and future of the experimental tests of the relativistic
theory of gravity, the reader may refer to Ni (2005; 2016).
9.2 Classical Experimental Tests of General Relativity 413
The world lines of the stationary observers coincide with the integral curves of
the Killing vector field ξ a , and hence ξ a = χ Z a , where χ can be obtained from
Z a Z a = −1 to be χ ≡ (−ξ b ξb )1/2 . Thus, (9.2.1) becomes ω = [(−K a ξ a )χ −1 ]| p
and ω = [(−K a ξ a )χ −1 ]| p . Since the world lines of a photon is a geodesic whose
tangent vector is K a , and ξ a is the Killing vector field, from Theorem 4.3.3 we can
see that K a ξ a is a constant on the curve, i.e., (K a ξ a )| p = (K a ξ a )| p . Thus, it follows
from (9.2.1) that1
ω χ λ χ
= or = , (9.2.2)
ω χ λ χ
where λ and λ are the wave lengths corresponding to ω and ω , respectively, and
χ ≡ (−ξ b ξb )1/2 | p . Now we will give the quantitative result for a static observer in
1 There can be more than one null geodesic between two points p and p in a stationary spacetime
[See Sachs and Wu (1977) Exercise 7.3.2]. Equation (9.2.2) indicates that the redshift only depends
on the points p and p and has nothing to do with the null geodesics.
414 9 Schwarzschild Spacetimes
When r > r (i.e., light source is closer to the star than the receiver), we have λ > λ,
and thus the wave length of the light received by the receiver is longer than that
when it was emitted, which is called a redshift. Since there is no relative motion
between two stationary observers G and G , the redshift can be interpreted as purely
an effect of the gravitational field (curved spacetime). Hence, this effect is called
a gravitational redshift, and χ ≡ (−ξ b ξb )1/2 is called the (gravitational) redshift
factor.
The magnitude of a redshift can be described by the relative redshift parameter (or
simply redshift) z ≡ (λ − λ)/λ. Calculation indicates that when the light emitted
from the Sun arrives at the Earth (regard the Sun as a source of the gravitational
field), the relative redshift is only about 2 × 10−6 . In order to enhance the redshift,
one can measure the light coming from a white dwarf. A white dwarf is a celestial
body which has a much higher density than a normal star (see Sect. 9.3.2 for details).
Due to its high density, its surrounding gravitational field is way stronger than that
of the Sun. The redshift of the light coming from a white dwarf can be dozens of
times larger than the redshift of the light from the Sun. After general relativity was
published, people have measured the redshift of the light from white dwarves a few
times, but the results were not sufficiently precise to confirm the prediction of the
theory. The first successful gravitational redshift experiment with high precision was
done by R. V. Pound and G. A. Rebka Jr. using the Mössbauer effect in 1960. In
1960, R. L. Mössbauer discovered that some nuclei (e.g., 57 Fe) can emit γ-rays with
very narrow (very sharp) linewidth under certain conditions, and crystals containing
this kind of nucleus can have resonance absorption of γ-rays at this frequency with
very high selectivity. Assuming that the frequency of this kind of γ-ray is changed
slightly for some reason, the absorption by the crystal will be significantly reduced.
This provides a powerful tool for measuring the extremely weak gravitational red-
shift caused by the Earth’s gravitational field. Place two pieces of such a crystal
at different heights on the surface of the Earth, the lower one (E in Fig. 9.4) as
the emitter, and the higher one (see A in the figure) as the receiver. Although the
redshift calculated based on the height difference between them (12.5 m) is merely
1.36 × 10−15 , the absorption rate of the γ-ray emitted by A to E still decreases due to
the weak gravitational redshift of the γ-rays. To confirm and measure this decrease,
one can let A move towards E at a constant speed, and use the “blueshift” (wave-
9.2 Classical Experimental Tests of General Relativity 415
length decreases) due to the Doppler effect to offset the gravitational redshift. When
the rate is adjusted to an appropriate value (only 3 × 10−7 m/s), the absorption rate
will reach the maximum value. Then the value of the gravitational redshift can be
measured. The precision of this experiment is very high (the relative uncertainty is
about 1%), and the results obtained agree well with the theoretical values. Since then
there were also experimental tests with higher precisions being done [the reader may
refer to Will (2018)].
According to Newtonian mechanics, the orbit of a planet is an ellipse with the sun
as a focus. However, the observational results are slightly divergent from this. Take
Mercury, which is the closest to the Sun, as an example. Although in each period
its orbit is very close to an ellipse, the major axes of two “ellipses” in two adjacent
periods do not coincide, which is indicated by the slight change of its perihelion. As
time goes on, due to the effect of accumulation, the slow rotation of the long axis
of the “ellipse” (and thus the perihelion) around the sun becomes observable. This
phenomenon is called the precession of the perihelion. Before the advent of general
relativity, the precession rate of Mercury’s perihelion had already been measured
as about 5600 per century ( stands for arcseconds). People have studied this in
depth and discovered many possible causes (including the influence from the other
planets). It was found that the precession rate caused by all these factors is 5557 per
century, and there is still 43 per century that cannot be explained. This is the famous
“43-second problem”. Based on general relativity, Einstein took Mercury as a free
point mass in a curved spacetime caused by the Sun. His approximate calculation of
a timelike geodesic in Schwarzschild spacetime naturally leads to the conclusion that
the orbit of Mercury is not a closed curve, and the precession rate of its perihelion is
exactly 43 per century. This result has greatly strengthened people’s confidence in
general relativity. Now we will introduce the derivation of the perihelion precession
in general relativity.
Suppose there are only the Sun and Mercury in the solar system and the gravita-
tional field of Mercury can be neglected, i.e., we only discuss the motion of Mercury
416 9 Schwarzschild Spacetimes
under the action of the Sun’s gravitational field (external gravitational field). First we
discuss this using Newton’s theory of gravity. Let the masses of the Sun and Mercury
be M and m, respectively, then the gravitational potential energy of Mercury is
1 2
m u r + u 2ϕ + U (r ) = A , (9.2.5)
2
where the constant A is the total mechanical energy of Mercury. Suppose the angular
momentum of Mercury per unit mass is |L|, then
dϕ
L = r uϕ = r 2 . (9.2.6)
dt
From (9.2.4), (9.2.5) and (9.2.6) we can find by calculation that
2
dr 2Mr 3 2 Ar 4
+ r2 = + . (9.2.7)
dϕ L2 m L2
M
μ(ϕ) = [1 + e cos(ϕ − ϕ0 )] , (9.2.10)
L2
where e and ϕ0 are constants of integration. Without loss of generality, take ϕ0 = 0,
then
M
μ(ϕ) = 2 (1 + e cos ϕ) . (9.2.11)
L
9.2 Classical Experimental Tests of General Relativity 417
This is the equation of a conic section, with e as the eccentricity. Plugging (9.2.11)
and its derivative back to (9.2.8) yields
2 AL 2
e2 = 1 + . (9.2.12)
m M2
When 0 e < 1 this is an ellipse, and dμ/dϕ = 0 (round orbit) has been included
in this as the special case of e = 0.
However, general relativity provides a slightly different result. Let κ = 1 (timelike
geodesic). Dividing (9.1.6) by (dϕ/dτ )2 , and using (9.1.5) we can find by calculation
that 2
dr E 2r 4 r2 2M
− +r 1+ 2
2
1− = 0. (9.2.13)
dϕ L2 L r
d2 μ M
2
+ μ = 2 + 3Mμ2 . (9.2.15)
dϕ L
Comparing this with (9.2.9) we find an additional term 3Mμ2 (general relativity
correction term). Since the r of Mercury is way larger than the M of the Sun,2 i.e.,
M/r 1, the correction term 3Mμ2 = (3M/r )μ μ, and thus one can manage
to find an approximate solution. The solution (9.2.11) in Newton’s theory of gravity
can be viewed as the zeroth order approximation, denoted by μ0 (ϕ) for clarity, i.e.,
M
μ0 (ϕ) = (1 + e cos ϕ) . (9.2.16)
L2
Plugging this zeroth-order approximate solution into the second term on the right-
hand side of (9.2.15), we obtain an equation that the first-order approximate solution
μ1 (φ) should satisfy, i.e.,
d2 μ1 M M 3M 3
2
+ μ1 = 2 + 3Mμ20 = 2 + 4 (1 + 2e cos ϕ + e2 cos2 ϕ) . (9.2.17)
dϕ L L L
2 When doing the quantitative calculation, it is better to go back to the International System of Units
(SI), i.e., to fill in the physical constants G and c. From Appendix A one can see that M/r is actually
(G M/c2 )/r . The mass of the Sun M corresponds to G M/c2 ∼ = 1.5 km, while the distance between
the perihelion of Mercury and the Sun is about 5 × 107 km, and hence (G M/c2 )/r 1.
418 9 Schwarzschild Spacetimes
3M 3 1 1
μ1 (ϕ) = μ0 (ϕ) + 4 1 + eϕ sin ϕ + e2 − cos 2ϕ . (9.2.18)
L 2 6
What we care about is the perihelion. For μ0 (ϕ), the values of ϕ of the perihelion
are 0, 2π, · · · . Although there are many differences between the expressions of μ1 (ϕ)
and μ0 (ϕ), the values of ϕ of the perihelion will not change if the term eϕ sin ϕ is
missing. Only this term can deviate Mercury from a closed orbit, which leads to
the precession of the perihelion, and the precession angle increases as the value of
ϕ increases (the effect will accumulate). Therefore, when we only care about the
perihelion precession, we can neglect the other terms inside the square brackets in
(9.2.18) except eϕ sin ϕ and write it as [where μ0 (ϕ) has been substituted by (9.2.16)]
M 3M 2
μ1 (ϕ) = 2
1 + e(cos ϕ + 2 ϕ sin ϕ) . (9.2.19)
L L
3M 2
ε≡ , (9.2.20)
L2
then cos ϕ ∼
= 1, sin ϕ ∼
= ϕ, and thus it follows from (9.2.19) that
1 ∼ M
= μ1 (ϕ) ∼
= 2 [1 + e cos (ϕ − εϕ)] . (9.2.21)
r (ϕ) L
This indicates that the orbit of Mercury is approximately an ellipse. Although the
right-hand side of (9.2.21) is still a periodic function, the period is not 2π as
in (9.2.16). The perihelion is the point with the smallest r , i.e., the point where
cos (ϕ − εϕ) = 1. ϕ = 0 is certainly a perihelion; however, when ϕ = 2π ,
Suppose ϕ̂ is the value of ϕ satisfying cos(ϕ̂ − εϕ̂) = 1 that is the closest to 2π , then
it is not difficult to show that (neglecting the higher order term 2π ε2 )
ϕ̂ ∼
= 2π + 2π ε . (9.2.22)
Thus, the precession angle of the perihelion of Mercury in each period is (see Fig. 9.5)
6π M 2
ϕP ∼
= 2π ε = . (9.2.23)
L2
The discussion above is valid for any planet. Plugging in the specific data one can
obtain that the precession rate of the perihelion of Mercury is 43 per century.
9.2 Classical Experimental Tests of General Relativity 419
When a light ray from a distant star that hits the ground after passing by the Sun, it will
be bent due to the effect of the Sun’s gravitational field. This is an important prediction
of general relativity. In this section we introduce the derivation of this prediction. In
the 4-dimensional language, the world line of a photon is a null geodesic. Let κ = 0
in (9.1.6), then using a method similar to the derivation of (9.2.13), it is not difficult
to derive that 2
dr E 2r 4 2M
− +r 1−
2
= 0. (9.2.24)
dϕ L2 r
d2 μ
+ μ = 3Mμ2 . (9.2.26)
dϕ 2
1
μ(ϕ) = sin(ϕ + α) , (9.2.27)
l
where l and α are constants of integration. Suppose the photon is at infinity when
ϕ = 0, i.e., μ(0) = 1/r (0) = 0, then α = 0, and hence
1
μ(ϕ) = sin ϕ . (9.2.28)
l
This is a straight line equation in 2-dimensional Euclidean space expressed in a polar
coordinate system {r, φ}. To see this, we take r = 0 as the origin of the Cartesian
coordinate system {x, y}, then
420 9 Schwarzschild Spacetimes
x = r cos ϕ , (9.2.29)
1
y = r sin ϕ = sin ϕ = l = constant , (9.2.30)
μ
where in the third equality we used (9.2.28). Thus, the spatial trajectory of a photon
is a straight line whose distance from the origin is l (see Fig. 9.6). Note that both r
and ϕ are changing along this straight line (y is a constant). Since the range of r is
(0, ∞), (9.2.29) indicates that the range of x is (−∞, ∞). To discuss the deflection
of starlight, obviously we cannot take M = 0. However, since M/r 1, finding the
first-order approximate solution is sufficient for us. Taking the μ(ϕ) in (9.2.28) as
the zeroth-order approximate solution μ0 (ϕ) and plugging it into the right-hand side
of (9.2.26), we obtain the differential equation satisfied by μ1 (ϕ):
d2 μ1 3M
2
+ μ1 (ϕ) = 2 sin2 ϕ . (9.2.31)
dϕ l
1 M
μ1 (ϕ) = sin ϕ + 2 (1 − cos ϕ)2 . (9.2.32)
l l
From the above equation we know that μ1 (0) = 0, i.e., r (0) = ∞, which indicates
that when the ϕ-coordinate of a photon is zero, it is infinitely far from the Sun (the
r of a distant star can be regarded as ∞). However, (9.2.32) and (9.2.28) being
different indicates that the photon is “heading” towards different directions when
it is coming close to and leaving the Sun: it follows from (9.2.29) that μ1 (π ) = 0,
which indicates that the ϕ-coordinate is π when the photon is going away from the
“Sun” (this “Sun” has M = 0); however, it follows from (9.2.32) that μ1 (π ) = 0,
and so we expect it to leave the Sun in a direction π + β which is slightly different
from π , i.e., μ1 (π + β) = 0. To find the deflection angle β (see 9.7), by plugging
ϕ = π + β into (9.2.32) and using μ1 (π + β) = 0 we obtain
1 M
0 = μ1 (π + β) = sin(π + β) + 2 [1 − cos(π + β)]2 .
l l
4M
β∼
= . (9.2.33)
l
The above equation indicates that the deflection angle β increases as l decreases.
The minimum value of l is equal to the radius of the Sun. Plugging this into (9.2.33)
as the value of l [after adding the physical constants G and c, (9.2.33) becomes
β∼= 4G M/lc2 ], we find β = 1.75 . This is the quantitative prediction of general
9.2 Classical Experimental Tests of General Relativity 421
relativity for the deflection angle of starlight. In order to verify this prediction by
observation, we can try to photograph the apparent position of the star when the
starlight is deflected by the Sun, and compare it with the actual position of the star
photographed six months later (or ago) when the Earth turned to the other side of
the Sun. However, it is not easy to observe the apparent position of a star, since the
Sun is much closer to the Earth than the star we want to observe, and the starlight
cannot be seen at all among the Sun’s sunlight. (You cannot “watch the stars in the
daytime”!) Then people came up with the idea of using a total solar eclipse. During
a total solar eclipse, the sunlight is blocked by the Moon between the Sun and the
Earth, but the light from a distant star can “bypass” the Sun and reach the Earth. Soon
after World War I, two expedition teams set off from the United Kingdom to Brazil
and Africa to observe the total solar eclipse on March 29th, 1919. The observation
results of the two teams were, respectively, 1.13 ± 0.07 times and 0.92 ± 0.17 times
of the theoretical prediction, which is considered as an important support of the
theory. The announcement in many European and American newspapers attracted
the attention of the war-weary public, and also made Einstein prestigious. However,
Einstein responded quite calmly to this. He believed in his own theory so much
(based on its elegance and internal self-consistency) that he once replied “Then I
would feel sorry for the dear Lord.” to the question what if the observation outcomes
had not agreed with the theory. Note that Newton’s theory of gravity can also predict
the bending of starlight by the Sun, but the deflection angle is half of the predicted
value of general relativity. The results of the 1919 UK teams indeed favored Einstein
over Newton, but they were not of high precision. Although there were a few more
observations for total solar eclipses in the next several decades, which continued to
give mild support to general relativity, the improvement in accuracy was very little
due to various reasons, especially the weather. In modern times, with technological
advances, people can test the bending of distant quasar light by Jupiter and also the
bending of radio waves, and the measurements are now quite a bit more precise.3
Therefore, we do have now pretty high precision results for light deflection which
provide a strong support for general relativity [for details see Will (2018)].
3 For example, an analysis based on the very-long-baseline interferometry (VLBI) database gives
an result which is 0.99983 ± 0.00045 times the predicted value of general relativity, where the
standard error is reduced to 4.5 × 10−4 , see Shapiro et al. (2004).
422 9 Schwarzschild Spacetimes
In this subsection we discuss the interior spacetime metric and internal states of a
static spherically symmetric star. The matter field inside a star can be regarded rather
precisely as a perfect fluid, whose energy-momentum tensor is
The star being static means that every comoving observer inside the star can be
regarded as a static observer, whose 4-velocity U a is parallel to the static Killing
vector field ξ a = (∂/∂t)a . Again, choose the Schwarzschild coordinate system, the
line element can then still be expressed by (8.3.2). From U a Ua = −1 and ξ a ξa =
g00 = −e2 A , we get
and hence it follows from (9.3.1) that the nonvanishing components of Tab are as
follows:
T00 = Tab (∂/∂t)a (∂/∂t)b = ρe2 A , T11 = Tab (∂/∂r )a (∂/∂r )b = pe2B ,
T22 = Tab (∂/∂θ )a (∂/∂θ )b = pr 2 , T33 = Tab (∂/∂ϕ)a (∂/∂ϕ)b = pr 2 sin2 θ .
R = 2e−2B [−A + A B − A 2 + 2r −1 (B − A ) − r −2 ] + 2r −2 ,
1 0
R00 − Rδ = −e−2B (2B r −1 − r −2 ) − r −2 ,
2 0
1
R11 − Rδ11 = e−2B (2 A r −1 + r −2 ) − r −2 ,
2
1
R22 − Rδ22 = e−2B [A − A B + A 2 + (A − B )r −1 ] ,
2
1
R33 − Rδ33 = e−2B [A − A B + A 2 + (A − B )r −1 ] .
2
Plugging these together with (9.3.3) into R μ ν − Rδ μ ν /2 = 8π T μ ν , we see that there
are only 3 independent equations as follows:
d
8πρr 2 = 2r e−2B B − e−2B + 1 = 1 − (r e−2B ) ,
dr
and hence by integration we get
If C = 0, then from (9.3.7) and (9.3.8) we can see that e−2B → ∞ when r → 0;
however, e−2B = g 11 , while it is unreasonable to have g 11 = ∞ in the center (r = 0)
of the star, and hence C = 0. Thus, it follows from (9.3.7) that
−1
2m(r )
g11 (r ) = e2B(r ) = 1 − . (9.3.9)
r
Suppose the radius of the star is R, then when r > R the metric should be the vacuum
Schwarzschild solution [see (8.3.18)]. The interior metric and exterior metric should
be continuous at the surface (r = R) of the star. Plugging r = R into (9.3.9) yields
−1
2m(R)
g11 (R) = 1 − .
R
424 9 Schwarzschild Spacetimes
On the other hand, it follows from the vacuum Schwarzschild solution that
2M −1
g11 (R) = 1 − .
R
√ −1/2
2m(r )
ε= hdr ∧ dθ ∧ dϕ = 1 − r 2 sin θdr ∧ dθ ∧ dϕ ,
r
Therefore, when calculating the integral one cannot use r 2 sin θdr ∧ dθ ∧ dϕ as volume
element as in the 3-dimensional Euclidean space. However, the M in (9.3.10) is the result of
integrating ρ(r ) with r 2 sin θdr ∧ dθ ∧ dϕ as the volume element. From the mathematical
perspective, the integral (9.3.10) in the 3-dimensional non-Euclidean space ( t , h ab ) with
r 2 sin θdr ∧ dθ ∧ dϕ as the volume element is somewhat strange, but this does not mean
that the M in (9.3.10) is a weird quantity. In fact, as the only parameter of the Schwarzschild
solution, the physical meaning of M is crystal clear: it is the total mass (total energy) of
Schwarzschild spacetime, which includes the gravitational potential energy (see Chap. 12
for details). However, ρ(r ) is the energy density obtained from the local measurement made
by a static observer inside the star, which contains the static energy density of each particle
(mainly the nucleus) and internal energy (heat, pressure, etc.) density, except the gravitational
potential energy. This is similar to the discussion about the difference between E and E local
in Sect. 9.1: the result of a local measurement made by an observer does not contain the
energy contribution from the gravitational field. Therefore, the M including gravitational
potential energy is surely not equal to the integral ρ(r )ε, since the latter does not include
the contributions from the gravitational field. Note particularly that
−1/2
2m(r )
ρ(r )ε = ρ(r ) 1 − r 2 sin θdr ∧ dθ ∧ dϕ
r
−1/2
R 2m(r )
= 4π ρ(r ) 1 − r 2 dr
0 r
!!! R
= 4π ρ(r )r 2 dr = M .
0
The fact that ρ(r ) does not contain the contribution from the gravitational field is closely
related to another fact, namely the gravitational field energy is non-local. To put it in a simple
way, the so called non-locality of the gravitational field energy means that the energy density
of the gravitational field is meaningless: there does not exist such a quantity, which can be
reasonably interpreted as the energy density of the gravitational field (NB: compare with the
fact that the energy density of an electromagnetic field has a clear meaning and an explicit
9.3 Spherical Stars and Their Evolution 425
expression), see Chap. 12 for details. However, the non-locality of the gravitational field
energy does not indicate that the gravitational field itself has no energy. An important result
people found after a long and tortuous path of study is: for an asymptotically flat spacetime
(physically corresponding to an isolated gravitational system), one can always define the
notion of total energy, which contains all the energy contributions including that of the
gravitational field. Applying this definition to Schwarzschild spacetime with a parameter
M, one finds that M is exactly the total energy of this spacetime (as an asymptotically flat
spacetime).
[The End of Optional Reading 9.3.1]
dA m(r ) + 4π pr 3
= . (9.3.11)
dr r [r − 2m(r )]
dA ∼ m(r )
= 2 . (9.3.12)
dr r
Since the Newtonian gravitational potential φ with spherical symmetry satisfies
dφ m(r )
= 2 , (9.3.13)
dr r
we can see that A is, in a sense, the quantity corresponding to the Newtonian gravita-
tional potential in a static spherically symmetric curved spacetime. Equation (9.3.13)
is actually a manifestation of the Poisson equation ∇ 2 φ = 4πρ in Newton’s theory
of gravity in the spherically symmetric case. When we have spherical symmetry,
∇ 2 φ = 4πρ becomes
1 d 2 dφ
r = 4πρ .
r 2 dr dr
to
(∂/∂r )b ∇ a Tab = 0 . (9.3.6 )
Also,
where (5.7.2) (the equivalent definition of Christoffel symbols) is used in the third
equality, and (8.3.4) is used in the fifth equality. Plugging the above equation and
(9.3.15) into (9.3.14) yields
dp dA
= −( p + ρ) . (9.3.16)
dr dr
Then, using (9.3.11) we obtian
dp m(r ) + 4π pr 3
= −( p + ρ) . (9.3.17)
dr r [r − 2m(r )]
d p ∼ ρm(r )
=− 2 , (9.3.18)
dr r
where the function m(r ) is defined by (9.3.8), and the function A(r ) needs to satisfy
(9.3.11). A necessary and sufficient condition of hydrostatic equilibrium is (9.3.17).
The internal state inside a spherical star is determined by 4 functions A(r ), m(r ),
p(r ) and ρ(r ), and there are only three equations they have to satisfy, namely (9.3.8),
(9.3.11) and (9.3.17). In order to determine the internal state of a star, one also has
to assign a fourth equation, called the equation of state. To put it in a simple way, an
equation of state is a relation of the energy density ρ and pressure p represented by
f (ρ, p) = 0 (where f is a certain specific function)4 . After an equation of state is
determined, there are only 3 undetermined functions A(r ), m(r ) and p(r ) remaining,
which have to satisfy the differential equations (9.3.11), (9.3.17), and
dm(r )
= 4πρ(r )r 2 (9.3.20)
dr
coming from (9.3.8). These 3 equations are all first-order differential equations,
which can be solved exactly once the initial conditions A(0), m(0) and p(0) are
given. It follows from (9.3.8) that m(0) ≡ 0, and thus m(0) does not need to be (and
cannot be arbitrarily) assigned. After A(0) (with adjustment later) and p0 ≡ p(0)
are assigned, we can integrate the above-mentioned 3 differential equations from
r = 0 to p = 0. [As long as the equation of state satisfies the following reasonable
requirements: for all p 0, we have ρ 0, then the OV equation (9.3.17) assures
automatically that the pressure decreases monotonically outwards.] The place where
p = 0 is the surface of the star, whose corresponding value of r is the radius R of the
4 Generally speaking, the pressure p is not only a function of the density ρ, but also depends on
the specific entropy (i.e., the average entropy per nucleus) and the chemical components of the star.
Only when the specific entropy and chemical components are the same everywhere inside the star
can p be solely a function of ρ, and the equation of state be expressed as f ( p, ρ) = 0. The specific
entropy of a normal star (including the Sun) is not everywhere the same. However, the specific
entropy inside a white dwarf or neutron star, which will be discussed later, can be considered as
vanishing everywhere. The discussion in the main text is valid for the study of these “abnormal
celestial bodies”.
428 9 Schwarzschild Spacetimes
star, and m(R) is the total mass (energy) M of the star (including the gravitational
potential energy!). After having R we need to come back and modify the value of
A(0) (by adding a constant) in order to have it satisfy the condition at the surface of
the star connecting the vacuum solution outside the sphere, i.e.,
2M
e2 A(R) = 1 − . (9.3.21)
R
Therefore, by assigning a value of p0 one can determine a set of functions A(r ),
m(r ) and p(r ), and the internal state and metric is then completely determined. For
an equation of state in the real world, the exact solutions of equations like (9.3.17)
are hard to be find, and thus a numerical method is used. However, for an idealized
equation of state, we can perform the integral analytically. The simplest and most
useful idealization is the following equation of state: ρ = constant. This is actually a
very special equation of state whose energy density ρ is independent of the pressure.
Although this is not a perfect model of a star, it can still be regarded as a first-order
approximation of a small star whose pressure is not that high. Then (9.3.8) becomes
4πρr 3
m(r ) = . (9.3.22)
3
The equation above holds for both general relativity and Newton’s theory of gravity.
For Newton’s theory of gravity, (9.3.18) can be simplified as ddrp = − 4π 3
ρ 2 r when
ρ is a constant. After the initial value p0 is assigned, the unique solution is p(r ) =
− 23 πρ 2 r 2 + p0 , and the radius R of the star can be determined by p(R) = 0:
2
0 = p(R) = − πρ 2 R 2 + p0 .
3
Thus, p0 can then be expressed in terms of R:
2 2 2
p0 = πρ R , (9.3.23)
3
and hence p(r ) can also be expressed in terms of R as
2 2 2
p(r ) = πρ (R − r 2 ) . (9.3.24)
3
When Newton’s theory of gravity is not a good approximation, one needs to solve
the OV equation (9.3.17). The solution found by Schwarzschild in 1916 is (and thus
the metric inside a star with uniform density is called the interior Schwarzschild
solution)
(1 − 2M/R)1/2 − (1 − 2Mr 2 /R 3 )1/2
p(r ) = ρ , (9.3.25)
(1 − 2Mr 2 /R 3 )1/2 − 3(1 − 2M/R)1/2
1 − (1 − 2M/R)1/2
p0 = p(0) = ρ . (9.3.26)
3(1 − 2M/R)1/2 − 1
ρ(1 − Y )
p0 = , (9.3.27)
3Y − 1
and it follows from d p0 /dY < 0 that p0 increases as M/R increases. This is easy to
understand since if M is larger, the self-gravity will be stronger, and so the pressure
gradient for balancing the self-gravity will be greater, and the central pressure p0
will be greater when R is fixed. In contrast, if M is fixed and R is smaller, then for
the purpose of creating the pressure gradient we need, p0 has to be higher as well.
When M/R is large enough such that Y = 1/3, we have p0 → ∞, which indicates
that equilibrium cannot be maintained no matter how large the central pressure is.
Thus, the M/R of a static star with a uniform density has an upper limit, and from
Y = 1/3 we can see that this upper limit is
Of course, the M/R of a normal star is way smaller than this upper bound. To make
a numerical evaluation, we should add the constant G/c2 to M, i.e., substitute M by
G M/c2 . Take the Sun as an example, G M /c2 ∼ = 1.5 km, R ∼ = 7 × 105 km, and
hence
G M /c2 ∼ 4
= 2 × 10−6 .
R 9
It follows from (9.3.22) that M = 4πρ R 3 /3, and eliminating R by using (9.3.28)
yields
4 1
Mmax = √ . (9.3.29)
9 3πρ
This is the maximum allowable mass of a star with uniform density ρ (note that there
is no maximum allowable mass in Newton’s theory of gravity). The existence of the
upper mass limit in general relativity is not a result specifically for a star with uniform
density. It can be proved that as long as one assumes ρ(r ) 0 and dρ/dr 0, the
mass of any spherically symmetric static star with any radius R cannot exceed 4R/9.
We mention in passing that, as we have emphasized in Sect. 8.10, when solving
Einstein’s equations, one should solve for the functions reflecting the matter field and
the components of the metric simultaneously. In Example 1 of Sect. 8.10 we have
pointed out that when the matter field is a perfect fluid, there are 16 undetermined
functions gμν (x), ρ(x), p(x), U μ (x) and 16 equations to be solved. The discussion
in this section provides a specific example of that.
430 9 Schwarzschild Spacetimes
the pressure p in the gas cloud rises as T increases. Thus, the outward force on any thin
spherical layer caused by the pressure gradient d p/dr [see (9.3.18)] also increases
with T , and it seems that the contraction may stop when the temperature is high
enough. However, this is not possible without an energy source: since the temperature
of the gas cloud is higher than its surroundings, it keeps radiating energy outwards.
If the contraction stops, the temperature (and thus the pressure) will decrease, and
the pressure difference between two sides of the thin shell cannot counterbalance the
self-gravity. From the perspective of energy it also has to keep contracting, so that (a
9.3 Spherical Stars and Their Evolution 431
part of) the gravitational potential energy keeps being converted into radiant energy.
After the gas cloud contracts slowly for a period of time, the temperature and the
density at the center is finally high enough to ignite a thermonuclear reaction. Near
the center (a central sphere called the stellar core), the hydrogen is transformed into
helium by thermonuclear fusion (which is the same reaction as in a hydrogen bomb
explosion), and at the same time releases a huge amount of energy. This supplements
the energy lost due to radiation (no need to rely on the gravitational potential energy
conversion), and so the gas cloud will reach equilibrium and no longer contract. At
this time, the gas cloud starts to become a star. The pressure gradient d p/dr at any
point in the gas cloud satisfies the stable equilibrium condition (9.3.18). The Sun is
an example of an ordinary star. It has spent about 4.5 billion years in this stable state
maintained by burning hydrogen into helium inside the stellar core, and can maintain
this state for about another 5 billion years. One day, all the hydrogen in the stellar
core will become helium, with only a thin layer of hydrogen around it still burning.
The situation inside the star is roughly sketched in Fig. 9.9.
When the temperature of the stellar core has not reached the level of igniting
helium nuclear fusion, the situation will be similar to the previous situation when it
has not reached the point required to ignite hydrogen: the helium ball contracts again
under the action of self-gravity and becomes hotter at the same time. This intensifies
the burning of hydrogen in the surrounding thin layer, which leads to the expansion
and cooling down of the outer part of the star, and turns it into a red giant. “Red” is
due to the decrease of the surface temperature, while “giant” comes from its inflated
size. The high temperature and density caused by the contraction of the helium sphere
may reach the level of the nuclear fusion reaction that ignites helium (burning helium
into carbon or oxygen), and the energy released will bring the stellar core to a stable
equilibrium again. The duration of this balance maintained by helium combustion is
much shorter than that of hydrogen combustion. When helium is burned into carbon
(or oxygen), the stellar core will contract again. The fate of a star in its later years
varies with its mass. For a star with a smaller mass (including the Sun), the contraction
of the stellar core cannot provide enough temperature for carbon to undergo nuclear
fusion, and thus it is no longer possible to maintain the equilibrium by nuclear energy.
Is there any power strong enough to counterbalance the self-gravity? There does not
exist such a power in classical physics. To prevent the contraction due to self-gravity,
there must be a sufficiently large pressure gradient [which is represented by (9.3.18)
in Newton’s theory of gravity, and (9.3.17) in general relativity].
432 9 Schwarzschild Spacetimes
A star is composed of hydrogen, helium and other elements. The high temperature
in the star puts these atoms in ionized states. According to classical physics, this
combination of ions and electrons can be regarded as an ideal gas. From (9.3.31) we
can see that a high temperature is required in order to obtain a high pressure for a
given density. Since the star keeps radiating energy, except for nuclear reactions, there
does not exist such a mechanism that can provide energy for maintaining the high
temperature. However, according to quantum physics, even a system at absolute zero
temperature may have a considerable pressure. Take an electron gas for example. In
classical physics, the average kinetic energy of the electrons is 3kB T /2; the average
kinetic energy vanishes when T = 0, and so all electrons are in a state with zero
energy. However, according to quantum physics, electrons are subject to the Pauli
exclusion principle, i.e., any energy state can be occupied by at most two electrons
(which have opposite spins and, hence, must be in different states). Therefore, when
T = 0, the electrons on the one hand must “squeeze” into a state with the lowest
possible energy; on the other hand, since each energy state can only be occupied
by two electrons, electrons must fill up all the states with the energy values from
zero all the way to a certain value E F (only states with energy greater than E F are
all empty). E F is called the Fermi energy, whose value increases as the density
increases. This indicates that even at absolute zero, the electrons in the electron gas
are not completely motionless as classical physics claims, they carry kinetic energy
that is not due to thermal motion (but due to the exclusion principle). This kind of
kinetic energy contributes to both pressure and energy density. An electron gas with
T = 0 is called a (completely) degenerate electron gas, and the pressure caused
by the above reasons is called the electron degeneracy pressure. At an ordinary
density, the Fermi energy E F is very small (for instance, the E F of the electron
gas in a common metal is only a few electronvolts), and the corresponding electron
degeneracy pressure is negligible. However, the degeneracy pressure will have a
considerable effect in the high density case. The high density caused by the second
contraction of the stellar core when the hydrogen and helium are burnt up gives the
electrons a rather high Fermi energy E F . Although the temperature T in the stellar
core is very high by the usual standard, due to the large E F , we have kB T E F , and
thus the contribution of electrons to the pressure p due to the thermal motion is much
smaller than that due to the kinetic energy of the electrons coming from the exclusion
principle and high E F . In this sense, it is not much different from the T = 0 case.
So at this time, the electrons in the star can be regarded as a degenerate electron gas,
whose degeneracy pressure may cancel the self-gravity, which will keep the star in
equilibrium and never contract. This kind of stable star supported by the electronic
degeneracy pressure is called a white dwarf. “Dwarf” means that it is much smaller
than an ordinary star, and “white” is named due to the high temperature at its surface.
Once an isolated star evolves into a white dwarf, there will be no important further
evolution anymore. Since the temperature is higher than its surroundings, it will
continuously radiate energy. Since there is no energy source, the radiation will cause
the star’s temperature to decrease until it is equal to that of the surroundings, and
so the star will no longer be visible (some literature refers to it as a “black dwarf”).
The existence of white dwarfs has been confirmed by astronomical observations
9.3 Spherical Stars and Their Evolution 433
dim and distant. Sirius B is the first white dwarf discovered by humans. Intuitively,
the more massive a star is, the stronger the self-gravity it has; only a star with a
sufficiently small mass can be supported by electronic degeneracy pressure and form
a white dwarf. S. Chandrasekhar first found the upper mass limit of a white dwarf,
MCh ∼ = 1.3M [see Chandrasekhar (1939)]. This work along with his extraordinary
contribution to astrophysics earned him the Nobel Prize in Physics in 1983. Optional
Reading 9.3.3 will briefly introduce the derivation of the Chandrasekhar limit.
During its evolution, a star will eject matter which makes its mass decrease. We
say that a white dwarf satisfies M < MCh , where M is the remaining mass. According
to estimation, any star with its initial mass less than 6 ∼ 8M will go through a red
giant phase, eject a large amount of matter and become a white dwarf with its mass
around 0.5 ∼ 0.6M .
If M > MCh , then the electron degeneracy pressure is not enough to maintain
the equilibrium of the star, and the nuclear fusion reaction inside the stellar core
will continue order by order until it is burned into iron and nickel. These are the
most tightly bound nuclei (with the maximum average binding energy), so they do
not release energy by nuclear fusion. Hence, the stellar core contracts sharply under
the action of the self-gravity, and the density and temperature increase sharply. At
this time the self-gravity is very strong, the Newtonian approximation (9.3.18) is
no longer applicable, and (9.3.17) in general relativity must be used. For a given
ρ(r ) > 0, the right-hand side (absolute value) of (9.3.17) is always greater than that
of (9.3.18); thus, to achieve an equilibrium in general relativity a greater central
pressure is needed, and so the equilibrium is more difficult to achieve. At such a high
temperature and high density, high-energy photons can break the iron-nickel nuclei
into neutrons, protons, or light nuclei (photofission), and the electrons will also react
with protons (electron capture) and form neutrons and neutrinos (the latter will run
out of the star). Therefore, neutrons account for the vast majority in the stellar core.
Neutrons are also fermions, which also obey the Pauli exclusion principle. When
the nuclear density (∼1017 kg·m−3 ) is reached, the Fermi energy E F (divided by the
Boltzmann constant kB ) of the neutrons is much higher than the temperature T in
the star,5 and so it can be regarded as a degenerate neutron gas (i.e., T ∼ = 0), whose
degeneracy pressure may also counterbalance the self-gravity, making the star reach
a stable equilibrium. This kind of stable star supported by the neutron degeneracy
pressure is called a neutron star. Since the density inside a neutron star reaches or
even exceeds nuclear density, people’s understanding of the equation of state under
this kind of condition is far less accurate than that at lower densities, which makes it
rather difficult to calculate the maximum mass of a neutron star. Different literature
gives different values of this, and one can only roughly say that the upper mass
limit of a neutron star is 2M (or 2 ∼ 3M ). Since it reaches nuclear density, one
may consider a neutron star as a “super-large atomic nucleus”. A neutron star is
much smaller than a white dwarf. The typical radius of a neutron star is only on the
order of 10 km, whereas a white dwarf has a radius between about 3,000 and 20,000
5 A more precise statement is: since it releases a large amount of high-energy neutrinos, a few
seconds after the formation of the neutron star it has E F kB T .
434 9 Schwarzschild Spacetimes
kilometers. A neutron star is a very special (and complex) celestial body, which has
various “extreme” (abnormal) behaviors: a density up to nuclear density, unusually
strong magnetic field (up to 1012 Gauss), very high-speed rotation (with frequency
from 1 Hz to nearly 1000 Hz), high speed of sound which is close to the speed of light,
superfluid in the interior.... Until today, it is still difficult to understand it thoroughly.
The first theoretical model of a neutron star was published by J. R. Oppenheimer
and G. M. Volkoff in 1939. Since their article did not provide any observable physical
effect, the study of neutron stars had been slighted for 28 years. The existence of
neutron stars has been confirmed since the discovery of a pulsar in 1967. A pulsar is
a signal source of periodic electromagnetic pulse signals measured on Earth, with a
period about 1 s or less. The only persuasive explanation is: this is a rotating neutron
star whose strong magnetic field on the surface leads to magnetic dipole radiation,
and the combination of orientation of the radiation and the rotation of the neutron
star lets the Earth receive an electromagnetic pulse signal (the electromagnetic pulse
of the pulsar discovered in 1967 is a radio pulse). Only neutron stars (with small
radius and strong surface gravity) can rotate at such a high angular velocity without
“falling apart”.
The stellar core contracts very sharply before it forms a neutron star, and thus
this process is called gravitational collapse. Once the rapidly collapsing stellar core
reaches a sufficient density and is stopped by the neutron degeneracy pressure, its
high energy will appear as an outward shock wave, and bust out the outer material,
forming a supernova explosion with a great energy. Pulsars have been found in
two famous supernova remnants, the Crab Nebula and the Vela supernova remnant,
which provides an important support for the above-mentioned theory. Ancient Chi-
nese documents have extremely rich records of supernova explosions. For example,
Volume Nine of Zhi (Records) in Song Shi (History of Song) published in 1346
recorded the supernova (SN1054) observed in 1054 AD (during the Northern Song
Dynasty), which was particularly valued by modern international peers [the photo
of one page of it can be found at the title page of Misner et al. (1973)]. The Crab
Nebula is exactly the remnant of SN1054. The most recently observed supernova
explosion visible to the naked eye on Earth was in 1987 (SN1987a). This supernova
is located in a neighboring galaxy of the Milky Way—the Large Magellanic Cloud,
which is about 160,000 light-years away from the Earth. The detailed mechanism of
the supernova explosion is still a subject being studied in depth.
If the mass of a spherically symmetric star is still greater than the upper mass
limit of a neutron star (∼2M ) after ejecting matter, there will be no power to
prevent its gravitational collapse. Then, it will contract without any restriction into
a “singularity” with infinite density and curvature, and form a Schwarzschild black
hole (see Sect. 9.4).
[Optional Reading 9.3.3]
This optional reading introduces the derivation of the formula of electron degeneracy
pressure and the upper mass limit of a white dwarf. First we discuss the electron degeneracy
pressure. Suppose x, y, z are the spatial coordinates of an electron, and k x , k y , k z are the three
coordinate components of the electron’s momentum, then {x, y, z; k x , k y , k z } is a coordinate
system of the 6-dimensional phase space. A phase space can be divided into many quantum
phase cells dxdydzdk x dk y dk z (each phase cell corresponds to an energy level), and the
9.3 Spherical Stars and Their Evolution 435
dxdydzdk x dk y dk z = h 3 . (9.3.32)
Let k ≡ (k x2 + k 2y + k z2 )1/2 , then the points whose values of k are in the range of (k, k + dk)
in the momentum space constitute a spherical shell with volume 4π k 2 dk. Then, the points
in the phase space representing states whose position is in dxdydz and the value of k is in
(k, k + dk) constitute a shell with volume 4π k 2 dkdxdydz. Since the volume of each quantum
phase cell is h 3 , there will be 4π k 2 dkdxdydz/ h 3 phase cells in the shell. Since each cell
corresponds to an energy level, and each energy level is occupied by at most two electrons,
the number of electrons in a shell will not exceed 8π k 2 dkdxdydz/ h 3 . For a completely
degenerate electron gas with T = 0, each energy level with E E F has two electrons, and
all the energy levels with E > E F are empty. Therefore, the number of electrons with their
values of k in (k, k + dk) per unit volume, denoted by f (k)dk, satisfies
8π k 2 dk/ h 3 , k < kF
f (k)dk = , (9.3.33)
0, k > kF
ρ = μn e m N , (9.3.35)
where n e is given by (9.3.34). To obtain the equation of state, one should also compute the
degeneracy pressure pde . Pressure is the stress per unit area, i.e., the force that the matter
on the left side of an area element exerts on the matter on the right side, or the momentum
exchanged through the area element per unit time (as the definition of force is the rate
of change of momentum dk/dt). This exchange of momentum is caused by the electrons
going across the area from left to right or the other way around (each electron carries some
certain momentum). Therefore, the pressure equals the vector sum of the momenta of the
electrons going through per unit area per unit time. Suppose dσ is an area element in the
internal space of the star, whose normal vector is n (see Fig. 9.10). First, consider an electron
436 9 Schwarzschild Spacetimes
dk d
f (k)u cos θdσ dk k cos θ = f (k)uk cos2 θdσ dk k .
4π 4π
Hence, the total momentum of all electrons (regardless of the magnitudes and directions of
k) going through per unit area per unit time, i.e., the degeneracy pressure at dσ is
1 ∞ 8π kF
pde = cos2 θdk f (k)u(k)kdk = k 3 u(k)dk , (9.3.36)
4π sphere 0 3h 3 0
where (9.3.33) is used in the second equality. Using k = (1 − u 2 )−1/2 m e u, one can rewrite
(9.3.36) as
8π kF k 4 dk
pde = 3 . (9.3.37)
3h 0 (k 2 + m 2e )1/2
Then, using (9.3.34) and one can rewrite (9.3.35) as
1/3
3ρ
kF = h . (9.3.38)
8π μm N
Plugging this into (9.3.37) yields the explicit expression for the equation of state. This
equation is quite complex, but some useful conclusions can be obtained by analyzing two
extreme cases. When m e kF , u F 1, the motion of electrons can be described by Newto-
nian mechanics, which is called the non-relativistic case; when m e kF , u F ∼
= 1, the motion
of electrons must be characterized by special relativity, which is called the ultra-relativistic
case. The non-relativistic condition m e kF and the ultra-relativistic condition m e kF
can also be expressed as ρ ρC and ρ ρC , respectively, where the critical density ρC
is defined by m e = kF , which can be found explicitly from (9.3.38) as
8π μm N m 3e
ρC = . (9.3.39)
3h 3
Rewriting this in SI (adding c3 ) and plugging in the specific values (take μ = 2), we obtain
8π μm N m 3e c3 ∼
ρC = = 2 × 109 kg · m−3 ,
3h 3
and thus the critical density ρC is about 2 × 106 times the density of water. For ρ ρC
(non-relativistic case), (9.3.37) gives approximately
9.3 Spherical Stars and Their Evolution 437
8π kF5
pde = . (9.3.40)
15h 3 m e
Plugging in (9.3.38) along with the specific values in SI yields
2/3 5/3 5/3
1 3 h2 ρ 7 ρ
pde = = 10 (SI) . (9.3.41)
20 π memN
5/3 μ μ
2π kF4
pde = . (9.3.42)
3h 3
Plugging (9.3.38) in the above equation, rewriting it in SI (adding c) and plugging in the
specific values, we obtain
1/3 4/3 4/3
3 hc ρ 10 ρ
pde = = 1.24 × 10 (SI) . (9.3.43)
π 4/3
8m N μ μ
d r 2 d p(r )
= −4πρ(r )r 2 . (9.3.45)
dr ρ(r ) dr
Starting from this equation, using some calculation techniques [see Weinberg (1972) pp. 308–
310], one finds the dependence of the radius R and the mass M of the star on the central
density ρ0 :
(γ −2)/2
R = a γ ρ0 , (9.3.46)
(3γ −4)/2
M= bγ ρ0 , (9.3.47)
where the constants aγ and bγ are related to γ ; for γ = 5/3 and γ = 4/3 they are, respec-
tively,
a5/3 = 6.3 × 108 μ−5/6 , b5/3 = 1.7 × 1026 μ−5/2 ,
(9.3.48)
a4/3 = 5.3 × 1010 μ−2/3 , b4/3 = 11.6 × 1030 μ−2 .
Based on this we can further discuss white dwarfs. When the mass M of a star is small
enough, ρ0 ρC , then (9.3.41) is valid everywhere inside the star, and the electron gas
inside the whole star forms a polytrope with γ = 5/3. When the central degeneracy pressure
equals the central pressure for keeping equilibrium, the star will be in an equilibrium state.
The relation between the radius R and the mass M in equilibrium can be seen from (9.3.46),
(9.3.47) (with γ = 5/3) and (9.3.48) as
R ∝ M −1/3 . (9.3.49)
438 9 Schwarzschild Spacetimes
Thus, the radius of a white dwarf with γ = 5/3 decreases as the mass increases. This seems
to contradict our life experience and the experience from the planets; later we will give a
rough explanation of this. If (9.3.49) always holds, then the electron degeneracy pressure
can support stars of any mass, since one can always plug a value of M into (9.3.49) and
find a radius R of a star in equilibrium. However, when the mass M is sufficiently large,
the central pressure will be so large that the (special) relativistic effect of electrons has to
be considered. Then, the star can no longer be regarded as a polytrope with γ = 5/3, and
(9.3.49) no longer holds. In fact, since ρ0 increases as M increases, the electrons near the
center will be the first to reach the ultra-relativistic level. A spherical core which can be
regarded as a polytrope with γ = 4/3 will appear in the star, and then it will gradually
expand to the entire body. From (9.3.47)6 we can see that M is independent of ρ0 when
γ = 4/3, which is quite different from the case where γ = 5/3. When the entire star can
be regarded as a polytrope with γ = 4/3, it follows from (9.3.48) that the value of this M
(denoted by MCh ) which is independent of ρ0 is
5.8
MCh = M . (9.3.50)
μ2
Equation (9.3.47) is derived under the condition of hydrostatic equilibrium. If the mass is
larger than MCh , the star cannot be in equilibrium. In fact, from (9.3.49) and (9.3.44) we can
see that the conclusion above can be interpreted as follows: as a rough evaluation, we assume
that the star has a uniform density, then it follows from (9.3.23) that the central pressure for
keeping the equilibrium is
pgrav ∝ M 2 R −4 , (9.3.51)
where p0 is now denoted as pgrav to emphasize that this is the central pressure for counter-
balancing the self-gravity. It follows from (9.3.44) that the degeneracy pressure provided by
the degenerate electron gas is pde ∝ M γ R −3γ , and hence
pgrav M 1/3 R , for γ = 5/3 , (9.3.52a)
∝ M 2−γ R 3γ −4 =
pde M 2/3 , for γ = 4/3 . (9.3.52b)
Suppose the electron gas in the star is in the non-relativistic case (γ = 5/3) and M < MCh ,
then from (9.3.52a) we know that there exists an R such that pgrav / pde = 1, and the star is in
equilibrium when its radius equals this value of R. If M increases slightly, then pgrav / pde >
1, i.e., the self-gravity is slightly larger than the degeneracy pressure, and the star will
contract to a smaller radius to reach equilibrium again. (This can be considered as a specific
interpretation for the conclusion “a white dwarf with a greater mass has a smaller radius”.)
However, if M is so large that the entire star has γ = 4/3, then it follows from (9.3.52b)
that pgrav / pde is independent of R. Under this extreme circumstance, only when M equals a
suitable value MCh can the star be in equilibrium. If M < MCh , then pgrav / pde < 1, i.e., the
degeneracy pressure is greater than the self-gravity, R will increase until it quits the ultra-
relativistic case. In contrast, if M > MCh , then pgrav / pde > 1, and the star will contract,
which makes γ closer to exactly 4/3. Then pgrav / pde will not change with the decrease of
R, and hence the star can only continue contracting and cannot reach equilibrium under the
support of the electron degeneracy pressure. Thus, MCh is indeed the upper mass limit of a
white dwarf (whose character is that the electron degeneracy pressure keeps the equilibrium).
6 Here we still have p ρ and m(r ) r , and hence the Newtonian equation (9.3.18) and (9.3.45)–
(9.3.48) derived from it are still applicable.
9.4 The Kruskal Extension and Schwarzschild Black Holes 439
Since the interior of a white dwarf is mostly helium, carbon or oxygen, one can take μ = 2 in
(9.3.50) and obtain MCh = 1.45M . The discussion above is just a simplified version. Some
more precise discussions and calculations provide MCh slightly smaller than this value, such
as MCh = 1.3M .
[The End of Optional Reading 9.3.3]
The line element of the vacuum Schwarzschild metric in the Schwarzschild coordi-
nate system is
2M 2M −1 2
ds = − 1 −
2
dt + 1 −
2
dr + r 2 dθ 2 + sin2 θ dϕ 2 . (9.4.1)
r r
When r = 2M, g11 = ∞ (singular); when r = 0, both g00 and g11 are singular. These
places where gμν is singular (or degenerate) are called singularities. Note that the
word “singularity” may be used to refer to both the property of being singular and
the place that has the singularity.7 There are two reasons accounting for the appear-
ance of a singularity: ① the metric tensor gab is well-behaved at this place, just
some components are not well-behaved in certain coordinate systems; this is called
a coordinate singularity, which can be removed by choosing a suitable coordinate
system; ② The metric tensor gab itself is ill-behaved (singular) at this place; this is
called a true singularity or spacetime singularity, which is really a thorny prob-
lem in general relativity. Later we will see that the singularity at r = 2M is only
a coordinate singularity, and a spacetime singularity exists only at r = 0. Denoting
rS ≡ 2M (called the Schwarzschild radius, and adding the constants G and c, we
get
2G M ∼ 3M
rS = = (km) .
c2 M
For the Sun, rS ∼ = 3 km, which is far less than its radius. Since the external
Schwarzschild solution does not apply to the interior of the Sun, there is no sin-
gularity problem for it (or any normal celestial bodies). However, for a spherically
symmetric star which experiences gravitational collapse and turns into a black hole
(Birkhoff’s theorem assures that the external spacetime geometry is described by the
Schwarzschild metric), the singularity problem is of great significance.
7 Also note that a singularity may not be a point, since in the 4-dimensional language r = 0 (or
r = 2M) represents a hypersurface instead of a point.
440 9 Schwarzschild Spacetimes
attack” is overly broad. A spacetime that should not have been singular, if we remove
a point by hand, would be a singular spacetime according to the preceding definition,
which is not what we want. One way to overcome this flaw is to add a restriction
in the definition: the spacetime we consider must be inextendible, i.e., it cannot be
enlarged by adding some points to it.8 A spacetime with some points removed arti-
ficially is not inextendible, and hence does not meet this definition. Then we inspect
the above definition from the perspective of whether it is physically singular. If there
exists an incomplete timelike geodesic in an inextendible spacetime, physically it is
indeed quite singular: the freely falling observer it represents will actually vanish in
the spacetime within a finite time (according to its own standard clock) or not even
have existed a finite amount of time earlier! Similarly, an incomplete null geodesic
is also physically singular, since it represents the world line of a photon. However, a
spacelike geodesic is not the world line of any particle, and so there is no reason to
consider a spacetime which only has incomplete spacelike geodesics as physically
singular. Hence, we take the following definition [see Hawking and Ellis (1973)]:
Definition 1 If there exists one (or more than one) incomplete timelike or null
geodesic in an inextendible spacetime, we say that it is a singular spacetime, or
it has a spacetime singularity.
However, Definition 1 still has drawbacks. For instance, there exists such a space-
time [see Geroch (1968)] which has no incomplete geodesics, but has a bizarre
non-geodesic timelike curve (which has been maximally extended) whose arc length
is finite and 4-acceleration (magnitude) is bounded. This indicates that the observer
in a spaceship traveling along the curve will vanish in the spacetime after a finite
time! (The finite arc length and bounded 4-acceleration assures that the spaceship
can finish this curve with a finite amount of fuel, and a spaceship like this exists in
principle.) Such a spacetime is singular enough to be called a singular spacetime,
but unfortunately it is not according to Definition 1. This indicates that this defi-
nition has a drawback that the “scope of attack” is too narrow. Another drawback
of Definition 1 is that the intuitive statement that the spacetime has a “hole” does
not always meet the existence of an incomplete geodesics. For example, there exists
such a geodesically incomplete spacetime (which contains an incomplete timelike,
null or spacelike geodesic) whose background manifold is compact, and hence has
no “holes” (according to Theorem 1.3.9, any point sequence in a compact manifold
has an accumulation point, and thus the manifold has no “holes”), see Wald (1984).
Although Definition 1 has these drawbacks, it may still be considered as the first
choice of the definition of a singularity. The proof of the Penrose-Hawking singu-
larity theorems used exactly this definition (Appendix E of Volume II provides a
brief introduction of singularity theorems). Later we will see that in the maximally
extended Schwarzschild spacetime there still exist many incomplete timelike and
null geodesics (whose existence is related to the elimination of r = 0). Therefore,
8The precise mathematical definition is: a spacetime (M, gab ) is said to be inextendible if there
does not exist a spacetime (M , gab ) such that there exists an isometry between the proper subsets
of (M, gab ) and (M , gab ).
442 9 Schwarzschild Spacetimes
If you can find a coordinate system such that the components of the Schwarzschild
metric in this system behave ordinarily at r = 2M, you can claim that r = 2M is only
a coordinate singularity. This is a sufficient condition for determining a coordinate
singularity. Unfortunately, finding this kind of “good” coordinate system in general
is not easy, and is in no way guaranteed. Luckily, the singularity of the Schwarzschild
metric at r = 2M involves only the first two dimensions in the total 4-dimensional
line element, and finding a “good” coordinates system in a 2-dimensional spacetime
is way easier than doing that in a 4-dimensional spacetime. In this section we will
first introduce a simple but heuristic example. Consider the 2-dimensional Rindler
spacetime, whose metric has the following line element expression in the coordinates
system {t, x}:
ds 2 = −x 2 dt 2 + dx 2 . (9.4.2)
9.4 The Kruskal Extension and Schwarzschild Black Holes 443
The approach of finding a “good” coordinate system for determining the singularity
at x = 0 as a coordinate singularity is based on the following fact: each point in
a 2-dimensional spacetime has only two null directions (while in 4-dimensional
spacetime there are infinitely many), and hence (locally) there are only two null
geodesics passing through each point, which sorts the null geodesics in the entire
spacetime into two families. If we find that a null geodesic is incomplete, then we
should suspect that certain regions have been eliminated from the given spacetime. If
one can show that these eliminated regions can be mended, i.e., the given spacetime
can be extended, and x = 0 is a point in the extended spacetime, then one can claim
that the singularity at x = 0 is only a coordinate singularity. Here is how it works:
Suppose η(λ) is a null geodesic in Rindler spacetime, with λ as the affine param-
eter, then its tangent vector
satisfies
where the positive sign and negative sign represents the “ingoing” family and “out-
going” family of null geodesics, respectively (the “ingoing” and “outgoing” here
are introduced simply for the sake of convenience, one can choose either family as
ingoing, and the other as outgoing), and different values of c correspond to different
geodesics in the same family. Hence, t + ln x and t − ln x are constants on each
“ingoing” and “outgoing” null geodesic. Define coordinates v and u as follows:
v := t + ln x , u := t − ln x . (9.4.5)
1
x = e 2 (v−u) ,
1
t= (v + u) , (9.4.6)
2
and plugging these into (9.4.2) after differentiating them yields
Thus, 0 = gvv = gab (∂/∂v)a (∂/∂v)b , which indicates that the basis vector (∂/∂v)a
is a null vector. Similarly, (∂/∂u)a is also a null vector. Therefore, we refer to v and
u as null coordinates. The ranges of the coordinates t and x [see (9.4.3)] correspond
to the following ranges of v and u (see Fig. 9.11):
This seems to suggest that all the null geodesics are complete, but actually it does
not since v and u are not affine parameters. The affine parameters can be obtained
by means of the timelike Killing vector field (∂/∂t)a . According to Theorem 4.3.3,
the E defined as follows is a constant along any null geodesic η(λ)
Define
V := ev . (9.4.11)
Since e−u /2E and c1 are constants, and λ is an affine parameter, (9.4.10) indicates
that V ≡ eν is also an affine parameter of the “outgoing” null geodesics (see Theorem
3.3.3). From (9.4.8) and V ≡ eν we can see that the range of V is (0, ∞), and thus the
“outgoing” null geodesics are incomplete. Similarly, for “ingoing” null geodesics,
U := −e−u (9.4.12)
is an affine parameter. From (9.4.8) and (9.4.12) we know that the range of U is
(−∞, 0), and thus the “ingoing” null geodesics are incomplete also. Does this indi-
cate that Rindler spacetime is a singular spacetime with spacetime singularity at
x = 0? No, the key is that Rindler spacetime is not an inextendible spacetime, but
is the result of eliminating certain regions from a larger spacetime. To confirm this
conclusion, one can derive from (9.4.11) and (9.4.12) that
ds 2 = −dV dU . (9.4.14)
are derived from the range of the original coordinates x, i.e., 0 < x < ∞. However,
now it is not necessary to stick to this range, since it follows from (9.4.14) that
the only nonvanishing component in the coordinate system {V, U } of the metric,
gV U = −1/2, behaves quite normally. Even if V, U take values exceeding the range
of (9.4.15), the line element (9.4.14) still behaves well, with no singularity at all.
If we present (9.4.15) to you in the first place without mentioning the previous
discussion, you would naturally consider that the range of V, U has no constraints,
i.e., they can take any value within (−∞, +∞). In this way, the extension of the
domain of the Rindler metric is realized by introducing new coordinates V, U . x = 0
represents points in the extended domain (the positive semi-axis of the V -axis, see
Fig. 9.12). The metric behaves normally at these points, just its components in the
original coordinate system {t, x} behave badly there. This is actually pretty natural,
since x = 0 never belongs to the coordinates patch of the original coordinate system
(it only “touches the edge” from the outside), the so-called singularity at x = 0
is nothing but applying the original coordinate system inappropriately outside the
coordinate patch. Therefore, the singularity at x = 0 is only a coordinate singularity.
If we further define coordinates T, X as follows:
V +U V −U
T := , X := , (9.4.16)
2 2
then it follows from (9.4.14) that ds 2 = −dT 2 + dX 2 . Thus, the Rindler metric is
actually a flat metric,9 just that its true colors are concealed by the original coordinates
t, x. The Rindler spacetime defined in (9.4.2) is nothing but a sub-spacetime of the
2-dimensional Minkowski spacetime [a quadrant defined by (9.4.15), see region R
in Fig. 9.12]. The Minkowski spacetime in Fig. 9.12 is the maximal extension of
the Rindler spacetime in Fig. 9.11. It follows from x 2 = ev−u = −V U that both of
the two lines V = 0 and U = 0 in Fig. 9.12 correspond to x = 0, which is exactly
a specific manifestation of “x = 0 does not belong to the coordinate patch of the
original coordinate system (it only “touches the edge” from the outside)”. Although
the two families of null geodesics in Fig. 9.11 appear differently from those in region
R of Fig. 9.12, they are essentially the same. This again indicates that, even for the
same spacetime, the spacetime diagram can vary widely due to different choices of
the coordinate system.
9 It only differs from the Minkowski metric up to a diffeomorphism, and thus they are equivalent
(they have the same geometry, see Sect. 8.10.2).
446 9 Schwarzschild Spacetimes
As a differential equation, Einstein’s equation is local (one can talk about a differential
equation at any given point and its neighborhood on the manifold). Each solution
of the equation represents a metric. As for what manifold is the metric defined on
(this is a global problem), one can only discuss it after solving the equation. Take
the original Schwarzschild line element (9.4.1) as an example. We have pointed out
that this line element has singularities at r = 0 and r = 2M. Since the background
manifold has to be connected, the range of r can either be r > 2M or r < 2M, but
cannot be their union. We may take r > 2M, and then try to show that r = 2M
is a coordinate singularity. The way of proving this is quite similar to that of the
Rindler case. The Rindler line element (9.4.2) is not only 2-dimensional, but also
has a timelike Killing vector field (∂/∂t)a , i.e., the metric components do not contain
t, which greatly simplifies the task of finding a “good” coordinate system. We may
summarize the way of accomplishing this task as the following procedure:
9.4 The Kruskal Extension and Schwarzschild Black Holes 447
ds 2 = −x 2 dt 2 + dx 2 = x 2 (−dt 2 + x −2 dx 2 ) .
Define a function x∗ (x) such that dx∗ = x −1 dx, then ds 2 = x 2 (−dt 2 + dx∗2 ). Let
v := t + x∗ , u := t − x∗ , i.e., t = (v + u)/2, x∗ = (v − u)/2, then −dt 2 + dx∗2 =
−dvdu. Hence
ds 2 = −x 2 dvdu = −ev−u dvdu = −dV dU ,
where
dr∗ := (1 − 2M/r )−1 dr . (9.4.18)
Take r
r∗ := r + 2M ln −1 , (9.4.19)
2M
which is the tortoise coordinate r∗ in (8.9.1). Let
v+u v−u
v := t + r∗ , u := t − r∗ or t= , r∗ = , (9.4.20)
2 2
then the ranges of v and u are
Let
Also
448 9 Schwarzschild Spacetimes
dvdu = β −2 e β(u−v) dV dU ,
and hence
r − 2M
dŝ 2 = −β −2 e β(u−v) dV dU.
r
The factor eβ(u−v) on the right-hand side of the above equation can be expressed using
(9.4.20) as eβ(u−v) = e−2βr∗ . Using (9.4.19) to express −2βr∗ , we can organize that
4β M
β(u−v) −2βr 2M
e =e .
r − 2M
Hence, 4β M
−2 r − 2M −2βr 2M
dŝ = −β
2
e dV dU .
r r − 2M
The cases where the above equation may be singular are r = 0 and r − 2M = 0, in
which the latter can be eliminated by choosing
1
β= (9.4.25)
4M
as
2M −2βr 32M 3 −r/2M
dŝ 2 = −β −2 e dV dU = − e dV dU. (9.4.26)
r r
This equation indicates that the metric components are no longer singular at r = 2M,
and hence the range of V, U can be extended to the regions where V 0 and U 0.
Unlike the Rindler case, (9.4.26) indicates that r = 0 is still a singularity, and thus
the range of r is constrained to r > 0. Thus, the values of V and U are in no way
arbitrary; together they must satisfy the condition r > 0. Also let
1 1
T := (V + U ) , X := (V − U ) , (9.4.27)
2 2
and complete it with the other two dimensions, then we obtain the expression for
the line element of the Schwarzschild metric in the Kruskal coordinate system
{T, X, θ, φ}:
32M 3 −r/2M
ds 2 = e (−dT 2 + dX 2 ) + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (9.4.28)
r
The above equation indicates that the Schwarzschild metric can be defined on a
manifold much larger than the original domain (r > 2M). Generally speaking, a
spacetime ( M̃, g̃ab ) is called an extension of a spacetime (M, gab ) if M ⊂ M̃ and
g̃ab | p = gab | p , ∀ p ∈ M. The extension of the original Schwarzschild spacetime we
9.4 The Kruskal Extension and Schwarzschild Black Holes 449
obtained just now is called the Kruskal extension [Kruskal (1960)]. In this extension,
the coordinates T, X can take all the values allowed by r > 0. The r in the line
element (9.4.28) should be regarded as a function of the coordinates T and X , which
is defined as follows (it is not difficult to prove this from the relation of the old and
new coordinates): r
− 1 er/2M = X 2 − T 2 . (9.4.29)
2M
Due to the spherical symmetry, one can sketch the spacetime diagram with only the
first two dimensions (see Fig. 9.13), and by imagining each point in the diagram
as an S 2 (2-dimensional sphere) yields the 4-dimensional spacetime. The factor
−dT 2 + dX 2 in (9.4.28) indicates that in the 2-dimensional Schwarzschild spacetime
diagram with T and X as the coordinate axes, the (radial) null curves are all lines
with slope ±1. This will bring huge convenience to our discussion.
From (9.4.29) we can see that r = constant corresponds to X 2 − T 2 = constant,
i.e., a hyperbola in the T X -plane (a pair of 45◦ lines when r = 2M), which becomes
a circular hyperboloid with the other two dimensions no longer suppressed (a hyper-
surface in the 4-dimensional manifold). There are two important special cases:
(1) r = 0 corresponds to X 2 − T 2 = −1. Thus, the bound of the Kruskal exten-
sion, r > 0, can be expressed in terms of the coordinates as
X 2 − T 2 > −1 . (9.4.30)
It is not difficult to show that any radial null or timelike geodesic with r → 0 is
incomplete. By calculation one also finds that the value of the scalar field Rabcd R abcd
approaches ∞ when r → 0 along these geodesics (which is obviously distinct
from the fact that Rabcd R abcd approaches a finite value when r → 2M), and thus
there exists an s.p. curvature singularity. This implies that the spacetime cannot be
450 9 Schwarzschild Spacetimes
Reversely, we can define the t coordinates in the other three regions in terms of V, U
using the following relations:
where
r∗ ≡ r + 2M ln |r/2M − 1| . (9.4.32)
Applying the line element (9.4.28) to regions B, W, A and rewriting the line element
in terms of t, r by means of (9.4.31 ) and (9.4.32), we still get (9.4.1), where the
range of r for regions B, W is 0 < r < 2M, and for A is r > 2M. Thus, the metric
in A, A and B, W are, respectively, the Schwarzschild line element (9.4.1) restricted
to r > 2M and 0 < r < 2M. The relations between the coordinate T, X and t, r in
these 4 regions are as follows:
object (including a photon) in the region A can never return to A (but can only fall
into the singularity) once it enters the region B. Therefore, the region B is called
a black hole, and N+ 1 is called the event horizon. Considering that each point in
Fig. 9.13 represents a 2-dimensional sphere, and thus the black hole is a 4-dimensional
spacetime region, while the event horizon is a (3-dimensional) null hypersurface (the
proof for the event horizon being a null hypersurface is left as Exercise 9.11, see the
hint therein). The region A is characterized by X < 0 and X 2 > T 2 , and it also has
r > 2M. In fact, it has exactly the same properties as the region A, including that
its relationship with the black hole B is similar to the relationship between A and B,
and hence N+ 2 is the event horizon of A . However, A and A do not have any causal
relation: any timelike or null curve staring from A cannot enter A and vise versa. In
this sense, people also often refer to A and A as two (independent) “universes”. The
region W is characterized by T < 0 and X 2 < T 2 , and it also has r < 2M. W and A
(or A ) are only divided by a “membrane”, which is the null hypersurface N− −
2 (or N1 ).
− −
Both N2 and N1 are “one-way membranes” with no way out. Any future-directed
timelike or null curve in W will cross N− −
2 (or N1 ) and enter A (or A ). Since B is
called a black hole, W is naturally called a white hole.
The above discussion is about the maximal extension of Schwarzschild spacetime
obtained under the premise that the entire spacetime is a vacuum. Although this
extension includes some tempting terminologies such as black hole, white hole, event
horizon and the two identical “universes”, the physical existence (authenticity) of it
deserves additional discussions. From the perspective of the initial value problem,
the chance for this entire spacetime to exist is very small, while part of it (including
part of A, B and the event horizon in between) is very meaningful, see Sect. 9.4.6
for details.
At the end of this subsection, we would like to discuss the Killing vector fields of
the maximally extended Schwarzschild spacetime. Before the extension, the space-
time has 4 independent Killing vector fields, in which 3 of them reflect the spherical
symmetry, see the ξ1a , ξ2a , ξ3a in Sect. 8.2; the fourth one reflects the staticity, namely
ξ a = (∂/∂t)a . For the maximally extended Schwarzschild spacetime, ξ1a , ξ2a , ξ3a still
reflect the spherical symmetry. Since the t coordinate is defined in all for regions
A, A’, B, W, and the line elements in all regions written in terms of t, r are the
original Schwarzschild form, the ξ a = (∂/∂t)a in each region is still a Killing field.
Note that in B and W ξ a is not timelike but spacelike, since it follows from the line
element (9.4.1) that r < 2M leads to gab (∂/∂t)a (∂/∂t)b > 0. There does not exist
other independent Killing vector fields besides ξ1a , ξ2a , ξ3a and ξ a , and hence B and
W are not static spacetime regions. (∂/∂t)a is undefined on the null hypersurfaces
N1 and N2 , since the coordinate t is undefined on it (t = ±∞). However, one can
express ξ a in A using the coordinate basis vectors (∂/∂ V )a and (∂/∂U )a as
1
ξ a = (∂/∂t)a = [V (∂/∂ V )a − U (∂/∂U )a ] . (9.4.40)
4M
Since (∂/∂ V )a and (∂/∂U )a are well-defined on N1 and N2 , one can define the vector
field ξ a on N1 and N2 using the above equation, and verify that it is a null Killing
9.4 The Kruskal Extension and Schwarzschild Black Holes 453
vector field. Hence, on the whole manifold there is a fourth C ∞ Killing vector field
ξ a , which is orthogonal to the other 3 independent Killing vector fields. Thus, the
symmetry of the maximally extended Schwarzschild spacetime is characterized by
4 Killing fields, in which three reflect the spherical symmetry and the fourth (i.e.,
ξ a ) is timelike in A and A , spacelike in B and W, and null on N1 and N2 . Thus,
we can see the necessity of changing “static” to “Schwarzschild” in the original
formulation of Birkhoff’s theorem “a spherically symmetric solution of the vacuum
Einstein equation must be a static metric” (see Sect. 8.3.3): the Schwarzschild metric
is not necessarily a static metric. From the geometric perspective, the essence of
Birkhoff’s theorem is: if the metric satisfies the vacuum Einstein equation and has
the three Killing vector fields reflecting the spherical symmetry, then it must have a
fourth (additional, not preassigned) Killing vector field ξ a , which can be timelike,
spacelike, or even null, depending on where the spacetime point is located.
Suppose the radius coordinates of static observers G and G outside the event horizon
are r and r (> r ), respectively. G emits light toward G . One can derive from (9.2.3)
that the redshift z ≡ (λ − λ)/λ. If r is fixed, then z is a function of r satisfying
dz(r )/dr < 0 and limr →2M z(r ) = ∞. Therefore, the hypersurface r = 2M is also
called the surface of infinite redshift. However, one should not say “the light emitted
from the surface of infinite redshift will have an infinite redshift when it reaches
G ”, since any outgoing null geodesic emitted from the hypersurface r = 2M (event
horizon) can only lie on the horizon and can never reach G .
The Schwarzschild spacetime (region A) has only one static reference frame (it
has only one hypersurface orthogonal Killing vector field, namely ξ a ), but there are
infinitely many stationary reference frames. This is because a linear combination of
ξ a and the spatial Killing field (∂/∂ϕ)a , ξ̃ a ≡ ξ a + β(∂/∂ϕ)a (where β is a constant)
is also a Killing field, and so ξ̃ a corresponds to a stationary reference frame in the
region where it is timelike. Equation (9.2.2) can be applied to any stationary reference
frame, where one just needs to interpret the ξ a as ξ̃ a . The surface of infinite redshift
corresponds to −ξ̃ a ξ̃a = 0, and thus relies on the stationary reference frame. In fact , if
one wants, one can even find a stationary reference frame that has a surface of infinite
redshift for Minkowski spacetime. Since the static reference frame in Schwarzschild
spacetime is unique, unless otherwise indicated, the surface of infinite redshift will
refer to the surface −ξ a ξa = 0 (which coincides with the event horizon), and the
“redshift factor” will mean
Lots of literature, including textbooks and popular science books like to use embedding
diagrams (see Fig. 9.14) to intuitively describe the Schwarzschild black hole. This subsection
provides an introduction to embedding diagrams. To start with, we first discuss the simple
case of the embedding diagram for a static spherically symmetric star. Equation (9.3.19)
represents the metric inside a static spherically symmetric star, whose induced line element
on any constant-t surface t reads
2m(r ) −1 2
ds 2 = 1 − dr + r 2 dθ 2 + sin2 θdϕ 2 . (9.4.41)
r
Let R be the radius of the star. If we let m(r ) take a constant value M ≡ m(R) when r R,
then the above equation applies to both the inner and outer parts of the star. This is a curved
line element. Due to the spherical symmetry, we can just consider the cross section with
θ = π/2 in t (denoted by S), whose induced line element is
2m(r ) −1 2
ds 2 = 1 − dr + r 2 dϕ 2 . (9.4.42)
r
Let gab represent the metric corresponding to this line element, then (S, gab ) is a 2-
dimensional Riemannian space. To intuitively manifest its intrinsic warping, one can embed
it into the one higher dimensional Euclidean space (R3 , δab ), i.e., consider the embedding
φ : S → R3 , and use the warping of φ[S] in R3 to intuitively reflect the intrinsic warping of
(S, gab ). Figure 9.15 is the embedding diagram that embeds (S, gab ) into (R3 , δab ). From
this figure we can see that the further from the center of the star, the lesser the space warps,
and as r → ∞ it approaches flat space. However, how is this diagram drawn? Based on what
principle can we draw a diagram with this kind of effect?
The line element expression of the 3-dimensional Euclidean metric δab in a cylindrical
coordinate system {z, r, ϕ} reads
ds 2 = dz 2 + dr 2 + r 2 dϕ 2 . (9.4.43)
Take a radial line segment on S. The difference between the values of r at its ends p and
p is dr (see Fig. 9.16 left). If we parallelly transport this segment to somewhere on S with
a different value of r , then although the new segment has the same dr as the old one, the
arc length is in general different [see (9.4.42)]. This is a significant manifestation of the
intrinsic warping of (S, gab ). Let q ≡ φ( p) and q ≡ φ( p ). As long as we assure that the
9.4 The Kruskal Extension and Schwarzschild Black Holes 455
line segments qq and pp have the same arc length when drawing the diagram, then the
external warping of φ[S] in R3 reflects the above-mentioned intrinsic warping of (S, gab ).
This is the principle of making the embedding diagram. Based on this one can find the
equation of the surface φ[S], and then draw φ[S]. As a hypersurface in R3 , the equation of
φ[S] can be expressed as f (z, r ) = 0 (the axial symmetry makes f to not depend on ϕ).
This corresponds to a function of one variable, z = z(r ), which represents the dependence
of the value of z on the value of r at an arbitrary point on φ[S]. Hence, the arc length of any
line segment on φ[S] is
Fig. 9.16 Embedding a surface S in the spacetime of a static spherically symmetric star into
(R3 , δab )
(Due to the asymptotic flatness, it develops from above and below into two surfaces that are
approximately planes.) Since the value of r in any point of 0 is greater than or equal to
2M, there does not exist a point with r < 2M in the embedding diagram. The whole “space”
is divided by the circle formed by the points of r = 2M (it is actually a sphere, called the
throat) into upper and lower halves, which correspond respectively to the X > 0 and X < 0
parts on the X -axis in Fig. 9.13b. For instance, the points p and p in Fig. 9.13b correspond
respectively to the circles (spheres) φ( p) and φ( p ) in Fig. 9.14. It is necessary to reiterate
that only the circular paraboloid represents the “whole space” 0 at t = 0, while the points
outside the surface do not have any physical meaning.
As we have mentioned in Sect. 9.3.2, if a star in its late stage of evolution wants to
maintain hydrostatic equilibrium in its interior [satisfying (9.3.17)], its mass must be
less than the upper mass limit of a neutron star. If a star whose initial mass is greater
than this upper bound cannot eject enough mass during its evolution and become
a stable white dwarf or neutron star, then it cannot be stable at all but can only
keep contracting until it becomes a black hole. According to Birkhoff’s theorem (see
Sect. 8.3.3), the exterior of the star must have a Schwarzschild metric, and hence
can be described by the spacetime diagram shown in Fig. 9.17. The non-shadow
region in the diagram is identical to the corresponding part in Fig. 9.13, while the
shadow region is described by the interior metric (non-vacuum solution to Einstein’s
equation). Therefore, the spacetime of a collapsing star does not have the white hole
region W at all, and it does not have the Region A’ either, while the black hole region
B and part of the region A in this case are of great significance. No matter how solid
the matter constructing the star is, as long as the surface of the star crosses the event
horizon, it has to keep contracting until the entire star is squashed into the singularity.
The reason is simple: the world line of any point on the surface of the star must lie
inside the light cone (must be timelike), and thus the angle between the T -axis and
9.4 The Kruskal Extension and Schwarzschild Black Holes 457
the world line must be smaller than 45◦ (note that the lines tilted at 45◦ in Figs. 9.13
and 9.17 represent radial null geodesics). The Schwarzschild coordinates can only
cover the spacetime region of r > 2M (or 0 < r < 2M), and thus cannot manifest
the whole process of a star in its late stage collapse into a black hole; particularly, it
cannot manifest the most crucial step in the whole process—the surface of the star
is contracted into the event horizon. If we want to describe the collapse of a star
using the Schwarzschild coordinates, then we can only draw it as Fig. 9.18. Since the
Schwarzschild coordinate t is undefined at r = 2M, this figure is actually just the
result of combining two diagrams (representing r > 2M and 0 < r < 2M) together.
The right part of the figure (r > 2M) may mislead people into thinking that the
surface of the collapsing star is always outside the event horizon r = 2M. This kind
of misunderstanding comes from confusing t = ∞ with “always” (readers who are
familiar with Zeno’s paradox may notice the similarity between the coordinate time
t and “Achilles time”). From Fig. 9.17 we can see that the intersection of the star’s
surface and r = 2M (see p in Fig. 9.17) corresponds to t = ∞. However, the proper
time τ of an observer at a point p on the star’s surface has a finite value, and they
will enter the black hole and fall into the singularity in a very short time τ . (For a
black hole with M = M , τ is approximately 2 × 10−5 s.)
The process of a star collapsing into a black hole can be represented more intu-
itively by another coordinate system—the ingoing Eddington-Finkelstein coordinate
system {v, r, θ, ϕ}. Although it cannot cover the maximally extended Schwarzschild
spacetime like the Kruskal system, it can cover the regions A and B (unlike the
Schwarzschild coordinate system which can only cover any one of the four regions).
The r, θ, ϕ in this system are the same as the corresponding coordinates in the
Schwarzschild system, and v := t + r∗ . The line element of the first two dimen-
sions in the Schwarzschild system
It follows from the equation above that the nonvanishing components gvv = −(1 −
2M/r ), gvr = 1 and the determinant g = −1 are all well-behaved at r = 2M, and
hence r = 2M is no longer a singularity. Also, considering that v ∈ (−∞, ∞) cor-
responds to V ∈ (0, ∞), we can see that {v, r } can cover the regions A and B in
Fig. 9.13. grr = 0 and gvv = −(1 − 2M/r ) also indicates that the coordinate basis
vector (∂/∂r )a of the Eddington-Finkelstein system {v, r, θ, ϕ} is a null vector while
(∂/∂v)a is a timelike (for region A) or spacelike (for region B) vector. Suppose η(λ)
is an arbitrary radial null geodesic in the regions A and B, then it follows from
(9.4.51) that
2
2M dv dv dr dv 2M dv dr
0=− 1− +2 = − 1− +2 .
r dλ dλ dλ dλ r dλ dλ
This indicates that the radial null geodesics can be classified into two families, which
are characterized respectively by the following conditions:
dv
(1) = 0, i.e., v = constant , (9.4.52)
dλ
2M dv dr dv 2r
(2) − 1− +2 = 0, and hence = . (9.4.53)
r dλ dλ dr r − 2M
9.4 The Kruskal Extension and Schwarzschild Black Holes 459
The first family of null geodesics are horizontal lines in the vr -diagram, which is not
intuitive enough. Define t˜ := v − r , then it follows from (9.4.51) that
The behaviors of the two families of null geodesics in the coordinate system {t˜, r }
are shown in Fig. 9.19: the equation for family (1) (incoming family) is dt˜/dr = −1,
and thus it is a family of parallel lines with slope −1; the equation for family (2)
(outgoing family) is
dt˜ r + 2M
= .
dr r − 2M
The behavior of this family is rather special: all of the null geodesics are curves
except for a vertical line (r = 2M); for a curve on the right of the vertical line, the
value of r increases as the affine parameter λ increases (which is truly outgoing),
while for a curve on the left of the vertical line, the value of r decreases as λ increases
(which is actually ingoing, but belongs to the outgoing family). This oddity reflects
an important property of the black hole: r = 2M is the event horizon, and any photon
inside the horizon (r < 2M) cannot cross the horizon and come out of the black hole
(to r > 2M); their values of r can only keep decreasing to zero. Based on the two
families of null geodesics one can easily draw the light cone at each point, which
is helpful for analyzing the motion of a point mass, since the world line of a point
mass is a timelike curve and the tangent vector at each point on the line must lie
inside the light cone at this point. Thus, a point mass outside the event horizon can
cross the horizon and enter the black hole, while once it enters there is no way out,
and it can only fall in to the singularity. Revolving Fig. 9.19 with respect to the
t˜-axis we get the 3-dimensional spacetime diagram (see Fig. 9.20), and then adding
the world tube of the surface of the collapsing star (the cannon shape surface in
Fig. 9.20) we can intuitively represent the exterior spacetime geometry of a star
collapsing into a black hole. To facilitate understanding this, let us consider the
460 9 Schwarzschild Spacetimes
following thought experiment (ignore the effects of the tidal force). Suppose you are
an observer exploring a black hole on a spaceship with enough fuel. If you do not
turn on the engine, the spaceship will fall freely, and will evidently cross the event
horizon, enter the black hole and die at the singularity. If you turn around before
reaching the horizon and go full steam ahead (namely let r increase before it reaches
2M), you will be able to return safely and write an exploration report. However, if
you go for one more step and reach the horizon (note that you will not feel anything
special when your world line intersects the horizon), then this moment will become
the regret of a lifetime, because from the light cone on the horizon we can see that
you cannot return once you are at the horizon. You cannot even have a phone call
wirelessly to your distant friend, since the “outgoing” photon on the horizon can
only go upward vertically along the horizon (r remains equal to 2M), see Fig. 9.20.
Now we will discuss the appearance of the collapsing star. Figure 9.21 shows
the situation of a photon emitted from the surface of the star reaching an exterior
static observer. Since the photon emitted from the surface of the star cannot reach
outside the horizon, seemingly an exterior observer will find that the star gets smaller
gradually and vanishes suddenly. However, if we look at Fig. 9.21 more carefully we
will see that this is not the case. Since the world line of an outgoing photon outside
the horizon will be more steep when it comes closer to the horizon, and becomes
completely vertical on the horizon (lies on the horizon), the exterior observers will
always (no matter how large its proper time τ is) receive the light emitted from the
surface of the star outside the horizon. They will observe that the star contracts slower
9.4 The Kruskal Extension and Schwarzschild Black Holes 461
and slower, and approaches a certain size,10 i.e., the radius of the star will approach
2M with a smaller and smaller speed and get “frozen” at this size. This phenomenon
is also called the time dilation effect of the gravitational field. We have mentioned
in Chap. 6 that when comparing the rates of clocks, first one needs to stipulate a
specific method for “clock comparison”. In the situation of Fig. 9.21, the world lines
of photons becomes the key of clock comparison: we stipulate the proper times of the
world lines of, respectively, an observer on the star’s surface and an exterior observer
set by the world lines of two neighboring radial photons, τ and τ , as the objects
to compare. Calculation shows that (see Optional Reading 9.4.2) τ > τ , and if
τ2 = τ1 then τ2 > τ1 , i.e., τ /τ increases as τ increases. Thus, the exterior
observer views that a standard clock on the star’s surface is not only slower than his
own clock, but also becomes slower and slower as it goes (note that this kind of “view”
is the outcome of both the spacetime geometry and the method of clock comparison
we just stipulated). Another manifestation of this effect of time dilation is the redshift.
Regard two neighboring null geodesics as the world lines of two neighboring wave
crests, then τ and τ are the periods of the light waves measured by the observer
10 Later we will see that the light waves received by the exterior observers will have stronger and
stronger redshift. Thus, only if we assume (theoretically) that the observer is sensitive to light of
any wavelength and intensity can they observe this phenomenon.
462 9 Schwarzschild Spacetimes
on the star’s surface and the exterior observer, respectively. τ > τ indicates that
the light wave received by the exterior observer has a longer wavelength, i.e., it has a
redshift, and τ /τ increasing with the increase of τ indicates that the redshift is
getting stronger. However, this kind of redshift is different from the redshift between
stationary observers which we discussed in Sect. 9.2.1, since the observer on the
star’s surface is not a stationary observer, see Optional Reading 9.4.2 for details.
[Optional Reading 9.4.1]
Beginners often feel confused by the fact that the spacetime diagrams of the same physical
process in different coordinate systems can be so different. The essence of the problem is
actually very simple: a coordinate system by definition is nothing but a map from an open
set O of the manifold to an open set V of Rn , and a spacetime diagram is a diagram in
V . The same physical process can certainly have different spacetime diagrams in different
coordinate systems. So we may say that a physical process is absolute, while a spacetime
diagram is relative (since a coordinate system is involved). We have pointed this out when
we first introduced spacetime diagrams in Sect. 6.1.5.
It follows from (9.4.54) that the coordinate basis vectors (∂/∂ t˜)a and (∂/∂r )a are not
orthogonal. In Fig. 9.19, the t˜-axis and the r -axis are drawn to be orthogonal; this is because
the spacetime diagram is a diagram in an open set V of R, which does not reflect the
spacetime metric, and thus does not reflect the orthogonality of vectors. All Fig. 9.19 tells
us is: all the vertical lines have r as a constant, all the horizontal lines have t˜ as a constant.
Only in this way can the two families of null geodesics characterized by dt˜/dr = −1 and
dt˜/dr = (r + 2M)/(r − 2M) be represented as the two families of curves in the diagram.
[The End of Optional Reading 9.4.1]
λ χ
= , (9.4.55)
λ̃ χ
where
2M 1/2 2M 1/2
χ ≡ (−ξ a ξa )1/2 p = 1 − , χ ≡ (−ξ a ξa )1/2 p = 1 − .
r ( p) r(p )
(9.4.56)
However, the redshift corresponding to τ > τ we mentioned before refers to (λ − λ)/λ.
Based on (9.4.55), to find λ /λ one only has to find λ̃/λ. In this case, only p is involved,
and we can deal with this by using the same approach of special relativity (see Sects. 7.2 and
7.5). This is essentially a problem of Doppler frequency shift, and so we can use (6.6.66a)
directly, in which the γ can be derived as follows:
(E is the energy of the timelike geodesic with Z a as the tangent vector.) Then, from γ =
(1 − u 2 )−1/2 we find the 3-speed of Z a relative to Z̃ a , u = E 2 − χ 2 /E. Plugging this
into (6.6.66a) yields
λ̃ E + E2 − χ2
= . (9.4.57)
λ χ
The above equation indicates that when p is infinitesimally approaching the event horizon,
there exists an infinite (Doppler) redshift between the wavelengths measured by the observers
Z a and Z̃ a . Combining (9.4.55) and (9.4.57) we can find λ /λ:
λ χ (E + E 2 − χ 2 )
= . (9.4.58)
λ χ2
The above equation can be viewed as a combination of the Doppler redshift and the grav-
itational redshift. By means of this equation we can also give a proof to a conclusion we
claimed before—for Fig. 9.21 we have τ > τ , and τ /τ increases as τ increases.
Since τ and τ can be interpreted as the periods of the light wave when it is emitted and
received, we have
τ χ (E + E 2 − χ 2 ) χ
= > 2E > E. (9.4.59)
τ χ2 χ
The E in the above equation is the energy of the world line (geodesic) of a point on the
surface of the collapsing star, i.e.,
Exercises
˜9.1. Consider Taub’s plane symmetric static spacetime, whose line element is
(8.6.1 ). By means of the Killing vector fields, write down the decoupled
equations satisfied by the parametrization t (τ ), x(τ ), y(τ ) and z(τ ) of the
timelike geodesic γ (τ ) (the reader may refer to Sect. 9.1).
9.2. In Newton’s theory of gravity, derive (9.3.18) directly using Fig. 9.8.
˜9.3. Show that the OV equation of hydrostatic equilibrium (9.3.17) can be
rewritten as
2m(r ) 1/2 d p
1− = −(ρ + p)g , (9.4.60)
r dr
9.10. Show that the maximally extended Schwarzschild spacetime has an s.p.
curvature singularity. Hint: use (8.3.21).
9.11. Show that the N1 in Fig. 9.13 is a null hypersurface. Hint: one only has
to show that its normal vector n a is null. Note that the equation of N1 is
U = 0, and hence its normal covector is n a ≡ ∇a U .
9.12. Derive (9.4.51) from (9.4.50), and then derive (9.4.54).
˜9.13. Write down the expression for the line element of the Schwarzschild metric
in the outgoing Eddington-Finkelstein coordinate system {u, r, θ, ϕ} (u ≡
t − r∗ ).
*9.14. Show that the ξ a defined in terms of (∂/∂ V )a and (∂/∂U )a [see (9.4.40)]
is a null Killing vector field on N1 and N2 .
*9.15. Transform Figs. 9.21, 9.22 and 9.23. Give another derivation for (9.4.58)
by calculating the τ /τ in the figure. Hints: (1) U ≡ −e(r∗ −t)/4M is
a constant on each outgoing null geodesic. Along the world line of an
exterior static observer and the world line of a freely falling observer on
the star’s surface, derive two expressions for the same dU (with dτ and
dτ respectively). Then putting an equal sign between the two expressions
yields (9.4.58). (2) When writing the expression of dU in terms of dτ one
needs to use the formulae of dt/dτ and dr/dτ expressed by the energy E,
which can be obtained using the approach in Sect. 9.1.
References 465
References
Ni, W.-T. (2016), ‘Solar-system tests of the relativistic gravity’, Int. J. Mod. Phys. D 25(14), 1630003.
arXiv:1611.06025.
Sachs, R. K. and Wu, H. (1977), General Relativity for Mathematicians, Spinger-Verlag, New York.
Shapiro, S. S., Davis, J. L., Lebach, D. E. and Gregory, J. S. (2004), ‘Measurement of the solar
gravitational deflection of radio waves using geodetic very-long-baseline interferometry data,
1979-1999’, Phys. Rev. Lett. 92, 121101.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Weinberg, S. (1972), Gravitation and Cosmology: Principles and Applications of the General
Theory of Relativity, John Wiley and Sons, New York.
Will, C. M. (2018), Theory and Experiment in Gravitational Physics, Cambridge University Press,
Cambridge.
Chapter 10
Cosmology I
Thoughts of the universe began ever since the dawn of mankind, full of mystery,
imagination, and wisdom. Almost every sage has thought over, talked about, and
drawn conclusions concerning the universe. However, it is only after the development
of general relativity that cosmology became a genuine science. From the point of view
of general relativity, the universe is the maximal spacetime containing everything in
Nature, with its curvature on large scales and a distribution of matter satisfying the
Einstein field equation.
Among the various branches of physics, cosmology is the most special one in the
following sense: the object it concerns is unique—our universe. There are no other
objects that could be compared with the universe. It is impossible to do experiments
again and again as is done in other branches of physics, because the evolution of the
universe cannot be replayed. The only way to study cosmology is to accumulate data
from observations, to develop cosmological models in order to interpret these data,
to speculate on the unknown past history of the universe, and to predict its future.
There are many cosmological models. In this chapter only the mostly accepted one
is introduced, known as the standard cosmological model1 for its notable success.
There are still various problems in the standard model. Hence, it has been contin-
uously amended ever since its birth. For example, an important amendment to it is
inserting an “inflation” period in the very early universe. Furthermore, observations
in 1998 showed that the universe is currently undergoing an accelerating expansion,
which consequently also requires that the standard model must be amended. A new
standard cosmological model is in development, although there are still open ques-
tions. We will introduce the inflationary model and the new standard cosmological
model in Volume II.
1We will refer to the standard cosmological model as the “standard model” for short when there is
no confusion with the Standard Model of particle physics.
Fig. 10.1 A foliation of the spacetime, each slice represents the space at a certain time
cosmology. At that time, there was little observational data to support it. He suggested
such a postulate in order to simplify the discussion. [Mach’s principle also contributed
to the development of this postulate, see Peebles (1993) pp. 10–16.]
The cosmological principle concerns the properties of the cosmic space. In pre-
relativity physics, the word “space” is rather simple to understand. In relativity,
however, it is not that simple. The key point is that spacetime is an absolute object
in relativity, while space and time are related to a decomposition by an observer.
For the same spacetime, different space and time are the result of different 3 + 1
decompositions. To provide a precise interpretation of “spatial homogeneity”, one
must first have a clear definition of “space”.
In pre-relativity physics, each surface of absolute simultaneity is the “whole
space” at a certain moment, see Fig. 6.10. In this sense the concept of space (as an
absolute concept) is rather simple. In special relativity, given an inertial reference
frame {t, x, y, z}, a constant-t surface t is the whole space at the time t relative
to this inertial reference frame. In the above cases, each spacetime is foliated by
constant-t surfaces. (This means that, for each spacetime point p, there is exactly
one constant-t surface t such that p ∈ t .) In the former case, the foliation (slicing)
is absolute (i.e., the foliation is unique, as shown in Fig. 10.1a), while in the latter,
the foliation is relative (i.e the foliations for different reference frames are different,
as shown in Fig. 10.1b).
In special relativity, the foliation relative to a given inertial reference frame has
the following properties:
① Each leaf (slice) is a connected spacelike hypersurface.
② The set {t } of all leaves of the foliation is a 1-parameter family. That is, each
real number t corresponds to a unique leaf t in the set, whose t is the coordinate
time of this leaf relative to the given inertial reference frame. Thus a leaf is also
called a specific “time”.
In general relativity, there are no global inertial reference frames in a curved
spacetime. Instead, any foliation similar to the above is acceptable, with each leaf of
the foliation identified with a time. To be more precise, in cosmology, a foliation (or
slicing) {t } of a spacetime (M, gab ) is characterized by a smooth function τ on M
satisfying the following conditions:
① (dτ )a is a timelike 1-form with (dτ )a Z a > 0 for any future-directed timelike
vector field Z a . It follows that each constant-τ surface is a spacelike hypersurface.
② For any real number t, the corresponding constant-τ surface, denoted by t ≡
{ p ∈ M | τ ( p) = t}, is either empty or connected.
470 10 Cosmology I
③ For each pair of real numbers t and t , if both t and t are not empty, they
must be diffeomorphic to each other.
It is easy to see that, for each p ∈ M, there exists a unique t such that p ∈ t .
In fact, p ∈ τ ( p) . Each surface t is called a “time”, which is a leaf (or slice) of
the foliation, see Fig. 10.1c.
To be specific, the foliation {t } as in the above is also called the foliation asso-
ciated with τ . Let {t } be a foliation of M associated with a function τ . Then, since
both τ and τ are smooth maps from the connected manifold M to R, and since both
(dτ )a and (dτ )a are nonvanishing on M, the images I ≡ τ [M] and I ≡ τ [M] are
both open intervals. In this chapter, we regard {t } and {t } as the same (or an equiv-
alent) foliation if there is a diffeomorphism f : I → I such that τ = f ◦ τ . In other
words, the foliations {t } and {t } are equivalent if they differ from each other by
a reparametrization. Later, when we say a spacetime admits a unique homogeneous
foliation, it will be in this sense.
For a more detailed discussion on the topics of foliation as well as 3 + 1 decom-
position, the reader may refer to Sect. 14.4 in Volume II.
In relativity, acceptable foliations for a spacetime satisfying the above condi-
tions are not unique. This is how the concepts of space and time in relativity have
arbitrariness. If the spacetime has some nontrivial symmetries, foliations adapted to
these symmetries are more convenient. In fact, as a special case with zero curvature,
Minkowski spacetime admits various foliations, among which a foliation consisting
of surfaces of simultaneity relative to an inertial reference frame is the most accepted,
because it is associated with the symmetries of Minkowski spacetime.
Spatial homogeneity and spatial isotropy of a spacetime are closely related to
some intrinsic symmetries of the spacetime. By spatial homogeneity, we refer to the
existence of a foliation of the spacetime such that, at all points in the same arbitrary
leaf of the foliation, the geometric properties and physical properties are the same.
Hence each leaf in this foliation is called a surface of homogeneity. Such a foliation
is adapted to the intrinsic symmetries. Although other foliations are acceptable, they
are not convenient for use in that some of their leaves are not surfaces of homogeneity.
Unless otherwise stated, when we talk about spaces in cosmology they are all surfaces
of homogeneity. Spatial isotropy is relative to a reference frame. If there exists a
reference frame in spacetime where no observer can find any spatial direction (a
direction orthogonal to the world line) distinct from other spatial directions in a local
experiment at a certain time, then we say that the spacetime is spatially isotropic.
In this subsection we will show that on each surface of homogeneity, there are
only three kinds of possible geometry satisfying the cosmological principle. This
conclusion will significantly simplify the successive discussions.
The cosmological principle assumes that the universe is spatially homogeneous
and spatially isotropic in both physics and geometry. To be precise, and to be
10.1 Kinematics of the Universe 471
convenient for discussing the spatial geometry of the universe, we shall first define
some geometric concepts, including spatial homogeneity and spatial isotropy, using
mathematical language.
Definition 1 A generalized Riemannian space (M, gab ) is said to be homogeneous
if, for any p, q ∈ M, there exists an isometry φ : M → M of gab such that φ( p) = q.
An embedding submanifold i : S → M is said to be homogeneous if (S, h ab )
is a homogeneous generalized Riemannian space, where h ab = i ∗ gab is the induced
metric. For convenience, we will also refer to the image i[S] as a homogeneous
submanifold of M.
A foliation {t } of a spacetime (M, gab ) is called a homogeneous foliation if
each leaf in it is a homogeneous submanifold.
A spacetime (M, gab ) is said to be spatially homogeneous if it admits a homo-
geneous foliation {t } (see Fig. 10.2). Each t in the foliation is called a surface of
homogeneity.
w2a
pp
w1a
472 10 Cosmology I
Proof For clarity, we will distinguish an embedded submanifold (as a map) and its image
in this proof. Consider a generalized Riemannian space (S, h ab ) so that the map i : S → M
is the embedded submanifold, whose image i[S] = . Then h ab = i ∗ gab . Since ψ is an
isometry, according to Theorem 4.4.5, i = ψ ◦ i : S → M is also an embedded submanifold
of M, and the corresponding induced metric h ab = i ∗ g is equal to h .
ab ab
The image of i : S → M being homogeneous means that (S, h ab ) is a homogeneous
generalized Riemannian space. Now that h ab = h , i : S → M is also a homogeneous
ab
submanifold. It is easy to see that ψ[] is the image of i : S → M. Therefore, ψ[] is also
homogeneous.
Proposition 10.1.3 Suppose {t } is a homogeneous foliation of a spacetime (M, gab ) asso-
ciated with a function τ . Let ψ : M → M be an isometry of (M, gab ). If ψ preserves the
future direction (time orientation), then {ψ[t ]} is a homogeneous foliation of (M, gab )
associated with the function (ψ −1 )∗ τ ; otherwise, {ψ[t ]} is a homogeneous foliation of
(M, gab ) associated with the function −(ψ −1 )∗ τ .
Proof First, we show that {ψ[t ]} satisfies the three conditions of a foliation. Let τ =
(ψ −1 )∗ τ , then τ is a smooth function on M. Since ψ is an isometry, (dτ )a = (ψ −1 )∗ (dτ )a
is nonvanishing on M. For an arbitrary future-directed timelike vector field Z a on M and
any p ∈ M,
Proof of Proposition 10.1.1 Suppose p is a point on the world line of an isotropic observer
G, and is the surface of homogeneity containing p which is not orthogonal to the 4-
velocity Z a of G at p (see Fig. 10.4). Suppose V p is the 4-dimensional tangent space at p,
and W p is the linear subspace of V p orthogonal to Z a . Then the elements in W p are spatial
vectors (with respect to G) at p. Let w1 a ∈ W p be a unit vector tangent to . Since is not
orthogonal to Z a , there also exists a unit vector w2 a ∈ W p that is not tangent to .
Let ψ be an arbitrary isometry of (M, gab ) such that ψ( p) = p, and ψ∗ Z a = Z a . Then,
according to Proposition 10.1.3, {t } being a homogeneous foliation assures that {ψ[t ]}
is also a homogeneous foliation. Due to the uniqueness of homogeneous foliations, we have
{ψ[t ]} = {t }. Especially, for the leaf containing p, ψ[] is also a leaf in the same
foliation, which contains ψ( p) = p. Hence, ψ[] = . Since w1 a is tangent to , ψ∗ w1 a
is tangent to ψ[] = , and thus ψ∗ w1 a = w2 a . Consequently, there is no isometry ψ such
that ψ( p) = p and ψ∗ w1 a = w2 a , which contradicts the fact that G is an isotropic observer.
p
w2a
w1a
474 10 Cosmology I
that the identity map I on p (2) corresponds to the tensor δa [c δb d] . Thus, (10.1.2)
can be rewritten as a equality of tensors
2 In linear algebra, a linear operator L on a real vector space V is said to be symmetric or self-
adjoint if (u, Lv) = (Lu, v) for arbitrary u, v ∈ V . More discussion on the self-adjointness of
linear operators will be given in Appendix B in Volume II.
10.1 Kinematics of the Universe 475
R̂ab cd = 2K δa [c δb d] . (10.1.3)
It follows from R̂abcd = R̂cdab that (X, LY ) = (L X, Y ). This indicates that the linear oper-
ator L (i.e., R̂ab cd ) is a symmetric operator on p (2). Therefore, there exists an orthonormal
basis of p (2) formed by the eigenvectors of L. In other words, the corresponding matrix of
L in this basis is diagonal, and each diagonal element is the eigenvalue of a corresponding
eigenvector.
Let I be the identity map on p (2). In order to show that L = 2K I with K ∈ R, we only
have to show that all the eigenvalues of L are equal and then denote them by 2K , see the
following proposition:
Proof Suppose (t , h ab ) is a 3-dimensional Riemannian space. Let W p be the tangent space
of t at p ∈ t . Then (W p , h ab | p ) is a 3-dimensional vector space together with a positive
definite metric, and p (2) is nothing but the set of all the 2-forms on W p . Denote the dual
form of Yab ∈ p (2) (which is a 1-form) by wa . Then, according to (5.6.1), wc = Y ab ε̂abc /2,
where ε̂abc is the volume element associated with h ab . Raising the index of wc by h ac , we
/2. We assume
obtain a spatial vector wa = ε̂abc Ybc /2. Similarly, there is also wa = ε̂abc Ybc
that Yab and Yab are chosen such that w a and w a have the same magnitudes, then it follows
from the fact that the cosmic spacetime is isotropic that there exists an isometry ψ of (M, gab )
476 10 Cosmology I
where we used ψ ∗ R̂ab cd = R̂ab cd in the first equality, (10.1.5) in the second equality, and
(10.1.4) in the third equality. Also,
3 As shown in the proof of Proposition 10.1.6, the isotropy and the uniqueness of the homogeneous
foliation is a sufficient condition for (t , h ab ) to be a space of constant curvature. In fact, the
10.1 Kinematics of the Universe 477
if we can list out the h ab corresponding to each real number K , we can exhaust all
the possible (local) geometries of t .
Speaking of the geometries with maximal symmetry, the first thing one may have
in mind is a flat metric. For a flat metric, its curvature tensor vanishes, which trivially
satisfies (10.1.3) and (10.1.1 ) with K = 0. Hence, on a surface t of homogeneity,
the line element with K = 0 can be expressed by means of Cartesian coordinates as
dl 2 = dx 2 + dy 2 + dz 2 . (10.1.9)
x 2 + y 2 + z 2 + w2 = R̄ 2 , (10.1.10)
where R̄ > 0 is the radius of the 3-sphere. Similar to the 3-dimensional Euclidean
space, the spherical coordinates in the 4-dimensional Euclidean space, R, ψ, θ and
ϕ, are defined by
x = R sin ψ sin θ cos ϕ ,
y = R sin ψ sin θ sin ϕ ,
(10.1.11)
z = R sin ψ cos θ ,
w = R cos ψ .
Then, the line element of the 4-dimensional Euclidean space can be expressed as
From (10.1.10) and (10.1.11) we can see that, on a 3-sphere S R̄ with radius R̄, we
have R = R̄ and dR = 0. Thus, the line element on S R̄ induced by the line element
of the 4-dimensional Euclidean space is
From the above line element, one can find that the curvature of S R̄ reads R̂ab cd =
2 R̄ −2 δa [c δb d] (Exercise 10.1). Hence, the curvature of the 3-sphere S R̄ satisfies
uniqueness of the homogeneous foliation is not necessary, since Minkowski spacetime is obviously
a space of constant curvature whose homogeneous foliation is not unique.
478 10 Cosmology I
(10.1.3) with K = R̄ −2 . Since R̄ −2 can be any positive number, the 3-spheres with
various radii exhaust the local geometries of the 3-dimensional spaces of constant
curvature with K > 0.
For spaces of constant curvature with K < 0, let us consider a 3-dimensional
circular hyperboloid (denoted by Sξ̄ , shown in Fig. 10.5 with two dimensions sup-
pressed) in 4-dimensional Minkowski spacetime, determined by the following equa-
tion:
t 2 − x 2 − y 2 − z 2 = ξ̄ 2 , (10.1.13)
Then, the 4-dimensional Minkowski line element can be expressed in the above
region as
ds 2 = −dt 2 + dx 2 + dy 2 + dz 2
= −dξ 2 + ξ 2 [dψ 2 + sinh2 ψ (dθ 2 + sin2 θ dϕ 2 )] . (10.1.15)
From (10.1.14) we can see that on the 3-dimensional hyperboloid Sξ̄ defined by
(10.1.13), we have ξ = ξ̄ and dξ = 0, and hence the line element on Sξ̄ induced by
the 4-dimensional Minkowski line element reads
From the above line element, one can find that the the curvature of Sξ̄ is (left as an
exercise)
R̂ab cd = −2ξ̄ −2 δa [c δb d] . (10.1.17)
10.1 Kinematics of the Universe 479
Hence, the curvature on the 3-dimensional hyperboloid Sξ̄ satisfies (10.1.3) with
K = −ξ̄ −2 . Since ξ̄ −2 could be any positive number, the 3-dimensional hyperboloid
Sξ̄ with various positive ξ̄ exhausts the local geometries of the 3-dimensional spaces
of constant curvature with K < 0.
Summary. As a consequence of the cosmological principle, at any time of the uni-
verse (i.e., for any surface of homogeneity), there are only three kinds of possible
local spatial geometries, described by the following three kinds of metrics:
(a) 3-dimensional spherical metric, whose line element can be expressed in terms
of the spherical coordinates ψ, θ and ϕ as
(b) 3-dimensional flat metric, whose line element can be expressed in terms of
the Cartesian coordinates as
dl 2 = dx 2 + dy 2 + dz 2 , K = 0. (10.1.19)
In terms of the spherical coordinates ψ, θ and ϕ, the above line element can also be
written in a form similar to (10.1.18)
dl 2 = dψ 2 + ψ 2 dθ 2 + sin2 θ dϕ 2 , (10.1.19 )
(c) 3-dimensional hyperbolic metric, whose line element can be expressed in terms
of the hyperbolic coordinates ψ, θ and ϕ as
Is the universe spatially finite? Throughout history, this question has been
answered in both the affirmative and the negative. Each of these conceptions have
dominated the mainstream in certain periods, with the amounts of time almost equal.
Now, as it is certain that the spatial geometry of the universe can be classified into
the above three cases, with some further requirements on the global topology (see
the paragraph below) of the universe, this question becomes absolutely clear. In
case (a), the space of the universe is a 3-dimensional sphere, resulting in a “closed
universe” whose volume is finite. Although the universe is “finite’ in this case, it
is “boundless” since a sphere has no boundaries. In case (b) and (c), the space of
the universe is respectively a 3-dimensional Euclidean space and a 3-dimensional
hyperboloid, each resulting in an “open universe” with an infinite volume, and hence
we say that the universe is “infinite” in these two cases. However, which case does
our universe really belong to? We will discuss this problem in Sect. 10.3 and Chap.
15 in Volume II.
It is necessary to elucidate that a space of constant curvature only requires its
metric to satisfy no more than condition (10.1.1 ) Especially, there is no condition
for its global topological structure. Take a surface (t , h ab ) of homogeneity in the
universe as an example. In the case of K > 0, (10.1.1) only leads to the conclusion
480 10 Cosmology I
The cosmic spacetime should be equipped with a metric gab such that, on each surface
t of homogeneity, the induced metric h ab is one of those we obtained in Sect. 10.1.2
(corresponding to the line element dl 2 ). One can introduce a suitable coordinate
system so that the line element of gab can be expressed in a simple form. To do that,
we first point out the following conclusion: for two arbitrary isotropic observers A
and B, and for two arbitrary surfaces t1 , t2 of homogeneity with t1 < t2 , the world
line segments of A and B between t1 and t2 are of the same length. Physically, it
is easy to accept this statement, since all isotropic observers should be on an equal
footing, and the existence of an exceptional observer is hard to imagine. In Optional
Reading 10.1.4, we will give a rigorous proof for this conclusion.
Now let us introduce the coordinate system. On a surface 0 of homogeneity,
set (local) coordinates x 1 ≡ ψ, x 2 ≡ θ and x 3 ≡ ϕ (as described in Sect. 10.1.2 for
the different signs of K ), then the world lines of isotropic observers can carry these
coordinates out of 0 in the following way: along the world line γ of an isotropic
observer, the coordinates ψ, θ and ϕ remain constants, determined by their values
at the intersecting point of γ and 0 . Next, set the standard clock carried by each
isotropic observer (i.e., the proper time τ ) to zero on 0 , and define the coordinate
time t at each spacetime point p to be the proper time τ of the isotropic observer
passing through p. In this way, we have a coordinate system {t, x i } of the cosmic
spacetime, called the Robertson-Walker (RW) coordinate system. This system is
obviously a comoving coordinate system of an isotropic reference frame. Note that
the value of t on a surface of homogeneity can be different from the parameter for
the family {t } of the surfaces of homogeneity, since the parameter for the family
in principle can be assigned arbitrarily. In other words, the homogeneous foliation
{t } can be associated with a function different than the coordinate t. However, to
10.1 Kinematics of the Universe 481
avoid confusion, we will stipulate that the homogeneous foliation {t } is indeed
associated with the coordinate t in the RW system. That is, for an arbitrary surface
t of homogeneity, the time coordinate at every point equals the parameter of t in
the foliation. The RW coordinate system has two major virtues:
① Each constant-t surface is a surface of homogeneity. Therefore a surface t of
homogeneity is also a surface of simultaneity, representing the whole space of the
universe at a time t.
② The world line of an isotropic observer is also a t-coordinate line, with its
coordinate time being its proper time τ , called the cosmic time. Unless otherwise
stated, the “time” in cosmology refers to the cosmic time.
Due to the virtue ②, the coordinate basis vector (∂/∂t)a equals the 4-velocity Z a
of isotropic observers. Hence
Due to the virtue ①, the three spatial coordinate basis vectors (∂/∂ x i )a are all tangent
to surfaces of homogeneity, and thus are orthogonal to (∂/∂t)a . Hence,
where the definition of the induced metric (see Definition 1 in Sect. 4.4) is used in the
second step. Note that generally speaking, h i j depends on t, x 1 , x 2 and x 3 , we may
denote it by h i j (t, x) (where x stands for x 1 , x 2 , x 3 ). By means of the uniqueness
of the homogeneous foliation, it can be proved that (see Optional Reading 10.1.5)
h i j (t, x) has the form of “separation of variables”, i.e.,
with a(t) depending only on t, and ĥ i j (x) depending only on x i . Consequently, the
line element of the cosmic metric gab in the RW coordinate system reads
dl 2 = a 2 (0) ĥ i j (x) dx i dx j ,
which belongs to the three cases summarized in Sect. 10.1.2. First we look at the
simplest case, i.e., case (b), where the spatial metric is flat. In terms of the Cartesian
coordinates x i , the induced line element on 0 is a 2 (0) ĥ i j (x) dx i dx j = δi j dx i dx j .
We may further take a(0) = 1 so that ĥ i j (x) = δi j . Then, (10.1.22) in this case
482 10 Cosmology I
becomes
The form of the function a(t) is determined by the Einstein field equation (see
Sect. 10.2.3). Similarly, if the line element induced on 0 has the form of (10.1.18)
or (10.1.20), then we have, respectively
1
dl 2 = a 2 (0) ĥ i j dx i dx j = [dψ 2 + sin2 ψ (dθ 2 + sin2 θ dϕ 2 )] ,
K
1
dl 2 = a 2 (0) ĥ i j dx i dx j = − [dψ 2 + sinh2 ψ (dθ 2 + sin2 θ dϕ 2 )] .
K
For these two cases, we may further take a 2 (0)K = 1 and a 2 (0)K = −1, respec-
tively. Then, (10.1.22) in cases (a) and (c) becomes
ds 2 = −dt 2 + a 2 (t) dψ 2 + sin2 ψ dθ 2 + sin2 θ dϕ 2 [case (a)] , (10.1.23a)
ds 2 = −dt 2 + a 2 (t) dψ 2 + sinh2 ψ dθ 2 + sin2 θ dϕ 2 [case (c)] . (10.1.23c)
dr 2
ds 2 = −dt 2 + a 2 (t) + r 2 (dθ 2 + sin2 θ dϕ 2 ) , (10.1.25)
1 − kr 2
where
10.1 Kinematics of the Universe 483
a b
l2
∂ ∂
D AB (t) = h ab dl .
l1 ∂l ∂l
where
a b
l2
∂ ∂
D̂ AB = ĥ ab dl . (10.1.27)
l1 ∂l ∂l
From the above equation and (10.1.25), we can see that D̂ AB is only determined by
the galaxies A and B, and it is independent of time. Equation (10.1.26) then indicates
that a(t) is the value obtained by measuring the distance between A and B at t, with
D̂ AB as the unit. Hence, a(t) reflects the time dependence of the distance between
any two galaxies, and thus is called the scale factor of the universe. If the spatial
coordinates of the galaxies A and B are (r A , θ, ϕ) and (r B , θ, ϕ), then the parameter
l can be chosen to be r along the geodesic. It can be verified that both θ and ϕ remain
constants along the geodesic γ (l). Then (10.1.26) can be expressed as
rB
dr
D AB (t) = a(t) √ . (10.1.28)
rA 1 − kr 2
It is easy to perform the above integral for all cases of k = 1, 0 and −1. Note that the
−dt 2 in (10.1.25) is −c2 dt 2 in SI, which has the dimension of length. Thus, when
k = ±1, the coordinate r is dimensionless, and a(t) has the dimension of length;
when k = 0, the dimensions of a(t) and r can be arbitrary, with a(t)r having the
dimension of length.
When k = 1, the universe is closed. In this case, one can also ask about the volume
of the universe at any time t (as the volume of a 3-sphere), which is obviously related
to a(t). It follows from (10.1.23a) that the volume element associated with the spatial
induced metric h ab is
ε̂ = a 3 sin2 ψ sin θ dψ ∧ dθ ∧ dϕ .
Therefore, a 3 (t) is proportional to the volume of the universe at the time t, and a(t)
is exactly the radius of the closed universe at t.
[Optional Reading 10.1.4]
Now we will prove a statement we claimed in the beginning of Sect. 10.1.3, which is
summarized as Proposition 10.1.8. But before that, we shall introduce some facts which will
be useful in the proof of Proposition 10.1.8 and the later discussions.
First, there is a theorem which assures that each point in a Riemannian space has a convex
neighborhood, in which any pair of points can be joined by a unique geodesic segment lying
in it [for a proof of this theorem, see Hicks (1965)]. Then, we have the following proposition.
Proposition 10.1.7 Suppose (M, gab ) is a spacetime satisfying the cosmological principle,
which has a unique homogeneous foliation. Let t be a slice in the foliation, and P be
10.1 Kinematics of the Universe 485
Proposition 10.1.8 Assume that the spacetime (M, gab ) satisfies the cosmological principle
and admits a unique homogeneous foliation. Then, for any pair of isotropic observers and
any pair of surfaces t1 and t2 of homogeneity, the lengths of the world line segments of
these two observers between t1 and t2 are equal.6
Proof Suppose A and B are two different isotropic observers, and t1 and t2 are two
different surfaces of homogeneity. Let a1 and a2 be the intersection points of the world line
of A and t1 and t2 , i.e., a1 = A ∩ t1 and a2 = A ∩ t2 . Similarly, let b1 = B ∩ t1 and
b2 = B ∩ t2 (Fig. 10.7).
Now being mindful that (t1 , h ab ) is a 3-dimensional Riemannian space, let us first
suppose that b1 is in a convex neighborhood of a1 . Then, according to Proposition 10.1.7,
there exists an isometry ψ of (M, gab ) satisfying ψ[t1 ] = t1 , ψ[t2 ] = t2 and ψ[A] =
B. Since a2 = A ∩ t2 and b2 = B ∩ t2 , it follows that ψ(a2 ) = b2 . Hence, for the line
segments Aa1 a2 and Bb1 b2 , we have ψ[Aa1 a2 ] = Bb1 b2 . As a consequence, the arc lengths of
Aa1 a2 and Bb1 b2 are equal.
Since (t1 , h ab ) is connected, and since it can be covered by convex neighborhoods, One
can easily show that the arc lengths of Aa1 a2 and Bb1 b2 are still equal even if b1 is not in a
convex neighborhood of a1 .
5 Technically, P ∩ t = { p} is a set with p as its only element. We will recognize it as a point for
convenience.
6 If the homogeneous foliation is not unique, then the lengths of the world line segments of two
isotropic observers between t1 and t2 can be different, even if t1 and t2 are leaves in the same
homogeneous foliation.
486 10 Cosmology I
a1 p w' a
wa b1
t1
and p2 = (t2 , x Gi ). Let X a | and Y a | be two spatial vectors at p with the same magnitude,
p1 p1 1
whose coordinate components are X i and Y i , respectively, i.e., X a | p1 = X i (∂/∂ x i )a | p1 and
Y a | p1 = Y i (∂/∂ x i )a | p1 . Then
t = t , x i = f i (x) , i = 1, 2, 3, (10.1.31)
and x G i = f i (x ), where the x in the parentheses stands for x 1 , x 2 and x 3 , and similarly for
G
x G . It follows from (4.1.7) that
∂ f i
Yi = X j j . (10.1.32)
∂ x xG
Consider the spatial vectors X a | p2 = X i (∂/∂ x i )a | p2 and Y a | p2 = Y i (∂/∂ x i )a | p2 at p2 .
Since the real numbers X i and Y i satisfy (10.1.32), we have ψ∗ (X a | p2 ) = Y a | p2 . Hence,
X a | p2 and Y a | p2 are of the same magnitude. That is,
Next, suppose X a | p1 and Y a | p1 do not have the same magnitude (and Y a | p1 = 0). Since the
3 × 3 matrix constituted by h i j is positive definite, there exists λ ∈ R such that
and hence
h i j (t1 , x G )X i X j h i j (t1 , x G )Y i Y j
= . (10.1.34)
h kl (t2 , x G )X k X l h kl (t2 , x G )Y k Y l
10.1 Kinematics of the Universe 487
Note that the indices i, j, k and l in the above equation are all summed over 1, 2 and 3. The
ratio in the above equation does not depend on (X 1 , X 2 , X 3 ) and (Y 1 , Y 2 , Y 3 ), but is only
determined by t1 , t2 and x G . Thus, there exists a real number ω(t1 , t2 , x G ) such that
Finally, now we show that ω actually does not depend on x. For an arbitrary isotropic
observer G with p1 = G ∩ t1 and p2 = G ∩ t2 , there is a convex neighborhood N of
p1 in t1 . For simplicity, we may assume that the coordinate patch of x i covers N . Then,
according to Proposition 10.1.7, for an arbitrary point p1 ∈ N , there is an isometry φ of
(M, gab ) satisfying ① φ maps each t to itself; ② φ maps each isotropic observer to an
isotropic observer, and, especially, ③ φ maps G to G , the isotropic observer that contains
p1 = φ( p1 ) ∈ G . Let p2 ∈ G ∩ t2 , then it follows that p2 = φ( p2 ). Because of ① and ②
above, the coordinate transformation induced by φ will have the form of (10.1.31). In terms
of the new coordinates t and x i , the line element of the cosmic metric gab can be expressed
as
∂fi ∂f j k l
ds 2 = −dt 2 + h i j (t , x ) dx i dx j = −dt 2 + h i j (t, x ) k dx dx ,
∂ x ∂ xl
Since φ is an isometry, comparing with the line element in the old coordinate system ds 2 =
−dt 2 + h kl (t, x) dx k dx l , we obtain
∂fi ∂f j
h i j (t, x ) = h kl (t, x) . (10.1.37)
∂ xk ∂ xl
Setting t to be t1 and t2 yields
∂fi ∂f j
h i j (t1 , x ) = h kl (t1 , x) , (10.1.38)
∂xk ∂ xl
∂fi ∂f j
h i j (t2 , x ) k = h kl (t2 , x) . (10.1.39)
∂x ∂ xl
Applying (10.1.36) to both sides of (10.1.38), we have
∂fi ∂f j
ω(t1 , t2 , x )h i j (t2 , x ) = ω(t1 , t2 , x)h kl (t2 , x) .
∂ xk ∂ xl
Noticing (10.1.39), we obtain ω(t1 , t2 , x ) = ω(t1 , t2 , x). To see more clearly that ω does not
depend on x, let us go to the active perspective, i.e., viewing the coordinate transformation
x i → x i under ψ as the map between two observers G and G in the old coordinates x i .
Then, we have
488 10 Cosmology I
ω(t1 , t2 , x G ) = ω(t1 , t2 , x G ) .
Since G is arbitrary, the above equation shows that ω(t1 , t2 , x G ) is independent of x G for
G ∩ N = ∅. As t1 can be covered by convex neighborhoods, it follows that ω(t1 , t2 , x)
does not depend on x, so it can be denoted by ω(t1 , t2 ). Then, (10.1.36) turns out to be
h i j (t1 , x) = ω(t1 , t2 )h i j (t2 , x). Particularly, let t2 be fixed and t1 be arbitrary. Denoting t1
by t, we have
h i j (t, x) = ω(t, t2 )h i j (t2 , x) . (10.1.40)
Let ĥ i j (x) ≡ h i j (t2 , x) and a 2 (t) ≡ ω(t, t2 ), the above equation becomes h i j (t, x) =
a 2 (t)ĥ i j (x), i.e., (10.1.21).
[The End of Optional Reading 10.1.5]
In the early 20th century, the American astronomer V. S. Slipher observed the spectral
lines of 41 galaxies. He discovered redshifts within 36 of these galaxies. Recall that
a redshift is defined to be z ≡ (λ − λ)/λ, where λ and λ are the wave lengths of
light when it is emitted and observed, respectively. Attributing the redshifts to the
Doppler effect, Slipher’s discovery shows that these 36 galaxies are moving away
from our galaxy, the Milky Way. In other words, this indicates that our universe is
expanding. (Since the solar system is orbiting around the Galactic Center, i.e., the
center of the Milky Way, the blueshifts of the other 5 galaxies can be interpreted as
being caused by their motion toward the Sun.) So far, spectra from tens of thousand
of galaxies have been measured, and all of them are redshift except for few (those
from nearby galaxies). This provides a solid observational basis for the expansion
of the universe. In 1923, the American astronomer Edwin Hubble began to make
measurements of the distance of the extragalactic galaxies from us, which is more
difficult than the measurement of redshifts. He found that the redshift z of a galaxy is
proportional to its distance D from us, and z is equal to the recessional speed u when
the latter is very small. [The Taylor expansion of (6.6.66a) to the first order yields
z∼= u.] Hence, Hubble published the well-known Hubble law in 1929 stating that,
ȧ(t)
H (t) := , (10.2.3)
a(t)
which is independent of both the galaxies and the distance between the galaxies, so
that
The above equation indicates that, at any time t, the recessional speed (the speed
of separation) between two galaxies is proportional to the proper distance between
them. Let t0 be the present time, and denote H (t0 ) simply by H0 (i.e., the Hubble
constant), then
which is exactly (10.2.1). The result in (10.2.4) was derived by the Belgian physicist
G. Lemaître two years before Hubble’s article, and thus more properly, Hubble’s law
is also called the Hubble-Lemaître law. Note that the Hubble parameter is different
from from the Hubble constant in that the former depends on t, while the latter is
merely the current value of the former. Because observational results indicate that
H0 > 0, it follows from (10.2.1 ) that u(t0 ) > 0 whenever D(t0 ) = 0. That is, on an
arbitrary galaxy, the observation for another galaxy will show that the latter is moving
away. [The measurement by Hubble only reveals that other galaxies are going away
from the Milky Way, while (10.2.1) asserts that any pair of galaxies are going away
from each other.] Thus, the fact that all galaxies are measured to be away from us
does not mean that the Milky Way is the center of the expanding universe. As an
analogy, one can imagine a balloon with lots of ants on its surface. When such a
balloon is expanding, each of these ants finds that the others are going away from
it, with no ants being more special than the others. Just like the expanding balloon,
there is no center of expansion in the universe.
According to (10.2.1), u could be greater than the speed of light in vacuum when
D is large enough. This does not contradict relativity. To see this, recall that one
of the principles of relativity states that “the world line of a point mass must be
timelike.” This is an absolute and unambiguous statement, which, by a properly
defined concept of speed, is equivalent to the statement that “the speed of a point
mass is less than the speed of light in vacuum” (which is a relative statement). One
must keep in mind that the “speed” in the latter statement refers to the magnitude
of the 3-velocity u a defined in (6.3.28), i.e., the 3-speed of a point mass obtained
from a local measurement by an instantaneous observer. If the observer is an inertial
observer in Minkowski spacetime, then the speed is nothing but the speed of the
490 10 Cosmology I
particle relative to the inertial frame the observer belongs to. However, there are
various definitions of speed. For a definition different from the above definition of
speed, a speed greater than the speed of light does not necessarily violate relativity.
The recessional speed of galaxies is such an example. It is defined as the derivative
of the distance of the galaxies with respect to the cosmic time, which is, of course,
of physical meaningful and reasonable to be called a speed. However, this is not the
speed obtained from a local measurement by an instantaneous observer, and hence
it is not a contradiction to the principle of relativity. In fact, when deriving the RW
metric, the world line of each galaxy has been recognized to be timelike, which
automatically obeys the principle of relativity stated above. As a consequence, for
any instantaneous observer (not necessarily an isotropic observer), the speed of a
galaxy obtained by a local measurement is certainly less than the speed of light in
vacuum.
Hubble interpreted cosmic redshifts as the Doppler effect in flat spacetime, from
which he obtained the recessional speed of the galaxies. According to general rela-
tivity, the existence of matter in the universe results in the curvature of spacetime,
and cosmic redshifts are actually an effect of the curved spacetime. Compared to
cosmological scales, the galaxies Hubble measured are of very small distances from
us, and the redshift is relatively small. In this case, it is acceptable to treat these
redshifts as the Doppler effect. However, for galaxies at sufficiently large distances
from us, their redshifts have to be interpreted as being due to the curved spacetime
geometry.
Under the geometric optics approximation (see Optional Reading 7.2.1), light
signals are regarded as propagating along null geodesics. Suppose a photon emitted
by a galaxy A at p1 travels along a null geodesic η(β) (where β is an affine parameter),
and this photon is received by another galaxy B at p2 (see Fig. 10.8). Let K a =
(∂/∂β)a be the wave 4-vector of the above photon, then its angular frequency at p1
relative to the observer A is ω1 = −gab Z a K b | p1 , where Z a | p1 is the 4-velocity of A
at p1 . Noticing that Z a is the same as the coordinate basis vector (∂/∂t)a in the RW
coordinate system {t, r, θ, ϕ}, and that
dt ∂ b dx i ∂ b
K =
b
+ ,
dβ ∂t dβ ∂ x i
p2
t2
a
Z a
K
p1
D t1 t1
d2 x μ dx ν dx σ
2
+ μνσ = 0, (μ = 0, 1, 2, 3) , (10.2.5)
dβ dβ dβ
a ȧ
0 11 = , 0 22 = a ȧr 2 , 0 33 = a ȧr 2 sin2 θ ,
1 − kr 2
ȧ kr
1 01 = 1 10 = , 1 11 = ,
a 1 − kr 2
1 22 = −r (1 − kr 2 ) , 1 33 = −r (1 − kr 2 ) sin2 θ ,
ȧ 1
2 02 = 2 20 = 3 03 = 3 30 = , 2 12 = 2 21 = 3 13 = 3 31 = ,
a r
2 33 = − sin θ cos θ , 3 23 = 3 32 = cot θ .
It is easy to verify that the world line of an isotropic observer is a geodesic. We leave
this as Exercise 10.2. [From the above expressions for the Christoffel symbols as
well as (5.7.2), this is in fact almost obvious.] Setting μ = 2, 3 in (10.2.5), we have
2
d2 θ 2ȧ dt dθ 2 dr dθ dϕ
+ + − sin θ cos θ = 0 ,
dβ 2 a dβ dβ r dβ dβ dβ
d2 ϕ 2ȧ dt dϕ 2 dr dϕ dθ dϕ
2
+ + +2 cot θ = 0 . (10.2.6)
dβ a dβ dβ r dβ dβ dβ dβ
been given, functions t (β), r (β), θ (β) and ϕ(β) in its parametric equation are all
determined in a given coordinate system. To show that θ (β) = θ0 and ϕ(β) = ϕ0 , one
only needs to notice that (10.2.6) is a system of 2nd-order equations for two unknown
functions θ (β) and ϕ(β), while θ (β) = θ0 and ϕ(β) = ϕ0 give the unique solution
satisfying the initial conditions θ ( p1 ) = θ0 , ϕ( p1 ) = ϕ0 , dβ
dθ
| p1 = 0 and dβ
dϕ
| p1 = 0.]
Furthermore, setting μ = 0 in (10.2.5), we have
2
d2 t a ȧ dr
+ = 0.
dβ 2 1 − kr dβ
2
ω da
Define ω = dt/dβ, then the above equation becomes dω
dβ
+ a dβ
= 0. Its general
solution gives
ω0
ω= , (10.2.8)
a
where ω0 is a constant of integration. In the manner as we have discussed above, we
see that for any value of β, the corresponding value of ω is the angular frequency of
the photon measured by the isotropic observer that passes through the point η(β).
Thus, the above equation can be interpreted as follows: as the universe is expanding,
the wavelength of each photon in the universe (with respect an isotropic observer) is
stretched proportionally, which leads to the redshift. Applying (10.2.8) to points p1
and p2 , respectively, we obtain
ω2 a(t1 )
= , (10.2.9)
ω1 a(t2 )
λ2 − λ1 ω1 a(t2 )
z= = −1= − 1. (10.2.10)
λ1 ω2 a(t1 )
a(t2 ) ∼
= a(t1 ) + ȧ(t1 ) (t2 − t1 ) ∼
= a(t1 ) + ȧ(t1 ) D(t1 ) ,
Hence,
ȧ(t1 )
z= D(t1 ) = H (t1 ) D(t1 ) , (10.2.11)
a(t1 )
Remark 1 In Exercise 10.3 we will see another approach for deriving (10.2.8), which
takes the advantage of the geodesic equation in the form of K a ∇a K b = 0, instead
of using the component form (10.2.5). Alternatively, (10.2.8) can also be derived
in a purely geometric fashion (which makes use of the fact that the contraction of
the tangent vector field of a geodesic and a Killing field remains constant along the
geodesic). For details, see Wald (1984), pp. 103–104.
The Einstein tensor G ab for the Robertson-Walker metric can be expressed in terms
of a(t). When Tab , the energy-momentum tensor of all the content in the universe, is
also expressed in terms of a(t), Einstein’s equation G ab = 8π Tab will give rise to a
set of differential equations for a(t), from which we can solve for the time evolution
of the universe.
The contents of the universe can be classified into two types: those consisting of
particles with nonzero rest masses are called matter; those consisting of particles
with zero rest mass are called radiation. Matter is mainly accumulated in galax-
ies, while the main contribution to radiation is the cosmic microwave background
radiation (CMB, or CMBR), which is some electromagnetic microwaves distributed
throughout the whole universe, discovered in 1965 (for details, see Sect. 10.3.1). On
a cosmological scale, each galaxy can be treated as a point mass (like a drop in the
ocean), and all the galaxies are regarded as forming a perfect fluid. The pressure of
such a perfect fluid is negligible (namely the random motions of the galaxies can
be neglected), and thus such a perfect fluid can be approximated as a dust, with
each galaxy being a particle in this dust. Hence, the world line of each galaxy is a
geodesic (see Sect. 6.5). Furthermore, since a perfect fluid is isotropic, each galaxy
can be approximately regarded as an isotropic observer [as we have seen below
(10.2.5), the world lines of isotropic observers are indeed geodesics]. The energy-
momentum tensor of all the matter (i.e., the dust) in the universe can be expressed
as
Tab (matter) = ρM Ua Ub ,
494 10 Cosmology I
where U a is the 4-velocity field of the isotropic observers, and ρM is the energy
density of matter measured by the isotropic observers. On the other hand, all the
radiation in the universe may also be treated as a special kind of perfect fluid, whose
4-velocity is the same as U a . Then, the energy-momentum tensor of all the radiation
in the universe reads
where the energy density ρR and the pressure p of the radiation are both measured by
the isotropic observers and satisfy p = ρR /3 [see (6.5.3)]. Combining these two con-
tributions, the total energy-momentum tensor of the universe can be approximately
written as
Tab = ρUa Ub + p (gab + Ua Ub ) , (10.2.12)
where ρ = ρM + ρR is the sum of the energy densities of the dust (galaxies) and the
radiation. In the actual universe, there are also other kinds of matter, in addition to
galaxies. However, according to the cosmological principle, one can expect that their
4-velocities on average is still U a , and so their energy-momentum tensors will also
have the form of (10.2.12). In other words, the Tab in (10.2.12) can be regarded as
including the contributions from all kinds of matter in the universe (ρ and p have
included the contributions from all of them). In summary, in the standard model,
there are only two types of content in the universe, matter with the characteristic
p∼ = 0, and radiation with the characteristic p = ρ/3. The contributions from both
of them have been included in (10.2.12), where ρ and p are independent of the spatial
coordinates due to the spatial homogeneity of the universe.
In the RW line element (10.1.25), t, r , θ and ϕ are the comoving coordinates. The
nonvanishing components of Tab obtained from (10.2.12) in this system are
a2
g11 = , g22 = a 2 r 2 , g33 = a 2 r 2 sin2 θ .
1 − kr 2
On the other hand, from (10.1.25), one can find the nonvanishing components of the
Einstein tensor G ab (the calculation is left as an exercise), which can be expressed
in terms of a(t) as
3(ȧ 2 + k)
G 00 = , (10.2.14)
a2
2ä ȧ 2 + k
Gi j = − + gi j . (10.2.15)
a a2
10.2 Dynamics of the Universe 495
3(ȧ 2 + k)
= 8πρ , (10.2.16)
a2
2ä ȧ 2 + k
+ = −8π p . (10.2.17)
a a2
Equations (10.2.16) and (10.2.17) are the fundamental equations that determine the
scale factor a(t). From these two equations we can easily get
ä 4π
= − (ρ + 3 p) . (10.2.18)
a 3
Equation (10.2.16) is called the Friedmann equation. Equations (10.2.16) and
(10.2.18) are also called the first Friedmann equation and the second Friedmann
equation, respectively. Differentiating (10.2.16) yields
ȧ ä ȧ 2 + k 4π ρ̇
− = . (10.2.19)
a a a2 3
Then, using the Friedmann equations (10.2.16) and (10.2.18) to eliminate ä/a and
(ȧ 2 + k)/a 2 in the above equation, we obtain
ȧ
ρ̇ + 3(ρ + p) = 0. (10.2.20)
a
On the other hand, once we have (10.2.16) and (10.2.20), we can apply both of them
to (10.2.19) and get
ȧ ä 4π
+ (ρ + 3 p) = 0 .
a a 3
Therefore, the Friedmann equations are equivalent to (10.2.16) and (10.2.20) when
ȧ = 0.
For ρ > 0 and p 0, (10.2.18) indicates that ä < 0. This means that the universe
is either expanding (ȧ > 0) or contracting (ȧ < 0), but cannot be static, since ȧ = 0
can at most happen at a special moment when ȧ > 0 is turning into ȧ < 0. Since
the observation results indicate that the universe is expanding in the present day,
i.e., ȧ(t0 ) > 0 (t0 is the current time coordinate), it follows from ä < 0 that ȧ(t) >
ȧ(t0 ) > 0 for arbitrary t < t0 , and, the smaller t is, the greater ȧ(t) is. Hence, when
we trace backward in time, the universe shrinks more and more rapidly, and finally,
at a certain time (set as t = 0), the value of a becomes zero. At this time, the density
becomes infinite, and so we say that the universe expanded out of a singularity, called
the big bang singularity. In fact, it is not so appropriate to refer to the origin of the
496 10 Cosmology I
universe as a “big bang”. The word “bang” usually means a hit striking violently
with a loud noise, which occurs as an event in a regular spacetime background, with
no singularity at the spacetime point where the hit occurs. Furthermore, there exists
something (e.g., a bomb before its explosion) whose world line ends in the future
direction at the spacetime point where the bang occurs (each bomb fragment has its
own world line after the explosion). The big bang of the universe is quite different.
First, it corresponds to a spacetime singularity. All timelike geodesics are incomplete
in the past direction, and all of them approach the singularity as t tends to zero: for
any pair of such geodesics, γ1 (t) and γ2 (t), the distance between the spacetime points
γ1 (t) and γ2 (t) tends to zero as t > 0 tends to zero. On the other hand, there does
not exist a timelike geodesic that approaches the big bang singularity in the future
direction. Intuitively, one may imagine that at the beginning of time, all particles in
the universe are jammed in a spatial volume that “cannot be smaller”. During the
expansion of the universe, each particle runs away from the others.
Before solving (10.2.16) and (10.2.20) in the general cases, let us discuss two
extreme cases: ① the dust-only universe, whose contribution to Tab comes com-
pletely from matter (dust); and ② the radiation-only universe, whose contribution
to Tab comes completely from radiation. For the dust-only universe, p = 0, and
integration of (10.2.20) gives
ρM a 3 = constant . (10.2.21)
ρR a 4 = constant . (10.2.22)
This is because the number of photons within a comoving volume is a constant, while
the frequency (energy) of every photon is proportional to a −1 due to the redshift
[see (10.2.8)]. Thus, the energy density of radiation is proportional to a −4 . Our
present universe is matter-dominated, which is closer to a dust-only universe than
to a radiation-only universe. However, when t is sufficiently small, the universe is
radiation-dominated (although there were no galaxies yet in the early universe, only
particles). In the following, we will solve (10.2.16) and (10.2.20) for the these two
extreme cases.
For the radiation-only universe, we write (10.2.22) as
8π 4
B2 = ρa , (10.2.23)
3
where B > 0 is a constant, and ρR is denoted by ρ. Then (10.2.16) can be rewritten
as
10.2 Dynamics of the Universe 497
1
universe
k=
k=0
k = +1
0 t
B2
ȧ 2 = −k. (10.2.24)
a2
By setting b(t) ≡ a 2 (t), under the condition a(t) = 0 when t → 0, we can find that
a special solution of the above equation is a 2 (t) = 2Bt − kt 2 . Thus, for the three
cases of k, we have
The diagrams for a(t) in these cases are shown in Fig. 10.9. Since radiation is
dominant when t is sufficiently small, the behaviors of the three curves of a(t) are of
significance near the origin. It follows from (10.2.25) or Fig. 10.9 that a = 0 when
t = 0, which corresponds to the big bang singularity. In (10.2.24), k can be neglected
when a is sufficiently small. Therefore, the three curves are approximately the same
near the origin.
For the matter (dust)-only universe, we write (10.2.21) as
8π 3
A= ρa , (10.2.26)
3
where A > 0 is a constant, and ρM in (10.2.21) is replaced by ρ. Then, (10.2.16) can
be rewritten as
A
ȧ 2 = −k. (10.2.27)
a
In order to solve it, we introduce a new variable
t
dt
tˆ(t) ≡ . (10.2.28)
0 a(t )
498 10 Cosmology I
a 2 = Aa − ka 2 . (10.2.29)
Note that tˆ = 0 only when t = 0. Then, the special solutions to the above equation
satisfying the initial condition a(0) = 0 can be listed case by case as follows:
A A
case (a) (k = +1), a= (1 − cos tˆ) , t= tˆ − sin tˆ ,
2 2
(10.2.30a)
9A 1/3 2/3
case (b) (k = 0), a= t , (10.2.30b)
4
A A
case (c) (k = −1), a = (cosh tˆ − 1) , t= sinh tˆ − tˆ .
2 2
(10.2.30c)
For each of these cases, the graph of a(t) is similar to that in Fig. 10.9, so it is not
shown here separately. The solution for the dust-only universe was first obtained by
the Soviet physicist and mathematician Alexander Friedmann in 1922, and then inde-
pendently by Georges Lemaître in 1927, much earlier than the discoveries by Howard
P. Robertson and Arthur G. Walker in 1935. Therefore, the standard cosmological
model is also often referred to as the Friedmann-Lemaître-Robertson-Walker
(FLRW) model.
So far we have discussed the two extreme cases. The actual universe contains
both matter and radiation. In this case, it is very difficult to solve the Friedmann
equations quantitatively. However, a qualitative discussion is still possible. Firstly,
observations indicate that our universe is presently in expansion, i.e., ȧ(t0 ) > 0, with
t0 the present value of t. According to (10.2.18), ä is negative, and hence the smaller
t > 0 is, the greater ȧ(t) is. Thus, for 0 < t < t0 , the curve of a(t) is convex upwards.
Thus, the curve of a(t) intersects the t-axis at a certain time before t0 (which has
been stipulated to be t = 0), similar to the curves in Fig. 10.9. Secondly, for t > t0 ,
we can write (10.2.20) as
d(ρa 3 )
= −3 pa 2 . (10.2.31)
da
Then, as a increases, its right-hand side decreases not slower than a −1 . The above
equation indicates that the behavior of ȧ depends on k. For k = 0, we have ȧ 2 =
8
3
πρa 2 , which implies that ȧ 2 decreases as a increases, and that ȧ approaches zero as
a goes to infinity. Noticing that ȧ(t0 ) > 0, we see that a(t) is positive for any t > t0 .
10.2 Dynamics of the Universe 499
Hence, a increases as t increases, but the slope of the curve a(t) keeps decreasing.
Note that this does not ensure that a approaches infinity when t goes to infinity. In
this case, the curve of a(t) quantitatively behaves the same as the curve for k = 0 in
Fig. 10.9. For k = −1, (10.2.32) results in ȧ 2 = 83 πρa 2 + 1, which is similar to the
case of k = 0, only that the slope of the curve a(t) tends to 1, instead of 0, as a → ∞
(which corresponds to t → ∞). This quantitatively behaves the same as the curve
for k = −1 in Fig. 10.9. For k = +1, (10.2.32) results in ȧ 2 = 83 πρa 2 − 1, which
indicates that ȧ 2 decreases as a increases. When 83 πρa 2 decreases to 1, ȧ decreases
to zero. Let aC be the value of a when 83 πρa 2 = 1, then it represents the critical
state: in the process of a increasing from a(t0 ) to aC , the value of ȧ decreases from
ȧ(t0 ) > 0, and becomes zero when a increases to be aC (with the corresponding value
of t denoted by tC ). Since ä is always negative, according to (10.2.18), ȧ decreases
as t increases. Consequently, ȧ becomes negative when t > tC . That is, for t > tC , a
decreases as t increases until it becomes zero, and so aC is obviously a maximum of
a, which quantitatively behaves the same as the curve for k = 1 in Fig. 10.9. Thus, as
we discussed case by case, the behavior of a for the actual universe can still roughly
be described by Fig. 10.9.
[Optional Reading 10.2.1]
Rigorously speaking, it should also be proved that both values of aC and tC are finite. Let
f (a) ≡ a ȧ 2 . Then, due to ȧ 2 = 83 πρa 2 − 1, one has
8
f (a) = πρa 3 − a = f 1 (a) − f 2 (a) ,
3
where f 1 (a) ≡ 83 πρa 3 and f 2 (a) = a. As shown in (10.2.31), the curve of f 1 is a curve
with negative slope and positive function value. Thus, it is located in the first quadrant, and
has an intersection with the graph of f 2 , a straight line with slope 1. It follows from a ȧ 2 =
f 1 (a) − f 2 (a) that the value of a at the intersection is exactly aC , and thus aC is finite. Let
ρC be the value of ρ corresponding to aC , then ȧ 2 = 83 πρa 2 − 1 results in ρC = 8πa 3
2 > 0.
C
Since p 0 in (10.2.18), we have ä(tC ) < 0, or, more precisely, limt→tC ä(t) < 0, which
guarantees that tC is finite. In other words, it is impossible that a increases to aC only when
t → ∞.
[The End of Optional Reading 10.2.1]
Since the universe has an initial time (t = 0), we can talk about its age in the
present day, which is t0 − 0 = t0 . Suppose D(t) is the distance between two arbi-
trarily chosen galaxies, and denote D(t0 ) by D0 for short. Roughly, assume that
the universe expands at a constant rate, which is the presently observed rate u 0 .
Then t0 ∼= D0 /u 0 = D0 /H0 D0 = H0−1 . The value of H0 is measured to be about
73 (km/s)/Mpc or 22.4 (km/s)/Mly.7 Hence, roughly, t0 ∼ = H0−1 ∼
= 13.4 Gyr. As we
−1
can see in Fig. 10.10, however, t0 < H0 (see Exercise 10.4 for the precise relation of
t0 and H0−1 ), and thus t0 is less than 13.4 billion years in the FLRW model. However,
the observation for Type Ia supernovae in 1999 shows that the present universe is
7Note that 1 Mpc = 106 pc, where pc stands for parsec (an abbreviation for “parallax second”),
which is a commonly used unit of length in astronomy. Roughly speaking, 1 pc ≈ 3.26 ly (light-
years). Also, 1 Mly = 106 ly and 1 Gly = 109 ly.
500 10 Cosmology I
D0
0
H0 1
t0 t
with 0.5 h 1, which reflects the discrepancy between the estimated results. So
far, the methods that have been used for measuring the Hubble constant include
two major kinds. One is the “late universe” method, which measures the redshifts
using the technique of a “calibrated distance ladder”. The values obtained from
these measurements agree on a value near 73 (km/s)/Mpc [the latest data gives
73.30 ± 1.04 (km/s)/Mpc in Riess et al. (2022)]. The other is the “early universe”
method, which is based on the CMB observations. The measurements of this kind
have agreed on a value near 67.7 (km/s)/Mpc [67.66 ± 0.42 (km/s)/Mpc in Aghanim
et al. (2018)]. Although the techniques of both kinds of measurements have been
improved over the years and they both clearly converge on some certain values,
these two values do not agree with each other. This discrepancy is called the Hubble
tension [see Di Valentino et al. (2021) for a review].
8 The estimating and measurement of the age of the universe depends on the cosmological model,
this result is based on the CDM model (see Sect. 10.3.3).
10.2 Dynamics of the Universe 501
As we have stated at the beginning of this chapter, the universe is the maximal
spacetime containing everything in Nature. We should supplement this statement,
noting that the universe described by the RW metric is just the universe after being
“smoothed”, which only represents the behavior of the actual universe on a cosmo-
logical scale. If the local behavior of the actual universe in a relatively small scale is
of concern, one should choose a suitable metric according to the local distribution of
matter. Even though the universe is the maximal spacetime containing everything,
the RW metric does not reflect the spacetime geometry of the local regions (i.e., those
much smaller than cosmological scales).
As early in 1917, Einstein himself studied the universe using his field equation. Due
to the widely accepted philosophical idea at that time that the universe is supposed
to be invariant, he attempted to find a spacetime metric to describe a static universe.
Unfortunately, the Einstein equation is not compatible with such a static solution.
This is because static means ȧ = 0, and then (10.2.16) and (10.2.17) become
3k = 8πρa 2 , (10.2.16 )
k = −8π pa 2 , (10.2.17 )
which are obviously incompatible with the physical conditions ρ > 0 and p > 0.
Einstein realized that there are no static solutions to his equation from the beginning.
However, at that time he believed firmly that our universe is static, and so he modified
his own equation just in order to acquire a static solution for the universe. He assumed
that the modified field equation has the form G̃ ab = 8π Tab . It follows from the
properties of Tab that G̃ ab must satisfy G̃ ab = G̃ ba and ∇ a G̃ ab = 0. For such a tensor
field G̃ ab constructed out of gab and its derivatives of the first and second orders, G̃ ab
can only be a linear combination of G ab and gab (sans proof). Therefore, in 1917,
Einstein published the modified Einstein equation
9Einstein assumed that is very small, so that the -term is negligible in every other problem
except for cosmology [see Rindler (1982)].
502 10 Cosmology I
and formally treat Tab − gab /8π as a new “energy-momentum tensor”. In this
manner, (10.2.35) is still, formally, Einstein’s equation without the cosmological
constant. For convenience, the actual energy-momentum tensor Tab will be now
denoted by T̄ab , and Tab will denote the new energy-momentum tensor, i.e., T̄ab −
gab /8π . Then, (10.2.35) can now be expressed as
In the original model, there is only matter (dust), but no radiation, i.e., T̄ab depends
only on ρ̄ but not p̄. Then, T̄ab = ρ̄Ua Ub , and thus T̄00 = ρ̄, T̄i j = 0. It follows from
(10.2.13) that the ρ and p in the new energy-momentum tensor Tab satisfy
p=− . (10.2.38 )
8π
This indicates that the introduction of the -term is equivalent to adding “matter” with
a negative pressure p into the universe (as long as > 0). In this case, the equation
system of (10.2.16 ) and (10.2.17 ) will admit a solution. Plugging (10.2.38 ) into
(10.2.17) yields
k = a2 . (10.2.39)
2k = 8π ρ̄a 2 . (10.2.41)
k = +1 . (10.2.42a)
= a −2 , (10.2.42b)
1
a2 = . (10.2.42c)
4π ρ̄
10.3 The Thermal History of Our Universe 503
Equation (10.2.42) represents the unique static solution for a dust-only universe with
the -term added, where ρ̄ is the density of the dust. Equation (10.2.42a) indicates
that the spatial geometry of this solution is spherical, with the corresponding 4-
dimensional line element
dominated, and it is not difficult to derive the relation between the temperature T
and the scale factor a. It follows from (10.2.22) that the energy density ρ of the
radiation-dominated universe is proportional to a −4 , and from quantum statistical
mechanics we know that ρ is proportional to T 4 for radiation,10 and hence T ∝ a −1 .
On the other hand, the k in (10.2.24) is negligible when a is sufficiently small, and
thus its solution is a = (2Bt)1/2 , combined with T ∝ a −1 yields T t 1/2 = constant.
The value of this constant in SI is about 1010 , and hence
1010
T = √ (the units for T and t are K and s, respectively) . (10.3.1)
t
This is an approximate relation of the temperature T and time t for the early universe
(radiation-dominated), from which we can see that T = ∞ at t = 0 (the big bang).
Therefore, starting from the big bang singularity where the temperature is infinitely
high, the evolution of our universe is a process of adiabatic expansion with the
temperature continually decreasing.
1. The big bang singularity.
The expansion of the universe starts from the big bang singularity (t = 0, T = ∞).
The spacetime singularity is one of the thorniest problems. Many physical quantities
approach infinity as one approaches the singularity, where all the physical laws also
become invalid. Before 1965, most of the physicists did not believe in the existence
of a spacetime singularity, and tried to post various reasons for avoiding singularities.
Making use of global differential geometry, R. Penrose and S. W. Hawking proved,
first individually and then jointly, a series of singularity theorems, which assert that
spacetime singularities (including the collapse of a star in its late stage and the big
bang at the beginning of the universe) are inevitable as long as some reasonable
conditions are satisfied (see Appendix E in Volume II for a qualitative introduction
to singularity theorems). What is notable is that these conditions do not contain any
requirement on symmetry. Subsequently, many relativists had to admit the existence
of singularities, and so a variety of intensive studies regarding singularities sprung up.
However, since it is hard to believe that physical quantities can be infinite, one may
look at singularly theorems from another perspective: rather than prove the existence
of singularities, singularity theorems indicate classical general relativity fails to be
applicable near a singularity (where the spacetime curvature is very large). As is
well-known, there were two great revolutions of physics that happened in the early
20th century—the creation of relativity and quantum theory. In the perspective of
understanding the spacetime structure and the essence of gravity, general relativity is
undoubtedly a revolutionary theory, while it is “not quite revolutionary” from another
perspective, since it does not obey the fundamental principles of quantum theory.
According to quantum theory, any observable cannot have a determined value (unless
10 For electromagnetic radiation, it follows from the law of blackbody radiation that ρ ∝ T 4 . If one
considers the contributions from other particles, ρ and T will have the relation ρ = (π 2 /30)Neff T 4 ,
where Neff is a number determined by the number of types of the particles whose rest energy is far
less than kB T (kB is the Boltzmann constant). Thus, ρ ∝ T 4 only when Neff is a constant.
10.3 The Thermal History of Our Universe 505
the system is in an eigenstate of this observable), and one can only make probabilistic
predictions for the results of a measurement. However, all the observables (e.g., the
metric) in general relativity have determined values (as we describe the history of
a particle using its world line, we have assumed that it has a determined position
at each moment). Nowadays, it has been a consensus that a which theory does not
consider quantum effects is referred to as a classical theory, and thus Einstein’s
general relativity is referred to as classical general relativity.11 Since singularity
theorems indicate that classical general relativity breaks down when the spacetime
curvature is sufficiently large, there should exist a critical time tC > 0 in the very
early universe, where classical general relativity is invalid in the period [0, tC ] and
should be substituted by a brand new theory of quantum gravity. Although people
has been exploring for this quantum theory of gravity actively and important progress
keeps being made, so far we have not established a complete theory yet. Hence, we
still cannot consider the singularity or a region very close to it (within [0, tC ]) and our
discussion can only start from the critical time tC . How do we estimate the value of tC ?
Since this question involves spacetime, gravity and quantum theory, tC should only
depend on fundamental constants c, G and , and the “unique” quantity with time
dimension constructed by c, G and is the Planck time tP ≡ (G/c5 )1/2 ∼ 10−43 s.
Therefore, tP is taken as the critical time tC , i.e., a rough bound for the region where
classical general relativity is valid is at tP (see Optional Reading 10.3.1 for details).
We will only discuss the history of the evolution after tP ∼ 10−43 s.
[Optional Reading 10.3.1]
Is can be said that the spacetime curvature in the period [0, tC ] is so large that classical
general relativity breaks down. However, this statement needs some explaining. First of all,
what is the magnitude of the spacetime curvature? The spacetime curvature is a tensor, whose
magnitude usually refers to a scalar constructed from the curvature tensors (and metric),
such as the scalar curvature R ≡ g ab Rab and the scalar R ≡ R ab Rab . The early universe is
radiation-dominated, and the trace of the energy-momentum tensor of the electromagnetic
radiation (a null electromagnetic field) gives T = 0, and so from Einstein’s field equation
we can see that R = 0 in this case. Therefore, we may use R ≡ R ab Rab to represent the
magnitude of the spacetime curvature of the early universe. Second, what value of R is large
enough so that classical general relativity is invalid? We would like to find a critical value
RC such that in a very rough sense we can say that classical general relativity is valid when
R < RC and it is not when R > RC . The most solid way is to determine this bound is by
a theory of quantum gravity, but we do not have such a theory yet. A concession would
be to obtain some information using perturbation techniques, from which one can get an
approximate order of magnitude of RC . Another cursory but quite convenient method is
dimensional analysis. The dimension of R in SI is L −4 [which can be derived from (A.7) in
Appendix A], while the “unique” quantity with length dimension constructed by c, G and
11 Note that the criterion for “classical physics” has became different from that in the first half of the
20th century. People used to refer to (both special and general) relativity and quantum mechanics
as “modern physics” and the previous physics as “classical physics”. As time goes on (especially as
people realized that general relativity has to be combined with quantum theory), the term “classical”
gradually becomes a synonym of “non-quantum”, and the general relativity without considering
quantum effects is referred to as classical general relativity to be distinguished from a theory of
quantum gravity. As this criterion for “classical physics” has became a consensus among physicists
internationally, the previous interpretation of the word “classical” now seems to be too “classical”.
506 10 Cosmology I
is the Planck length lP ≡ (G/c3 )1/2 ∼ 10−35 m. Hence, it is generally accepted that
RC ∼ lP−4 (∼ means they are of the same order).
In a word, one can roughly regard RC ∼ lP−4 by means of dimensional analysis. On the
other hand from dimensional analysis we also have tC ∼ tP . It is natural to ask: if we assume
for now that classical general relativity is applicable, would the value of R really have the
same magnitude as RC ∼ lP−4 when the universe evolves to tP ? From the expressions of the
Christoffel symbols below (10.2.5) we can find all the nonvanishing components of the Ricci
tensor of the FLRW universe as
12 Besides, there might also be other particles beyond the Standard Model that have not been
discovered yet.
10.3 The Thermal History of Our Universe 507
of “stirring” is way faster than the speed of the expansion of the “pot” (the whole
space). Therefore, in most parts of the early universe all kinds of particles can reach
a local thermal equilibrium.
According to quantum statistical physics, the average energy of the radiation
particles emitted in the radiation with a temperature T is roughly equal to kB T ,
where kB is the Boltzmann constant. This conclusion can also be approximately
applied to matter particles with rest energy far less than kB T , whose speed is close
to the speed of light. Together with radiation particles, they are called relativistic
particles. For example, kB T ∼ = 10 MeV when T = 1011 K, while the rest energy of
an electron e is about 0.5 MeV, and hence an electron is a relativistic particle when
T = 1011 K. According to quantum field theory, two photons can be transformed
into some particle-antiparticle pair (“pair production”), and a particle-antiparticle
pair can also be transformed into two photons (“pair annihilation”). Of course, both
of these two processes satisfy the energy conservation law. The average energy kB T
of a photon at room temperature is far less than the rest energy of an electron, and thus
the probability of two photons becoming an electron-positron pair (2γ → e + e+ ) is
almost zero. However, at a high temperature like T = 1011 K, the rate of this kind
of “pair production” is very large (basically proportional to the density of photons).
When e and e+ collide, they can also annihilate into two photons (e + e+ → 2γ), and
the rate of the annihilation is proportional to the density of (e, e+ ) pairs. Therefore,
when equilibrium is reached, the density of (e, e+ ) pairs is roughly equal to the photon
pairs whose energy is greater than the rest energy m e of an electron. Conversely, since
the rest energy of a proton p and a neutron n is about 1840 times the rest energy m e
of an electron, the densities of (p, p̄) and (n, n̄) pairs are almost zero even at this
high temperature T = 1011 K (where p̄ and n̄ stand for antiproton and antineutron,
respectively).
3. Asymmetry of matter and antimatter.
When t = 1 s and T = 1010 K, because kB T m e and kB T m p , there exist plenty
of (e, e+ ) (the same order as the number of γ) while there are almost no (p, p̄) and
(n, n̄). Therefore, the contents of the universe are: a large amount of neutrinos ν and
antineutrinos ν̄, a large amount of photons γ, a large amount of (e, e+ ) (the number
density of each kind of particle above is basically the same) and a small amount
of protons p and neutrons n. Earlier than this, such as when T 1013 K, since
kB T > m p , there used to be a large amount of (p, p̄) and (n, n̄), which vanished due
to annihilation when the temperature decreased to kB T < m p . Since the p, p̄ and n,
n̄ annihilate in pairs, why could there still be a small amount of p and n that remain?
The reason we know that there must be a small amount of p and n is because the
matter in the present universe are all composed of p and n, while antiparticles in the
present universe are extremely rare. That is to say, there exists a particle-antiparticle
(matter-antimatter) asymmetry in the present universe. If we accept this fact, we
have to admit that besides a large amount of (p, p̄) and (n, n̄), there should be a small
amount of unpaired p and n before t = 0.01 s. When kB T < m p , p and p̄, n and n̄
annihilate in pairs, with only a small amount of p and n (both are baryons) remaining.
508 10 Cosmology I
It is estimated that n b /n γ , the ratio of the number densities of baryons and photons
in the universe, is only on the order 10−10 , but it is surely not zero.
If one questions further about the source of this asymmetry of baryons and
antibaryons, then there are only two possible answers: ① The universe prefers par-
ticles over antiparticles from its beginning (which is obviously not quite natural);
② There were the same number of baryons and antibaryons at the beginning of the
universe, and for some reason baryons became favored during its very early evolu-
tion. If one believe that the baryon number must be conserved, then the latter choice
would not be acceptable. Fortunately, there could be ways to bypass this difficulty.
For example, a Grand Unification Theory (GUT) proposed in the 1970s which unifies
the electromagnetic, weak, and strong interactions suggests that the baryon number
may not be conserved at a very high energy scale. It has been shown that the baryon
number not being conserved plus a temporary deviation from thermal equilibrium
in the very early universe may create surplus p and n from the universe which orig-
inally has particle-antiparticle symmetry. Although the present GUT models have
not been supported by experiments as anticipated, it is generally believed that a suc-
cessful Grand Unification Theory will sooner or later resolve the above difficulty in
cosmology.
4. Neutrino decoupling.
When t = 1 s and T = 1010 K, kB T ∼ = 1 MeV is still greater than m e , and hence
there still exist plenty of (e, e+ ). However, since the temperature and density have
decreased a lot compared with before, the interaction rate between neutrinos (or
antineutrinos) and other particles is far less than the expansion rate of the universe.
The mean free time of a neutrino got extended significantly, and so it becomes
approximately a free particle that does not interact with other particles, which means it
is no longer in thermal equilibrium with other particles. This is called the decoupling
of neutrinos. The decoupling time and temperature of neutrinos are denoted by tνd
and Tνd , respectively. Although neutrinos will fill up the universe after the decoupling
just like other particles, and keep affecting the evolution of the universe since they
still contribute to the total energy-momentum tensor, they are not correlated with
any other constituents of the universe in any other aspects. This huge amount of
neutrinos evolve independently up to today, and they now exist as an independent
particle system whose temperature is about 1.95 K, known as the cosmic neutrino
background. Since the interaction between neutrinos and a detector is extremely
small, it is almost impossible to observe the cosmic neutrino background directly.
However, indirect evidence has been observed from the fluctuations of the cosmic
microwave background [see Follin et al. (2015)].
5. Primordial nucleosynthesis.
Observations indicates that about 1/4 of the total baryonic mass of the current uni-
verse is helium. Except for primordial nucleosynthesis in the early universe, there is
no other known process that could have created this abundance of helium. (Although
the nuclear reactions inside stars keep producing helium, they only contribute to
a small portion of the abundance above.) The temperature involved in primordial
nucleosynthesis is roughly between 1010 K and (slightly lower than) 109 K. When
10.3 The Thermal History of Our Universe 509
the temperature is higher than this range, even if a proton and a neutron could com-
bine into a helium nucleus, it will be shattered by the high energy photons (called
“photofission”). Since the physics in this temperature interval is already well-studied
and has been conformed in labs on the Earth, people are confident enough for the
theory of primordial nucleosynthesis. Due to the fact that the density number of the
nuclei is relatively low and the rapid expansion of the universe leads to a very short
reaction time (about 102 s), a reaction can only happen for two high speed particles in
primordial nucleosynthesis. First, protons and neutrons combine into deuterons, and
the rest of the energy and momentum is carried away by a photon (p + n → 2 H + γ).
And then this is followed by a sequence of reactions which produce 3 H (triton), 3 He
and 4 He, such as
2
H + n → 3H + γ , 2
H + p → 3 He + γ , 2
H + 2 H → 3 He + n ,
2
H + H → He + n ,
3 4 3
He + He → 4 He + 2p .
3
Since there does not exist a stable nuclide whose mass number is 5, the reaction
chain stops here. As the main product, 4 He gradually accumulates, and the nuclear
reaction continues until there are a large enough number of nuclei, which will lead
to the production of a tiny amount of 7 Li. Since there does not exist a stable nuclide
whose mass number is 8, the reaction chain again ends here. The first step that
this whole reaction chain must go through is protons and neutrons combining into
deuterons. The binding energy of a deuteron nucleus is much lower than that of a
helium nucleus. A helium nucleus can remain stable when the temperature decreases
to 3 × 109 K, while at this temperature a deuteron nucleus will be broken right after it
is formed. Therefore, the nucleosynthesis process that is actually meaningful begins
after the “deuteron barrier” is passed when the temperature is slightly lower than
109 K, and the product is a large amount of 4 He and a tiny amount of 2 H, 3 He and 7 Li
(3 H is unstable and will quickly decay to 3 He). If we take the yield of 4 He as the unit,
then the yields of 2 H and 3 He are about 10−5 , while the yield of 7 Li is about 10−10 .
As for all kinds of elements that are heavier than 7 Li in today’s universe, they mainly
come from the nuclear reactions in the interior of stars and supernova explosions.
The reason that the reactions inside a star can skip over the elements with A = 5
and A = 8 and yield heavy elements is that the self-gravity there is so strong that the
density of the star’s core is extremely high, and there is enough reaction time such
that three-particle collisions can happen.
The helium abundance produced by the primordial nucleosynthesis closely
depends on the ratio n n /n p of the number density of protons and neutrons before the
end of the nucleosynthesis (the reason will be seen shortly). This ratio can be derived
from the following discussion. Before the neutrinos decouple, protons and neutrons
can convert mutually by the following weak interaction processes: p + e ↔ n + νe ,
p + νe ↔ n + e+ . Since the mass of a neutron is slightly greater than the mass of
a proton (m n − m p ∼= 2.5m e ), it is more difficult for a proton to turn into a neutron
than the reverse. For example, since m p + m e ∼ = m n − 1.5m e < m n , it follows from
the conservation of energy that a rest proton and a rest electron cannot even turn into
a rest neutron, but the reverse process does not have such an issue. Certainly, the
510 10 Cosmology I
energy of an electron when the temperature is above 1010 K is far greater than its rest
energy m e , and thus p + e → n + νe could happen, but nevertheless its probability
is always less than that of the reverse process. Therefore, n n /n p should be less than
1 when the forward and reverse reactions reach a statistical equilibrium, and the
quantitative relation is given by the Boltzmann equation
nn − m
= e kB T , (10.3.3)
np
NHe 2σ Np 2σ
Y = = = ,
N (σ + 1)Np σ +1
i.e.,
nn n n −1
Y =2 1+ . (10.3.4)
np np
Plugging in n n /n p ∼
= 1/7 yields Y ∼ = 0.25. Apart from primordial nucleosynthesis,
the nuclear reactions in the interior of stars also produces 4 He (a lot less than the pro-
duction of primordial nucleosynthesis though), and thus it is necessary to deduce the
primordial helium abundance (the abundance of helium when primordial nucleosyn-
thesis is over) from the observed helium abundance. As the accuracy of measure-
ments got improved over the years, recent estimations [Y = 0.245 ± 0.003 in Zyla
et al. (2020)] have matched very well with the theoretical value above. Although the
abundances of other products (2 H, 3 He and 7 Li) are very small, they are also sig-
nificant for verifying the theory. There is another important physical parameter η
involved in the quantitative calculation of the abundances of the products of primor-
dial nucleosynthesis, which is defined as the ratio of the densities of the baryons
and photons in the universe (η ≡ n b /n γ ). η−1 stands for the photon number around
each baryon, which affects the starting time of the nucleosynthesis by affecting the
difficulty of photofission, and thus affects the abundances of the products. The abun-
10.3 The Thermal History of Our Universe 511
then the theoretical abundances of all four products above agree with their obser-
vational abundances [Zyla et al. (2020)]. This not only is a powerful support to the
theory of nucleosynthesis, but also sets a rather clear (and narrow) possible range for
this key parameter η, which provides another important contribution to cosmology.
Another important contribution of the theory of primordial nucleosynthesis is that
it determines the number of neutrino species Nν as 3, i.e., it confirms that there are
only 3 types of neutrinos (and thus leptons only have three generations). This is
supposed to be a problem of particle physics; the history of cosmology being used in
this subject started from 1976. The situation of high energy physics at that time had the
following features: ① there was already evidence that, beside the first two generations
of leptons e and μ (and their corresponding neutrinos νe and νμ ), there exists a third
generation of leptons (and thus a third type of neutrino); ② the accelerators at that
time could not provide any meaningful restriction on Nν ; ③ many physicists tended
to believe that the value of Nν would increase as the energy of the accelerators
increased; ③ very few particle physicists believed that the study of cosmology could
be helpful to particle physics. However, G. Steigman and collaborators blazed a new
trail by pointing out that the increase of the number of neutrino species will lead
to an increase in the abundance of 4 He coming from primordial nucleosynthesis,
and thus the observed abundance of 4 He should give an upper bound for Nν . The
basic idea is as follows: since the k in (10.2.16) is negligible when a is small, it
is easy to see from H ≡ ȧ/a that H 2 = 8πρ/3. More species of neutrinos leads to
a greater ρ, which leads to a greater H due to the equation above, namely a faster
expansion of the universe. This would make neutrinos decouple earlier, i.e., tνd would
be smaller, and thus the decoupling temperature Tνd would be greater. It follows from
(10.3.3) that this “freezes out” n n /n p at a greater value, and hence the abundance of
4
He would be higher. The upper bound they gave in Steigman (1977) was Nν 7.
This article demonstrated the novel insight that “cosmology can provide important
constraints on particle physics, and the universe is an important supplement for high
energy accelerators.” Later on, more and more studies have been carried out along
this direction, which keeps shrinking down the estimated value of Nν [see Steigman
(2012) for a review]. A recent analysis in Cyburt et al. (2016) gives Nν 3.2, which
agrees well the result Nν = 3 obtained from the collider experiments by the European
Organization for Nuclear Research (CERN).
6. Cosmic microwave background radiation.
In a long period of time after primordial nucleosynthesis, nothing significant happens
in the universe until t ∼= 1013 s ∼= 4 × 105 years at which time T ∼ = 3000 K (or
4000 K). At this temperature, nuclei and electrons start to combine into neutral atoms
512 10 Cosmology I
(before this the electrons still have enough energy to escape from the electromagnetic
bound of a nucleus), and the matter in the universe starts to transfer quickly from
an ionized state (plasma) to the neutral state. In an ionized state, photons interact
frequently with charged particles (especially free electrons), and thus they are in
thermal equilibrium with the matter particles. However, photons have almost no
interaction with neutral particles, and thus the universe becomes transparent after
the charged particles are combined into neutral particles (the mean free time of a
photon is a lot longer than the present age of the universe). At this stage, photons
are decoupled from the “big family” of the particles in thermal equilibrium and
become an independent system. Before decoupling, these photons were in thermal
equilibrium with the matter particles (similar to the photons in an oven being in
thermal equilibrium with the particles of the oven’s wall), whose energy density
distribution in wavelength satisfies the blackbody radiation curve, which can be
described by Planck’s law:
8π hc k hcT λ −1
du = e B − 1 dλ , (10.3.6)
λ 5
where du stands for the energy per unit volume of the photons whose wavelength is
in the range (λ, λ + dλ), T is the temperature, and h and kB are the Planck constant
and Boltzmann constant, respectively. Although the photons are no longer in thermal
equilibrium with the matter particles after decoupling, their energy distribution in
wavelength still satisfies Planck’s law, only the temperature T will decrease inversely
as the scale factor a increases. The reason can be briefly explained as follows: after
the photon decoupling, suppose a is increased by a factor α, i.e., a = αa, then the
number of photons per unit volume is decreased by a factor α −3 . On the other hand,
the energy of each photon also decreases by a factor α −1 due to the redshift [see
(10.2.8)]. Therefore, the energy of those photons whose wavelength is in the range
(λ, λ + dλ) per unit volume (when the scale factor is a) will decrease to
8π hc k hcT λ −1
du = α −4 du = e B − 1 dλ ,
α λ
4 5
8π hc k hcT λ −1
du = 5
e B − 1 dλ , where T ≡ α −1 T . (10.3.7)
λ
Thus, the distribution of energy density in wavelength when the scale factor increases
to α can still be described by Planck’s law, which just corresponds to a lower tem-
perature T . Estimation shows that the temperature of the decoupled photon system
in the present day is T0 ∼ 3 K. That is to say, the present universe is filled with a
large amount of background photons homogeneously (all the galaxies are “soaked”
in the bath of ubiquitous photons), and the distribution of their energy in wavelength
is described by the blackbody radiation curve at 3 K. The radiation energy is mainly
concentrated in microwave band (the wavelength of the maximum energy density
10.3 The Thermal History of Our Universe 513
is about 0.1 cm), and therefore this is called the cosmic microwave background
radiation (CMB, CMBR).
American physicists and radio engineers A. A. Penzias and R. W. Wilson detected
this isotropic radiation accidentally in 1965, and received the 1978 Nobel Prize in
Physics for this discovery. What they detected was in fact the signal at only one
wavelength (7.35 cm) (i.e., only one point on the curve). Assume that this is blackbody
radiation, then the temperature corresponds to the blackbody radiation curve passing
through this point is 3.5 K. American physicist R. H. Dicke and colleagues pointed
out immediately that this is a trace (a “fossil”) of the big bang, which was exactly the
cosmic background radiation they were preparing to search for. However, there were
also a few articles at that time which gave alternative explanations for this signal. In
order to confirm that this is indeed a trace of the big bang, two conditions need to be
satisfied: ① the distribution of the energy spectrum is a blackbody radiation curve;
② the radiation is highly isotropic [the intensity (or the corresponding temperature)
is the same in all directions]. This urged people to measure the other points of the
curve and to test the isotropy of the radiation. Soon (in 1967), it was confirmed
that the anisotropy is no more than 0.1–0.3%, and the results of measuring many
other points with wavelength greater than 0.3 cm all fit the blackbody radiation
curve. The radiation with wavelength less than 0.3 cm can be easily absorbed by
the atmosphere, which could be measured outside the atmosphere by balloons or
satellites. Since 1989, the Cosmic Background Explorer (COBE) satellite started to
measure for a wide wave band in a high precision and obtained a perfect blackbody
radiation curve. Figure 10.1113 illustrates the first results published in 1990, which
is regarded as the most perfect blackbody radiation observed by humans in nature.
COBE also presented a more precise result for the anisotropy of the background
radiation. Expand the temperature T (as a function of the angular coordinates) in
terms of the spherical harmonics, then other than the constant term T0 , the two lowest
order spherical harmonics are called the dipole moment and quadrupole moment,
which are the main manifestations of the anisotropy. Let T1 and T2 represent the
amplitudes of the dipole anisotropy and quadrupole anisotropy, respectively, then the
measurements of COBE give T1 /T0 ∼ 10−3 and T2 /T0 ∼ 10−5 . The former can be
reasonably interpreted as the consequence of the small velocity of the Earth relative to
the isotropic reference frame: the Earth orbits around the Sun, the Sun moves relative
to the center of the Milky Way, and the Milky Way also has “peculiar motion” relative
to the isotropic reference frame. By definition, only isotropic observers can obtain
isotropic results from measurements, and thus it certainly makes sense that a small
anisotropy of the background radiation is observed by the Earth’s observer. Analysis
shows that the first-order approximation of this anisotropy manifests exactly as the
13 The luminance Bν in this figure refers to the energy per unit frequency transmitted per unit area,
per unit solid angle, per unit time, whose unit is J·s−1 ·m−2 ·sr−1 ·Hz−1 , where sr stands for steradian.
The corresponding luminance of u in (10.3.6) is not Bν but Bλ , i.e., the energy per unit wavelength
transmitted per unit area, per unit solid angle, per unit time, whose unit is J·s−1 ·m−2 ·sr−1 ·m−1 . For
the same temperature T , the peak frequency (wavelength) of the Bν − ν (or Bν − λ) curve is not
equal to that of the Bλ − λ (or Bλ − ν) curve. For T = 2.73 K, the peak of the Bν − ν and Bλ − λ
curves are approximately 1.6 and 1 mm, respectively.
514 10 Cosmology I
luminance
frequency (GHz)
dipole moment. [Intuitively, as the Earth is going across the “ocean” of background
radiation, the radiation in front of it should be stronger than that behind it, which
gives rise to the dipole anisotropy.] Therefore, the anisotropy of about one part per
thousand obtained by COBE (and the previous ground observations) is not only
reasonable, but also it can be used conversely to determine the precise velocity of
the Earth relative to the isotropic reference frame (the cosmic rest frame), and the
result is about 369 km·s−1 .14 After this anisotropy is subtracted out, one finds that the
anisotropy (mainly the quadrupole anisotropy) when photons are decoupled is only
about 10−5 . This tiny anisotropy is very critical for understanding the formation of
the large scale structure of the universe (e.g., galaxies), see 7 below; when it was first
discovered by COBE in 1992, this suddenly became the headline news all over the
world. The leaders of the COBE project, G. F. Smoot and J. C. Mather were awarded
the 2006 Nobel Prize in Physics for the discoveries of the blackbody spectrum and
the anisotropy of the CMB.
Besides its intensity represented by the temperature of the blackbody spectrum,
the CMB radiation as electromagnetic radiation also exhibits polarization. The CMB
polarization can be decomposed into two components, dubbed an E-mode and a B-
mode [for the details of the decomposition, the reader may refer to, e.g., Chap. 10
of Dodelson and Schmidt (2020)].15 E-modes can be produced by the interaction
between photons and free electrons (such as Compton scattering). However, if the
radiation is isotropic, the polarization will be equal in all directions, and the overall
effect is still unpolarized. In fact, the tiny anisotropy of the CMB temperature we
just mentioned plays a crucial role here, which allows the scattering process to
14 This is the average speed of the Earth relative to the isotropic reference frame, which is also the
speed of the Sun relative to the isotropic reference. Since the Earth orbits the Sun with a speed of
about 30 km/s, the actual speed of the Earth at each time of the year can be obtained by considering
the correction due to this relative motion between the Earth and the Sun.
15 For the CMB polarization one only considers the polarization patterns on the celestial sphere.
10.3 The Thermal History of Our Universe 515
produce polarization. Thus, the spectrum of the E-mode polarization is smaller than
the anisotropy spectrum of the CMB. On the other hand, the spectrum of the B-mode
polarization is even smaller than that of the E-mode polarization. There are two
types of B-mode polarization, the first one is caused by the gravitational lensing
of E-modes (gravitational lensing is the effect that, due to the deflection of light by
gravitational fields, massive bodies such as galaxies behave similar to convex glass
lenses); the second one is caused by the primordial gravitational waves produced in
the early universe. The E-mode polarization and the first type of B-mode polarization
have been detected, while the B-modes produced by primordial gravitational waves
have not been found yet. The latter, once detected, will provide a powerful tool we can
use to see through the early universe, and open a new window for gravitational-wave
astronomy (see Sect. 7.9.4).
Since it takes some amount of time for the light emitted by a galaxy to reach the
Earth (see Fig. 10.8), from the observations for bright galaxies and quasars one can
obtain information of the universe earlier than the present time t0 . The CMB data
carries information way earlier than that (no galaxy was formed yet when photons
decoupled), which is highly valuable for the study of cosmology. The observation
of the CMB is regarded as the most powerful support to the standard model. One
of the drawbacks of a once strong competitor of the standard model—the steady
state model—is that it cannot provide a persuasive explanation for the background
radiation, and hence it has stepped down from the stage of history since 1965.
7. Structure formation.
The basic premises of the standard model are the large scale spatial homogeneity
and isotropy. On a smaller scale, the universe presents a hierarchical structure: there
exists stars, galaxies, galaxy clusters and superclusters. A generally accepted idea
is that the complicated structure today originates from the extremely weak density
fluctuation (also called perturbation) δρ/ρ in the very early universe, where ρ is
the average density, and δρ is the difference between the density at a point and ρ.
Gravity has the effect of amplifying the density fluctuation: if δρ/ρ > 0 (density
is higher than the average density) somewhere, then the matter there will contract
under the action of gravity, which leads to a higher density fluctuation. J. H. Jeans
has established the corresponding theory for static fluids in 1902, and E. M. Lifshitz
proposed the theory for the density fluctuation being amplified in an expanding
universe in 1946. Based on these theories, all kinds of models regarding structure
formation have been put forward. The early models (in the 1970s) considered that
baryons are the largest contributors to the matter in the universe, which leads to
serious troubles. Later, after the concept of non-baryonic dark matter was posed (see
Sect. 10.3.2 and Chap. 15), two theories of structure formation, namely the hot dark
matter model and cold dark matter model, appeared accordingly [see Longair
(2008) for details]. In the hot dark matter model, the formation of structures has
a top-down scenario: the superclusters are formed first, and then they break into
galaxy clusters and galaxies hierarchically. In contrast, the formation of structures
in the cold dark matter model has a bottom-up scenario: the galaxies are formed
first, and then galaxy clusters and superclusters are formed hierarchically. As to the
516 10 Cosmology I
of 2 H, 3 He, 7 Li
2. (e, e+ ) are all annihilated, there remains a
small amount of electrons for balancing the
charge of protons
105 years 3000 K 0.3 eV Neutral atoms are synthesized; photons decouple
and become the background radiation
109 years Structure formation
origin of the primordial perturbation, previously people could only treat it as a pre-
assigned initial condition. Nowadays, as the inflationary model has been generally
accepted (the basic idea is that the very early universe once experienced a dramatically
accelerating exponential expansion in a very short period of time, see Chap. 15 in
Volume II), the primordial perturbation can be completely explained by inflation. The
cold dark matter model now has achieved great success and is now the favored model.
More precisely, the cold dark matter model with the inflationary model offering
the “seeds” of the primordial perturbation has became the most widely accepted
theory of structure formation. Some even consider it as the fourth cornerstone of the
modern cosmology (the first three are the consensual cosmic expansion, primordial
nucleosynthesis and the CMB). However, there are still people who have different
opinions.
In the end of this subsection, to help readers remember, we roughly summarize a
few important periods in the history of the universe’s evolution in Table 10.1.
It should be pointed out that, in the above description of the universe’s evolution,
our understanding for the time after t = 1 s is relatively reliable. However, for t < 1 s,
we do not have a description for the early universe with such a high credibility, since
any “fossil” from that time has undetermined factors.
There only exist three possibilities for the RW metric, k = 1, k = 0 and k = −1. As
we have seen in Sect. 10.1.3, the first one is a closed universe, while the latter two are
open universes. Which one does our universe really belong to? Is it closed or open?
10.3 The Thermal History of Our Universe 517
then can be interpreted as the density with ρC as the unit, and hence we can say
that the universe is closed if and only if > 1. Notice that ρC itself is also a function
of t. It follows from (10.3.11) and (10.3.9) that can be expressed as
8π Gρ
= . (10.3.12)
3H 2
Once the present values of the Hubble parameter H0 and the mass density ρ0 are
measured, then it can be determined from the above equation whether our universe
is closed or not. Assume for now that the main contents of the universe are presented
518 10 Cosmology I
mainly in the form of galaxies. Suppose the present number density of the galaxies is
n, and the average mass of the galaxies is M̄, then ρ0 = n M̄. Suppose the luminosity
density of the universe per unit volume is L (called the luminosity density), and the
average luminosity of the galaxies is L̄, then L = n L̄. Plugging this into ρ0 = n M̄
yields
M̄
ρ0 = L , (10.3.13)
L̄
where M̄/ L̄ is called the average mass-to-light ratio of the galaxies. Let ρC0 and
0 represent the present values of ρC and , respectively, then
ρ0 8π G M̄
0 = = L , (10.3.14)
ρC0 3H02 L̄
where (10.3.9) and (10.3.13) are used in the second equality. There is already a
relatively reliable observational value for L . Plugging the observational values of
L and H0 into the above equation, we can get the relation between 0 and the
average mass-to-light ratio M̄/ L̄. The actual measurement is performed on a galaxy,
and the result is only the mass-to-light ratio M/L for this galaxy; only if the galaxy
is highly representative can we plug M/L into (10.3.14) as M̄/ L̄ and get a relatively
good result. The masses of different galaxies vary enormously (they may differ by
several orders of magnitudes), while the differences in their mass-to-light ratios are
much smaller. This is one of the merits of substituting (10.3.12) with (10.3.14) (for
t = t0 ). Now we introduce the dynamical method (which considers the gravitational
effect of the mass) of measuring the mass of a spiral galaxy (e.g., the Milky Way).
Besides its random motion, a star in a spiral galaxy also undergoes revolution (orbital
motion) around the galactic center with the gravity of the galaxy as the centripetal
force. To simplify the discussion, we assume that the galaxy has spherical symmetry.
From Newton’s theory of gravity we know that
G M(r )
v2 (r ) = , (10.3.15)
r
where v(r ) is the speed of the orbital motion of a star at a distance r from the center,
and M(r ) is the mass of the galaxy within the radius r . The curve v(r ) is called
the rotation curve of the galaxy. The rotation curve for many galaxies has been
measured. Let R be the r at which the galaxy’s luminosity disappears, then M(R)
represents the mass of the luminous matter in the galaxy. Plugging the mass-to-light
ratio measured in this way into (10.3.14) yields the contribution of the luminous
matter to 0 :
This indicates that the contribution from all the luminous matter to the mass density
is less than one percent of the critical mass. Moreover, M(R) is also way less than
10.3 The Thermal History of Our Universe 519
0 R r
the total mass of the galaxy. If there is no mass outside r = R, then it follows
from (10.3.15) that the curve for v(r ) should decrease as r −1/2 starting from r =
R. However, the rotation curves of plenty of galaxies has the following common
property: they first increase steeply from the galactic center, and then extend almost
horizontally until very far away from R where it is incapable of measurement,16
as shown in Fig. 10.12. This indicates that there is a spherical “dark halo” outside
the luminous part of a spiral galaxy, formed by non-luminous dark matter, whose
radius is a lot greater than R, and the mass of this dark halo is 3–10 times as much
as the mass of the luminous part. There are also other types of galaxies other than
spiral galaxies, e.g., elliptical galaxies. Evidence has indicated that there also exists
a considerable amount of dark matter in elliptical galaxies.
Considering that there exists a large space between galaxies, it is very likely that a
large amount of matter is there. People have also applied a similar dynamical method
for measuring galaxy clusters. [Assume that the Viral theorem holds, then there is a
formula similar to (10.3.15)]. The result gives
0 (galaxy cluster) ∼
= 10% ∼ 30% . (10.3.17)
This confirms that, apart from the galaxies, there is a large amount of dark matter
in a galaxy cluster. Since the above results are based on some hypotheses that are
not completely conformed yet, and since there are only about 5% of the galaxies in
the universe that belong to big clusters of galaxies, we cannot claim that the 0 of
the universe can be represented by (10.3.17) (although circumstantial evidence has
been found). However, one can at least conclude that the mass of dark matter in the
universe is way more than that of luminous matter.
If we take (10.3.17) as the contribution of all the matter in the universe to 0 , we
would draw the conclusion that the universe is far from being closed. However, the
inflationary model proposed in 1981 (see Chap. 15 in Volume II) suggests that 0
may be very close (or even equal to) 1; this is supported by some measurements and
16 Each point of the curve is measured from the frequency shift of rays emitted by stars or neutral
gas clouds. These stars and gas clouds serve as test particles. It is hard to find a test particle when
r is much greater than R.
520 10 Cosmology I
analyses. As the inflationary model is now widely accepted, how can we coordinate
the result 0 ∼ = 1 and (10.3.17)? Before 1998, in order to avoid the contradiction
with 0 ∼ = 1, people had to think that the distribution of the galaxies and galaxy
clusters is far from the total matter distribution of the universe: besides the matter
associated with galaxies and galaxy clusters, there might also be about 80% of the
matter that is not clustering, or even smoothly distributed in the universe. Note that so
far we have only considered the Einstein equation without the -term. The important
progress on the measurement of the cosmological constant in 1998 made people
believe that one should use the Einstein equation with the -term when discussing
cosmology problems. The key point is, besides the contribution M0 coming from
matter (including luminous matter and dark matter), 0 also has a contribution
from the cosmological constant. The contributions from and M0 roughly has a
seventy-thirty ratio, and together they give 0 ∼= 1. For details, see Sect. 10.3.3.17
Ever since Einstein introduced the cosmological constant in 1917, the status of has
experienced several ups and downs. Although Einstein himself abandoned after
1923, it was still valued by many people until the 1950s. One of the reasons is that
the early measurement of H0 by Hubble was excessively large, and the existence of
a positive could avoid the age of the universe being too small. Here is a qualita-
tive explanation. From (10.2.35) we can see that the existence of the cosmological
constant is equivalent to adding an “energy-momentum tensor” −λgab /8π to the
universe. Comparing this with the energy-momentum tensor of a perfect fluid
we can see that the -term can be considered as a “perfect fluid” with the equation
of state −ρ = p = −/8π . If this is the only contributor, then (10.2.18) becomes
3ä = a, and for > 0 we have ä > 0. Thus, contrary to the matter field, a pos-
itive cosmological constant provides a repulsive force unlike the usual attractive
gravitational force and makes the universe undergo an accelerating expansion. In the
early universe, the energy density ρ of the radiation and matter is very large, and its
gravity is stronger than the repulsive effect, which leads to a decelerating expansion.
Then, ρ decreases as the universe expands, and the expansion will have a constant
rate when ρ is small enough and the gravity counterbalances the repulsive force (note
17 The important discovery in 1998 is that the present universe is experiencing an accelerating
expansion. People once regarded the -term as the cause of this accelerating expansion. However,
difficulties still exists. The mechanism of the accelerating expansion of the universe has became
one of the biggest puzzles in cosmology or even in fundamental physics, called the “dark energy”
problem, see Chap. 15 in Volume II.
10.3 The Thermal History of Our Universe 521
0 H0 1
t0 t
that is fixed). As ρ keeps decreasing, the universe will turn into an accelerating
expansion when the repulsive force is stronger than gravity. By choosing a suitable
model, one can obtain the result that the present universe is undergoing an accelerat-
ing expansion, and so the measured H0−1 is less than t0 instead of greater than t0 (see
Fig. 10.13). So the “age paradox” of the universe can be resolved or at least relieved.
However, the situation turned around in the 1950s. On the one hand, newer mea-
surements indicated that value of H0 is about 1/8 of that measured by Hubble. On the
other hand, the development of the modern theory of stellar evolution also made the
age of stars a lot smaller than the estimated values in the 1930s. As a consequence,
the “age paradox” disappeared, and became unnecessary again. Nevertheless, after
having three ups and downs, nowadays the necessity of has revived once again.
The cosmological constant has influenced not only cosmology but also many other
areas of physics, and its significance has been generally recognized. However, peo-
ple are still facing difficulties regarding the cosmological constant. One aspect of
it is related to the vacuum energy in quantum field theory, which leads to the cos-
mological constant problem for physicists. Another aspect is from the perspective
of astronomers. Now we will introduce the cosmological the constant problem for
astronomers; the cosmological constant problem for physicists will be introduced in
Chap. 15.
A major concern of astronomers is whether or not a nonzero can be obtained
from the observations. The existence of affects the evolution of the universe.
After the -term is added into Einstein’s equation, (10.2.16) and (10.2.17) should
be modified as follows:
3(ȧ 2 + k)
= 8πρ + , (10.3.18)
a2
2ä ȧ 2 + k
+ = −8π p + . (10.3.19)
a a2
Applying (10.3.18) to the present time t0 yields
8πρ0 k
H02 = + − 2. (10.3.20)
3 3 a0
0 := , (10.3.21)
3H02
k
1 = M0 + 0 − , (10.3.22)
a02 H02
M0 stands for the contribution from matter to 0 (note that the present contribution
of radiation is negligible). Through observation, the following questions should be
answered: Do we need a nonzero 0 to assure that the above equation holds? If
so, what is the value of 0 ? This may be referred to as the cosmological constant
problem for astronomers.
The speed ȧ and the acceleration ä of the universe’s evolution depends on the
overall effect of M representing the gravity and representing the repulsive
force. Astronomers usually use a dimensionless quantity to represent the deceleration
−ä; what they care about is the present value of the (dimensionless) deceleration
parameter defined as follows:
a
q0 := − ä . (10.3.23)
ȧ 2 t0
Considering that in the present day we have ρ + 3 p ∼ = ρ, we can plug the present
values of (10.3.18) and (10.3.19) (take p = 0) into (10.3.23) and obtain
1
q0 = M0 − 0 . (10.3.24)
2
The above equation intuitively reflects the fact that “M0 leads to a deceleration
and a positive 0 leads to an acceleration”. The direct measurement of q0 has
been going on for decades. One of the main difficulties is how to choose a suitable
object to measure (a distance indicator). The clustered galaxies used to be taken as
the measured objects, but they have a shortcoming—their own evolution will bring
undetermined factors into the measurements. What we need is a distance indicator
that is not sensitive to evolution. Later, people found that Type Ia (also denoted
by 1a) supernovae can serve as ideal distance indicators, and the measurements of
them have became an active subject. A large number of observational results on high
redshift Type Ia supernovae published since 1998 [e.g., Riess et al. (1998); Perlmutter
et al. (1999)] has attracted huge attention internationally. These results indicate with a
high confidence level that: ① the cosmological constant is nonzero, and is positive; ②
unlike what people used to think, the present universe is experiencing an accelerating
expansion (the effect of 0 exceeds the effect of M0 /2, which leads to q0 < 0).
Furthermore, the combination of these results and the observational results of the
anisotropy of the CMB provides the following quantitative result [Aghanim et al.
(2018)]:
10.3 The Thermal History of Our Universe 523
Exercises
˜10.1. Verify that the curvature tensor (3) Rabc d of the metric in (10.1.12) satisfies
(3)
Rab cd = 2 R̄ −2 δa [c δb d] .
10.2. Show that the world line of an isotropic observer is a geodesic. Hint: from
the expressions for the Christoffel symbols below (10.2.5) and (5.7.2), this
is almost obvious.
10.3. Derive the formula (10.2.8) for cosmological redshift from the following
steps:
(a) Show that any null geodesic η(β) (where β is an affine parameter) has
dω/dβ = −K a K b ∇a Z b , where
(c) Using the results of (a) and (b), derive that dω/ω = −da/a, which gives
(10.2.8).
10.4. The present age of the universe is the time it takes for the evolution from
a = 0 to a0 ≡ a(t0 ). Given any value of a, one can talk about the time it
takes for the scale factor of the universe to evolve to this value, which is
called the age of the universe corresponding to this value of a. Therefore,
the age t can be regarded as a function of a.
(a) Starting from (10.2.30) and (10.2.26), show that the age function of
a matter-dominated universe with = 0 is given by the following three
equations:
3/2
2 a
for 0 = 1 , t= ,
3H0−1 a0
for 0 > 1 ,
2 1/2
0 a 1 a a
t = H0−1 cos−1 1 − 2(1 − −1
0 ) − 0 − (0 − 1) ,
2(0 − 1) 3/2 a0 0 − 1 a0 a0
for 0 < 1 ,
2 1/2
−0 a 1 a a
t = H0−1 cosh −1
1 + 2( −1
− 1) + 0 + (1 − 0 ) .
2(1 − 0 )3/2 0
a0 1 − 0 a0 a0
(b) Derive the expressions for the present age t0 of the universe in the cases
0 = 1, 0 > 1, 0 < 1 from the above three equations.
˜10.5. Show that the Einstein equation with the -term does not admit a solution
having a flat metric even if there is no matter field (Tab = 0). Hint: find the
relation of R and T from the Einstein equation with the -term, so that the
R in the equation can be eliminated. Then, it is easy to see that Rab cannot
vanish when Tab = 0.
References
Aghanim, N. et al. (2020), ‘Planck 2018 results. VI. Cosmological parameters’, Astron. Astrophys.
641, A6. arXiv:1807.06209.
Cyburt, R. H., Fields, B. D., Olive, K. A. and Yeh, T.-H. (2016), ‘Big bang nucleosynthesis: 2015’,
Rev. Mod. Phys. 88, 015004. arXiv:1505.01076.
Dodelson, S. and Schmidt, F. (2020), Modern Cosmology, Academic Press, London.
Ellis, G. F. R. (1989), The expanding universe: a history of cosmology from 1917 to 1960, in
D. Howard and J. Stachel, eds, ‘Einstein and the History of General Relativity’, Birkhäuser,
Boston, pp. 367–431.
Follin, B., Knox, L., Millea, M. and Pan, Z. (2015), ‘First detection of the acoustic oscillation
phase shift expected from the cosmic neutrino background’, Phys. Rev. Lett. 115, 091301.
arXiv:1503.07863.
Hicks, N. J. (1965), Notes on Differential Geometry, Van Nostrand, Princeton.
Kolb, E. W. and Turner, M. S. (1990), The Early Universe, Addison-Wesley Publishing Company,
Redwood City.
References 525
When discussing systems of units, one should pay attention to the distinction between
a quantity and a number. Besides quantity equations, what is more commonly used
are numerical-valued equations. The form of a numerical-value equation depends on
the system of units, and thus when memorizing a physical formula we should also
remember in which system of units it holds. Since the speed of light in vacuum and
the gravitational constant are frequently involved in relativity, setting their numerical
values to 1 (i.e., c = G = 1) will simplify the equations a lot, and the corresponding
system of units is called the geometrized unit system. However, geometrized units
are inconvenient for calculating the numerical values of physical quantities. Now we
will introduce the conversion of physical equations between systems of geometrized
units and non-geometrized units (e.g., SI).
To avoid confusion, we will use bold and regular letters to represent quantities
and numbers, respectively (only in this appendix). A non-geometrized unit system
usually takes the time T , length L and mass M as the base quantities in mechanics.
In the geometrized unit system, since c = G = 1, only one of these three quantities
can be chosen arbitrarily, and hence we can say that there is only one base quantity.
For instance, one can choose time as the base quantity and choose s (second) as its
unit (base unit). However, the essence of c = 1 is to take the speed of light as a unit
of speed, and so one can consider the speed V as a base quantity in the geometrized
unit system, and the speed of light is a base unit. Similarly, G = 1 implies that
the gravitational constant G is also a base quantity in the geometrized unit system.
Therefore, one can also say that there are three base quantities, i.e., T , V and G. In
fact, the number of base quantities in the same system of units is flexible, and one
can choose them according to the specific context. Suppose A is an arbitrary quantity
whose numerical value in the International System of Units (SI) and the geometrized
unit system are A and A , respectively, then their ratio
A
χ≡ (A.1)
A
is called the conversion factor of A between the two systems. The reason that χ is
not equal to 1 is that the units of T , L and M are different in the two systems. The
units of T , L and M in SI are the s, kg and m, respectively. In the geometrized unit
system, the only certain thing is that c = G = 1, while the units of T , L and M are
somewhat flexible. For the convenience of comparing the two systems, we stipulate
the time unit in the geometrized system to also be s. Under this stipulation, one
can determine the geometrized units of L and M using c = G = 1, and there is no
longer any flexibility (see Optional Reading A.1). According to dimensional analysis,
the relation between a derived unit and the base units is given by the dimensional
equation:
[A] = [T ]τ [L]λ [M]μ . (A.2)
When we only care about the conversion between the geometrized unit system and
SI (or the Gaussian unit system), the [T ]τ in this system can be ignored since the
time unit is the same in the two systems:
What the dimensional equation describes is how a derived unit changes with a change
of the base units. For instance, once we treat the [L] and [M] in the above equation as
multiples of the units of the base quantities L and M, respectively, then [A] represents
the corresponding multiple of the unit of the derived quantity A. In this interpretation,
all of [A], [L] and [M] represent numbers, and (A.3) should be interpreted as a
numerical-value equation. Changes of the units of L and M lead to corresponding
changes of the units of the velocity V and the gravitational constant G, their relations
obey
[V ] = [L] , [G] = [L]3 [M]−1 . (A.4)
Suppose the multiple of the units of L and M when we turn from SI to the geometrized
system are [L] and [M], respectively, then the multiple of V and G are [V ] and [G]
in (A.4); the multiple of A is [A] in (A.5), and comparing with (A.1) we can see
that χ = [A]. The speed of light and the true value of the gravitational constant in
SI are c and G, which are both 1 in the geometrized system, and hence [V ] = 1/c,
[G] = 1/G. Plugging these into (A.5) yields [A] = c−λ−3μ G μ . Therefore,
χ = c−λ−3μ G μ . (A.6)
Equations (A.1) and (A.6) indicate that to find the numerical value of A in SI from its
value A in the geometrized system, we only have to know the dimensional exponents
λ and μ of A with respect to the base quantities L and M, which can be easily derived
or looked up.
Appendix A: The Conversion Between Geometrized and Nongeometrized … 529
Example 1 Find the expression of the Schwarzschild radius in SI from its expression
r S = 2M in the geometrized system.
[dx μ /dτ ] = [T ]−1 [L], and hence for a quantity dx μ /dτ we have λ = 1, μ = 0,
χ = c−1 and
gμν (dx μ /dτ )(dx ν /dτ ) = c−2 gμν (dx μ /dτ )(dx ν /dτ ) .
Example 3 In Sect. 10.2.2 we used the expression for the angular frequency of a
photon in the geometrized system ω = dt /dβ (β is the affine parameter of the
photon’s world line). Find its form in SI.
Solution First we should figure out the dimension of β. The wave 4-vector
of the photon is K = (∂/∂β )a . It follows from K a = ω (∂/∂t )a + k a that [K a ] =
[k a ].1 Since k a = k i (∂/∂ x i )a , where k i are the components of the wave 3-vector,
and [k i ] = [L]−1 , we have [K i ] = [L]−1 , and thus [β] = [L]2 . The expression ω =
dt /dβ can be written as ω dβ /dt = 1. Let A ≡ ω dβ /dt , then [A] = [T ]−2 [L]2 .
Hence λ = 2, μ = 0, χ = c−2 , A = c−2 A, i.e., ω dβ /dt = c−2 ωdβ/dt, and so
ω = c2 dt/dβ.
In general relativity we often encounter tensors like gab , Rabc d , Rab and R. When
it comes to the conversion of units, we will need to know the dimensions of these
quantities. For the convenience of the conversion, first we prove the following con-
clusions (note that the indices can be unbalanced in an dimension equation):
1The dimension of a vector can be defined as the dimension of the real number (quantity) obtained
by acting the vector on a dimensionless scalar field. Similarly one can define the dimension of a
dual vector and a tensor.
530 Appendix A: The Conversion Between Geometrized and Nongeometrized …
Proof (1) Since the essence of ds 2 = gμν dx μ dx ν is gab = gμν (dx μ )a (dx ν )b , we have
[gab ] = [ds 2 ] = [L]2 . (The readers who feel confused about this may consider it from
the perspective of the components. When x μ and x ν are both length coordinates, from
ds 2 = gμν dx μ dx ν and [ds 2 ] = [L]2 we can see that [gμν ] = 1. Then it follows from
gab = gμν (dx μ )a (dx ν )b that [gab ] = [L]2 . One should notice that, unlike [gab ] which
is absolute, [gμν ] relies on the dimension of the coordinates involved.)
(2) [∇a ωb ] = [∂a ωb ] = [(dx μ )a (dx ν )b ∂ων /∂ x ν ] = [(dx ν )b ][ων ] = [ωb ].
(3) ∇a ∇b ωc − ∇b ∇a ωc = Rabc d ωd . Considering that [∇a ∇b ωc ] = [ωc ], we have
[Rabc d ωd ] = [ωd ], and thus [Rabc d ] = 1.
(4) [Rabcd ] = [gde Rabc e ] = [L]2 .
(5) [Rac ] = [g bd Rabcd ] = [L]−2 [L]2 = 1.
(6) [R] = [g ac Rac ] = [L]−2 .
Example 4 Find the form of Einstein’s equation in SI from its form Rab − R gab /2 =
8π Tab in the geometrized system.
Solution For simplicity (and without loss of generality), take a perfect fluid as an
example. The energy momentum tensor of a perfect fluid is
Tab = (ρ + p )Ua Ub + p gab
.
Since the dimensions of the terms being summed are the same, we only have
to consider how does the equation Rab = 8π p gab transform. [Rab ] = 1 leads to
[Rab ] = [Rab ]. Also [ pgab ] = [M][L]−1 [T ]−2 · [L]2 = [M][L][T ]−2 , and hence for
the quantity pg ab we have λ = μ = 1, χ = c−4 G. Thus, p gab = c−4 Gpgab , and so
Rab = 8π c−4 Gpgab . Therefore, the form of Einstein’s equation in SI is
1 8π G
Rab − Rgab = 4 Tab . (A.8)
2 c
Until now we only talked about the conversion of the units in mechanics. Although
we used SI as an example for non-geometrized systems, the discussion can also be
applied to the Gaussian system. However, when electromagnetism is involved, one
needs to add a fourth base quantity, then the difference between SI and the Gaussian
system will be revealed. The fourth base quantity in SI is the electric current I , whose
base unit is the ampere; the fourth base quantity in the Gaussian system is the permit-
tivity , whose base unit is the permittivity of the vacuum 0 (and thus the number
0 = 1). Correspondingly, the equations in the geometric system that has electro-
magnetic quantities also have two forms, which may be called the “geometrized
SI” and “geometrized Gaussian system”. Besides the basic requirement c = G = 1,
the geometrized Gaussian system also requires 0 = 1, while the geometrized SI
stipulates that the unit of electric current is the ampere. To match the international
literature, we adopt the geometrized Gaussian system for all the equations in this text
that have electromagnetic quantities. The equations that do not have electromagnetic
Appendix A: The Conversion Between Geometrized and Nongeometrized … 531
quantities have the same form in the two geometrized systems. It is not difficult to see
that the method above can be applied to both the conversion from the geometrized
Gaussian system to the Gaussian system and that from the geometrized SI to SI.
For instance, it is straightforward for the reader to convert the form of the RN line
element in the geometrized Gaussian system
2M Q 2 2M Q 2 −1 2
ds 2 = − 1 − + 2 dt 2 + 1 − + 2 dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) ,
r r r r
(A.9)
to the following form in the Gaussian system:
2G M G Q2 2G M G Q 2 −1 2
ds 2 = − 1 − 2 + 4 2 c2 dt 2 + 1 − 2 + 4 2 dr + r 2 (dθ 2 + sin2 θdϕ 2 ) .
c r c r c r c r
(A.10)
To facilitate lookup, we list some of the equations involving electromagnetic quan-
tities in the form of geometrized SI as follows (the equation numbers without * are
the corresponding equations in the geometrized Gaussian system):
−1
∂ a Fab = − 0 Jb ,
(6.6.10*)
· E = ρ ,
∇ × E = − ∂ B ,
∇ · B = 0 ,
∇ × B = μ0 j + ∂ E .
∇
0 ∂t ∂t
(6.6.12*)
1
Tab = 0 (Fac Fb
c
− ηab Fcd F cd ) , (6.6.28*)
4
(Fac Fb c + ∗ Fac ∗ Fb c ) ,
0
Tab = (6.6.28 *)
2
T00 =
0
(E 2 + B 2 ) wi = −Ti0 = 0(E
× B)
i, i = 1, 2, 3 ,
2
(no number)
−1
2M Q2 2 + 1 − 2M + Q2
ds 2 = − 1 − + dt dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 ) ,
r 4π 0 r 2 r 4π 0 r 2
(8.4.26*)
2
Q Q
Fab = − (dt)a ∧ (dr )a , or Aa = − (dt)a . (8.4.27*)
4π 0 r 2 4π 0 r
All of the 1/2π in (8.8.7) are changed to 2 0 , and all of the factors 2 in (8.8.8) and
(8.8.9) are changed to 8π 0 . The −2π Jμ in Exercise 8.10 is changed to − 21 0 Jμ .
[Optional Reading A.1]
This optional reading further introduces the geometrized system (still restricted to
mechanics). Question: How large are the units of the length L and mass M (as quanti-
ties)? It will be convenient for answering this question if we choose T , V and G as the base
quantities. The dimension equations of L and M with respect to these three base quantities
are
[L] = [T ][V ] , [M] = [T ][V ]3 [G]−1 . (A.11)
532 Appendix A: The Conversion Between Geometrized and Nongeometrized …
Let L G and L I represent the number obtained by measuring the same length using the
length units in the geometrized system and SI respectively, then L G /L I = [L]. Note that the
time units in the geometrized system and non-geometrized systems are respectively 1 and
c = 3 × 108 , which means [L] = 1/c, and hence the above equation gives L I = cL G . Thus,
length unit in the geometrized system = c × length unit in the SI = 3 × 108 m . (A.12)
Similarly, it follows from the second equation in (A.11) and [G] = 1/G (where the number
G = 6.67 × 10−11 ) that
c3
mass unit in the geometrized system = × mass unit in the SI
G
(3 × 108 )3
= × kg = 4 × 1035 kg . (A.13)
6.67 × 10−11
On the other hand, when we do not need to change the units of V and G, it is quite beneficial to
regard the geometrized system as having only one base quantity T . In this case, we can view
three originally different quantities—time, length and mass—as the same type of quantity.
The “key” to identifying them is to regard 1 s, 3 × 108 m and 4 × 1035 kg as equal, i.e.,
The geometrized unit system is very convenient for general relativity. For a quantum
theory that does not involve gravity, a natural unit system is frequently used, in
which c = = 1. Depending on the field involved, sometimes one can also set a
third physical constant to 1. For example, kB (the Boltzmann constant) is set to 1
when thermodynamics is involved, m e (the value of the electron mass) is set to 1
when atomic physics is involved, m p or m n (the value of the proton or neutron mass)
is set to 1 when nuclear physics is involved, and G = 1 when gravity is involved
(e.g., a theory of quantum gravity). The unit system with G = c = = 1 is also
called the Planck unit system. Now we discuss the conversion between the Planck
system and SI. Compared with the geometrized system, the Planck system has an
additional constraint = 1 besides G = c = 1, which prevents one from choosing
the unit of time (and thus all the quantities) arbitrarily. Therefore, we should start
from (A.2) [instead of (A.3)], and change (A.4) to
It is not difficult to show that the “unique” quantity with time dimension constructed
by the speed of light, gravitational constant, and the reduced Planck constant is
the Planck time t P , whose numerical value in SI is tP = (G/c5 )1/2 ∼ 10−43 (s),
where c, G and are the numerical values of the speed of light, the gravitational
constant, and the reduced Planck constant in SI, respectively. The values of these three
quantities in SI are all 1, and hence [V ] = 1/c, [G] = 1/G, [T ] = 1/tP . Suppose χ̃
is the conversion factor for the quantity A between SI and the Planck system, then it
follows from (A.16) that
−(λ+μ+τ )
χ̃ = c−λ−3μ G μ tP = c−λ−3μ G μ (G/c5 )−(λ+μ+τ )/2 . (A.17)
Example 5 The relation of the energy E and the frequency ν in the Planck form
reads E = 2π ν . Find its form in SI.
Solution Suppose A ≡ E/ν, then [A] = [E][ν]−1 = [T ]−1 [M][L]2 , and hence τ =
−1, μ = 1, λ = 2. Plugging this into (A.17) yields
Example 6 The “unique” quantity with mass dimension constructed by the speed
of light, the gravitational constant, and the reduced Planck constant is the Planck
mass mP , whose numerical value in the Planck system is m P = 1. Find its numerical
value m P in SI.
Exercises
A.1. The form of the relation of the energy, mass and momentum of a point mass in
the geometrized system reads E 2 = m 2 + p 2 . Find its form in SI.
A.2. Find the form of hydrostatic equation in SI from its form d p /dr = −ρ m /r 2
in the geometrized system.
534 Appendix A: The Conversion Between Geometrized and Nongeometrized …
Reference
Sachs, R. K. and Wu, H. (1977), General Relativity for Mathematicians, Spinger-Verlag, New York.
Conventions and Notation
Note on Conventions
(1) Starting from Sect. 2.6, this work has adopted the abstract index notation to
represent tensors. For instance, v a represents a vector, where the Latin letter
a, called an abstract index, plays a similar role to the → in the commonly
used notation v. Do not interpret v a as the ath component of v a . When talking
about the components we use Greek letters as the indices (called component
indices or concrete indices); for example, v μ represents the μth component of the
vector v a . There is only one exception: a vector v a in a 4-dimensional spacetime
has three spatial components, for which we will use the most commonly used
convention, i.e., using v i (where i = 1, 2, 3) to represent the ith component of
v a . Although this violates the stipulation of “using Latin letters to represent the
abstract indices”, it is convenient in many ways. In order to distinguish from
the abstract indices a, b, c, d, e, . . ., we only use Latin letters starting from i
(usually i, j, k) as the labels for the spatial components. Practice has shown that
this can effectively avoid confusion. For more details about the index notation,
see Sect. 2.3.
(2) This work adopts the signature convention − + + + for the metric of 4-
dimensional spacetime.
(3) The definitions of the Riemann tensor Rabc d and the Ricci tensor Rab have various
conventions in the literature. This work follows the conventions of Wald (1984).
Notation List
{ } Set. First appears in Sect. 1.1. E.g., X = {1, 4, 5.6} stands for the set
formed by the real numbers 1, 4 and 5.6.
R The set of real numbers. First appears in Sect. 1.1.
N The set of natural numbers. First appears in Sect. 1.3.
Sn n-dimensional sphere.
∀x For all x. First appears in Sect. 1.1.
∃ There exists. First appears in Sect. 1.1.
∈ Belongs to. First appears in Sect. 1.1. E.g., x ∈ X stands for “x belongs
to the set X ”, i.e., x is an element of X .
∈/ Does not belong to. First appears in Sect. 1.1.
⊂ Contained in. First appears in Sect. 1.1. E.g., A ⊂ X stands for “A is
contained in the set X ”, i.e., A is a subset of X .
Contained in but not equal to. First appears in Sect. 1.1. E.g., A ⊂ X
stands for “A is contained in but not equal to the set X ”, i.e., A is a
proper subset of X .
∪ Union (see Definition 2 of Sect. 1.1).
∩ Intersection (see Definition 2 of Sect. 1.1).
− Difference of sets, e.g., A − B stands for the difference of the sets A
and B (see Definition 2 of Sect. 1.1).
−A Complement of A (see Definition 2 of Sect. 1.1).
∅ Empty set. First appears in Sect. 1.1.
:= Defined as. First appears in Sect. 1.1.
≡ Identical to or denoted by. First appears in Sect. 1.1. E.g., A ≡ B ∪ C
means “denote B ∪ C by A”.
∼
= Approximately equal to.
⇒ Implies (if ... then), e.g., A ⇒ B stands for “if A then B”.
⇔ Equivalent to (if and only if).
× Cartesian product (see Definition 3 of Sect. 1.1).
Q.E.D. (Denotes the end of a proof, aligned to the right.)
Rn The set of n-tuples (x 1 , . . . , x n ) of real numbers, i.e., Rn = R × · · · × R
(n factors in total).
⊗ Tensor product (see Definition 2 of Sect. 2.4).
:→ Map. First appears in Sect. 1.1. E.g., f : X → Y stands for “the map
from X to Y ”.
f [A] Suppose f : X → Y , A ⊂ X , then the image of A under the action of
f is denoted by f [A] in order to distinguish it from the image f (x) of
x ∈ X under f .
→ Maps to (image of a function), e.g., suppose f : X → Y , x ∈ X , y ∈ Y ,
then x → y stands for “the image of x is y”.
◦ Composite map. First appears in Sect. 1.1. E.g., φ ◦ ψ stands for the
composite map of φ and ψ (ψ after φ).
(X, T ) The topological space with X as the base set and T as the topology (see
Definition 2 of Sect. 1.2 and the following paragraph).
Tu Usual topology (see Example 3 of Sect. 1.2).
Cr The first r derivatives exist and are continuous.
C∞ Smooth (derivatives of all orders exist and are continuous).
Conventions and Notation 537
Reference
Gravitational potential, 178, 239, 275, 294, Integral curve, 34, 228
425 Integral of a function, 145
Gravitational radiation, 296, 348 Integration on manifolds, 134
Gravitational redshift, 414, 461, 463 Interior, 12
Gravitational wave, 296 Interior Schwarzschild solution, 428
cross-polarized, 307 Intersection, 2
plus-polarized, 307 Intrinsic curvature, 92, 100
polarization modes of, 299, 324 Invariant, 195
primordial, 515 Inverse image, 3
Graviton, 308, 326 Inversion, 54
Group, 35 Isometry, 113, 333, 336, See also Exercise
4.12
Isometry group, 337
H Isotropic coordinate system, 397
Harmonic coordinate condition, 398 Isotropic observer, 471
Harmonic function, 398 Isotropic reference frame, see reference
Hausdorff space, 14 frame
Hodge dual, see dual differential form Isotropic spacetime, 471
Homeomorphism, 9 Isotropy, 212, 468
Homogeneity, 468, 471
spatial homogeneity, 469
Homogeneous, 471 J
Hubble constant, 488 Jacobi identity, see Exercise 2.8
Hubble-Lemaître law, 488
Hubble parameter, 489
Hubble tension, 500 K
Hulse-Taylor binary, 318 Killing equation, 114
Hydrostatic equilibrium, 426 Killing vector field, 113
Hypersurface, 119 Kinnersley metric, 383, 387
null, 122, 176, 228, 315, 452 Kruskal extension (coordinates), 446
spacelike, 122, 170, 176, 398
timelike, 122
Hypersurface orthogonal, 333, 340, 453 L
CDM model, 523
Laser interferometer, 319
I Leaf, 469, 470
Identity map, see map Left-handed system (basis), 136
Incomplete geodesic, 440, 449 Length contraction, 179, 189, 219
Incomplete vector field, see complete vector Lie derivative, 110
field Lightlike vector, 48
Indiscrete topology, 7 LIGO, 319
Inertial coordinate system, 165, 168 Line element, 50
Inertial coordinate time, 171 induced, 84
Inertial force, 262, 267 Linear approximation, 287
Inertial mass, 241 Linearized Einstein (field) equation, 288
Inertial reference frame, see reference frame Linearized Einstein tensor, 288
Inextensible integral curve, 35 Linearized Riemann tensor, 288
Infinitesimal coordinate transformation, 290 Local inertial frame, 248, 269
Inflation, 317, 467 Locally unique, 34, 85
Injection, 4 Local Lorentz frame (system), 269
Instantaneous observer, see observer Local measurement, 196, 410, 424
Instantaneous rest (inertial) reference frame Lorentz contraction, see length contraction
(observer, coordinate system), 199, Lorentz covariance, 189, 239
263, 387 Lorentz 4-force, 223, See also Exercise 6.18
Index 543
U W
Union, 2 Wave 3-vector, 228, 305
Universe, 467 Wave 4-vector, 228, 248, 303, 413
Usual topology, 7 Wavefront, 228
Weak Equivalence Principle (WEP), 267
Weber bar, 318
V Wedge product, 130
Vacuum Einstein equation, see Einstein Weyl tensor, 96, 363, 374
(field) equation component in a null tetrad, 363, 374
Vacuum Schwarzschild solution, 341 White dwarf, 433
Vaidya metric, 378 White hole, 452
Vector, 24 World line, 164
(components) transformation law, 28 World sheet, 175, 180, 228, See also Exer-
Vector field, 32 cise 5.10