100% found this document useful (1 vote)
399 views

Differential Geometry and General Relativity 1 9789819900213 9789819900220 Compress

Uploaded by

Rishi Roy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
399 views

Differential Geometry and General Relativity 1 9789819900213 9789819900220 Compress

Uploaded by

Rishi Roy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 566

Graduate Texts in Physics

Canbin Liang
Bin Zhou

Differential
Geometry
and General
Relativity
Volume 1
Translated and Revised by
Weizhen Jia and Bin Zhou
Graduate Texts in Physics

Series Editors
Kurt H. Becker, NYU Polytechnic School of Engineering, Brooklyn, NY, USA
Jean-Marc Di Meglio, Matière et Systèmes Complexes, Bâtiment Condorcet,
Université Paris Diderot, Paris, France
Sadri Hassani, Department of Physics, Illinois State University, Normal, IL,
USA
Morten Hjorth-Jensen, Department of Physics, Blindern, University of Oslo, Oslo,
Norway
Bill Munro, NTT Basic Research Laboratories, Atsugi, Japan
Richard Needs, Cavendish Laboratory, University of Cambridge, Cambridge, UK
William T. Rhodes, Department of Computer and Electrical Engineering and
Computer Science, Florida Atlantic University, Boca Raton, FL, USA
Susan Scott, Australian National University, Acton, Australia
H. Eugene Stanley, Center for Polymer Studies, Physics Department, Boston
University, Boston, MA, USA
Martin Stutzmann, Walter Schottky Institute, Technical University of Munich,
Garching, Germany
Andreas Wipf, Institute of Theoretical Physics, Friedrich-Schiller-University Jena,
Jena, Germany
Graduate Texts in Physics publishes core learning/teaching material for graduate-
and advanced-level undergraduate courses on topics of current and emerging fields
within physics, both pure and applied. These textbooks serve students at the MS- or
PhD-level and their instructors as comprehensive sources of principles, definitions,
derivations, experiments and applications (as relevant) for their mastery and teaching,
respectively. International in scope and relevance, the textbooks correspond to course
syllabi sufficiently to serve as required reading. Their didactic style, comprehensive-
ness and coverage of fundamental material also make them suitable as introductions
or references for scientists entering, or requiring timely knowledge of, a research
field.
Canbin Liang · Bin Zhou

Differential Geometry
and General Relativity
Volume 1
Canbin Liang Bin Zhou
Department of Physics Department of Physics
Beijing Normal University Beijing Normal University
Beijing, China Beijing, China

Translated by
Weizhen Jia Bin Zhou
Department of Physics Department of Physics
University of Illinois Urbana-Champaign Beijing Normal University
Urbana, IL, USA Beijing, China

ISSN 1868-4513 ISSN 1868-4521 (electronic)


Graduate Texts in Physics
ISBN 978-981-99-0021-3 ISBN 978-981-99-0022-0 (eBook)
https://doi.org/10.1007/978-981-99-0022-0

Jointly published with Science Press


The print edition is not for sale in China (Mainland). Customers from China (Mainland) please order the
print book from: Science Press.

Translation from the Chinese Simplified language edition: “微分几何与广义相对论/Wei fen ji he yu


guang yi xiang dui lun” by Canbin Liang and Bin Zhou, © Science Press 2006. Published by Science
Press. All Rights Reserved.
© Science Press 2023
This work is subject to copyright. All rights are reserved by the Publishers, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publishers, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publishers nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publishers remain neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Professor Canbin Liang (1938–2022) giving a lecture in 2012, photograph by Gui-Rong Liang
Foreword

I first met Canbin Liang during a visit I made to China in 1980. Liang had just
received authorization from the Chinese government to come to the United States
for two years as a visiting scholar, and he asked if I could serve as his advisor. I
agreed, and Liang spent 1981–83 in my group at the University of Chicago.
At that time, China had just emerged from the Cultural Revolution, so, as might
be expected, Liang had very little knowledge of modern developments in general
relativity when he arrived. But he had an enormous interest in acquiring a deep
understanding of the foundations of general relativity and a tremendous dedication
to doing so. At the time, I was in the midst of writing my General Relativity book and
was giving considerable thought as to how the subject should be presented. Liang
carefully read drafts of my book and we had many discussions of issues in general
relativity. Liang also had many interactions with Bob Geroch during his years in
Chicago.
After he returned to China, Liang and I maintained our friendship and we
continued to correspond on many issues in general relativity. Liang devoted himself
to teaching the modern approach to general relativity to students in China and he
published his book Differential Geometry and General Relativity in Chinese. I learned
from a number of students who came from China to our Ph.D. program at the Univer-
sity of Chicago that this book—as well as Liang’s renown lecture course in general
relativity—was instrumental in introducing generations of Chinese students to the
modern approach to general relativity.
Liang’s book is quite similar to my General Relativity book in its coverage and
presentation of the core material. However, while my treatment of the mathematical
material focuses on presenting the general, abstract results in as concise a way as
possible, Liang’s book provides many simple examples that illustrate these results.
Liang also makes considerable effort to warn readers of possible pitfalls that can
arise in their understanding of the material. I believe that many readers will find these
examples and additional discussion quite helpful for working through the meaning
of the general concepts.

vii
viii Foreword

I am very pleased that Differential Geometry and General Relativity has been
translated into English, so that students outside of China now also can benefit from
Liang’s insights and pedagogy.

January 2023 Robert Wald


Charles H. Swift Distinguished Service
Professor of Physics
University of Chicago
Chicago, USA
Preface to the English Edition

The present book is translated and revised based on the second edition of the original
Chinese text under the guidance of the first author, the late Prof. Canbin Liang. As
one of the most popular Chinese books on theoretical physics, this work has already
influenced generations of Chinese physicists since it first came out in 2000. Now
we are so pleased that this work is translated into English and can be accessed by
a broader group of readers. As can be told by the Chinese readers, the writing style
of Prof. Liang is quite distinctive, as he sometimes uses idioms, or even self-created
expressions, in order to make the text more vivid and intuitive to readers. Although
this brings certain difficulties to the translation, we have made our best attempt to
keep the style of the original text, so that this translation can be not only faithful to
but also as expressive as the original work.
Apart from translation, we also implemented a sizable amount of revision to this
work. From minor typo fixes to major content updates, every single chapter has been
improved to some extent. Some revision was based on the personal notes of Prof.
Liang himself, as he has been organizing possible improvements to this work based on
his teaching experience over the years. Besides this, the parts that have been heavily
modified are primarily those on gravitational radiation (Sect. 7.9) and cosmology
(Chap. 10), due to the rapid developments in these areas over the past two decades.
For Sect. 7.9, we expanded the discussions on the gauge conditions and gravitational
plane waves substantially. We also replaced the out-of-date introduction on the detec-
tion of gravitational waves with a new one, which includes the discussions on the
interferometric detection and the recent progress since the first direct observation by
LIGO. For Chap. 10, we decided to revise and upgrade the content of the original
chapter comprehensively, and it is now divided into two parts. The first half is the
new Chap. 10, which sets up the geometric foundation for cosmology, and focuses
on the standard cosmological model. Particularly, we enhanced our mathematical
descriptions for the spatial geometries of the universe and updated the observational
data to the latest ones. The second half will introduce the (currently in development)
“new standard cosmological model”, which includes inflation, dark matter and dark
energy. This part will be presented in Volume II along with other advanced topics.

ix
x Preface to the English Edition

As has been mentioned in the prefaces of the Chinese editions, this work was
influenced by many other textbooks, especially the classic text General relativity by
Robert Wald. Many discussions in this work can be viewed as extensions of those
concise and incisive lines in Wald’s book, making them more accessible to beginners.
However, the aim of this work is in no way to be a replacement for Wald’s book.
In fact, we encourage readers to refer to his book after reading this text. As Prof.
Liang said, just like Wald’s book does a great job of paving the way for readers to
understand The Large Scale Structure of Space-Time, the masterpiece by Stephen
Hawking and George Ellis, one of the goals of this work is to pave the way for
reading Wald’s book (or one at that level). In addition, the material in these books
is also complementary in many ways. We hope this work, especially Volume I, can
be an initiation for the beginners to differential geometry and general relativity, and
can open up the world for them to absorb further knowledge from other great works.
We would like to thank those who helped us and contributed to this work. First,
we express special thanks to Prof. James Nester, who has supported the preparation
of this English edition since day one. He read the translated manuscript of this
entire book carefully, and provided not just English refinements, but also plenty of
professional comments, which notably improved the quality of this book. We also
benefited a lot from discussions with him. The translator Weizhen Jia would like to
thank his friend Brandon Buncher for his constant support. He read a large portion
of the translated manuscript and always offered nice suggestions when Jia consulted
him. In particular, he helped a lot with those expressions used by Prof. Liang that are
hard to translate, which makes this work more accessible to readers from Western
(and other non-Chinese) backgrounds. We would also like to sincerely thank Prof.
Zhoujian Cao, who provided valuable suggestions on the manuscript of Sect. 7.9,
and has always been very supportive to us during the preparation of this work. We
also appreciate Prof. Bin Hu for his comments and suggestions on the manuscript of
Chap. 10.
We would like to thank Nick Abboud and Marcus Rosales for their helpful
comments, and Jinhuan He for providing a translation draft for part of Chap. 9. In
addition, we thank Prof. Robert Wald for his support and the lovely foreword to the
English edition. We would also like to express thanks to Prof. Jerzy Lewandowski,
Prof. Zheng Zhao, Prof. Sijie Gao, Prof. Youngge Ma and Prof. Tong-Jie Zhang
for their support of the publication of this work, and to Dr. Mengchu Huang from
Springer Nature for his assistance. We are also grateful to Ms. Jinxing Zhang for her
generous help.
Prof. Liang was an extraordinary educator who dedicated his entire life to passing
on the knowledge of physics to the next generation. At age 84, he was still giving
lectures even just a few days before he became critically ill. He left us forever soon
after that, just when this volume was about to be finalized; it is unfortunate that he
could not be here in person to see it being published. We hope that the outcome of
Preface to the English Edition xi

this effort can benefit more readers around the world, fulfilling a wish of our beloved
and highly esteemed professor.

January 2023 Weizhen Jia


University of Illinois at Urbana-Champaign
Urbana, USA
Bin Zhou
Beijing Normal University
Beijing, China
Preface to the Second Edition

Since its publication in 2000, the first edition of this book (the first volume) has
attracted attention and received praise in the field of theoretical physics, especially
in general relativity, in China. Due to a small print run, it was sold out in only 2 years.
After its publication, I have been using this book as a textbook; thus far, I have used
it five times to teach graduates and undergraduates. In addition to the teaching at
Beijing Normal University, I was also invited to give lectures to both undergraduates
in the Fundamental Science Class of Tsinghua University and graduate students
from the Academy of Mathematics and Systems Science at the Chinese Academy of
Sciences (CAS). The classes at the CAS have also attracted dozens of graduates and
undergraduates from other departments of the CAS and from another 11 institutions
of higher learning, including Peking University and Tsinghua University. I realized
the importance of this work in promoting the subject, and at the same time, I also found
that there are some mistakes and deficiencies in this work. Needing to improve the
content and to supplement the work with more details, I set out to write a draft of the
second edition. While creating the new draft, I consulted with many of my colleagues
and students, including (in alphabetical order by their last names) Zhoujian Cao,
Muxin Han, Zhiquan Kuang, Yongge Ma, Zhi Wang, Xiaoning Wu, Xuejun Yang,
Hao Zhang, Hongbao Zhang, Bin Zhou and Meike Zhou. Among them, Zhoujian Cao,
Zhiquan Kuang, Hongbao Zhang and Bin Zhou have made outstanding contributions.
Through many discussions with Dr. Bin Zhou, I found that he not only has a vast
amount of knowledge of mathematics and physics but also a clear and logical way
of thinking. He always has a relatively clear, deep and accurate understanding of the
mathematical and physical problems within and beyond this work. In this way, he is
truly a rare and outstanding physicist. In order to further improve the writing quality,
I decided to invite him to revise this work as the second author, and he agreed to my
request. The close collaboration in the past 5 months has proved that this was the
correct decision. I think Dr. Bin Zhou has indeed made remarkable contributions to
the revision.
I would like to express special thanks to two other friends who helped me with and
contributed to the writing of this work. The first one is Dr. Robert Wald, a professor
at the University of Chicago and a member of the National Academy of Sciences

xiii
xiv Preface to the Second Edition

(NAS). He is an excellent advisor who enlightened me and offered generous help to


my teaching and writing after I returned to China. The other is Researcher Zhiquan
Kuang from the Institute of Mathematics, CAS. He reviewed many chapters of this
work, putting forward many valuable suggestions. In addition, his profound thinking
and deep understanding always benefited me when I discussed with him.
This work is divided into two volumes 1,1 and their contents have been fully
introduced in the preface of the first edition. The revision is not only a comprehensive
rewrite of the first edition but also supplements the first edition with new content. The
main supplements to the first volume include the Vaidya metric and the Kinnersley
metric, conjugate points, embedding diagrams and dark energy. The main additions
to the second volume include fiber bundles and their applications in physics, spaces
of constant curvature, and the de Sitter and anti-de Sitter space times. Although the
second edition contains much more difficult material, it still maintains the writing
style of the first edition, that is, it is made to be as understandable as possible.
In particular, the introductory parts of this work are designed to be accessible to
beginners. The whole work can be used as a text for a graduate course and a reference
for relativists as well. The first volume can also be used as a reference book for
undergraduates who are in the second year or above. Physicists who are not in the
field of relativity can also take the first five chapters of the first volume and some
other chapters in the whole work, such as those on Lie groups and Lie algebras and
fiber bundle theory, as an introduction to differential geometry.
As to the writing style, another feature of this work is that it contains two parts—
compulsory reading and optional reading—to meet the needs of readers at different
levels. There are a large number of exercises in each chapter. Recommendations on
the use of compulsory reading, optional reading and exercises are elaborated on in
the preface of the first edition.
In general relativity, there are many verbose formulae, and the adoption of the
system of geometrized units (where c = 1, G = 1) can greatly simplify these
formulae. This system will also be used throughout this book. To help our readers
understand the transition between geometric and non-geometric systems in a better
way, we have attached an appendix (Appendix A).
I want to express my sincere thanks to Academician Ti-Pei Li, Academician Tan
Lu, Researcher Han-Ying Guo, Prof. Liao Liu, Prof. Zheng Zhao, Researcher Runqiu
Liu, Associate Professor Yongge Ma, Prof. Xuejun Yang and Prof. Guihua Tian, along
with many other colleagues and readers for their contribution and support. I also want
to thank readers of the first edition for their concern and love.

April 2005 Canbin Liang


Beijing Normal University
Beijing, China

1The second volume of the second edition was further divided into two volumes when it was
published, which makes the entire work into three volumes.
Preface to the First Edition

Beginning in 1981, I was a visiting scholar at the relativity group of the University of
Chicago for two years. Before going abroad, for various reasons, I only knew a little
about general relativity, and even less about its essential mathematical tool, modern
differential geometry. Thanks to the strong academic atmosphere of the relativity
group of the University of Chicago, and thanks to the careful guidance of both
Professor Robert Wald (my advisor) and Professor Robert Geroch, I soon became
very interested in this research field. As a teacher, before I returned to China, I had a
strong urge to teach my students what I had learned in the past two years as much as
possible. As soon as I got back to China, I taught a series of graduate-level courses,
starting with “Differential Geometry and General Relativity”. I was also invited to
give lectures outside of Beijing. The past decade’s lecture notes have become the
main source for writing this work. Over the past decade, I taught and learned to further
deepen my understanding of the material I had been teaching. When confronted with
difficulties, I would write to my mentors, Professor Wald and Professor Geroch, for
help, and each time they gave me warm replies. Their insights would never fail to
enlighten me. Physicists often feel that modern differential geometry is abstract and
arcane at first pass, and fail to grasp its essence immediately. I think maybe I can help
them with this issue. As I was once a first-time learner, I can empathize with how
difficult it is to begin learning differential geometry. In addition, my past teaching
experience may help reduce the subject’s difficulty. Reducing difficulty has not only
become an aim of my past decade’s teaching, but also has become a major principle
in the writing of this work. In order to reduce the difficulty, I spared no effort to
elaborate, which dramatically increased this work’s length.
Modern differential geometry is not only crucial to the study of general relativity,
but also has important applications in many sub-disciplines of physics (and even
engineering). Many physicists have realized that modern differential geometry will
play an increasingly important role in their further study based on results from inter-
national conferences and a substantial volume of literature, but find it difficult to learn
the material properly. The heads of the Department of Physics of Beijing Normal
University have recognized the importance of modern differential geometry to physi-
cists much earlier. They encouraged and supported me to transfer my first graduate

xv
xvi Preface to the First Edition

course, Differential Geometry and General Relativity, to a one-semester elective


course for advanced undergraduates (about 70 lecture hours) since 1995. More than
half of the lecture hours are used for introducing basic knowledge of differential
geometry (which corresponds to the first five chapters of this book). More than half
of the remainder explains how to apply differential geometry to analyze special rela-
tivity (that is Chapter 6 of this book). The final part gives a brief introduction to
general relativity (part of Chap. 7). Practice shows that physics undergraduates who
like abstract thinking and have learned calculus and the basics of linear algebra can
pass the final exam, as long as they attend class and spend time reviewing what they
have learned and completing their homework assignments (about 5 questions per
week on average). I am thrilled to find that some undergraduates (including sopho-
mores) can understand the essence and develop a strong interest in the topics they
learned. These undergraduates continued to take and excel in the graduate courses I
offered, which covered all of the content following Sect. 7.4 of this volume.
This work is divided into two volumes. The first volume has 10 chapters. Among
them, the first five chapters are an introduction to differential geometry. We begin
to apply this knowledge in Chap. 6, in which we analyze special relativity. The last
four chapters introduce the basic content of general relativity. Although the material
and writing style of the first five chapters are geared toward those studying relativity,
physicists who are not in the field of relativity can also use it as an introduction to
differential geometry. The second volume further explores the advanced topics of
general relativity (focusing on global analysis, such as the global causal structures
of spacetime, asymptotically flat spacetimes, gravitational collapse, Kerr-Newman
black holes, the 3 + 1 decomposition of spacetimes, and the Lagrangian and Hamil-
tonian formulations of general relativity), as well as the mathematical tools required
(such as conformal transformations, Lie groups and Lie algebras). The two volumes
together can be used as a textbook for a graduate course and a reference for rela-
tivists. The first volume can also be used as a textbook for an advanced undergraduate
elective course.
As to the writing style, another feature of this work is that it contains two parts—
compulsory reading and optional reading—to meet the needs of readers at different
levels. The compulsory reading is typed in SimSum while each optional reading
is typed in KaiTi2 and marked with the words [Optional Reading] and [The End
of Optional Reading]. The compulsory reading is selfcontained, and skipping the
optional readings does not affect the understanding of the subsequent compulsory
reading. Footnotes may be treated similarly to optional readings. Beginners are
advised to skip all the optional readings and footnotes during their first reading.
There are a large number of exercises in each chapter; however; their difficulty
varies. In front of the title number, the most difficult exercises are marked with an
asterisk *, which refers to the difficulty and does not mean that the problem is related
to the optional reading. Questions marked with a tilde ∼ before their title numbers
are basic exercises that are strongly recommended. Among them, there are quite easy
questions and some relatively difficult ones. In order to reduce the difficulty, hints

2 In the English edition, each optional reading is indented and typed in a smaller font size.
Preface to the First Edition xvii

are offered on those difficult questions. If time does not allow, the reader may choose
to complete some of the questions that are marked with a tilde. It is okay to read the
material without doing any exercises, but it is likely that you will find it difficult to
understand the later chapters due to the lack of a strong foundation.
Owing to my limited knowledge and understanding, there may exist mistakes and
deficiencies in this book. As an important way to reduce mistakes and deficiencies,
I invited a large number of experts, colleagues and students to read part of the
manuscript of the first volume. In alphabetical order by their last names, they are Bin
Ao, Zhoujian Cao, Luru Dai, Xianxin Dai, Changjun Gao, Sijie Gao, Han He, Bo Hu,
*Chao-Guang Huang, Zhiquan Kuang, **Liao Liu, Xiaoqin Li, Yongge Ma, Junjie
Nan, **Shouyong Pei, **Wen-Chao Qiang, Hua Shen, *Qingjun Tian, *Xiaocen
Tian, Bo-Bo Wang, Jinshan Wu, Xiaoning Wu, **Kongqing Yang, **Yun-Qiang Yu,
*Xuejun Yang, Hongbao Zhang, Peng Zhang, Bin Zhou and Zong-Hong Zhu. (Those
marked with ** are professors or researchers, and those marked with * are associate
professors or associate researchers.) Those mentioned above have put forward many
valuable suggestions on the chapters they read. I would like to express special thanks
to two friends who helped me and contributed substantially to the writing of this
work. The first one is Professor Robert Wald of the University of Chicago. He is
an excellent advisor who enlightened me and provided me immense help with my
teaching and writing after I returned to China. His masterpiece General Relativity is
one of the major references for these volumes. The other one is Researcher Zhiquan
Kuang from the Institute of Mathematics, CAS. He has reviewed many chapters of
this book and put forward many important suggestions. Besides that, his profound
thinking and deep understanding always benefited me when I discussed with him.
I would also like to express thanks to Professor Liao Liu from the Department of
Physics of Beijing Normal University, and Professor Yuanxing Gui from the Depart-
ment of Physics of Dalian University of Technology. Because of their recommen-
dations, this work was included in the publishing plan of Beijing Normal Univer-
sity Press and received financial support from the press as well. I would like to
sincerely thank Professor Zheng Zhao and Professor Yongcheng Wang for their
care and support during the writing and publication. Besides that, I want to express
thanks to Guifu Li, an editor of Beijing Normal University Press, for his support and
contribution. I am grateful to the Beijing Municipal Education Commission for their
approval and funding. I also appreciate the financial support provided by Beijing
Normal University Press.

February 2000 Canbin Liang


Beijing Normal University
Beijing, China
Contents

1 Topological Spaces in Brief . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 The ABCs of Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Compactness [Optional Reading] . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Manifolds and Tensor Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1 Differentiable Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Tangent Vectors and Tangent Vector Fields . . . . . . . . . . . . . . . . . . . 23
2.2.1 Tangent Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Tangent Vector Fields on Manifolds . . . . . . . . . . . . . . . . . 32
2.3 Dual Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4 Tensor Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5 Metric Tensor Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6 The Abstract Index Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3 The Riemann (Intrinsic) Curvature Tensor . . . . . . . . . . . . . . . . . . . . . . 67
3.1 Derivative Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2 Derivative and Parallel Transport of a Vector Field Along
a Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2.1 Parallel Transport of a Vector Field Along a Curve . . . . . 75
3.2.2 The Derivative Operator Associated with a Metric . . . . . 78
3.2.3 Relationship Between the Derivative and Parallel
Transport of a Vector Field Along a Curve . . . . . . . . . . . . 80
3.3 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.4 The Riemann Curvature Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.4.1 Definition and Properties of the Riemann
Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.4.2 Computing Riemann Curvature from a Metric . . . . . . . . 97

xix
xx Contents

3.5 The Intrinsic Curvature and the Extrinsic Curvature . . . . . . . . . . . 99


Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4 Lie Derivatives, Killing Fields and Hypersurfaces . . . . . . . . . . . . . . . . 105
4.1 Maps of Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.2 Lie Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.3 Killing Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.4 Hypersurfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5 Differential Forms and Their Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.1 Differential Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.2 Integration on Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.3 Stokes’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.4 Volume Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.5 Integrating Functions on Manifolds, Gauss’s Theorem . . . . . . . . . 145
5.6 Dual Differential Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.7 Computing the Riemann Curvature Using the Tetrad
Method [Optional Reading] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6 Special Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.1 Foundations of the 4-Dimensional Formulation . . . . . . . . . . . . . . . 163
6.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.1.2 The Background Spacetime of Special Relativity . . . . . . 165
6.1.3 Inertial Observers and Inertial Frames . . . . . . . . . . . . . . . 167
6.1.4 Proper Time and Coordinate Time . . . . . . . . . . . . . . . . . . . 170
6.1.5 Spacetime Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.1.6 Spacetime Structure: Special Relativity Versus
Pre-Relativity Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.2 Interesting Typical Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.2.1 Length Contraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.2.2 Time Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.2.3 The Twin “Paradox” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.2.4 The Garage “Paradox” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.3 Kinematics and Dynamics of a Point Mass . . . . . . . . . . . . . . . . . . . 189
6.4 The Energy-Momentum Tensor of Continuous Media . . . . . . . . . . 206
6.5 Perfect Fluid Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.6 Electrodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.6.1 Electromagnetic Fields and 4-Current Densities . . . . . . . 216
6.6.2 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
6.6.3 Lorentz 4-Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Contents xxi

6.6.4 The Energy-Momentum Tensor


of an Electromagnetic Field . . . . . . . . . . . . . . . . . . . . . . . . 225
6.6.5 Electromagnetic 4-Potential and Its Equation
of Motion, Electromagnetic Waves . . . . . . . . . . . . . . . . . . 226
6.6.6 The Doppler Effect on a Light Wave . . . . . . . . . . . . . . . . . 233
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7 Foundations of General Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.1 Gravity and Spacetime Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.2 Physical Laws in Curved Spacetime . . . . . . . . . . . . . . . . . . . . . . . . . 244
7.3 Fermi-Walker Transport and Non-Rotating Observers . . . . . . . . . 249
7.4 The Proper Coordinate System of an Arbitrary Observer . . . . . . . 259
7.5 Equivalence Principles and Local Inertial Frames . . . . . . . . . . . . . 267
7.6 Tidal Forces and the Geodesic Deviation Equation . . . . . . . . . . . . 273
7.7 The Einstein Field Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
7.8 Linear Approximation and the Newtonian Limit . . . . . . . . . . . . . . 287
7.8.1 Linearized Theory of Gravity . . . . . . . . . . . . . . . . . . . . . . . 287
7.8.2 The Newtonian Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
7.9 Gravitational Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
7.9.1 Gauge Conditions of the Linearized Theory
of Gravity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
7.9.2 Gravitational Plane Waves . . . . . . . . . . . . . . . . . . . . . . . . . 302
7.9.3 Emission of Gravitational Waves . . . . . . . . . . . . . . . . . . . . 316
7.9.4 Detection of Gravitational Waves . . . . . . . . . . . . . . . . . . . 317
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
8 Solving Einstein’s Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
8.1 Stationary Spacetimes and Static Spacetimes . . . . . . . . . . . . . . . . . 331
8.2 Spherically Symmetric Spacetimes . . . . . . . . . . . . . . . . . . . . . . . . . . 336
8.3 The Vacuum Schwarzschild Solution . . . . . . . . . . . . . . . . . . . . . . . . 340
8.3.1 Static Spherically Symmetric Metrics . . . . . . . . . . . . . . . . 340
8.3.2 The Vacuum Schwarzschild Solution . . . . . . . . . . . . . . . . 341
8.3.3 Birkhoff’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
8.4 The Reissner-Nordström Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
8.4.1 Electrovacuum Spacetimes
and the Einstein-Maxwell Equations . . . . . . . . . . . . . . . . . 349
8.4.2 The Reissner-Nordström Solution . . . . . . . . . . . . . . . . . . . 351
8.5 Axisymmetric Metrics [Optional Reading] . . . . . . . . . . . . . . . . . . . 355
8.6 Plane Symmetric Metrics [Optional Reading] . . . . . . . . . . . . . . . . 357
8.7 The Newman-Penrose (NP) Formalism [Optional Reading] . . . . . 360
8.8 Solving the Einstein-Maxwell Equations Using the NP
Formalism [Optional Reading] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
8.8.1 Maxwell’s Equations and Einstein’s Equation
in the NP Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
xxii Contents

8.8.2 An Example of Solving the Einstein-Maxwell


Equations Under the Axisymmetric Condition . . . . . . . . 370
8.9 The Vaidya Metric and the Kinnersley Metric . . . . . . . . . . . . . . . . 378
8.9.1 From the Schwarzschild Metric to the Vaidya
Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
8.9.2 The Kinnersley Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
8.9.3 The Kinnersley Metric (Detailed Discussions) . . . . . . . . 387
8.10 Coordinate Conditions, the Gauge Freedom of General
Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
8.10.1 Coordinate Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
8.10.2 The Gauge Freedom of General Relativity . . . . . . . . . . . . 401
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
9 Schwarzschild Spacetimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
9.1 Geodesics in Schwarzschild Spacetimes . . . . . . . . . . . . . . . . . . . . . 407
9.2 Classical Experimental Tests of General Relativity . . . . . . . . . . . . 412
9.2.1 Gravitational Redshift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
9.2.2 Perihelion Precession of Mercury . . . . . . . . . . . . . . . . . . . 415
9.2.3 Light Deflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
9.3 Spherical Stars and Their Evolution . . . . . . . . . . . . . . . . . . . . . . . . . 422
9.3.1 Interior Solutions for Static Spherical Stars . . . . . . . . . . . 422
9.3.2 Stellar Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
9.4 The Kruskal Extension and Schwarzschild Black Holes . . . . . . . . 439
9.4.1 The Definition of a Spacetime Singularity . . . . . . . . . . . . 440
9.4.2 Coordinate Singularities of Rindler Metrics . . . . . . . . . . . 442
9.4.3 The Kruskal Extension of Schwarzschild
Spacetimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
9.4.4 Surfaces of Infinite Redshift in Schwarzschild
Spacetimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
9.4.5 Embedding Diagrams [Optional Reading] . . . . . . . . . . . . 454
9.4.6 The Gravitational Collapse of a Spherical Star
and Schwarzschild Black Holes . . . . . . . . . . . . . . . . . . . . . 456
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
10 Cosmology I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
10.1 Kinematics of the Universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
10.1.1 Cosmological Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
10.1.2 Spacial Geometries of the Universe . . . . . . . . . . . . . . . . . . 470
10.1.3 The Robertson-Walker Metric . . . . . . . . . . . . . . . . . . . . . . 480
10.2 Dynamics of the Universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
10.2.1 The Hubble-Lemaître Law . . . . . . . . . . . . . . . . . . . . . . . . . 488
10.2.2 Cosmological Redshift . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
Contents xxiii

10.2.3 Evolution of the Scale Factor . . . . . . . . . . . . . . . . . . . . . . . 493


10.2.4 The Cosmological Constant and Einstein’s Static
Universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
10.3 The Thermal History of Our Universe . . . . . . . . . . . . . . . . . . . . . . . 503
10.3.1 A Brief History of the Universe . . . . . . . . . . . . . . . . . . . . . 503
10.3.2 The Dark Matter Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 516
10.3.3 The Cosmological Constant Problem
and the CDM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524

Appendix A: The Conversion Between Geometrized


and Nongeometrized Unit Systems . . . . . . . . . . . . . . . . . . . . . . 527
Conventions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Outline of Volume II

11 Global Causal Structure of Spacetime


11.1 Pasts and Futures
11.2 Inextendible Causal Curves
11.3 Causality Conditions
11.4 Domains of Dependence
11.5 Cauchy Surfaces, Cauchy Horizons and Globally Hyperbolic
Spacetimes
Exercises

12 Asymptotically Flat Spacetimes


12.1 Conformal Transformations
12.2 The Conformal Infinity of Minkowski Spacetime
12.3 The Conformal Infinity of Schwarzschild Spacetimes
12.4 Isolated systems and Asymptotically Flat Spacetimes
12.5 Symmetries on I ± and i0 , the BMS Group and SPI Group
12.6 The Non-locality of Gravitational Energy
12.7 The Total Energy and Total Momentum of an Asymptotically
Flat Spacetime
Exercises

13 Kerr-Newman (KN) Black Holes


13.1 Reissner-Nordström (RN) Black Holes
13.2 The Kerr-Newman Metric
13.3 The Maximum Extension of KN Spacetimes
13.4 Static Limit Surfaces, Ergospheres and More
13.5 Extracting Energy from a Rotating Black Hole (Penrose Process)
13.6 The Black Hole “No-Hair” Conjecture
Exercises

xxv
xxvi Outline of Volume II

14 Revisiting Reference Frames


14.1 General Discussions on Reference Frames
14.2 Einstein’s Rotating Disk
14.3 Clock Synchronization in a Reference Frame [Optional Reading]
14.4 The 3 + 1 Decomposition of Spacetimes
14.5 An Application of the 3 + 1 Decomposition of Spacetimes—the
Initial Value Problem in General Relativity
Exercises

15 Cosmology II
15.1 Finding aWay Out of the Difficulties of the Standard
Cosmological Model—the Inflationary Model
15.2 Dark Matter
15.3 The Cosmological Constant and the Vacuum Energy Problem
15.4 Dark Energy and the “New Standard Cosmological Model”
Exercises

Appendix B Mathematical Foundation of Quantum Mechanics in Brief


B.1 The ABCs of a Hilbert Space
B.2 Unbounded Operators and Their Self-Adjointness
Exercises

Appendix C Geometric Phases in Quantum Mechanics


C.1 Berry’s Geometric Phase
C.2 The Aharonov-Anandan (AA) Geometric Phase

Appendix D Energy Conditions

Appendix E Singularity Theorems and the Cosmic Censorship Hypotheses


E.1 Introducing the Singularity Theorems
E.2 Cosmic Censorship Hypotheses
E.3 The Strong Cosmic Censorship Hypothesis in Terms of TIPs
[Optional Reading]
E.4 Singular Boundaries

Appendix F The Frobenius Theorem

Appendix G Lie Groups and Lie Algebras


G.1 The ABCs of Group Theory
G.2 Lie Groups
G.3 Lie Algebras
G.4 One Parameter Subgroups and Exponential Maps
G.5 Important Lie Groups and Lie Algebras
Outline of Volume II xxvii

G.6 Structure Constants of a Lie Algebra


G.7 Lie Groups of Transformations and Killing Vector Fields
G.8 Adjoint Representations and Killing Forms [Optional Reading]
G.9 The Proper Lorentz Group and the Lorentz Algebra
Exercises
Outline of Volume III

16 Lagrangian and Hamiltonian Formulations of General Relativity


16.1 Lagrangian Formalism
16.2 Hamiltonian Formalism for Systems with a Finite Number of
Degrees of Freedom
16.3 Expressing Lagrangian and Hamiltonian Formalisms of Systems
with a Finite Number of Degrees of Freedom in Geometric
Language [Optional Reading]
16.4 Hamiltonian Formulation of Classical Field Theories
16.5 Hamiltonian Formulation of General Relativity
16.6 Tensor Densities [Optional Reading]
16.7 Symplectic Geometry and Its Applications in the Hamiltonian
Formalism [Optional Reading]
16.8 From Geometrodynamics to Connection Dynamics—A Brief
Introduction to Ashtekar’s New Variables [Optional Reading]
Exercises

17 Isolated Horizons, Dynamical Horizons and Black Hole


Thermodynamics
17.1 Traditional Black Hole Thermodynamics and Its Shortcomings
17.2 Null Geodesic Congruences and Their Raychaudhuri Equations
17.3 The Raychaudhuri Equations on a Null Hypersurface
17.4 Trapped Surfaces and Apparent Horizons
17.5 Weakly Isolated Horizons and Their Zeroth and First Laws
17.6 Further Discussions on Weakly Isolated Horizons [Optional
Reading]
17.7 Dynamical Horizons and Their Mechanical Laws
Exercises

xxix
xxx Outline of Volume III

Appendix H Spacetime Symmetries and Conservation Laws


(Noether’s Theorem)
H.1 Proving Noether’s Theorem Using Geometric Language
H.2 Canonical Energy-Momentum Tensors
H.3 On the Proof Using Coordinate Language

Appendix I Fiber Bundles and Their Applications in Gauge Theories


I.1 Principal Fiber Bundles
I.2 Connections on a Principal Fiber Bundle
I.3 Fiber Bundles Associated to a Principal Fiber Bundle
(Associated Bundles)
I.4 The Global Gauge Invariance of Physical Fields
I.5 The Local Gauge Invariance of Physical Fields
I.6 The Physical Meaning of Cross Sections
I.7 Gauge Potential and Connection
I.8 Gauge Field Strength and Curvature
I.9 Connections and Covariant Derivatives on a Vector Bundle
Exercises

Appendix J De Sitter Spacetime and Anti-de Sitter Spacetime


J.1 Spaces of Constant Curvature
J.2 De Sitter Spacetime
J.3 The Penrose Diagram of de Sitter Spacetime
J.4 More on Event Horizons and Particle Horizons
J.5 Schwarzchild-de Sitter Spacetime
J.6 Anti-de Sitter Spacetime
Chapter 1
Topological Spaces in Brief

1.1 The ABCs of Set Theory

A well-determined collection of some amount of objects is called a set. Each object


in the set is called an element or a point. If x is an element of a set X , then we say
“x belongs to X”, and denote it by x ∈ X . The symbol ∈ / stands for “does not belong
to”. There are two ways to express a set, one is to list each of its elements, separated
by commas, and enclosing all elements in a curly brackets; for example,

X = {1, 4, 5.6}

represents the set that consists of the real numbers 1, 4 and 5.6. The other expression
is to point out the general character of elements in a set; for example,

X = {x | x is a real number}

represents the set of all real numbers (the common notation of this specific set is R),
while

X = {x ∈ R | x > 9}

represents the set of all real numbers that are greater than 9.
The set that has no elements is called the empty set, denoted by ∅.
Definition 1 If each element of a set A belongs to a set X , then we say A is a subset
of X . We also say that A is contained in X , or X contains A, denoted by A ⊂ X
or X ⊃ A. Stipulate that ∅ is a subset of any set. Two sets X and Y are said to be
equal (denoted by X = Y ) if X ⊂ Y and Y ⊂ X . A is called a proper subset of X
(denoted by A  X ) if A ⊂ X and A = X .

Remark 1 A more exact expression of the definition of a subset should be “a set S is


called a subset of the set X if and only if each element of A belongs to X ”. However,
© Science Press 2023 1
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_1
2 1 Topological Spaces in Brief

for convenience’s sake, all of the terms “if” and “when” in the definition are implied
to indicate “if and only if”.

This text will use := to represent “is defined as” and use ≡ to represent “identical
to” or “denoted by”. For example, C ≡ A − B means “denote A − B by C”. The
adoption of these two symbols is simply for clarity, they may be replaced by the
equal sign as well.
Definition 2 The union, intersection, difference and complement of two sets A and
B are defined as follows:
Union A ∪ B := {x | x ∈ A or x ∈ B}.
Intersection A ∩ B := {x | x ∈ A, x ∈ B}. (The condition “x ∈ A, x ∈ B” is
short for “x ∈ A and x ∈ B”, the same applies below.)
Difference A − B := {x | x ∈ A, x ∈ / B}. (Mathematics books usually denote the
difference by A\B or A ∼ B, this text will denote all of them by A − B.)
If A is a subset of X then −A, the complement of A, is defined as −A := X − A.

Theorem 1.1.1 The operations above obey the following laws:


Commutative law A ∪ B = B ∪ A, A ∩ B = B ∩ A.
Associative law (A ∪ B) ∪ C = A ∪ (B ∪ C), (A ∩ B) ∩ C = A ∩ (B ∩ C).
Distributive law (A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C), (A ∪ B) ∩ C = (A ∩ C)
∪ (B ∩ C).
De Morgan’s law A − (B ∪ C) = (A − B) ∩ (B − C), A − (B ∩ C) = (A −
B) ∪ (A − C).

Proof As an example, we prove the second equation of De Morgan’s Law (the reader
should verify the rest). All we have to do for this is to show that both sides of the
equation contain one another.
(A) Suppose x ∈ A − (B ∩ C), then x ∈ A, x ∈ / B ∩ C. The latter leads to x ∈
/ B
or x ∈
/ C. Combining x ∈ A and x ∈ / B, we have x ∈ A − B; combining x ∈ A and
x∈/ C, we have x ∈ A − C, and hence x ∈ (A − B) ∪ (A − C). Therefore,

A − (B ∩ C) ⊂ (A − B) ∪ (A − C) .

(B) Suppose x ∈ (A − B) ∪ (A − C), then x ∈ A − B or x ∈ A − C. The former


leads to x ∈ A, x ∈
/ B; the latter leads to x ∈ A, x ∈
/ C. Combining the two, we have
x ∈ A, x ∈ / B ∩ C, and hence x ∈ A − (B ∩ C). Therefore,

(A − B) ∪ (A − C) ⊂ A − (B ∩ C) . 

Definition 3 The Cartesian product X × Y of two non-empty sets X and Y is


defined as
X × Y := {(x, y) | x ∈ X, y ∈ Y } .

That is, X × Y is a set such that each of its element is an ordered pair (x, y) formed
by one element x from X and one element y from Y . The Cartesian product of a
1.1 The ABCs of Set Theory 3

(finite1 ) number of sets can be defined similarly; for example,

X × Y × Z := {(x, y, z) | x ∈ X, y ∈ Y, z ∈ Z } .

We also stipulate that the Cartesian product satisfies the associative law, i.e., (X ×
Y ) × Z = X × (Y × Z ).
Example 1 R2 := R × R, Rn := R × · · · × R (n factors in total). Since an element
of R2 is an ordered pair formed by two real numbers, these two real numbers are called
the natural coordinates of this element. Similarly, each element of Rn has n natural
coordinates. It follows that Rn is intrinsically endowed with coordinates, though
this is not necessarily true for other sets. Using natural coordinates, the concept of
distance between any two elements of Rn can be defined.
Definition 4 The distance, denoted by |y − x|, between any two elements x =
(x 1 , . . . , x n ) and y = (y 1 , . . . , y n ) is defined as

 n

|y − x| :=  (y i − x i )2 .
i=1

Starting from the next paragraph, this text will use the mathematical symbols ∀
(stands for “for all” or “for any”) and ∃ (stands for “there exists”) frequently, please
be familiar with them.
Definition 5 Suppose X and Y are non-empty sets. A map from X to Y (denoted
by f : X → Y ) is a rule that associates each element of X with a unique element
in Y . If y ∈ Y is the corresponding element of x ∈ X , we write y = f (x); we also
call y the image of x under the map f and call x the preimage (or inverse image)
of y. X is called the domain of the map f , and the set of images of all elements of
X (denoted by f [X ]) is called the range of the map f : X → Y . Maps f : X → Y
and f : X → Y are said to be equal if f (x) = f (x) ∀x ∈ X .
Remark 2 Usually, y = f (x) is also written as f : x → y. Please note the difference
between → and →: → in f : X → Y means that f is a map from X to Y (a set to
a set), while → in f : x → y means that the image of x ∈ X under the map f is y
(an element to an element).
Remark 3 Suppose A ⊂ X , then the subset formed by the images of elements of A
under the map f is denoted by f [A], i.e.,

f [A] ≡ {y ∈ Y | ∃x ∈ A such that y = f (x)} ⊂ Y .

Example 2 A single-valued function y = f (x) in standard calculus is a map from


R (or its subset) to R.

1 The Cartesian product of an infinite number of sets can also be defined, but this is outside the
scope of this text.
4 1 Topological Spaces in Brief

Fig. 1.1 Composite map X Y


g ◦ f . NB: First perform f f
x f (x)
then perform g
g g
f

Z
g (f (x))

Remark 4 A map from R2 to R gives a function of two variables, because each point
in R2 is described by two real numbers (natural coordinates). Similarly, a map from
Rn to Rm gives m functions of n variables.

Definition 6 A map f : X → Y is said to be one-to-one if there is no more than


one inverse image for any y ∈ Y (there may be none). f : X → Y is said to be onto
if there is at least one inverse image for any y ∈ Y (there may be more than one).2

Remark 5 ① A necessary and sufficient condition for f to be onto is that the range
f [X ] = Y . ② If f is a one-to-one map, then there exists an inverse map f −1 :
f [X ] → X . However, whether f has an inverse map or not, we can always define
f −1 [B], the “inverse image” of any subset B ⊂ Y under f , as

f −1 [B] := {x ∈ X | f (x) ∈ B} ⊂ X .

Note that the “inverse” here is a subset (rather than an element) of X . For example, if
X has (and only has) two elements x and x , whose image under the action of f are
both y ∈ Y , then although the inverse map f −1 : Y → X does not exist, f −1 [{y}]
is still meaningful when considering y as a one-point subset (i.e., {y}) of Y , and the
meaning is f −1 [y] = {x, x } ⊂ X .

Definition 7 f : X → Y is called a constant map, if f (x) = f (x ) ∀x, x ∈ X .

Definition 8 Suppose X , Y , Z are sets and that f : X → Y and g : Y → Z are


maps, then the composite map denoted by g ◦ f is a map from X to Z , which is
defined as (g ◦ f )(x) := g( f (x)) ∈ Z ∀x ∈ X (see Fig. 1.1).

Remark 6 If X = Y = Z = R, then the composite map g ◦ f is the familiar com-


posite function of one variable.

If X and Y are just two sets (with no additional structure), “one-to-one” and
“onto” are the only two requirements that we can impose on a map between X and
Y ; however, if some kinds of additional structures are assigned to X and Y , then
sometimes we can impose additional requirements on f : X → Y . For example, we

2Many mathematics books call the one-to-one and onto maps used in this text injections and
surjections, respectively, and call maps that are both injective and surjective one-to-one maps (also
called bijections). Thus, their one-to-one maps are stronger than the one-to-one maps in this text.
1.1 The ABCs of Set Theory 5

Fig. 1.2 Expressing continuity in terms of open intervals, the curve represents the map f : X → Y

can ask f : R → R to be continuous, or even smooth. The continuity of a function


of one variable is already defined in calculus (the “ε − δ definition”), we restate it as
follows: ① f is said to be continuous at x if ∀ε > 0 ∃δ > 0 such that | f (x ) − f (x)| <
ε when |x − x| < δ; ② f is said to be continuous on R if it is continuous at any point
of R. It seems that this definition is unable to be generalized to a map between two
sets that have no definition of distance, since it depends on the concept of distance
between any two elements in R (for R, the distance is the difference of the natural
coordinates). However, after pondering this, one can find that the ε − δ definition can
be rephrased with the concept of open intervals (rather than the concept of distance)
as follows: suppose X = Y = R, the map f : X → Y is said to be continuous if the
“inverse image” of any open interval in Y is a union of open intervals in X (or an
empty set). The equivalence between this statement and the familiar  − δ statement
may be illustrated by Fig. 1.2 (we do not mean to prove it here). In Fig. 1.2a, the
map f : X → Y is continuous according to the  − δ definition; correspondingly, the
inverse image of an arbitrary open interval (a, b) in Y is the open interval (a , b ). In
Fig. 1.2b, the map is continuous; correspondingly, the inverse image of an arbitrary
open interval (a, b) in Y is the union of open intervals (a , b ) and (b , a ). In
Fig. 1.2c, the map f : X → Y is discontinuous at c ∈ X ; correspondingly, there
exists an open interval (a, b) in Y whose inverse image f −1 [(a, b)] = (a , c ) ⊂ X is
neither an open interval nor a union of open intervals. The discussion above indicates
one aspect of the use of the concept “the union of open intervals”: to define the
continuity of the map f : R → R. Actually this concept also has many other uses,
so it is often necessary to generalize it to any set X other than R. For convenience, we
refer to any subset of R that can be expressed as the union of open intervals (along
with the empty set ∅) an open subset. To generalize the concept of an open subset
to any set X , we should first determine the essential, abstract (thus generalizable)
properties of the open subsets of R. They are: (a) both R and ∅ are open subsets;
(b) the intersection of a finite number of open subsets is still an open subset; (c) the
union of any number of open subsets is still an open subset. After generalizing these
three properties, we can define the concept of an open subset for any set X . A set with
open subsets defined is called a topological space. From the concept of open subsets,
6 1 Topological Spaces in Brief

one can also define many concepts and prove many theorems, which develop into a
complete and fruitful branch of mathematics—point-set topology. Sections 1.2 and
1.3 will give an introduction to the basics of topological spaces.

1.2 Topological Spaces

As we mentioned at the end of Sect. 1.1, subsets of R can be divided into two
categories: open subsets and non-open subsets. (Any subset is either open or non-
open. Do not refer to a non-open subset as a closed subset. According to the definition
of a closed subset that we will introduce later, a subset can be open, closed, both, or
neither.) The collection of open subsets has the above three properties (a), (b), (c).
For any non-empty set X , we can also assign some subsets to be open and others to
be non-open in an appropriate manner. To make this assignment useful, we require
that any method of assignment should satisfy three requirements: (a) X itself and the
empty set ∅ are open subsets; (b) the intersection of a finite number of open subsets is
an open subset; (c) the union of any number (which may be finite or infinite) of open
subsets is an open subset. For a given set, there are usually many ways of assigning
openness that meet these three requirements. For example, suppose X is a set, we can
assign X and ∅ as open subsets, and all others as non-open. This certainly satisfies
the above three requirements, with the feature that it has the lowest number of open
subsets (only two). However, we can also have another extreme assignment, namely
to assign any subset of X to be an open subset. It is not hard to see that this method
of assigning openness also satisfies the three requirements above. Although the two
assignments above do not necessarily have much use, they at least can indicate that
there is more than one way of assigning openness to meet the three requirements
above. We say that each of the assignments which satisfies the three requirements
above gives an additional structure to the set X , called the topological structure. For
a set with a topological structure defined, we can point at any subset of it and ask: “Is
this an open subset?” The answer will be either “yes” or “no”, with no middle ground.
Conversely, for a set without any topological structure defined, this kind of question
is meaningless. If X is a set having a topological structure, then the collection of its
open subsets also form a set, called a topology on X , denoted by T . Let P represent
the collection of all subsets of X (as shown in Fig. 1.3), then any open subset O and
any non-open subset V are both elements in P. All of the open subsets of X form a
subset T of P (note that it is not a subset of X ), which is the topology on X . Please
notice the difference between the symbol ⊂ and ∈: O ⊂ X only indicates that O is
a subset of X , while O ∈ T indicates that O is an open subset of X . The discussion
above will pave the way for understanding the definitions expressed by the following
mathematical language.

Definition 1 A topology T on a non-empty set X is a collection of some subsets


of X , which satisfies:
(a) X, ∅ ∈ T ;
1.2 Topological Spaces 7

Fig. 1.3 P is the collection of all subsets of X . Any subset of X (e.g., O, V ) is an element of P .
T is a subset of P such that each element of it (e.g., O) is an open subset of X

n n
(b) If Oi ∈ T , i = 1, 2, . . . , n, then i=1 Oi ∈ T (where i=1 Oi stands for the
intersection of these Oi ); 
(c) If Oα ∈ T ∀α, then α Oα ∈ T . (Adding ∀α after Oα ∈T indicates that
each Oα belongs to T , with no restriction on the number of Oα . α Oα ∈ T indi-
cates that the union of all Oα belongs to T .)

Definition 2 A set X with a topology T assigned to it is called a topological space.


A subset O of the topological space X is called an open subset (or open set for
short) if O ∈ T .

For the same set X we can define different topologies (there may be many T that
satisfy Definition 1). Suppose T1 and T2 are both topologies on X , then a subset A
of X may satisfy both A ∈ T1 and A ∈ / T2 ; that is, A is an open set for T1 (measured
by T1 ), but not an open set for T2 . We thus see that T1 and T2 define X as two
different topological spaces. In order to clarify the choice of the topology, we use
(X, T ) to represent a topological space. As a result, (X, T1 ) and (X, T2 ) represent
two different topological spaces, even though both of their “base sets” is X . After a
topology is specified, one can also just use X to represent a topological space.
Which topology should we choose for a given set X to make it a topological
space? This depends on the properties of X itself, as well as what kind of problem
we are considering. For example, we may choose the so-called “usual topology” as
the topology for the set R in most of the problems we are usually concerned with
(see Example 3 below).

Example 1 Suppose X is an arbitrary non-empty set, and let T to be the collection


of all subsets of X , then it obviously satisfies the three requirements in Definition 1,
and hence forms a topology on X , which is called the discrete topology.

Example 2 Suppose X is an arbitrary non-empty set, and let T = {X, ∅}, then it
obviously satisfies the three requirements in Definition 1 and hence forms a topology
on X , which is called the indiscrete topology. The indiscrete topology is the topology
that has the lowest number of elements.

Example 3 (1) Suppose X = R, then Tu :={the empty set or subsets of R that can
be expressed as a union of open intervals} is called the usual topology on R.
(2) Suppose X = Rn , then Tu :={the empty set or subsets of Rn that can be
expressed as a union of open balls} is called the usual topology on Rn , where an
open ball is defined as B(x0 , r ) := {x ∈ Rn | |x − x0 | < r }, x0 is called the center
8 1 Topological Spaces in Brief

and r > 0 is called the radius. An open ball in R2 is also called an open disk; an
open ball in R is just an open interval.
It is not difficult to check that the Tu in (1) and (2) satisfy the three requirements
in Definition 1. According to the definition above, any open interval of R is an open
set measured by Tu . However, in principal we can also choose other topologies to
make R a topological space different from (R, Tu ). For example, if measured by the
indiscrete topology, then no subset is an open set other than R and ∅. In contrast, if
measured by the discrete topology, then any subset of R (including any closed interval
or half-closed interval) is an open set. From now on, we will consider (Rn , Tu ) when
we treat Rn as a topological space unless stated otherwise.

Example 4 Suppose (X 1 , T1 ) and (X 2 , T2 ) are topological spaces, X = X 1 × X 2


(i.e., X is the Cartesian product of X 1 and X 2 ). Define the topology on X as

T := {O ⊂ X | O can be expressed in the form of a union of subsets of O1 × O2 ,


O 1 ∈ T1 , O 2 ∈ T2 } ,
(1.2.1)
then T is called the product topology on X .
[Optional Reading 1.2.1]
It is not difficult to generalize the definition of the product topology from two topological
spaces to a finite number of topological spaces. However, there is a point we should make:
suppose X = X 1 × X 2 × X 3 , then X can be regarded as either (X 1 × X 2 ) × X 3 or X 1 ×
(X 2 × X 3 ). If X 12 = X 1 × X 2 , and we use T12 to represent its product topology, then we
can define the product topology for the set X = X 12 × X 3 , denoted by T . Similarly, if
X 23 = X 2 × X 3 , and we use T23 to represent its product topology, then we can also define
the product topology for the set X = X 1 × X 23 , denoted by T . Following (1.2.1), we can
also define the product topology T˜ for X = X 1 × X 2 × X 3 as follows:

T˜ := {O ⊂ X | O can be expressed in the form of a union of subsets of


O1 × O2 × O3 , O1 ∈ T1 , O2 ∈ T2 , O3 ∈ T3 } .

It can be proved that T˜ = T = T ; that is, these three definitions for the product topology
on X = X 1 × X 2 × X 3 are different routes to the same end, so there is no ambiguity. The
conclusion for the case of a finite number of topological spaces is also similar. A simple
example is Rn = R × · · · × R, where one can also prove that the product topology defined
in this way agrees with the topology defined in Example 3 in terms of open balls.
[The End of Optional Reading 1.2.1]

Example 5 Suppose (X, T ) is a topological space, and A is an arbitrary non-empty


subset of X . If A is regarded as a set, we can certainly assign a topology, denoted by
S (script S), and thereby make it a topological space, denoted by (A, S ). Since A
is a subset of X , we would hope that S and T are closely related. If A ∈ T , then
the problem is simple, we just need to define S := {V ⊂ A | V ∈ T }. However,
if A ∈/ T , we have A ∈ / S according to the definition above, which contradicts
condition (a) of Definition 1. Therefore, the definition of S above is illegal. A smart
definition is
S := {V ⊂ A | ∃O ∈ T such that V = A ∩ O}. (1.2.2)
1.2 Topological Spaces 9

Fig. 1.4 The bold line


segment (excluding the
endpoints) is a subset V of
A, since it can be considered
as an intersection of O ∈ Tu
and A. From (1.2.2) we can
see that V ∈ S

It can be proved from the equation above that A ∈ S even if A ∈ / T , furthermore


S satisfies the other conditions of Definition 1 (see Exercise 1.6). The S defined
in this way is called the induced topology on A(⊂ X ) derived from T . Later on,
unless stated otherwise, when we treat a subset A of (X, T ) as a topological space,
we will consider it as (A, S ), where S is the topology induced by T . (A, S ) is
called a topological subspace of (X, T ).
The example below is helpful for a better understanding of induced topology.
Consider a unit circle S 1 in R2 defined by S 1 := {x ∈ R2 | |x − x0 | = 1}, whose
center is at x0 . Suppose A ⊂ R2 is S 1 , then A is not open measured by Tu on R2 ,
since it cannot be expressed as an union of open balls in R2 (a line is too thin to
fill in any open disk). If we define an induced topology S for A using (1.2.2), then
A is open as measured by S . Moreover, suppose V is an arbitrary segment of A
(excluding the two endpoints); then, as shown by the bold line in Fig. 1.4, although
V is not an open set measured by Tu , it is open as measured by S , since there exists
an open disk O ∈ Tu such that V = A ∩ O.
Using the concept of open sets, we can define the continuity of maps between
topological spaces. Two equivalent definitions are given below; the proof of the
equivalence is left as an exercise (Exercise 1.10).
Definition 3a Suppose (X, T ) and (Y, S ) are topological spaces. A map f : X →
Y is said to be continuous if f −1 [O] ∈ T ∀O ∈ S (for the definition of f −1 [O],
see ② of Remark 5 in Sect. 1.1).
Definition 3b Suppose (X, T ) and (Y, S ) are topological spaces. A map f : X →
Y is said to be continuous at a point x ∈ X if ∀G ∈ S that satisfies f (x) ∈ G ,
∃G ∈ T such that x ∈ G and f [G] ⊂ G . f : X → Y is said to be continuous if it
is continuous at every point x ∈ X .
Remark 1 It is easy to see that if X = Y = R, and T = S = Tu , then Definition 3b
(and thus Definition 3a) will return to the ε − δ definition.
Definition 4 Topological spaces (X, T ) and (Y, S ) are said to be homeomorphic
to each other if there exists a map f : X → Y such that, (a) f is one-to-one and onto,
and (b) both f and f −1 are continuous.3 Such a map f is called a homeomorphism
from (X, T ) to (Y, S ).

3Motivated readers can try to find examples of continuous maps that are both one-to-one and onto
whose inverse is discontinuous (hint: think about discrete and indiscrete topologies).
10 1 Topological Spaces in Brief

Fig. 1.5 ∀x ∈ S 1 (except for


a), one can use the way
shown in the figure to define
its image f (x) on R

The continuity and differentiability of an ordinary function y = f (x) can be repre-


sented by C r , where r is a non-negative integer. C 0 stands for continuous, C r indicates
that the r th derivative exists and is continuous, and C ∞ denotes that derivatives of
all orders exist and are continuous (called smooth). Although one can cleverly gen-
eralize the C 0 property to maps between topological spaces using the concept of
open sets, this cannot be done for C r with r > 0. In fact, the highest requirement
for maps between topological spaces has already been reflected in the the definition
of homeomorphism. A homeomorphism f : X → Y not only sets up a one-to-one
correspondence between the points in X and Y , but also sets up a one-to-one corre-
spondence between the open sets of X and Y ; hence, all of the properties that depend
on the topology can be “carried” into Y by f . Therefore, from a purely topological
point of view, two topological spaces that are homeomorphic to each other “cannot
be more alike”, and can be considered to be equivalent.
Example 6 Any open interval (a, b) ⊂ R is homeomorphic to R (the proof is left to
the reader as Exercise 1.6).

Example 7 A circle S 1 ⊂ R2 together with the induced topology (induced by Tu of


R2 ) can be treated as a topological space. Is it homeomorphic to R? At first glance,
one might think it is possible to define a homeomorphism from S 1 to R in terms of
Fig. 1.5. However, f is not a map from S 1 to R in that a ∈ S 1 has no image. It is not
difficult to show that f : (S 1 − {a}) → R is a homeomorphism; thus, a circle with
a point removed is homeomorphic to R. However, S 1 is not homeomorphic to R;
we will give a concise proof after Theorem 1.3.8 in Sect. 1.3 (optional reading) in
which the concept of “compactness” that will be discussed in Sect. 1.3 is used. The
key points are: ① S 1 is compact but R is not; ② The image of a compact subset under
a continuous map is still compact. ① and ② imply that S 1 cannot be homeomorphic
to R.

Example 8 Consider a circle and an ellipse on a Euclidean plane. From the viewpoint
of Euclidean geometry they are certainly different: the Euclidean geometry has a
concept of distance which circles and ellipses are defined in respect to. However,
from the aspect of pure topology, (R2 , Tu ) is a topological space and a circle S 1
as well as an ellipse E are two subsets of R2 : S 1 , E ⊂ R2 . One can make S 1 and
E topological spaces (S 1 , S S 1 ) and (E, S E ), where S S 1 and S E are topologies
induced by Tu . It can be proved (and it is intuitively easy to believe) that there
exists a homeomorphism f : (S 1 , S S 1 ) → (E, S E ); thus, from the perspective of
1.2 Topological Spaces 11

(a) (b) (c)

Fig. 1.6 From the perspective of circuits or topology, a and b are identical, while b and c are
different. From the viewpoint of geometry, all three are different

pure topology they are exactly the same. Conversely, if we cut a gap on S 1 , the result
will be homeomorphic to R, and thus has a different topology from that of S 1 and
E. If we imagine R2 as a rubber sheet and manipulate it by deformation, then the
shape of a curve on the sheet will change with it. However, as long as there is no
cutting or gluing, the curves before and after the deformation are homeomorphic
to each other. Hence, topology is also colloquially called “rubber sheet geometry”.
The major difference between topology geometry and Euclidean geometry is that
the former does not have a concept of distance. At first glance, it may seem that a
geometry without a concept of distance would not be useful, but this is not the case.
A simple example is the electric circuit problem. Although there is a big difference
between Fig. 1.6a, b from the Euclidean point of view, they are identical as circuits.
Conversely, if we cut one of the branches in (b) [turning it to (c)], it will be very
different in the view of circuits. This is the same as the viewpoint of topology. In fact,
topology is very useful in the study of complex circuits (networks), which forms an
applied branch called “network topology”.

Definition 5 N ⊂ X is called a neighborhood of x ∈ X if ∃ O ∈ T such that x ∈


O ⊂ N . A neighborhood that is an open set is called an open neighborhood.

Remark 2 Suppose X = R, and N = [a, b], then N is a neighborhood of x accord-


ing to Definition 5 if and only if a < x < b. Please pay particular attention to the
“sideswipe” case: if x = a, then N is not a neighborhood of x, since there is no
open set O in R such that x ∈ O ⊂ N . Intuitively, to make [a, b] a neighborhood
of x, x should have “neighbors” on both sides of it. Since none of the “neighbors”
on the left side belongs to [a, b], [a, b] cannot be a neighborhood of x = a. Thus,
Definition 5 reflects this intuitive requirement to some extent. Please also note the
following subtle example: In the topological space [0, ∞] ⊂ R, an interval [0, 1] is
an open neighborhood of 0, while [0, 1] is a neighborhood of 0.

Definition 5’ (Neighborhoods of a Subset) N ⊂ X is called a neighborhood of


A ⊂ X if ∃ O ∈ T such that A ⊂ O ⊂ N .

Theorem 1.2.1 A ⊂ X is open if and only if A is a neighborhood of x ∀x ∈ A.

Proof (A) Suppose A is open, then ∀x ∈ A, ∃A ∈ T such that x ∈ A ⊂ A. Hence,


A is a neighborhood of x according to Definition 5.
12 1 Topological Spaces in Brief


(B) Suppose A is a neighborhood of x ∀x ∈ A, and let O = x∈A Ox (Ox ∈ T
satisfies x ∈ Ox ⊂ A in Definition 5), then O = A (the reader should complete the
proof of this). Also, from Definition 1 (c) we know that O ∈ T . Hence, A ∈ T , i.e.,
A is an open set. 
Definition 6 C ⊂ X is called a closed set if −C ∈ T .
Theorem 1.2.2 Closed sets have the following properties:
(a) The intersection of any number of closed sets is a closed set;
(b) The union of a finite number of closed sets is a closed set;
(c) X and ∅ are closed sets.
Proof They can be easily proved using Definitions 1, 6 and De Morgan’s Law. 
Thus, any topological space (X, T ) has two subsets that are both open and closed,
namely X and ∅.
Definition 7 A topological space (X, T ) is said to be connected if it does not
contain a subset that is both open and closed other than X and ∅.
Example 9 Suppose A and B are open intervals of R, and A ∩ B = ∅ (draw a picture
of this). If we use T to represent the topology induced on the subset X ≡ A ∪ B
by the usual topology of R, then, in addition to X and ∅, the topological space
(X, T ) also has subsets A and B that are both open and closed ( A and B are open
under the induced topology, and they are also closed since they are complements of
one another.). Thus, (X, T ) is not connected, which coincides with the fact that the
picture of A and B you drew is intuitively not connected.4
Suppose (X, T ) is a topological space, and let A ⊂ X . The closure, interior and
boundary of A are defined as follows:
Definition 8 The closure of A, denoted by Ā, is the intersection of all of the closed
sets that contain A, i.e.,


Ā := Cα , A ⊂ Cα , and Cα is closed. (1.2.3)
α

Definition 9 The interior of A, denoted by i(A), is the union of all open sets that
are contained in A, i.e.,

i(A) := Oα , Oα ⊂ A, Oα ∈ T . (1.2.4)
α

Definition 10 The boundary of A is defined as Ȧ := Ā − i(A). x ∈ Ȧ is called a


boundary point. Ȧ is also denoted by ∂ A.

4What is more consistent with our intuition is the concept called “arcwise connected”. There are
subtle differences between this and the concept of connected (see the first footnote of Sect. 5.2).
1.3 Compactness [Optional Reading] 13

Theorem 1.2.3 Ā, i(A) and Ȧ have the following properties:


(a) ① Ā is a closed set, ② A ⊂ Ā, ③ A = Ā if and only if A is a closed set;
(b) ① i(A) is an open set, ② i(A) ⊂ A, ③ i(A) = A if and only if A ∈ T ;
(c) Ȧ is a closed set.

Proof (a), (b) are easy to prove. (c) can be proved as follows: X − Ȧ = X − [ Ā −
i(A)] = (X − Ā) ∪ i(A), where we used the conclusion of Exercise 1.2 in the last
step. Since Ā is closed, X − Ā is open. In addition, i(A) is open, so hence have
X − Ȧ is open. Therefore, Ȧ is closed. 

The definition below will be used in the whole of Sect. 1.3 and the beginning of
Chap. 2:

 11 A set {Oα } of open sets of X is called an open cover of A ⊂ X if


Definition
A ⊂ α Oα . We can also say that {Oα } covers A.

1.3 Compactness [Optional Reading]

Definition 1 Suppose {Oα } is an open cover of A ⊂ X . If {Oα1 · · · , Oαn }, a subset of {Oα }


with finitely many elements, also covers A, then we say {Oα } has a finite subcover.

Definition 2 A ⊂ X is said to be compact if any of its open covers has a finite subcover.

Example 1 Suppose x ∈ X , then the one-point subset A ≡ {x} must be compact.

Proof Suppose {Oα } is an arbitrary open cover of A, then there exists at least one element
in {Oα } (denoted by {Oα1 }) that satisfies x ∈ {Oα1 }. Hence, {Oα1 } (as a subset of {Oα }) is
an open cover of A ≡ {x}, and thus {Oα } has a finite subcover. 

Example 2 A ≡ (0, 1] ⊂ R is not compact.

Proof Let N represent the set of natural numbers, then {(1/n, 2) | n ∈ N} is an open cover
of A that does not have finite subcover. 

Similarly, any open interval or half-open interval in R is noncompact.

Example 3 R is not compact (the proof is left as Exercise 1.16).

Theorem 1.3.1 Any closed interval of R is compact.

Proof Omitted. 

Remark 1 Do not think that closed sets are necessarily compact (even R has noncompact
closed subsets; the reader should try to find an example). Compactness and closedness are
closely related, but not equivalent. Their relationship is shown in the following two theorems.
14 1 Topological Spaces in Brief

Fig. 1.7 Figure for the proof


of Theorem 1.3.2

To give Theorem 1.3.2, we first introduce the following definition.

Definition 3 A topological space (X, T ) is called a T2 space or Hausdorff space if ∀x, y ∈


X , x = y, ∃O1 , O2 ∈ T such that x ∈ O1 , y ∈ O2 and O1 ∩ O2 = ∅.

Remark 2 Almost all of the common topological spaces (such as Rn ) are T2 spaces. The
indiscrete topological space is an example of a non-T2 space. Hawking and Ellis (1973)
(pp. 13–14) provided an example that is “closer to practical use”.

Theorem 1.3.2 If (X, T ) is a T2 space, A ⊂ X is compact, then A is a closed set.

Proof The theorem obviously holds when A = ∅, and thus we suppose A = ∅ below. All
we have to prove is that X − A ∈ T ; to prove this we only have to show that ∀x ∈ X − A,
∃O ∈ T such that x ∈ O ⊂ X − A (see Theorem 1.2.1). Since X is a T2 space, when x is
given, ∀y ∈ A, ∃O y , G y ∈ T such that x ∈ O y , y ∈ G y and O y ∩ G y = ∅ (see Fig. 1.7).
Varying y over A yields two sets of subsets {G y | y ∈ A} and {O y | y ∈ A}. It is easy to see that
{G y | y ∈ A} is an open cover of A. The compactness of A assures that it must contain a finite
subcover {G y1 , . . . , G yn }. Let O ≡ O y1 ∩ · · · ∩ O yn , then we have: ① O ∈ T ; ② x ∈ O; ③
O ∩ A = ∅ (the proof is left as an exercise), i.e., O ⊂ X − A. Thus, from Theorem 1.2.1
we know that X − A ∈ T , and hence A is closed. 

Theorem 1.3.3 If (X, T ) is compact and A ⊂ X is closed, then A is compact.

Proof Since A is closed, X − A is open. Suppose {Oα } is an arbitrary open cover of A,


then {Oα , X − A} is an open cover of X (here we used the fact that A is a closed set). X is
compact indicates that there exists a finite subcover {O1 , . . . , On ; X − A} for {Oα , X − A}.
Therefore, {O1 , . . . , On } covers A, and hence {Oα } has a finite subcover. 

Definition 4 A ⊂ Rn is said to be bounded if ∃ an open ball B ⊂ Rn such that A ⊂ B.

Theorem 1.3.4 A ⊂ R is compact if and only if A is a bounded closed set.

Proof (A) Suppose A is compact. (a) Since R is a T2 space, from Theorem 1.3.2 we know that
A is a closed set. (b) {(−n, n) | n ∈ N} is an open cover of A, the compactness of A assures
that for this open cover there exists a finite subcover {(−1, 1), (−2, 2), . . . , (−m, m)}, i.e.,
A ⊂ (−1, 1) ∪ (−2, 2) ∪ · · · ∪ (−m, m) = (−m, m), and thus A is bounded.
(B) Suppose A is a bounded closed set. The boundedness assures that ∃M ∈ R such
that A ⊂ [−M, M]. From Theorem 1.3.1 we know that [−M, M], being a subset a subset
of (R, Tu ), is compact. Let C ≡ [−M, M] and use S to represent the topology induced on
C by Tu , then it can be proved that (C, S ) is also compact (exercise). Regarding (C, S )
1.3 Compactness [Optional Reading] 15

as (X, T ) in Theorem 1.3.3 and noticing that A ⊂ C is closed, we conclude that A is


compact.5 

Theorem 1.3.5 Suppose A ⊂ X is compact, and f : X → Y is continuous, then f [A] ⊂ Y


is compact.

Proof Suppose {Oα } is an arbitrary open cover of f [A]. The continuity of f assures that
f −1 [Oα ] is open, and thus { f −1 [Oα ]} is an open cover of A. Since A is compact, there
exists a finite subcover { f −1 [O1 ], . . . , f −1 [On ]}; thus, {O1 , . . . , On } is an open subcover
of {Oα }. Therefore, f [A] ⊂ Y is compact. 

From Theorem 1.3.5 we can obtain a corollary: homeomorphisms preserve the compactness
of subsets.

Definition 5 A property that is invariant under homeomorphisms is called a topological


property or topological invariance.

Example 4 Compactness, connectedness, and the property of T2 are all topological proper-
ties. Boundedness is not an topological property; for example, although an interval (a, b) is
homeomorphic to R, the former is bounded while the latter is unbounded. From this it can
also be seen that length is not a topological property either.

There is a well-known theorem in mathematical analysis: Any continuous function on a


closed interval must attain its maximum and minimum value on this interval. The following
theorem is a generalization of this theorem.

Theorem 1.3.6 Suppose X is compact and f : X → R is continuous, then f [X ] ⊂ R is


bounded and attains a maximum and minimum value.

Proof This is a corollary of Theorems 1.3.4 and 1.3.5. 

Theorem 1.3.7 Suppose (X, T1 ), (X, T2 ) are compact, then (X 1 × X 2 , T ) is compact (T


is the product topology of T1 and T2 ).

Proof Omitted. 

Theorem 1.3.8 A ⊂ Rn is compact if and only if it is a bounded closed set.

Proof This is a corollary of Theorem 1.3.7 and previous theorems (Rn is the Cartesian
product of n R). 

Simple Application Example. Consider (R2 , Tu ). Suppose S 1 is an arbitrary circle in


R2 , then it is easy to see that it is a bounded closed set; thus, from Theorem 1.3.8 we
know that it is compact. From Theorem 1.3.5 we know that continuous maps preserve
compactness; however, neither R nor any of its open intervals is compact. Therefore, S 1
cannot be homeomorphic to R or any of its open intervals. Similarly, from Theorems 1.3.1
and 1.3.5 we know that any closed interval cannot be homeomorphic to R or any of its open
intervals.

5 Strictly speaking, to get the conclusion that A is compact from Theorem 1.3.3 we should generalize
this theorem slightly as follows (its proof is similar to the original theorem): suppose C is a compact
subset of a topological space (X, T ), A ⊂ C and A is a closed subset of (X, T ), then A must be
compact.
16 1 Topological Spaces in Brief

Definition 6 A map S : N → X is called a sequence of X .

Remark 3 Usually a sequence is denoted by {xn }, where xn ≡ S(n) ∈ X , n ∈ N. {xn } is


actually just a series of ordered points in X .

Definition 7 x ∈ X is called the limit of a sequence {xn } if for any open neighborhood O
of x there exists N ∈ N such that xn ∈ O ∀n > N . If x is the limit of {xn }, then we say {xn }
converges to x.

Definition 8 x ∈ X is called an accumulation point of a sequence {xn } if any open neigh-


borhood of x contains infinitely many points of {xn }.

Remark 4 x is the limit of {xn } ⇒ x is an accumulation point of {xn }, but not vice versa.

One of the conditions in the following theorem involves the concept “second countable”. A
set with a finite number of elements is called a finite set; otherwise; it is called an infinite set.
For a finite set one can always number its elements and count them one by one, so a finite set
must be a countable set. However, an infinite set is not necessarily uncountable; for example,
N is a countable infinite set. Finite sets are simpler than infinite sets, and countable infinite
sets are simpler than uncountable infinite sets. A topological space (X, T ) is said to be
second countable if there exists a countable subset {O1 , . . . , O K } ⊂ T or {O1 , . . .} ⊂ T
for T such that any O ∈ T can be expressed as a union of elements from {O1 , . . . , O K } [or
{O1 , . . .}]. For example, (Rn , Tu ) is second countable since Tu has a countable subset such
that any O ∈ Tu can be expressed as a union of elements from this subset. (This countable
subset above is a subset of Tu such that each element Oi of it is an open ball, the natural
coordinates of its center are all rational numbers, so is its radius.)

Theorem 1.3.9 If A ⊂ X is compact, then any sequence in A has an accumulation point


within A. Conversely, if X is second countable and any sequence in A ⊂ X has an accumu-
lation point within A, then A is compact.

Proof Omitted. 

Exercises

˜1.1. Show that A − B = A ∩ (X − B), ∀A, B ∈ X .


˜1.2. Show that X − (B − A) = (X − B) ∪ A, ∀A, B ∈ X .
˜1.3. Fill in the following table with “True” or “False”:

f :R→R is a one-to-one map is an onto map


f (x) = x 3
f (x) = e x
f (x) = cos x
f (x) = 5, ∀x ∈ R

˜1.4. Determine whether each of the following statements is true or false, and give
a brief explanation:
Reference 17

(a) the tangent function is a map from R → R;


(b) a logarithmic function is a map from R → R;
(c) (a, b] ⊂ R is an open set measured by Tu ;
(d) [a, b] ⊂ R is a closed set measured by Tu .

˜1.5. Give a counterexample to show that the statement “the intersection of an


infinite number of open subsets of (R, Tu ) is open” is not true.
˜1.6. Show that the induced topology defined in Example 5 of Sect. 1.2 satisfies
the three conditions in Definition 1 of that section.
1.7. Use an example to show that (R3 , Tu ) has a subset that is neither open nor
closed.
˜1.8. Is the constant map f : (X, T ) → (Y, S ) continuous? Why?
˜1.9. Suppose T is a discrete topology on a set X , and S is an indiscrete topology
on a set Y .
(a) Find all the continuous maps from (X, T ) to (Y, S ). (b) Find all the
continuous maps from (Y, S ) to (X, T ).
˜1.10. Show that Definitions 3a and 3b are equivalent.
1.11. Show that any open interval (a, b) ∈ R is homeomorphic to R.
1.12. Suppose X 1 and X 2 are subsets of R, X 1 ≡ (1, 2) ∪ (2, 3), X 2 ≡ (1, 2) ∪
[2, 3). The topologies induced on X 1 and X 2 by the usual topology of R,
respectively, are denoted by T1 and T2 . Are the topological spaces (X 1 , T1 )
and (X 2 , T2 ) connected?
1.13. Is the topological space formed by a set X with the discrete topology T
connected?
˜1.13. Suppose A ⊂ B. Show that (a) Ā ⊂ B̄. Hint: A ⊂ B implies that B̄ is a
closed set that contains A. (b) i(A) ∈ i(B).
˜1.14. Show that x ∈ Ā ⇔ the intersection of A and any neighborhood of x is
non-empty. A hint to ⇒: suppose O in T and O ∩ A = ∅. First show that
A ⊂ X − O, then (use the definition of closure) show that Ā ∈ X − O.
1.14. Show that R is not compact.

Reference

Hawking, S. W. and Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
Chapter 2
Manifolds and Tensor Fields

2.1 Differentiable Manifolds

Physics cannot be done without a background space. For example, classical mechan-
ics and electrodynamics study the time evolution of matter and electromagnetic fields
in R3 , statistical physics and Hamiltonian theory often use phase spaces, special rel-
ativity has R4 as its spacetime background, etc. Colloquially, these spaces are all
“continuous” rather than consisting of discrete points. The spacetime of general rel-
ativity is also a “continuous 4-dimensional space”, which locally looks like R4 , yet
is not necessarily R4 . However, the meaning of the word “continuous” is not yet
clear. “Differentiable manifold” (or “manifold” for short) is the accurate term used
for all kinds of “continuous spaces” with differential structures. Rn is the simplest
n-dimensional manifold. Roughly speaking, differential manifolds are topological
spaces with differential structures, which look locally like Rn , but globally may be
different from Rn . The precise definition is as follows:
Definition 1 A topological space M is called an n-dimensional differentiable
 or n-dimensional manifold for short, if M has an open cover {Oα }, i.e.,
manifold,
M = α Oα (see Definition 11 of Sect. 1.2), satisfying
(a) for each Oα ∃ a homeomorphism ψα : Oα → Vα (Vα is an open subset of Rn
measured by the usual topology);
(b) If Oα ∩ Oβ = ∅, then the composite map ψβ ◦ ψα−1 (see Fig. 2.1) is C ∞
(smooth).1

Remark 1 ① ψβ ◦ ψα−1 is a map from ψα [Oα ∩ Oβ ] ⊂ Rn to ψβ [Oα ∩ Oβ ] ⊂ Rn .


Since each point of Rn has n natural coordinates, ψβ ◦ ψα−1 provides n functions
of n variables (see Remark 4 of Sect. 1.1). “ψβ ◦ ψα−1 is C ∞ ” means that all these
functions of n variables are C ∞ (the smoothness of n-variable functions has already

1 Definition 1 is the general definition of a smooth manifold. In this text, and usually in physics,
manifolds also satisfy the following additional conditions: as a topological space, M is Hausdorff and
second countable (for both see Sect. 1.3). From now on, our manifolds will satisfy these conditions.
© Science Press 2023 19
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_2
20 2 Manifolds and Tensor Fields

Fig. 2.1 Figure for the


definition of a manifold.
ψβ ◦ ψα−1 is the composite
map of ψα−1 and ψβ

been defined in calculus).2 ② Suppose p ∈ Oα , then ψα ( p) ∈ Rn , and thus the point


ψα ( p) has n natural coordinates. It is natural to call these n numbers the coordinates
of p acquired through the map ψα . Being a topological space, M in general does
not have coordinates; being a manifold, however, its elements (points) in Oα can
acquire coordinates from the map ψα . If Oα ∩ Oβ = ∅, then the points in Oα ∩ Oβ
can acquire coordinates from either ψα or ψβ , and these two sets of coordinates
are different in general. We say that (Oα , ψα ) forms a (local) coordinate system
whose coordinate patch is Oα ; (Oβ , ψβ ) forms another coordinate system whose
coordinate patch is Oβ . Thus, a point in Oα ∩ Oβ has at least two sets of coordi-
nates, denoted by {x μ } and {x ν } (μ, ν = 1, . . . , n), respectively. The map ψβ ◦ ψα−1
naturally provides a relation connecting these two sets of coordinates, which is rep-
resented by n functions of n variables as follow:

x 1 = φ 1 (x 1 , . . . , x n ), . . . , x n = φ n (x 1 , . . . , x n ).

This relation is called a coordinate transformation. Condition (b) in Definition 1


assures that the function relations x μ = φ μ (x 1 , . . . , x n ) in a coordinate transfor-
mation are all C ∞ . For convenience’s sake, {x ν } is also usually called a coordinate
system, although we cannot see the domain of the coordinate patch explicitly from
{x ν }. Physicists also often denote x μ = φ μ (x 1 , . . . , x n ) as x μ = x μ (x 1 , . . . , x n ).
Definition 2 In mathematics, a coordinate system (Oα , ψα ) is also called a chart;
the collection of all charts {(Oα , ψα )} satisfying conditions (a) and (b) in Definition 1
is called an atlas. Condition (b) is also called the compatibility condition. Therefore,
any two charts in an atlas are compatible.
Example 1 Suppose M = (R2 , Tu ). Choose O1 = R2 , ψ1 = identity map (i.e., each
image coincides with its inverse image), then {(O1 , ψ1 )} is an atlas that contains
only one chart. Thus, R2 is a 2-dimensional manifold that can be covered by a single
coordinate patch, called a trivial manifold. According to this atlas, the coordinates
of each point in R2 are the natural coordinates it is endowed with as an element of

2 The manifold in Definition 1 is short for smooth manifold. Change C ∞ in condition (b) to C r (r
is a natural number), then it becomes the definition of a C r manifold.
2.1 Differentiable Manifolds 21

R2 . Of course points in R2 can also be described by other coordinates (such as polar


coordinates). This is actually nothing but choosing another chart (O2 , ψ2 ) that is
compatible with (O1 , ψ1 ), where ψ2 maps p ∈ O2 to ψ2 ( p) ∈ R2 , and referring to
the natural coordinates of ψ2 ( p) as the new coordinates of p. However, it should be
noted that the coordinate patch of R2 does not necessarily include all points of R2
(e.g., polar coordinates).
Similarly, we can see that Rn is an n-dimensional trivial manifold.

Example 2 Suppose M = (S 1 , S ), where S 1 := {x ∈ R2 | |x − o| = 1} is a unit cir-


cle with center o and S is the topology on S 1 induced by Tu of R2 . Since S 1 is not
homeomorphic to R (see Example 7 in Sect. 1.2), any atlas that defines (S 1 , S ) as
a manifold cannot contain only one chart. Suppose x 1 and x 2 are the natural coor-
dinates of R2 , and O1+ , O1− , O2+ , O2− are “open semicircles” defined as follows:
Oi+ = {(x 1 , x 2 ) ∈ S 1 |x i > 0}, Oi− = {(x 1 , x 2 ) ∈ S 1 |x i < 0}, i = 1, 2, then {Oi± }
can cover S 1 . Define the homeomorphism ψi± from Oi± to the open interval (−1, 1)
as the following “projection map”: ψ1± ((x 1 , x 2 )) = x 2 , ψ2± ((x 1 , x 2 )) = x 1 , then it
is easy to prove that the overlap regions of the open semicircles satisfy the compati-
bility condition (Exercise 2.1), and thus S 1 is a 1-dimensional manifold. In fact, one
can cover S 1 with an atlas containing only two charts; motivated readers may try to
verify that.

Example 3 Suppose M = (S 2 , T ), where S 2 = {x ∈ R3 | |x − o| = 1} is a unit


sphere with center o and S is the topology on S 2 induced by Tu of R3 . Follow-
ing the method in the last example, one can cover S 2 with six open hemispheres and
define the homeomorphism from each hemisphere to the corresponding open disk
on R2 [Wald (1984)]. By showing that the overlap regions satisfy the compatibility
condition one can show that S 2 is a 2-dimensional manifold. It can also be proved
that S 2 can be covered with an atlas that contains only two charts. The surface of
the Earth can be regarded as S 2 , while it looks locally like R2 . You cannot tell that
human beings are living on a sphere from just a city map of Beijing (R2 ). On the
contrary, a globe tells you clearly that the surface of the Earth is globally not R2 .

Suppose the atlas {(Oα , ψα )} defines a topological space M as a manifold, then


any two charts in this atlas are naturally compatible. However, we can also use another
atlas {(Oβ , ψβ )} to define the same M as a manifold. Now there are two possibilities:
① These two atlases are not compatible with each other, i.e., there exists Oα and Oβ
such that Oα ∩ Oβ = ∅, and ψα and ψβ do not satisfy condition (b) of Definition 1 on
Oα ∩ Oβ . We then say that these two atlases define M as two different differentiable
manifolds, and these two atlas represent two different differential structures (The
concept of differential structures should be understood gradually, rather than in one
step.); ② These two atlases are compatible, then we say they define M as the same
differentiable manifold (with only one differential structure). For convenience, we
may treat {(Oα , ψα ); (Oβ , ψβ )} as one atlas. Furthermore, we may even put all of
the charts compatible with (Oα , ψα ) together and create one large atlas. Later on,
when we talk about a manifold, we always assume that the largest possible atlas
22 2 Manifolds and Tensor Fields

Fig. 2.2 The map


ψβ ◦ f ◦ ψα−1 corresponds
to n functions of n variables,
whose C r -differentiability
defines the
C r -differentiability of
f :M→M

has been chosen as the differential structure, so that we can perform any coordinate
transformation.
A significant difference between differentiable manifolds and topological spaces
is that the former has differential structures additional to topological structures.
Therefore, for a map between two manifolds we can not only talk about whether
it is continuous, but also whether it is differentiable, or even if it is C ∞ . Sup-
pose M and M are two manifolds whose dimensions are n and n , atlases are
{(Oα , ψα )} and {(Oβ , ψβ )}, respectively, and f : M → M is a continuous map
(see Fig. 2.2). ∀ p ∈ M, choose any coordinate system (Oα , ψα ) such that p ∈ Oα ,
and coordinate system (Oβ , ψβ ) such that f ( p) ∈ Oβ , then ψβ ◦ f ◦ ψα−1 is a map
from Vα ≡ ψα [Oα ] (or an open set of Vα ) to Rn . Thus, this map corresponds to
n functions of n variables, and their C r -differentiability can be used to define the
C r -differentiability for f : M → M .
Definition 3 f : M → M is called a C r map if ∀ p ∈ M, the n functions of n
variables corresponding to the map ψβ ◦ f ◦ ψα−1 are of class C r .
Remark 2 Since charts in the same atlas are compatible, the definition above is
independent of the choice of (Oα , ψα ) and (Oβ , ψβ ).
Definition 4 Differential manifolds M and M are said to be diffeomorphic to each
other if ∃ f : M → M satisfying (a) f is one-to-one and onto; (b) f and f −1 are
C ∞ . Such an f is called a diffeomorphism from M to M .
Remark 3 ① Being a diffeomorphism is the highest requirement one can impose on a
map of manifolds (if there are additional structures imposed on these manifolds then
it is another matter); manifolds that are diffeomorphic to each other can be considered
to be equivalent. ② A necessary condition for two manifolds to be diffeomorphic to
each other is that they have the same dimension. ③ In Definition 1, ψ : Oα → Vα was
required to be a homeomorphism instead of a diffeomorphism since a diffeomorphism
is a relationship between manifolds, and we did not have the concept of manifold
yet. But now that Definition 4 has been introduced, one may naturally ask: if we
treat Oα and Vα in Definition 1 as manifolds, is ψα a diffeomorphism? The answer
is affirmative, and motivated readers should try to verify this. From this, one can
further their understanding of the statement “a manifold M looks locally like Rn ”.
2.2 Tangent Vectors and Tangent Vector Fields 23

A simple, but important, example of a map f : M → M is the case M = R.


In this case each point of M corresponds to a real number, and hence we have the
following definition:
Definition 5 f : M → R is called a function on M or a scalar field on M. If f is
C ∞ , then it is called a smooth function on M. The collection of all smooth functions
on M is denoted by F M , abbreviated with F when there is no confusion. From now
on, functions will refer to smooth functions unless stated otherwise.

Example 4 The electric potential of a point charge at a point q in R3 is a smooth


function on the manifold M ≡ R3 − {q}.

Example 5 The μth coordinate x μ of a coordinate system (O, ψ) is a smooth func-


tion defined on O; interested readers may try to show that it satisfies the definition
of a smooth function.

Since n coordinates determine a unique point p in O and from f : M → R we


get a unique real number f ( p), when a function f : M → R is combined with a
coordinate system (O, ψ), we get a function of n variables F(x 1 , . . . , x n ). However,
when f is combined with another coordinate system (O, ψ ), we have another func-
tion of n variables F (x 1 , . . . , x n ). The function relations F and F are different
since F = f ◦ ψ −1 while F = f ◦ ψ −1 . Thus, the multivariate function (function
relation) corresponding to f : M → R is coordinate dependent. One should pay
attention to distinguishing a function (scalar field) f from the multivariate function
which comes from combining f with a coordinate system.
Suppose M, N are manifolds, then they must be topological spaces, and thus
M × N is also a topological space. It is not difficult to go a step further and define
M × N as a manifold using the manifold structures of M and N [see Wald (1984)
p. 13]. Suppose M and N have dimensions m and n respectively, then the dimension
of M × N is m + n, i.e., dim(M × N ) = dim M + dim N .

2.2 Tangent Vectors and Tangent Vector Fields

2.2.1 Tangent Vectors

First we review the definition of a vector space (i.e., linear space) in linear algebra.
Definition 1 A vector space over the field of real numbers is a set V together with
two maps, namely V × V → V (called addition) and R × V → V (called scalar
multiplication), satisfying the following conditions:
(a) v1 + v2 = v2 + v1 , ∀v1 , v2 ∈ V ;
(b) (v1 + v2 ) + v3 = v1 + (v2 + v3 ) , ∀v1 , v2 , v3 ∈ V ;
(c) ∃ a zero element 0 ∈ V such that 0 + v = v , ∀v ∈ V ;
(d) α1 (α2 v) = (α1 α2 )v , ∀v ∈ V, α1 , α2 ∈ R;
24 2 Manifolds and Tensor Fields

(e) (α1 + α2 )v = α1 v + α2 v , ∀v ∈ V, α1 , α2 ∈ R;
(f) α(v1 + v2 ) = αv1 + αv2 , ∀v1 , v2 ∈ V, α ∈ R;
(g) 1 · v = v , 0 · v = 0 , ∀v ∈ V .

Remark 1 From these 7 conditions we can deduce that: ∀v ∈ V , ∃ u ∈ V such that


v + u = 0. u is conventionally denoted by −v.

We will also often denote the zero element of V as 0; that is, the symbol 0 stands
for both 0 ∈ R and 0 ∈ V . The reader should be able to identify its meaning by the
context or equation.
Algebraically speaking, any set that satisfies Definition 1 is called a vector space,
and any element of it is called a vector. Suppose p is a point in 3-dimensional
Euclidean space, and V p is the collection of straight line segments (or arrows v) that
start at p with all possible directions and lengths. Define the addition of two arrows
as adding them by the parallelogram law, and define the scalar multiplication α v
(∀α ∈ R, v ∈ V p ) as the manipulation that preserves the direction of the arrow (or
turns into the opposite direction when α < 0) while multiplying its length by |α|,
then V p is a vector space according to Definition 1, and hence each arrow starting at
p is a vector. We want to generalize this kind of concept of vectors to an arbitrary
manifold M; that is, we want to define infinitely many vectors at each point p of
M, the collection of which forms a vector space at p. Since “straight line segment”,
“direction” and “length” are not defined on a general manifold (yet), the way we
defined vectors in terms of arrows cannot be carried over to general manifolds. To
generalize, we should pick the most essential property of an arrow which is the
easiest to be generalized. Suppose v is an arrow at an arbitrary point p in R3 , then
we can take the directional derivative of an arbitrary C ∞ function f on R3 along v,
and the value of this derivative function at p is a real number. Thus, v is a map that
turns f into a real number. Let FR3 represent the collection of all smooth functions
on R3 , then f ∈ FR3 , and hence v is a map from FR3 to R, i.e., v : FR3 → R.
Since the manipulation of taking the directional derivative is linear and satisfies the
Leibniz rule, we finally have found the essential property of an arrow v that can be
easily generalized: it is a linear map from FR3 to R that satisfies the Leibniz rule.
Generalizing this to an arbitrary point p of an arbitrary manifold M, we arrive at the
following definition:
Definition 2 A map v : F M → R is called a vector at a point p ∈ M if ∀ f, g ∈
F M , α, β ∈ R we have
(a) (Linearity) v(α f + βg) = αv( f ) + βv(g);
(b) (Leibniz rule) v( f g) = f | p v(g) + g| p v( f ), where f | p stands for the value
of the function f at p, which can also be denoted by f ( p).

Remark 2 Since f and g are functions on M, f g is also a function on M whose


value at any point p of M is defined as the product of f ( p) and g( p).
[Optional Reading 1.2.1]
Theorem 2.2.1 Suppose f 1 , f 2 ∈ F M are equal in a neighborhood N of p ∈ M, i.e.,
f 1 | N = f 2 | N , then for any vector v at p we have v( f 1 ) = v( f 2 ).
2.2 Tangent Vectors and Tangent Vector Fields 25

Proof First we prove two lemmas.

Lemma 1 If f ∈ F M is a zero function, i.e., f | p = 0 ∀ p ∈ M, then for any vector v at p


we have v( f ) = 0.

Proof of Lemma 1 ∀g ∈ F M we have f + g = g. The linearity of the action of v yields

v(g) = v( f + g) = v( f ) + v(g) ,

and hence v( f ) = 0. 

Lemma 2 If f ∈ F M is zero in a neighborhood N of p ∈ M, i.e., f | N = 0, then for any


vector v at p we have v( f ) = 0.

Proof of Lemma 2 Let h ∈ F M satisfy h| M−N = 0 and h| p = 0, then f h| M = 0, and from


Lemma 1 we have v( f h) = 0. On the other hand, the Leibniz rule also yields v( f h) =
f | p v(h) + h| p v( f ) = h| p v( f ), and hence h| p v( f ) = 0. Note that h| p = 0, we therefore
have v( f ) = 0. 
Now we can prove Theorem 2.2.1. Let f = f 1 − f 2 , then f | N = 0. From Lemma 2 we know
that v( f ) = 0. On the other hand, from the linearity we know that v( f ) = v( f 1 − f 2 ) =
v( f 1 ) − v( f 2 ); therefore, v( f 1 ) = v( f 2 ). 

Remark 3 Definition 2 requires that a vector v at p can only act on f ∈ F M . If a function


f is only defined on a neighborhood U (= M) of p ∈ M, i.e., f ∈ FU , f ∈ / F M , then v( f )
is meaningless. However, one can always find an f¯ ∈ F M and a neighborhood N ⊂ U of
p such that f¯| N = f | N , and thus we can define v( f ) as v( f¯). Although for the same f
there are infinitely many of f¯ that satisfy the requirement above, Theorem 2.2.1 assures that
v( f¯) are the same for all these f¯. Thus, it is legal to define v( f ) in terms of v( f¯). This
conclusion is very useful. For example, suppose (O, ψ) is a coordinate system of M, then
the μth coordinate x μ is a function on O (instead of M), but it is still valid to talk about a
vector v at any point p of O acting on x μ ; that is, v(x μ ) is meaningful.

[The End of Optional Reading 2.2.1]

According to Definition 2, to define a vector at p all we have to do is assign a map


from F M to R that satisfies conditions (a) and (b); that is, assign a corresponding
rule according to which each f ∈ F M corresponds to a specific real number. Since
there are lots of maps like this, there are (infinitely) many vectors at p. For example,
suppose (O, ψ) is a coordinate system with coordinates x μ , then any smooth function
f ∈ F M on M combined with (O, ψ) yields a function of n variables F(x 1 , . . . , x n ).
In this way, we can define n vectors for any point p in O, denoted by X μ (where
μ = 1, . . . , n), whose action on an arbitrary f ∈ F M , i.e., X μ ( f ), are defined as the
following real number

∂ F(x 1 , . . . , x n ) 
X μ ( f ) :=  , ∀ f ∈ FM , (2.2.1)
∂xμ p

where ∂ F(x 1 , . . . , x n )/∂ x μ | p is an abbreviation for ∂ F(x 1 , . . . , x n )/


∂ x μ |(x 1 ( p),...,x n ( p)) . We will also abbreviate ∂ F(x 1 , . . . , x n )/∂ x μ as ∂ f (x 1 , . . . , x n )/
∂ x μ or ∂ f (x)/∂ x μ , even ∂ f /∂ x μ ; the reader should recognize that the f in ∂ f /∂ x μ
26 2 Manifolds and Tensor Fields

stands for a function of n variables F(x 1 , . . . , x n ) rather than a scalar field f . Thus,
(2.2.1) can be shortened as

∂ f (x) 
X μ ( f ) := , ∀ f ∈ FM , (2.2.1 )
∂ xμ p

Theorem 2.2.2 Let V p represent the collection of all vectors at p in M, then V p


is an n-dimensional vector space (where n is the dimension of M), i.e., dim V p =
dim M ≡ n.

Proof (A) Define the addition, scalar multiplication and zero element according to
the following three equations; it is not hard to verify that V p satisfies Definition 1,
and hence is a vector space.
(1) (v1 + v2 )( f ) := v1 ( f ) + v2 ( f ), ∀ f ∈ F M , v1 , v2 ∈ V p ;
(2) (αv)( f ) := α · v( f ), ∀ f ∈ F M , v ∈ V p , α ∈ R;
(3) Define the zero element 0 ∈ V p to satisfy 0( f ) = 0, ∀ f ∈ F M .
(B) Choose an arbitrary coordinate system whose coordinate patch contains p,
then (2.2.1) defines n vectors X μ at p, where μ = 1, . . . , n. We want to show that they
are linearly independent. Suppose n real numbers α μ (μ = 1, . . . , n) are such that
α μ X μ = 0. (In this text we adopt the Einstein summation convention; that is, repeated
indices are assumed to be summed over; here α μ X μ is short for nμ=1 α μ X μ .) Since
the coordinates x ν (ν = 1, . . . , n) can be treated as functions on the coordinate patch,
both sides of this equation should give us the same result when applied to x μ . Accord-
ing to the definition of the zero element 0 [(3) of (A) in this proof], the action on the
right-hand side yields
0(x ν ) = 0 , (2.2.2a)

while the action on the left-hand side yields

α μ X μ (x ν ) = α μ ∂ x ν /∂ x μ | p = α μ δ ν μ = α ν , (2.2.2b)

1, μ = ν ;
where we used (2.2.1) in the first step, and δ μ ν is defined as δ μ ν ≡
0, μ = ν .
ν
Comparing (2.2.2a) and (2.2.2b), we can see that α = 0, ν = 1, . . . , n. Therefore,
X 1 , . . . , X n are linearly independent, and thus dim V p  n.
(C) To show that ∀v ∈ V p , we have

v = vμ X μ , (2.2.3)

where
v μ = v(x μ ) . (2.2.3 )
2.2 Tangent Vectors and Tangent Vector Fields 27

[This is the tricky step, see Wald (1984) p. 16 for a proof]. Equation (2.2.3) indicates
that any element of V p can be expressed linearly in terms of these X μ , and (2.2.3 )
says that its coefficients are the real numbers given by the action of v on x μ . The
combination of (B) and (C) indicates that {X 1 , . . . , X n } is a basis of V p , and therefore
dim V p = n. 

Definition 3 {X 1 , . . . , X n } of any point p in a coordinate patch is called a coordi-


nate basis; each X μ is called a coordinate basis vector, and the coefficients v μ of
v ∈ V p expressed by {X μ } are called the coordinate components of v.

Theorem 2.2.3 Suppose {x μ } and {x ν } are two coordinate systems, the intersection
of their coordinate patches is non-empty, p is a point in the intersection, v ∈ V p , {v μ }
and {v ν } are the coordinate components of v in these two systems, then

ν ∂ x ν  μ
v = v , (2.2.4)
∂ xμ p

where x ν is short for the coordinate transformation function x ν (x σ ) between these


two systems.

Proof We first derive the relationship between two coordinate bases {X μ } and {X ν }
at p. From the definition of {X μ } we can see that ∀ f ∈ F M ,
 
∂ f (x)  ∂ f (x ) 
X μ( f ) = , Xν( f ) =  ,
∂ xμ p ∂x ν p

where f (x) and f (x ) are abbreviations for f (x 1 , . . . , x n ) and f (x 1 , . . . , x n ),


namely the two functions of n variables coming from the combination of the scalar
field f : M → R and the coordinate systems {x μ } and {x ν }, respectively. Suppose
q is an arbitrary point in the intersection of the two coordinate patches, then the
scalar field f has the value f |q at q such that f |q = f (x(q)) = f (x (q)), denoted
by f (x) = f (x ) for short. On the other hand, each x ν corresponds to n functions
of x μ (coordinate transformation relations), denoted by x ν = x ν (x) for short, and
thus f (x) = f (x (x)). Hence,
   
∂ f (x (x))  ∂ f (x ) ∂ x ν ∂ x ν 
X μ( f ) = = = X ( f), ∀ f ∈ FM .
∂ xμ p ∂x ν ∂xμ p ∂ xμ p ν

The equation above indicates that the maps X μ and (∂ x ν /∂ x μ )| p X ν are equivalent,
i.e., 
∂ x ν 
Xμ = X . (2.2.5)
∂ xμ p ν

Therefore, v = v μ X μ = v μ X μ can be expressed as


28 2 Manifolds and Tensor Fields

Fig. 2.3 The map


C : I → M is called a curve
on M


μ ∂x
 ν
v  X = v νX .
∂x  ν
μ
p
ν

Since the n basis vectors in {X ν } are linearly independent, we arrive at (2.2.4). 

Equation (2.2.4) is called the vector (components) transformation law; many books
use this equation as the definition of a vector.
Next, we introduce the definitions of a curve and its tangent vector.

Definition 4 Suppose I is an interval of R, then a C r function C : I → M is called


a curve of class C r on M. From now on, the term “curve” will refer to a smooth
(C ∞ ) curve unless stated otherwise. For any t ∈ I , there is a corresponding unique
point C(t) ∈ M (see Fig. 2.3). t is called the parameter of the curve.

Remark 4 The curve described here is closely related to the intuitive concept of a
curve, but there is also a difference. An intuitive curve usually refers to the image of
the map C : I → M above, namely a subset C[I ] of M (see Fig. 2.3), without any
parameter having been mentioned. The curve defined above refers to the map itself,
which is a “curve with a parameter”.3 Suppose the images of the maps C : I → M
and C : I → M coincide (see Fig. 2.4), then one would intuitively regard them
as the same curve; however, as long as C and C are different maps, according to
Definition 4, they are different curves. Nevertheless, we can say in most instances
that C and C are two parametrization of “the same curve”. To be accurate, the curve
C : I → M is called a reparametrization of the curve C : I → M if ∃ an onto
map α : I → I satisfying (a) C = C ◦ α, (b) the function t = α(t) induced by α
has a nonvanishing derivative. Here is the explanation: from C = C ◦ α we have

C(t) = C (α(t)) = C (t ) , ∀t ∈ I .

The map α being onto assures C [I ] = C[I ], i.e., the maps of these two curves have
the same image.4

3 However, there also exists a curve C : I → M such that its image covers the whole M, which
seems quite far from an intuitive curve.
4 The fact that α satisfies condition (b) assures that α has the property of one-to-one; by adding the

ontoness we can see that C is also a reparametrization of C .


2.2 Tangent Vectors and Tangent Vector Fields 29

Fig. 2.4 The


reparametrization of a curve

Remark 5 ① The image of a curve C is also often denoted by C(t) (instead of C[I ])
in order to indicate that the parameter of the curve is t. Note that if t is one specific
value (“dead”), then C(t) only stands for a point in the image of the curve; only
when we consider t “can run all over I ” (“alive”) does C(t) stand for the image
of the curve. We also usually refer to the image of a curve simply as a curve; the
reader should recognize by the context whether the word “curve” means the map
or its image. ② The definition of a curve is independent of the coordinate system,
and hence is absolute. However, it is convenient to express a curve explicitly with
the help of a coordinate system. Suppose (O, ψ) is a coordinate system, C[I ] ⊂ O,
then ψ ◦ C is a map from I ⊂ R to Rn , which amounts to n functions of one variable
x μ = x μ (t), μ = 1, . . . , n. These n equations are called the parametric equations,
or a parametric representation of the curve. A simple example is as follows: let
M = R2 , {x 1 , x 2 } is the natural coordinate system of R2 , then x 1 = cos t, x 2 = sin t
are the parametric equations of a curve C : R → R2 , which is a unit circle in R2
centered at the origin.

Definition 5 Suppose (O, ψ) is a coordinate system and x μ are coordinates, then


the following subset of O:

{ p ∈ O | x 2 ( p) = constant, . . . , x μ ( p) = constant}

can be regarded as (the image of) a curve with the parameter x 1 (changing the
constant value of x 2 , . . . , x n gives another curve), called the x 1 -coordinate line.
The x µ -coordinate line can be defined likewise.

Example 1 In a 2-dimensional Euclidean space, the x- and y-coordinate lines of the


Cartesian coordinate system {x, y} are two sets of parallel lines that are perpendicular
to each other; the ϕ-coordinate lines of the polar coordinate system {r, ϕ} are an
infinite number of concentric circles centered at the origin, and the r -coordinate
lines are an infinite number of half-lines that start at the origin.

Now let us discuss the tangent vector of a curve. Intuitively, one may think there
are infinitely many tangent vectors parallel to each other at one point of a curve.
However, if we define a curve as a map (“curve with a parameter”), then there is only
one tangent vector at one point of a curve. The definition is as follows:
30 2 Manifolds and Tensor Fields

Fig. 2.5 A function


f : M → R on M combined
with a curve C yields a map
f ◦ C : I → R, i.e., a
function of one variable
f (C(t))

Definition 6 Suppose C(t) is a C 1 curve on a manifold M, then the tangent vector


T at C(t0 ) tangent to C(t) is a vector at C(t0 ), whose action on f ∈ F M is defined
as 
d( f ◦ C) 
T ( f ) :=  , ∀ f ∈ FM . (2.2.6)
dt t0

Remark 6 ① f : M → R is a function (scalar field) on M but not generally a function


of one variable. However, f ◦ C, the combination of f and a curve C : I → M, is a
function of one variable with the argument t [also denoted by f (C(t)), see Fig. 2.5].
When there is no confusion, d( f ◦ C)/dt can also be denoted by d f /dt. ② The
tangent vector at a point C(t0 ) tangent to C(t) is also often denoted by ∂/∂t|C(t0 ) ,
and thus (2.2.6) can also be written as
  
∂  d( f ◦ C)  d f (C(t)) 
( f ) := =  , ∀ f ∈ FM . (2.2.6 )
∂t C(t0 ) dt 
t0 dt t0

Example 2 Since the x μ -coordinate line is a curve with x μ as the parameter, the
coordinate basis vector X μ at p defined in (2.2.1) is a tangent vector of the x μ -
coordinate line passing through p. Hence, it is also usually denoted by ∂/∂ x μ | p , and
therefore (2.2.1 ) can also be expressed as
 
∂  ∂ f (x) 
( f ) := ∀ f ∈ FM . (2.2.6 )
∂ xμ p ∂ xμ p

Thus, the symbol ∂ f /∂ x μ can be interpreted as either ∂ F(x 1 , . . . , x n )/∂ x μ [see


(2.2.1)], or the action of a coordinate line tangent vector ∂/∂ x μ on a scalar field f .

Theorem 2.2.4 Suppose the parametric equations of a curve C(t) in a given coor-
dinate system is x μ = x μ (t), then the expansion of the tangent vector at an arbitrary
point on the curve in this coordinate basis gives

∂ dx μ (t) ∂
= . (2.2.7)
∂t dt ∂ x μ
2.2 Tangent Vectors and Tangent Vector Fields 31

That is, the coordinate components of the tangent vector ∂/∂t of the curve C(t) is the
derivative of the parametric representation x μ (t) of C(t) in this system with respect
to t.
Proof Exercise. 
Definition 7 Two nonzero vectors v, u ∈ V p are said to be parallel if ∃α ∈ R such
that v = αu.
From Definition 6 we can see that the tangent vectors of a curve depend on
the parametrization of the curve; there is only one vector at each point C(t0 ) of a
curve C(t) that is tangent to C(t). The reason why it intuitively seems that there
are infinitely many (parallel) tangent vectors at one point of a curve is that, in that
case, we understand a curve as being the image of the map rather than the map itself
[making “degenerate” an infinite number of curves (maps) with the same image into
one curve]. The theorem below indicates that if two curves C and C have the same
image, then their tangent vectors at any point are parallel.
Theorem 2.2.5 Suppose a curve C : I → M is a reparametrization of C : I →
M, then their tangent vectors at any image point has the following relation:

∂ dt (t) ∂
= , (2.2.8)
∂t dt ∂t

where t (t) is the function of one variable induced by a map α : I → I (see


Remark 4), i.e., α(t).
Proof We can combine any f ∈ F M with C and C to induce functions f (C(t))
and f (C (t )) of one variable, denoted by f (t) and f (t ), respectively. Suppose the
image of t ∈ I under the map (C −1 ◦ C) is t , then f (t) = f (t (t)) (see Fig. 2.6).
Hence,
 
∂ d f (t) d f (t (t)) d f (t ) dt ∂ dt dt ∂
(f) = = = = (f) = ( f), ∀ f ∈ FM ,
∂t dt dt dt dt ∂t dt dt ∂t

and thus we have (2.2.8). 


From Definition 6 we can see that, ∀ p ∈ M, if we choose an arbitrary curve C(t)
such that p = C(t0 ), then there must be an element in V p that can be regarded as the
tangent vector of this curve at C(t0 ). Now we ask: if we choose an arbitrary element
v in V p , can we find a curve that passes through p whose tangent vector at p is v? The
answer is affirmative: this kind of curve not only exists, but also they are numerous
(Fig. 2.7 is a visual representation). For instance, if we choose an arbitrary coordinate
system {x μ } such that p is contained in its coordinate patch, then the curve with the
parametric equations x μ (t) = x μ | p + v μ t is such a curve, where v μ is the coordinate
components of v in this system.
In conclusion, any element in V p can be viewed as the tangent vector of a curve
passing though p. Therefore, a vector at p is also called a tangent vector, and V p is
called the tangent space at p.
32 2 Manifolds and Tensor Fields

Fig. 2.6 Figure for the proof


of Theorem 2.2.5

Fig. 2.7 A vector v at p is


the common tangent vector
of many curves

2.2.2 Tangent Vector Fields on Manifolds

Definition 8 Suppose A is a subset of M. If we assign a vector to each point of A,


we obtain a vector field defined on A.

Example 3 The tangent vectors at all points of a non-self-intersecting curve C(t)


form a vector field on C(t) (as a subset of M).

Suppose v is a vector field on M and f is a function on M, then the value of v


at any point p of M will map f to a real number v| p ( f ) according to Definition 2,
which forms a function v( f ) when varying p over M. Therefore, a vector field v can
be viewed as a map that turns a function f into a function v( f ).
Definition 9 A vector field v on M is said to be of class C ∞ (smooth) if the result
of v acting on a C ∞ function is a C ∞ function, i.e., v( f ) ∈ F M , ∀ f ∈ F M . v is said
to be of class C r if the result of v acting on a C ∞ function is a C r function.
From now on, unless stated otherwise, the term “vector field” will mean smooth
(C ∞ ) vector field.
Example 4 (1) A set of coordinate basis vectors {X μ ≡ ∂/∂ x μ } form n smooth vector
fields in the coordinate patch, called coordinate basis vector fields. (2) The electric
field E of a point charge at point q in R3 is a smooth vector field on the manifold
M ≡ R3 − {q}.

[Optional Reading 2.2.2]


If C : I → M is a self-intersecting curve, i.e., ∃t1 , t2 ∈ I such that C(t1 ) = C(t2 ) ≡ p ∈ M,
then there are two tangent vectors at p, and thus we cannot say the tangent vectors form a
vector field defined on C(t) (the image of map C). However, we can define a vector field
along a curve C (the map C) [see Spivak (1970) Vol. II, p. 247; Sachs and Wu (1977)
2.2 Tangent Vectors and Tangent Vector Fields 33

Fig. 2.8 A point p on a


self-intersecting curve C[I ]
has two tangent vectors T1
and T2 ,but one can still talk
about the tangent vector field
along the curve (map) C

pp. 36–37], which is a map that corresponds each t ∈ I to a v ∈ VC(t) whose domain is I
rather than the subset C[I ] of M. Thus, this vector field can be denoted by v(t). In the case
of Fig. 2.8, both “the tangent vector field on C[I ]” and “the tangent vector of C[I ] at p” are
meaningless, but “the tangent vector field T (t) along C” is meaningful. Also, we can talk
about the tangent vector T (t1 ) of C at t1 (T1 in the figure) and the tangent vector T (t2 ) of C
at t2 (T2 in the figure). In this text, we often generally use “a vector field on a curve C(t)”;
for a self-intersecting curve, this actually means the vector field along C.
[The End of Optional Reading 2.2.2]

Theorem 2.2.6 A necessary and sufficient condition for a vector field v on M to be


C ∞ (or C r ) is that its components in any coordinate basis are C ∞ (or C r ) functions.

Proof Exercise. 

Suppose v is a smooth vector field on M, then v( f ) ∈ F M , ∀ f ∈ F M . If u is another


smooth vector field on M, then u(v( f )) ∈ F M . However, the function u(v( f )) ∈
F M is not necessarily equal to v(u( f )) ∈ F M , and thus we have the following
definition:
Definition 10 The commutator of two smooth vector fields u and v is a smooth
vector field [u, v], defined as

[u, v]( f ) := u(v( f )) − v(u( f )) , ∀ f ∈ FM . (2.2.9)

Remark 7 The equation above is the definition of the commutator [u, v] (as a vector
field), the definition of its value [u, v]| p at each point should be understood as

[u, v]| p ( f ) := u| p (v( f )) − v| p (u( f )) ∀ f ∈ FM . (2.2.9 )

To firmly believe that [u, v]| p defined by the equation above is a vector at p, one
should also show that (Exercise 2.8) it satisfies the two conditions in Definition 2.

Theorem 2.2.7 Suppose {x μ } is an arbitrary coordinate system, then [∂/∂ x μ ,


∂/∂ x ν ] = 0, μ, ν = 1, . . . , n.

Proof Suppose f (x) is the function of n variables that comes from the combination
of f and this coordinate system. From calculus we know that

∂ ∂ ∂ ∂
μ ν
f (x) = ν μ f (x)
∂x ∂x ∂x ∂x
34 2 Manifolds and Tensor Fields

and the theorem can be proved instantly. 


Theorem 2.2.7 indicates that two arbitrary basis vector fields of any coordinate
system commute with each other.5
Definition 11 A curve C(t) is called an integral curve of a vector field v if the
tangent vector at each point on it equals the value of v at this point.
Theorem 2.2.8 Suppose v is a smooth vector field on M, then for any point p of
M there exists a unique integral curve C(t) of v passing through it [which satis-
fies C(0) = p] (“unique” should be understood as “locally unique”, see Optional
Reading 2.2.3).
Proof Choose an arbitrary coordinate system {x μ } whose coordinate patch contains
p. Suppose the parametric equations of the integral curve is x μ = x μ (t), then it
follows from (2.2.7) that x μ (t) satisfies the first-order ordinary differential equations

dx μ (t)
= v μ (x 1 (t), . . . , x n (t)) , μ = 1, . . . , n ,
dt
where v μ is the μth component of v in this coordinate basis field, which is a given
function of x 1 , . . . , x n . From calculus we know that there exists a unique solution for
this system of equations under given initial conditions x μ (0) (μ = 1, . . . , n). When
a point p is given, a set of initial conditions is given, namely x μ (0) = x μ | p ; hence,
there must be a unique solution x 1 (t), . . . , x n (t), and the curve satisfied by these
n equations is the integral curve we want. It should also be verified that the curve
obtained in this way is independent of the coordinate system; the proof is left to the
reader. 
[Optional Reading 2.2.3]
The word “unique” in Theorem 2.2.8 should be understood as “locally unique”. Suppose you
have found an integral curve C : (a, b) → M of v, and 0 ∈ (a, b), C(0) = p. Your friend
can always choose a smaller interval (a , b ) ⊂ (a, b) that contains 0 and define a new curve
C : (a , b ) → M as C (t) = C(t) ∀t ∈ (a , b ). The domains of the map C and C are not
equivalent, and hence C = C. In this sense, the integral curves which pass through p are
not unique. However, C is nothing but an extension of C ; they are locally the same, and
Theorem 2.2.8 holds as long as we interpret “unique” as “locally unique”.
Speaking of extension, one may naturally ask: Is it always possible to extend the domain
of an integral curve C from (a, b) to the entirety of R? The answer is negative. The following
simple example is helpful for understanding this. Let x1 and x2 be the natural coordinates of
R2 , define a curve C : R → R2 as C(t) := (0, t) ∈ R2 ∀t ∈ R. The image of this curve is

5 Conversely, suppose X 1 , . . . , X n are n C ∞ vector fields on a neighborhood N of p ∈ M that are


linearly independent everywhere, and

[X μ , X ν ] = 0, μ, ν = 1, . . . , n ,

then there must exist a coordinate system {x μ } whose coordinate patch O ⊂ N contains p, and on
O we have X μ = ∂/∂ x μ , μ = 1, . . . , n. See Wald (1984) Exercise 5 in Chap. 2 for a hint of the
proof of this theorem. For a complete proof see Spivak (1970) Vol. I, pp. 219–220.
2.2 Tangent Vectors and Tangent Vector Fields 35

Fig. 2.9 The charge


distribution of a
parallel-plate capacitor has
translational symmetry along
the x− and y−axes

the x 2 -coordinate axis in R2 . It is not difficult to see that this is the integral curve of the vector
field ∂/∂ x 2 passing through (0, 0). If we cut out the “upper half” of R2 and regard the rest
of it as a manifold M, or more precisely, define M as M := {(x 1 , x 2 ) ∈ R2 | x 2 < 1}, then
the map C has no image at t  1, and its domain is just the open interval (−∞, 1) instead of
R. This domain cannot be extended anymore, and thus the map C : (−∞, 1) → M is called
an inextensible integral curve of the vector field ∂/∂ x 2 on M. Therefore, Theorem 2.2.8 can
also be expressed as:

Theorem 2.2.8 Suppose v is a smooth (actually, C 1 is already enough) vector field on M,


then for any point p in M there exists a unique inextensible integral curve C(t) passing
through it [and satisfying C(0) = p].

[The End of Optional Reading 2.2.3]

We will use some basic knowledge of group theory below, so we introduce the
following definition as a supplement (see Appendix G in Volume II for a detailed
introduction of the theory of Lie groups and Lie algebras):
Definition 12 A group is a set G together with a map G × G → G (called the
group multiplication, the product of elements g1 and g2 is denoted by g1 g2 ), that
satisfies the following conditions:
(a) (g1 g2 )g3 = g1 (g2 g3 ), ∀g1 , g2 , g3 ∈ G;
(b) ∃ an identity element e such that eg = ge = g, ∀g ∈ G;
(c) ∀g ∈ G, ∃ an inverse element g −1 ∈ G such that g −1 g = gg −1 = e.
Symmetry has a great significance for physics, and group theory is a powerful tool
for the study of symmetry. If an object is invariant under a certain transformation, then
we say it has a symmetry under this transformation. Take Fig. 2.9 as an example;
consider a moving point on a charged plane that is translating along the x− (or
y−) axis. Since the surface charge density σ at the moving point is invariant under
translation, we say σ has translational symmetry along the x− (or y−) axis. More
precisely, the translational symmetry of σ along the x-axis means that the function
σ (x, y, z) satisfies

σ (x, y, z) = σ (x + a, y, z) , ∀a ∈ R , (2.2.10)

where the point transformation represented by

x → x + a , y → y , z → z (2.2.11)
36 2 Manifolds and Tensor Fields

is called a translation along the x−axis. Suppose G is the collection of all the
translations along the x-axis, then an element in G can be characterized by a real
number a, denoted by φa ∈ G. Consider p ≡ (x, y, z) and q ≡ (x + a, y, z) as two
points in R3 , then the transformation (2.2.11) corresponds to the map φa : R3 →
R3 [satisfying φa ( p) = q], which is a diffeomorphism. Moreover, if we define the
multiplication for G as

φa φb := φa+b , ∀φa , φb ∈ G , (2.2.12)

then G forms a group (φ0 is the identity, and φ−a is the inverse of φa ). Each of
the infinitely many elements of this group can be characterized by a real number a,
which is therefore called a parameter, and G is called a one-parameter group.
Also, since each group element φa ∈ G is a diffeomorphism on R3 , we also call G a
one-parameter group of diffeomorphisms on R3 . To help the reader understand the
definition of a one-parameter group of diffeomorphisms, let us set the stage for it first.
Suppose M is a manifold, then R × M is a manifold that has one more dimension
than M (see the last paragraph of Sect. 2.1). Suppose φ is a map from R × M to M
(i.e., φ : R × M → M), then it can turn a real number t ∈ R and a point p ∈ M into
a point φ(t, p) ∈ M. We can also visualize φ as a machine with two slots, denoted
by φ(•, •); in order to produce an “end product” φ(t, p) ∈ M, one has to input two
“raw materials”, namely, a real number t ∈ R and a point p ∈ M. If we input t ∈ R
alone, then what it can produce is only a semi-manufacture φ(t, •), which is also a
machine that gives an “end product” after we input p ∈ M. φ(t, •) is usually denoted
by φt , i.e., φt : M → M. On the other hand, if we input p ∈ M to φ(•, •) first, we get
a semi-manufacture φ(•, p), which is also a machine waiting for t ∈ R to be input.
φ(•, p) is usually denoted by φ p , i.e., φ p : R → M.
Definition 13 A C ∞ map φ : R × M → M is called a one-parameter group of
diffeomorphisms on M if
(a) φt : M → M is a diffeomorphism ∀t ∈ R;
(b) φt ◦ φs = φt+s , ∀t, s ∈ R.

Remark 8 A set {φt |t ∈ R} is a group with map composition as the multiplication,


whose group elements φt are diffeomorphisms from M to M, and φ0 is the identity.
[from Definition 13(b) we know that φt ◦ φ0 = φt , and hence φ0 is the identity map.]
“φ : R × M → M is a one-parameter group of diffeomorphisms on M” actually
means that {φt |t ∈ R} is a one-parameter group of diffeomorphisms.

Suppose φ : R × M → M is a one-parameter group of diffeomorphisms, then


∀ p ∈ M, φ p : R → M is a smooth curve that passes through p [satisfying φ p (0)=p],
called the orbit of this one-parameter group of diffeomorphisms that passes through
p. Denote the tangent vector of this curve at φ p (0) by v| p , then we have a smooth
vector field v on M. Thus, a one-parameter group of diffeomorphisms on M gives rise
to a smooth vector field on M. Now let us see if the converse holds or not. Suppose v
is a smooth vector field on M, it seems that ∀t ∈ R one can use its integral curve to
define a diffeomorphism φt from M to M. [∀ p ∈ M, define φt ( p) as the point such
2.3 Dual Vector Fields 37

that it is located on the integral curve that passes through p, the difference of whose
parameter and the parameter of p is t.] So it looks like we can obtain a one-parameter
group of diffeomorphisms. However, the following problem may occur: the image
point does not exist for certain parameters of a curve (cutting out a region M could
make this situation happen); therefore, we can only say that a smooth vector field
on M gives rise to a one-parameter local group of diffeomorphisms, see Optional
Reading 2.2.4.
[Optional Reading 2.2.4]
Suppose the integral curve C of a vector field v that passes through p has a range of parameters
that cannot reach all of R (see the second paragraph of Optional Reading 2.2.3), namely
∃t ∈ R such that C(t) is not a point of M, then φt defined above is not even a map from M to
M [at least the image point φt ( p) does not exist], so clearly not a diffeomorphism from M
to M. However, it can be proved that ∀ p0 ∈ M, one can always find an open neighborhood
U of p0 and an open interval I in R that contains 0 to make the map φ in the text above
meaningful when restricted on I × U (i.e., there exists a map φ : I × U → M). The precise
definition is ∀t ∈ I , φt : U → M is a map that maps any p ∈ U to a point on the integral
curve that passes through p, with t the difference between the parameters of p and this point
(the reader may understand this with the help of the simple example in the second paragraph
of Optional Reading 2.2.3). Moreover, it can also be proved that φ : I × U → M has the
following properties:
(a) ∀t ∈ I , φt : U → φt [U ] is a diffeomorphism;
(b) If t, s, t + s ∈ I (real numbers t, s and t + s are all in the open interval I ), then φt ◦ φs =
φt+s .
Such a {φt |t ∈ I } is called a one-parameter local group of diffeomorphisms or a one-
parameter family of diffeomorphisms.
A vector field is said to be complete if the range of the parameter of every (inextensible) inte-
gral curve is R. Obviously, each complete smooth vector field can produce a one-parameter
group of diffeomorphisms. It can be proved that any vector field on a compact manifold is
complete. [References for this optional reading: Hawking and Ellis (1973) p. 27; Straumann
(1984) pp. 21–22.]
[The End of Optional Reading 2.2.4]

2.3 Dual Vector Fields

Definition 1 Suppose V is a finite dimensional vector space on R. A linear map


ω : V → R is called a dual vector on V . The collection of all the dual vectors on
V is called the dual vector space of V , denoted by V ∗ .6
Remark 1 Since addition and scalar multiplication are each defined on V , the lin-
earity requirement for ω can be explicitly written as

ω(αv + βu) = αω(v) + βω(u) , ∀v, u ∈ V , α, β ∈ R . (2.3.1)

6When talking about dual vectors (and tensors, see Sect. 2.4) in the future, unless stated otherwise,
we will always assume V is a finite dimensional vector space on R.
38 2 Manifolds and Tensor Fields

Example 1 Suppose V is the collection of all 2 × 1 real matrices, then it forms a 2-


dimensional vector space under the rule of matrix addition and scalar multiplication.
Let ω represent
 an arbitrary 1 × 2 real matrix (c, d), whose action on any  element

a a
v= in V can be defined using matrix multiplication: ω(v) := (c, d) =
b b
(ac + bd). The result is a 1 × 1 real matrix, which can be identified as a real number
ac + bd. A map ω : V → R defined in this way is clearly linear, and thus any 1 × 2
real matrix is a dual vector on V . More generally, if we consider column matrices
(n × 1 matrices) as vectors, then row matrices (1 × n matrices) are dual vectors.

Theorem 2.3.1 V ∗ is a vector space, and dim V ∗ = dim V .

Proof Define addition, scalar multiplication and the zero element for V ∗ as follows:

(ω1 + ω2 )(v) :=ω1 (v) + ω2 (v) , ∀ω1 , ω2 ∈ V ∗ , v ∈ V ;


(αω)(v) :=α · ω(v) , ∀ω ∈ V ∗ , v ∈ V, α ∈ R ;
0(v) :=0 ∈ R , ∀v ∈ V .

It is not difficult to see that such a V ∗ is a vector space. Suppose {eμ } is a basis of V ,
we can define n special elements e1∗ , . . . , en∗ in V ∗ using the following equation:

eμ∗ (eν ) := δ μ ν , μ, ν = 1, . . . , n . (2.3.2)

The equation above only defines the action of eμ∗ on the basis vectors in V , but since
the action of eμ∗ is linear, it actually defines the action of eμ∗ on an arbitrary element
in V . Now we only have to show that {eμ∗ } is a basis of V ∗ . It is easy to show that
e1∗ , . . . , en∗ are linearly independent to each other (exercise). ∀ω ∈ V ∗ , let

ωμ ≡ ω(eμ ) , μ = 1, . . . , n , (2.3.3)

and then one can easily show that (Exercise 2.11)

ω = ωμ eμ∗ . (2.3.4)

(Hint: the equation above is an equality of dual vectors, note that the action of ω on
v is linear, all we have to do for proving this equation is to verify that both sides of it
acting on any basis vector eν give the same real number.) Equation (2.3.4) indicates
that any element in V ∗ can be expressed linearly in terms of {eμ∗ }, and thus {eμ∗ }
is a basis of V ∗ , called the dual basis to the basis {eμ }, from which we find that
dim V ∗ = dim V . 

Review. Two vector spaces are said to be isomorphic if there exists a one-to-one and
onto linear map between them (this map is called an isomorphism). A necessary
and sufficient condition of two vector spaces to be isomorphic is they have the same
dimension.
2.3 Dual Vector Fields 39

Since dim V ∗ = dim V , of course V ∗ is isomorphic to V . An isomorphism is not


difficult to find. For example, suppose {eμ } is a basis of V , and {eμ∗ } is the dual
basis of it, then the linear map defined by eμ → eμ∗ is an isomorphism. However,
the choice of {eμ } is quite arbitrary, and the isomorphism defined in this manner will
change when the basis is changed; as a matter of fact, there does not exist a special
(distinguishing) isomorphism between V and V ∗ unless an additional structure is
added to V (see Sect. 2.5).
Since V ∗ is a vector space, it naturally has a dual space, denoted by V ∗∗ . Unlike the
relationship between V and V ∗ , there exists a natural, distinguishing isomorphism
that is defined as follows: ∀v ∈ V , we want a naturally defined image v ∗∗ ∈ V ∗∗ .
Since V ∗∗ is the dual space of V ∗ , v ∗∗ should be a linear map from V ∗ to R. Giving
it a definition is nothing but establishing a rule, according to which every ω ∈ V ∗
corresponds to a unique real number v ∗∗ (ω). Since v ∗∗ is about to be defined as
the image of v, v ∗∗ (ω) should relates to both v and ω; seeing that the simplest real
number one can construct from v and ω is ω(v), it is natural to define v ∗∗ as

v ∗∗ (ω) := ω(v) ∀ω ∈ V ∗ . (2.3.5)

This map V → V ∗∗ is an isomorphism (the proof is left as Exercise 2.13). This


natural isomorphic relation indicates that V and V ∗∗ can be viewed as the same
space (identify each v ∈ V with its image v ∗∗ ∈ V ∗∗ ). Therefore, it is V and V ∗ that
are actually useful; one cannot get any more useful spaces from extra dualities no
matter how many times it is applied [for the precise meaning of natural isomorphism
see Spivak (1970) Chap. 4, Exercise 6].
Theorem 2.3.2 If there is a basis transformation eμ = Aν μ eν in a vector space V
(Aν μ is simply the νth component of a new basis vector eμ expanded by the old
basis), and the (non-degenerate) matrix constituted by elements Aν μ is denoted by
A, then the corresponding dual basis transformation is

e μ∗ = ( Ã−1 )ν μ eν∗ , (2.3.6)

where à is the transposed matrix of A, and Ã−1 is the inverse of Ã.

Remark 2 The reader may be used to writing matrix elements as Aνμ , here we write
them as Aν μ . The reason for distinguishing the upper and lower indices is to make the
summation crystal clear (an upper ν together with a lower ν implies the summation
over ν) and to distinguish the type of a tensor (see Sect. 2.4 for details). However, what
is important in the matrix operation is just differentiating the left and right indices.
Therefore, if you want, you may change all the upper indices to lower indices for
now; for instance, (2.3.6) may be written as eμ ∗ = ( Ã−1 )νμ eν ∗ .

Proof All we have to prove is that both sides of the equation give the same result
when applied to eα . The proof is as follows:
40 2 Manifolds and Tensor Fields

( Ã−1 )ν μ eν∗ (eα ) = ( Ã−1 )ν μ eν∗ (Aβ α eβ ) = Aβ α ( Ã−1 )ν μ eν∗ (eβ )


= Ãα β ( Ã−1 )ν μ δ ν β = Ãα ν ( Ã−1 )ν μ = δα μ = e μ∗ (eα ) ,

where we used the linearity of the action by a dual vector on a vector in the second
equality, the definitions of a transpose matrix and inverse matrix in the third and
fifth equality respectively, and the definition of dual vector basis (2.3.2) in the sixth
equality. 

The discussions above all pertain to algebras; now we get back to a manifold M.
Since p ∈ M has a vector space V p , it also has a V p∗ . If we assign a dual vector at
each point of M (or A ⊂ M), we obtain a dual vector field on M (or A). A dual
vector field ω on M is said to be smooth if ω(v) ∈ F M ∀ smooth vector fields v.
Suppose f ∈ F M , let us show that f naturally induces a dual vector field on M,
denoted by d f . (The d f that our readers are familiar with stands for the differential
of a function f . From the perspective of differential geometry, the differential of f is
essentially a dual vector field. Optional Reading 2.3.1 will introduce the connection
between this brand new understanding and classical calculus.) To define d f we only
have to give the definition of its value d f | p ∈ V p∗ at any point p of M, and to define
d f | p we only have to specify the real number that comes from its action on an
arbitrary vector v ∈ V p at p. This number should be related to both f and v, and the
most natural (simplest) real number that can be constructed from f and v is v( f );
therefore, we define d f | p as

d f | p (v) := v( f ) , ∀v ∈ V p . (2.3.7)

From this it is easy to show that

d( f g)| p = f | p (dg)| p + g| p (d f )| p , (2.3.8)

which is exactly the Leibniz rule satisfied by the differential operator d.


Suppose (O, ψ) is a coordinate system, then the μth coordinate x μ can be viewed
as a function on O, and thus dx μ (as a special d f ) is a dual vector field defined on
O. Suppose p ∈ O and ∂/∂ x ν is the νth coordinate basis vector of V p , then from
(2.3.7) we know that at p we have
 
∂ ∂
dx μ = (x μ ) = δ μ ν .
∂xν ∂xν

Comparing this with (2.3.2) we can see that {dx μ | p } is exactly the dual coordinate
basis that corresponds to the coordinate basis {∂/∂ x ν | p }. The equation above holds
at any point of O. Therefore, just like ∂/∂ x ν is the νth coordinate basis vector field
on O, dx μ is the μth dual coordinate basis vector field on O, and {dx μ } is a dual
coordinate basis field on O. Any dual vector field ω on O can be expanded in terms
of {dx μ }:
ω = ωμ dx μ , (2.3.9)
2.3 Dual Vector Fields 41

where ωμ are called the coordinate components of ω in this coordinate system whose
expression can be obtained from (2.3.3) as

ωμ = ω(∂/∂ x μ ) . (2.3.10)

Theorem 2.3.3 Suppose (O, ψ) is a coordinate system, f is a smooth function on


O, and f (x) denotes the function of n variables f (x 1 , . . . , x n ) corresponding to
f ◦ ψ −1 , then d f can be expanded using a dual basis {dx μ } as follows:

∂ f (x) μ
df = dx , ∀ f ∈ FO . (2.3.11)
∂xμ
Proof All we have to prove is that we obtain the same result after applying
both sides of this equation to any coordinate basis vector ∂/∂ x ν , which is very
straightforward. 
Theorem 2.3.4 Suppose the coordinate patches of the coordinate systems {x μ } and
{x ν } have an intersection, and ω is a dual vector at an arbitrary point p in the
intersection, then the transformation relation between ωμ and ων , the components
of ω in these two coordinate systems, is

∂ x μ 
ων = ωμ . (2.3.12)
∂ x ν p

Proof Exercise 2.12. 


[Optional Reading 2.3.1]
Now we will develop an understanding of d f . First, let us talk about classical calculus.
Consider a function y = f (x). Suppose that when the argument x is shifted by a small
increment x at x0 , and the corresponding shift of the function y is y. If f (x) is a linear
function, i.e., y = ax + b (a, b are constants), then y = a x, i.e., y is proportional
to x. If f (x) is a nonlinear function, then y = a x + ε, where a ≡ f (x0 ), ε = 0. In
classical calculus a x is called the differential of y = f (x), denoted by dy, and it may be
proved that ε is a higher order infinitesimal than x when x approaches zero. x may also
be denoted by dx, and thus dy = f (x0 )dx. However, an “infinitesimal nonzero quantity” is
a very subtle concept that involves logical inconsistencies. Gottfried Leibniz was criticized
by many mathematicians of his time when he introduced and used this concept [see Kline
(1980) for details], and still to this day there are mathematicians who do not prescribe to it.
[For example, see Spivak (1970) Vol. I. This book points out on p. 153 that an “infinitely
small” change d x i is “nonsense”.] We do not mean to discuss the validity of these subtle
issues; but rather, we only want to explain that modern differential geometry has already
provided d f , the differential of a function, a brand new interpretation that is independent of
the concept of “infinitesimal nonzero quantities”: d f is a clearly defined dual vector field.
Suppose {x μ } is a coordinate system of a manifold M with a coordinate patch O, and f is
a function on M, i.e., f : M → R, then f induces a function f (x 1 , . . . , x n ) of n variables.
Suppose p ∈ O; classical calculus attempts to describe d f | p as an (infinitesimal) increment
of the function value at p. However, this increment is not yet certain since it depends on
“how far along which direction” a moving point would “move” from p. Now that a vector
42 2 Manifolds and Tensor Fields

v at p shows exactly “how far along which direction” a moving point would potentially
“move” from p, we can let d f | p actually “become” a real number (increment) by assigning
a v ∈ V p . And since d f | p evaluates to a real number when v is given, d f | p is actually a map
from the tangent space V p of p to R. To ensure that d f | p has the properties of differential
from classical calculus, this map is also required to be linear. Thus, d f | p is a dual vector on
V p while d f is a dual vector field on O. This is the most concrete and precise interpretation
of d f .
Physicists usually do not make any distinction between d f and f , and like to say
“d f | p equals f (q) − f ( p), where q is a point infinitely close to p.” They may even sketch
two points p and q on paper. In fact, p and q can not be infinitely close as long as they have
been assigned (marked out in a picture), which means f (q) − f ( p) is not an infinitesimal
quantity, and hence it can only be f instead of d f . However, since certain approximations
are always allowed in physics, treating f as being small enough that it approximates d f
is not only allowed, but often quite useful. In fact, suppose a curve C(t) satisfies C(0) = p,
(∂/∂t)| p = v, and q = C(α) with α small enough, then from (2.3.7) and (2.2.6 ) we can see
that the result of d f | p acting on αv is

1
d f | p (αv) = αv( f ) = α lim { f [C( t)] − f [C(0)]}
t→0 t
∼ 1
= α [ f (q) − f ( p)] = f (q) − f ( p) ≡ f,
α
and we see that (after acting on αv) d f | p really gives us f approximately. Albert Einstein
once said: “As far as the laws of mathematics refer to reality, they are not certain; and as far as
they are certain, they do not refer to reality.” As a physics book, this text also approximates
d f as f in multiple places.
[The End of Optional Reading 2.3.1]

2.4 Tensor Fields

Definition 1 A tensor of type (k, l) on a vector space V is a multilinear map

T : V ∗ × · · · × V ∗ × V × · · · × V → R .
k terms l terms

Remark 1 T can be likened to a machine with k “upper slots” and l “lower slots”.
So long as we input k dual vectors and l vectors into the upper and lower slots,
respectively, this machine produces a real number which is linearly dependent on
each of the inputs (this is the meaning of a “multilinear map”).
Example 1 (1) A dual vector on V is a tensor of type (0, 1) on V . (2) An element of
V can be regarded as a tensor of type (1, 0) on V . (This is because v can be identified
as v ∗∗ , and v ∗∗ is a linear map from V ∗ to R.)
From now on, we will use TV (k, l) to represent the collection of all tensors of
type (k, l) on V ; thus, V = TV (1, 0), V ∗ = TV (0, 1).
Suppose T ∈ TV (1, 1), then T : V ∗ × V → R. However, T can also be viewed
as another type of map. Since ∀ω ∈ V ∗ , v ∈ V , we have T (ω; v) ∈ R, so T (ω; •) is
2.4 Tensor Fields 43

a machine with only a lower slot that can turn a vector linearly into a real number,
which means that T (ω; •) is a dual vector on V , i.e., T (ω; •) ∈ V ∗ . After T is given,
we can create T (ω; •) with one ω ∈ V ∗ ; hence, T can also be viewed as a map (and it
linearly
is linear) that turns a dual vector ω into a dual vector T (ω; •), i.e., T : V ∗ −−−→ V ∗ .
linearly
Similarly, we can also view T as T : V −−−→ V . These three viewpoints for the
same T ∈ TV (1, 1) are equivalent. For expositional convenience, we call this way of
viewing the same tensor as different maps “the multifaceted view of tensors”. Being
able to have a “multifaceted view” is one of the advantages of defining tensors as
maps. We will use this frequently in the future.
Definition 2 The tensor product T ⊗ T of a tensor T of type (k, l) and a tensor
T of type (k , l ) on V is a tensor of type (k + k , l + l ) defined as follows:

T ⊗ T (ω1 , . . . , ωk , ωk+1 , . . . , ωk+k ; v1 , . . . , vl , vl+1 , . . . , vl+l )


:= T (ω1 , . . . , ωk ; v1 , . . . , vl )T (ωk+1 , . . . , ωk+k ; vl+1 , . . . , vl+l ) .

In Euclidean vector field theory, a dyadic v u is actually the tensor product of two
vectors v and u simply with the symbol ⊗ being omitted.7
Do tensor products satisfy the commutative law? Suppose ω ∈ V ∗ , v ∈ V ≡ V ∗∗ ,
then v ⊗ ω ∈ TV (1, 1), ω ⊗ v ∈ TV (1, 1). It follows from Definition 2 that ∀μ ∈ V ∗
and u ∈ V we have v ⊗ ω(μ; u) = v(μ)ω(u) = ω(u)v(μ) = ω ⊗ v(μ; u) [where
v(μ) should be interpreted as v ∗∗ (μ)], and hence v ⊗ ω = ω ⊗ v. However, the
tensor product of two vectors (or two dual vectors) usually becomes another tensor
after exchanging the order, i.e., v ⊗ u = u ⊗ v, ω ⊗ μ = μ ⊗ ω. For instance, a
dyadic in Euclidean space does not satisfy the commutative law.
Theorem 2.4.1 TV (k, l) is a vector space, with dim TV (k, l) = n k+l .
Proof (A) We define the addition, scalar multiplication and zero element in a natural
way and make TV (k, l) a vector space (see the first part of the proof of Theorem 2.3.1).
(B) Show that there are n k+l basis vectors. Take n = 2, k = 2, l = 1 as an example
(it is not difficult to prove this in the general case). Suppose {e1 , e2 } is a basis of V ,
and {e1∗ , e2∗ } is its dual basis. All we have to prove is that the following 8 elements
form a basis of TV (2, 1):

e1 ⊗ e1 ⊗ e1∗ , e1 ⊗ e1 ⊗ e2∗ , e1 ⊗ e2 ⊗ e1∗ , e1 ⊗ e2 ⊗ e2∗ ,


e2 ⊗ e1 ⊗ e1∗ , e2 ⊗ e1 ⊗ e2∗ , e2 ⊗ e2 ⊗ e1∗ , e2 ⊗ e2 ⊗ e2∗ .

One can first show that they are linearly independent (left as an exercise), and then
show that any T ∈ TV (2, 1) can be expressed as

7Similarly, |ψ|φ in quantum mechanics is also a tensor product of |ψ and |φ simply with the
symbol ⊗ being omitted. However, in quantum mechanics the vector space of |ψ is an infinite
dimensional vector space on C, which is more complicated than a finite dimensional vector spaces
on R which we are discussing. For details see Appendix B in Volume II.
44 2 Manifolds and Tensor Fields

T = T μν σ eμ ⊗ eν ⊗ eσ ∗ , (2.4.1)

where
T μν σ = T (eμ∗ , eν∗ ; eσ ) . (2.4.2)

The proof is left as an exercise. [NB: The equation to be proved, i.e., (2.4.1), is a
tensor equation of type (2, 1).] 

Remark 2 T μν σ are the components of T in the basis {eμ ⊗ eν ⊗ eσ ∗ }, or simply


“the components of T in the basis {eμ }” for short.

Now we introduce another important operation on tensors: contraction. As we


just claimed, a tensor T of type (1, 1) can be regarded as a linear map from V to V ,
which in fact is the linear transformation we know from linear algebra. The matrix
(T μ ν ) constituted by the components of T in an arbitrary basis {eμ ⊗ eν∗ } clearly
depends on the choice of a basis, and it is not difficult to show that the two matrices
(T μ ν ) and (T μ ν ) that correspond to the components of the same T in two different
bases are similar to each other; the proof is as follows. As in (2.4.2) we have
μ
T ν = T (e μ∗ ; eν ) = T (( Ã−1 )ρ μ eρ∗ ; Aσ ν eσ ) = ( Ã−1 )ρ μ Aσ ν T (eρ∗ ; eσ )
= ( Ã−1 )ρ μ Aσ ν T ρ σ = (A−1 )μ ρ T ρ σ Aσ ν = (A−1 T A)μ ν , (2.4.3)

where in the first and forth step we used (2.4.2), in the second step we used The-
orem 2.3.2, and in the third step we used the linearity of T . As a result, we have
the matrix equation T = A−1 T A (where T , A and T all represent matrices. T
sometimes represents a tensor and sometimes represents a matrix, the reader should
 Thus we can see that T and T are two similar matrices.
interpret it by the context).
Using T μ μ (short for nμ=1 T μ μ ) and T ρ ρ to represent the trace of T and T , then
from (2.3.4) we get
μ
T μ = (A−1 )μ ρ T ρ σ Aσ μ = Aσ μ (A−1 )μ ρ T ρ σ = δ σ ρ T ρ σ = T ρ ρ .

This shows that a tensor of type (1, 1) has the same trace in different bases. When
we are considering tensors we should pay attention to the features that do not depend
on the basis, and the trace of a tensor of type (1, 1) is exactly one of these features,
which is usually called the contraction of T , denoted by CT for now; namely,

CT := T μ μ = T (eμ∗ ; eμ ) . (2.4.4)

And now we discuss the contraction of a tensor T of type (2, 1). T can be denoted
by T (• , • ; •); it has two upper slots and one lower slot, and thus there are two
possible contractions: ① The contraction on the first upper slot and the lower slot
C11 T := T (eμ∗ , • ; eμ ); ② The contraction on the second upper slot and the lower
slot C21 T := T ( • , eμ∗ ; eμ ). If we define these two contractions using another basis
{eρ } and denote them by (C11 T ) and (C21 T ) , respectively, then it is easy to show
2.4 Tensor Fields 45

that (Exercise 2.14) (C11 T ) = C11 T , (C21 T ) = C21 T . From the “multifaceted view
of tensors” we can see that both C11 T and C21 T are tensors of type (1, 0), whose
components in any basis can be expressed in terms of the components of T in this basis
as (C11 T )ν = T (eμ∗ , eν∗ ; eμ ) = T μν μ and (C21 T )ν = T νμ μ (the summation symbol
has been omitted). It is not difficult to generalize the discussion above and give a
definition for the contraction of a tensor of type (k, l) as follows:
Definition 3 The contraction on the ith upper index (i  k) and the jth lower index
( j  l) of T ∈ TV (k, l) is defined as

Cij T := T ( • , . . . ,eμ∗ , • , . . . ; • , . . . , eμ , • , . . . ) ∈ TV (k − 1, l − 1) (sum over μ) .


↑ ↑
ith upper slot jth lower slot (2.4.5)

Remark 3 ① Cij T does not depend on the choice of a basis. ② It can be easily
seen from (2.4.5) that any contraction of a tensor of type (k, l) is a tensor of type
(k − 1, l − 1). ③ One can construct all kinds of new tensors using tensor products
in conjunction with contractions. For example, suppose v ∈ V , ω ∈ V ∗ , then v ⊗ ω
is a tensor of type (1, 1), while C(v ⊗ ω) is a tensor of type (0, 0) (a scalar).

Later, we will encounter the operation of contracting after taking the tensor product
occurs frequently, whose conclusion can be considered as the action of a tensor on
a vector (or a dual vector). As examples, here we write out three equations and then
prove them.

(a) C(v ⊗ ω) = ωμ v μ = ω(v) = v(ω) , ∀v ∈ V, ω ∈ V ∗ , (2.4.6)


μ
(where v and ωμ are the components of v and u in the same basis.)
(b) C12 (T ⊗ v) = T (• , v) , ∀v ∈ V, T ∈ TV (0, 2) . (2.4.7)

(c) C22 (T ⊗ ω) = T (• , ω; •) , ∀ω ∈ V , T ∈ TV (2, 1) . (2.4.8)

We will only give the proof of (2.4.7), and the other two equation are left as exercises.
T ⊗ v on the left-hand side of (2.4.7) is a tensor of type (1, 2), which is a machine
with 1 upper slot and 2 lower slots, and can be expressed as T ⊗ v(• ; • , •); hence,

C12 (T ⊗ v) = T ⊗ v(eμ∗ ; • , eμ ) .

Therefore, to prove (2.4.7) we only have to show the following equation:

T ⊗ v(eμ∗ ; • , eμ ) = T (• , v) . (2.4.7 )

Seeing that this is an equality of dual vectors, we only have to show that both sides
give the same real number when applied to any u ∈ V :
46 2 Manifolds and Tensor Fields

l.h.s. acting on u = T ⊗ v(eμ∗ ; u, eμ ) = T (u, eμ )v(eμ∗ )


= T (u, eμ )eμ∗ (v) = T (u, eμ )v μ = T (u, v) = r.h.s. acting on u ,

(where we used the result of Exercise 2.11 in the fourth equality), and thus we have
(2.4.7).
Apart from the three equalities above, there are many similar ones. Those equal-
ities represent the following rule: “The action of T on ω (or v) is contracting after
taking the tensor product of T and ω (or v)”, or roughly speaking, “the action is
contracting after taking product”. The manipulation of contracting after taking the
tensor product of two tensors is also usually called contraction for short, and thus
the expression above can even be simplified as “action means contraction”.
Now we return to a manifold M. The collection of all tensors of type (k, l) on
the tangent space V p of an arbitrary point p in M is denoted by TV p (k, l). Suppose
{eμ } and {eν∗ } are an arbitrary basis of V p and its dual basis, respectively, then
T ∈ TV p (2, 1) can also be written in an expanded form similar to (2.4.1). If we
choose a coordinate system such that the coordinate patch contains p, then we can
choose the coordinate basis vectors ∂/∂ x μ and dual basis vectors dx μ to be eμ and
eμ∗ ; namely, we rewrite (2.4.1) as

∂ ∂
T = T μν σ μ
⊗ ν ⊗ dx σ , (2.4.1 )
∂x ∂x
where the coordinate components T μν σ can be expressed following (2.4.2) as

T μν σ = T (dx μ , dx ν ; ∂/∂ x σ ) . (2.4.2 )

If we assign a tensor of type (k, l) at every point on a manifold M, we then


obtain a tensor field of type (k, l). A tensor field T on M is said to be smooth if
T (ω1 , . . . , ωk ; v1 , . . . , vl ) ∈ F M ∀ smooth dual vector fields ω1 , . . . , ωk and smooth
vector fields v1 , . . . , vl . From now on, the term “tensor field” will refer to a smooth
(C ∞ ) tensor field unless stated otherwise.
Theorem 2.4.2 The transformation relation for the components of a tensor of type
(k, l) in two coordinate systems is as follows (called the tensor transformation law):

μ1 ...μk ∂ x μ1 ∂ x μ k ∂ x σ1 ∂ x σl ρ1 ...ρk
T ν1 ...νl = . . . . . . T σ1 ...σl .
∂ x ρ1 ∂ x ρk ∂ x ν1 ∂ x νl
Proof Exercise. 

Remark 4 Many textbooks adopt the above equation as the definition of a tensor.
2.5 Metric Tensor Fields 47

2.5 Metric Tensor Fields

Definition 1 A metric g on a vector space V is a symmetric, non-degenerate tensor


of type (0, 2) on V . “Symmetric” means g(v, u) = g(u, v) ∀v, u ∈ V , and “non-
degenerate” means g(v, u) = 0 ∀u ∈ V ⇒ v = 0 ∈ V .
Remark 1 This abstract definition of non-degeneracy is closely related to the non-
degeneracy of a matrix (that is, the determinant is nonzero) which is familiar to the
reader. It can be proved that [see the paragraph after (2.6.8)] if g is non-degenerate,
then the matrix constituted by the components gμν ≡ g(eμ , eν ) in an arbitrary basis
{eμ } of V is also non-degenerate. Conversely, if V has a basis in which the component
matrix of g is non-degenerate, then g is non-degenerate.
A metric is similar to the familiar inner product. However, the difference between
the metric above and a normal inner product is that g(v, v) can be negative, and
g(v, v) = 0 does not mean v = 0. Later on, we will also often call g(v, u) the inner
product of v and u with respect to a metric g. Once a metric is defined for a vector
space V , the lengths of and the orthogonality between its elements can be defined as
follows:

Definition 2 The length or magnitude of v ∈ V is defined as |v| := |g(v, v)|.
Two vectors v, u ∈ V are said to be orthogonal if g(v, u) = 0. A basis {eμ } of V is
said to be orthonormal if any two basis vectors are orthogonal and each basis vector
eμ satisfies g(eμ , eμ ) = ±1 (not summed over μ).
Remark 2 Definition 2 indicates that the components of a metric g in any orthonor-
mal basis satisfy

0, μ = ν
gμν = . (2.5.1)
± 1, μ=ν

Thus, the matrix constituted by the components of a metric in an orthonormal basis


is a diagonal matrix, and the diagonal elements are either +1 or −1.
Theorem 2.5.1 Any vector space assigned with a metric has an orthonormal basis.
When written as a diagonal matrix, the numbers of +1 and −1 among the diagonal
elements do not depend on the choice of an orthonormal basis.
Proof Omitted. [See, for example, Schutz (1980) pp. 65–66]. 
Definition 3 Beyond having a diagonal matrix in an orthonormal basis, metrics
whose diagonal elements are all +1 are said to be positive definite or Riemannian,
metrics whose diagonal elements are all −1 are said to be negative definite, and
the others are said to be indefinite. The indefinite metrics whose diagonal elements
have only one −1 are said to be Lorentzian. The summation of all the diagonal
elements is called the signature of a metric. The ones most used in relativity are the
Lorentzian metrics and positive definite metrics.
48 2 Manifolds and Tensor Fields

Remark 3 For Lorentzian metrics, there are two conventions in the literature. Defini-
tion 3 presents the first convention, in which the diagonal elements of a 4-dimensional
Lorentzian metric are (−1, 1, 1, 1) (up to a trivial reordering,8 ) and the signature is
+2. In the other convention a Lorentzian metric is defined as a metric whose diag-
onal elements has only one +1, and thus the diagonal elements of a 4-dimensional
Lorentzian metric reads (1, −1, −1, −1), and the signature is −2. This text adopts
the convention with the +2 signature.
Definition 4 There are three types of vectors in a vector space V with a Lorentzian
metric g: ① any v that satisfies g(v, v) > 0 is called a spacelike vector; ② any v that
satisfies g(v, v) < 0 is called a timelike vector; ③ any v that satisfies g(v, v) = 0
is called a lightlike vector or a null vector.
Remark 4 ① In the convention with the −2 signature, the definitions of spacelike
vectors and timelike vectors are the exact opposite: a spacelike vector is defined as
g(v, v) < 0, while a timelike vector is defined as g(v, v) > 0. Nonetheless, there is
no essential difference: a vector that is timelike in the −2 signature is also timelike
in the +2 signature, and vice versa. ② The zero vector is certainly a null vector, but
not vice versa. Many readers may only be familiar with positive definite metrics, and
may think v = 0 (the zero element) whenever g(v, v) = 0. However, if a metric is
Lorentzian, then g(v, v) = 0 does not necessarily lead to v = 0 (the zero element
is unique, while there are infinitely many null vectors). Nonzero 4-dimensional null
vectors play a significant role in relativity. For instance, it is convenient to use them
to describe the propagation of electromagnetic waves and gravitational waves.
A metric g is a tensor of type (0, 2), which is a bilinear map from V × V to R,
so ∀v, u ∈ V we have g(v, u) ∈ R, and thus g(v, •) ∈ V ∗ . Given g, we can create
g(v, •) ∈ V ∗ for any v ∈ V , and hence g can be viewed as a linear map from V to V ∗ ,
linearly
i.e., g : V −−−→ V ∗ , which is an isomorphism (the proof is left as Exercise 2.15).
Therefore, V acquires a natural, distinguishing isomorphism from V to V ∗ after a
metric is assigned to it, using which we can naturally identify V and V ∗ . Summary:
V is naturally identified with V ∗∗ whether or not there is a metric; if there is a metric,
then V can also be identified with V ∗ .
Now we return to a manifold M.
Definition 5 A symmetric, everywhere non-degenerate tensor field of type (0, 2) is
called a metric tensor field.
Remark 5 In this text, we only care about metric fields each of which has a signature
that is the same everywhere.
One of the uses of a metric field is to define the arc length of a curve. First, we
discuss a 2-dimensional Euclidean space. Suppose the parametric equation of a curve
C(t) in the natural coordinate system {x, y} is x = x(t), y = y(t), then the square
of the length of a curve segment dl 2 [short for (dl)2 ] is

8 The modern convention in physics is to take “time” as the first component.


2.5 Metric Tensor Fields 49

dl 2 = dx 2 + dy 2 = [(dx/dt)2 + (dy/dt)2 ]dt 2 = [(T 1 )2 + (T 2 )2 ]dt 2 = |T |2 dt 2 ,

where T is a tangent vector of C(T ). From this we get

dl = |T |dt , (2.5.2)

and thus we define the arc length of C(t) as

l= |T |dt . (2.5.3)

The equation above can be generalized to any manifold M with a positive definite
metric field g. Suppose C(t) is an√arbitrary C 1 curve on M and T is its tangent
vector, i.e., T ≡ ∂/∂t, then |T | = g(T, T ), and hence the arc length of C(t) can
be naturally written as
l := g(T, T )dt . (2.5.4)

For a manifold M with a Lorentzian metric field g one should pay attention to the
type of a curve before defining its arc length. If the tangent vector at each point of a
C 1 curve C(t) is spacelike, then C(t) is called a spacelike curve. Similarly, we can
define a timelike curve and a null curve. The arc length of spacelike and null curves
are also defined by (2.5.4) (and thus the arc length of a null curve is always zero).
Note that for a timelike curve√ we have g(T, T ) < 0, so the length of a segment of
the curve is defined as dl := −g(T, T )dt. Thus, we have the following definition:
Definition 6 Suppose a manifold M has a Lorentzian metric field g, then the arc
length of a spacelike, null or timelike curve C(t) can be defined as


l := |g(T, T )|dt , where T ≡ . (2.5.5)
∂t

As for the arc length of an outlandish curve that can turn from spacelike into
timelike (or the other way round), we will leave it undefined. Although the following
discussion about arc length is for the Lorentzian metrics, it also applies to positive
definite metrics (if we consider all curves as spacelike curves).
It is not difficult to show that (Exercise 2.16) the arc length of a curve is inde-
pendent of its parametrization; that is, the reparametrization (which keeps the image
unchanged and adjusts the parameter) of a curve does not change the arc length of the
curve. In addition, since the definition of arc length (Definition 6) does not involve a
coordinate system, the arc length is certainly independent of the coordinate system.
However, if the curve lies inside the coordinate patch of a coordinate system {x μ },
the arc length can also be calculated with the help of the coordinate system. Since

g(T, T ) = g(T μ ∂/∂ x μ , T ν ∂/∂ x ν ) = T μ T ν g(∂/∂ x μ , ∂/∂ x ν ) = (dx μ /dt)(dx ν /dt)gμν ,


50 2 Manifolds and Tensor Fields

[In the last step, we used that fact that “the coordinate components of a tangent
vector of a curve are equal to the derivative of the parametric equation of the curve
in this system with respect to the parameter” (Theorem 2.2.4), i.e., T μ = dx μ /dt.]
the length of a line segment is

dl = |gμν dx μ dx ν | . (2.5.6)

Introduce the notation


ds 2 ≡ gμν dx μ dx ν , (2.5.7)

then the arc length reads



l= ds 2 (for spacelike curves) , (2.5.8)

l= −ds 2 (for timelike curves) . (2.5.9)

The notation ds 2 shows up very frequently in differential geometry; it is usually


called a line element. For a spacelike curve, ds 2 is equal to dl 2 , i.e., the square
of the length of a line segment dl; for a timelike curve, ds 2 is equal to −dl 2 , and
thus it is not the square of any real number. In fact, ds 2 is just a notation defined
by (2.5.7), which is not the square of any real number at all for any timelike curve
[see Optional Reading 2.5.2 for a precise interpretation of (2.5.7)]. However, since
the right-hand side of ds 2 ≡ gμν dx μ dx ν contains all the components gμν of g in the
coordinate system involved, one can “read off” all the coordinate components of
the metric directly from the expression of the line element. For example, suppose
the expression for the line element of a metric g on a 2-dimensional manifold in a
coordinate system {x, t} is

ds 2 = −xdt 2 + dx 2 + 4dtdx , (2.5.10)

then we can read off the components of g in this system as gtt = −x, gx x = 1, gt x =
gxt = 2. Thus, we can see that a given line element (expression) is equivalent to the
given metric field.
Suppose C : I → M is a spacelike or timelike curve, then |T |, the length of the
tangent vector T at an arbitrary point C(t), is a function of t denoted by |T |(t). If we
assign a point C(t0 ) on the curve arbitrarily as the starting point for measuring
t length,
then the curve segment between C(t0 ) and C(t) has the length l(t) = t0 |T |(t )dt ,
which is a increasing function of t. Hence, l can also √ act as the parameter of this
curve, called the arc length parameter. From dl ≡ |g(T, T )|dt we can see that a
tangent vector of a curve with the arc length as its parameter satisfies |g(T, T )| = 1,
namely it has a unit length.
Definition 7 Suppose a metric field g is given on a manifold M, then (M, g) is called
a generalized Riemannian space. (If g is positive definite, it is called a Riemannian
2.5 Metric Tensor Fields 51

space; if g is Lorentzian, it is called a pseudo-Riemannian space, or in physics, it


is called a spacetime.9 )
Now, we introduce two simple but significant examples of generalized Riemannian
spaces, namely Euclidean space and Minkowski space.
Definition 8 Suppose x μ are the natural coordinates of Rn . Define a metric tensor
field δ on Rn as
δ := δμν dx μ ⊗ dx ν , (2.5.11)

then (Rn , δ) is called the n-dimensional Euclidean space, and δ is called the
Euclidean metric.
The equation above indicates that the components of δ in a dual coordinate basis
of the natural coordinate system are

0, μ = ν
δμν = .
+ 1, μ=ν

Therefore, according to (2.5.7), the expression for the line element of the Euclidean
metric in the natural coordinate system should be ds 2 = δμν dx μ dx ν . If n = 2, then
we have ds 2 = (dx 1 )2 + (dx 2 )2 . This is exactly the well-known expression for the
line element of the 2-dimensional Euclidean space. It follows from (2.5.11) that the
natural coordinate basis is orthonormal measured by the Euclidean metric, since from

δ(∂/∂ x α , ∂/∂ x β ) = δμν dx μ ⊗ dx ν (∂/∂ x α , ∂/∂ x β ) = δμν dx μ (∂/∂ x α )dx ν (∂/∂ x β )

we can easily see that


δ(∂/∂ x α , ∂/∂ x β ) = δαβ . (2.5.12)

However, a coordinate system that satisfies (2.5.12) is not necessarily the natural
coordinate system. For example, for 2-dimensional Euclidean space, the coordinate
system defined based on the natural coordinate system {x, y} as follows

x = x +a, y = y+b (a, b are constants) (2.5.13)

has a basis {∂/∂ x , ∂/∂ y } that satisfies (2.5.12) (and thus it is orthonormal). Fur-
thermore, it is not difficult to show that (Exercise 2.17) the coordinate bases
{∂/∂ x , ∂/∂ y } of {x , y } defined by the following three equations also satisfy
(2.5.12):

9More precisely, (M, g) is called a spacetime if M is a connected manifold, and g is a Lorentzian


metric field with adequate differentiability.
52 2 Manifolds and Tensor Fields

x = x cos α + y sin α , y = −x sin α + y cos α (α is a constant) , (2.5.14)


x = −x , y = y, (2.5.15)
x =x, y = −y . (2.5.16)

Definition 9 A coordinate system in an n-dimensional Euclidean space that satisfies


(2.5.12) is called a Cartesian coordinate system or rectangular coordinate system.
In other words, a coordinate system is called a Cartesian system if its coordinate basis
is orthonormal measured by the Euclidean metric δ.
Remark 6 ① Since (2.5.12) is equivalent to (2.5.11), one can also say that a coor-
dinate system that satisfies (2.5.11) is a Cartesian system. ② The natural coordinate
system is certainly a Cartesian system. ③ The relationship between any two Carte-
sian systems in 2-dimensional Euclidean space can only have one of the forms in
(2.5.13)–(2.5.16) (or a composite of them). The first is called a translation, the sec-
ond is called a rotation, and each of the final two is called a reflection. ④ One should
distinguish between the symbols δ and δμν . δ stands for the Euclidean metric field,
which is a tensor field, while δμν are the components of δ in a Cartesian system. Also
note that the components of δ in a non-Cartesian system are not δμν .
The polar coordinate system {r, ϕ} is an example of a non-Cartesian system in 2-
dimensional Euclidean space. When using the polar coordinate system, physics books
usually use {êr , êϕ } as the corresponding basis (the ∧ above stands for unit vectors),
which is an orthonormal basis. However, it is not the coordinate basis of the polar
coordinate system; the point is that ∂/∂ϕ is not normalized since δ(∂/∂ϕ, ∂/∂ϕ) =
r 2 = 1 (the proof is left as an exercise). In fact, êϕ is the result of normalizing ∂/∂ϕ,
i.e., êϕ := r −1 ∂/∂ϕ. Thus, the frequently used basis {êr , êϕ } in physics books is not
the coordinate basis of the polar coordinate system but rather it is the orthonormal
basis that corresponds to the polar coordinate system.
Euclidean space is the simplest Riemannian space. Now we introduce the simplest
pseudo-Riemannian space—Minkowski space. The diagonal elements of the diago-
nalized 4-dimensional Lorentzian metric are (−1, 1, 1, 1). To highlight the only −1,
we denote its position as row 0 and column 0, and denote the position of the three
+1s as rows 1, 2, 3 and columns 1, 2, 3. Denote the elements of this diagonal matrix
as ημν (to distinguish it from δμν ), i.e., η00 ≡ −1, η11 ≡ η22 ≡ η33 ≡ 1. Generalizing
it to n dimensions, we have

⎨0,
⎪ μ = ν
ημν = − 1 , μ = ν = 0,


+ 1, μ = ν = 1, . . . , n − 1 .

Now we present the definition of Minkowski space.


Definition 10 Suppose x μ are the natural coordinates of Rn . Define a metric tensor
field η on Rn as
2.5 Metric Tensor Fields 53

η := ημν dx μ ⊗ dx ν , (2.5.17)

then (Rn , η) is called the n-dimensional Minkowski space (also known as the n-
dimensional Minkowski spacetime in physics), and η is called the Minkowski
metric.
From Definition 10 we can see that the expression for the line element of
Minkowski space in the natural coordinate system is ds 2 = ημν dx μ dx ν . Take n = 4,
for example, we have ds 2 = −(dx 0 )2 + (dx 1 )2 + (dx 2 )2 + (dx 3 )2 . This is exactly
the well-known expression for the line element of 4-dimensional Minkowski space-
time. It is easy to show that

η(∂/∂ x α , ∂/∂ x β ) = ηαβ , (2.5.18)

and thus the natural coordinate basis {∂/∂ x μ } is also orthonormal as measured by
the Minkowski metric. (The 0th coordinate basis vector is normalized to −1, the
others are normalized to 1). However, a basis satisfying (2.5.18) is not necessarily
the basis of the natural coordinate system. For instance, suppose t and x are the
natural coordinates for 2-dimensional Minkowski space, then the coordinate basis
{∂/∂t , ∂/∂ x } of

t =t +a, x = x + b (a, b are constants) (2.5.19)

also satisfies (2.5.18). It is not difficult to verify that (Exercise 2.18) the coordinate
bases {∂/∂t , ∂/∂ x } of {t , x } defined by the following three equations also satisfy
(2.5.18):

t = t cosh λ + y sinh λ , x = t sinh λ + x cosh λ (α is a constant) , (2.5.20)


t = −t , x =x, (2.5.21)
t =t, x = −x . (2.5.22)

Definition 11 The coordinate system in the n-dimensional Minkowski space that


satisfies (2.5.18) is called a Lorentzian coordinate system or pseudo-Cartesian
coordinate system; some works may also refer to it as a Cartesian (or Minkowski)
coordinate system.

Remark 7 ① The natural coordinates of Minkowski space are certainly Lorentzian


coordinates. ② The relationship between any two Lorentzian coordinate systems
in 2-dimensional Minkowski space can only have one of the forms as in (2.5.19)–
(2.5.22) (or a composite of them). The first one is called a translation, the second
one (2.5.20) is called a boost, and each of the final two is called a reflection. ③ The
components of the Minkowski metric tensor η in a non-Lorentzian coordinate basis
are not equal to ημν .
54 2 Manifolds and Tensor Fields

[Optional Reading 2.5.1]


Unlike any translation, rotation or boost, a reflection is a “discrete” transformation. There is
another kind of “discrete” transformation called inversion, defined as x = −x, y = −y and
t = −t, x = −x for 2-dimensional Euclidean space and Minkowski space, respectively.
Unlike a reflection, an inversion is a symmetry transformation with respect to a point. How-
ever, an inversion is not an independent transformation. Specifically, x = −x, y = −y is a
special case of (2.5.14) when α = π , while t = −t, x = −x can be regarded as a composite
of (2.5.21) and (2.5.22).
[The End of Optional Reading 2.5.1]

[Optional Reading 2.5.2]


In the text above, we have interpreted dl 2 as the square of the length of a line segment. This
is just a “popular” interpretation used a lot by physicists, which is actually not precise. For
instance, the 2-dimensional Euclidean space has

dl 2 = dx 2 + dy 2 . (2.5.23)
If a curve is a straight line, the essence of (2.5.23) is then

( l)2 = ( x)2 + ( y)2 , (2.5.24)


where l is a finite segment from the straight line, and x and y are the finite increments of
the coordinates x and y for this segment, respectively. If the curve is not a straight line, then
(2.5.24) is not applicable. However, no matter how short the segment is (as long as the two
ends are not coincident), its length is a definite real number rather than an infinitesimal, while
the prerequisite of (2.5.23) is that dl is an infinitesimal (and thus so are dx and dy). Again, we
run into the trouble of an “infinitesimal nonzero quantity” (see Optional Reading 2.3.1). The
same problem also occurs for the dl 2 and ds 2 of a curved space. Physicists are accustomed
to using approximation methods to deal with similar issues, they (including the authors of
this text) may write (2.5.23), but interpret dl, dx, dy, etc. as definite nonzero quantities l,
x, y, etc., i.e., interpret (2.5.23) as (2.5.24). Now, we will discuss how to understand the
differential geometry generalization of (2.5.23), i.e., the expression for the line element

ds 2 = gμν dx μ dx ν . (2.5.25)
Since both dx μ and dx ν are both dual vectors, their “product” dx μ dx ν can only be the tensor
product “dx μ ⊗ dx ν ”; thus, the right-hand side of (2.5.25) is actually an abbreviation for
gμν dx μ ⊗ dx ν . However, gμν dx μ ⊗ dx ν is nothing but the expansion of the metric tensor
g in the dual coordinate basis, i.e.,

g = gμν dx μ ⊗ dx ν . (2.5.26)
On the other hand, in differential geometry, one cannot find any other interpretation for ds 2
on the left-hand side of (2.5.25), it is actually nothing but another notation for g! Therefore,
we can see that the precise meaning of (2.5.25) turns out to be the tensor equation (2.5.26).
This interpretation is accurate, but also sounds pedantic, and is hard to be popularized. In
contrast, one of the important reasons why (2.5.25) is commonly used is that when using
approximations, dl 2 can be viewed as the square of the length of a line segment, and ds 2 is
nothing but a notation for dl 2 (for spacelike segments) or −dl 2 (for timelike segments). Many
equations in this section can only be understood with this interpretation of approximation.
For instance, if we insist that we use the true, precise definition from differential geometry,
then (2.5.8) should be rewritten as
2.6 The Abstract Index Notation 55

dx μ dx ν
l= gμν dt , (2.5.8 )
dt dt
where t is the parameter of the curve we are talking about. Unlike (2.5.8), each symbol in
this equation has a precise meaning; for example, dx μ /dt is the μth coordinate component
of a tangent vector of the curve, while dt together with the integral sign indicate that the
variable of integration is t.
[The End of Optional Reading 2.5.2]

2.6 The Abstract Index Notation

There are two common ways to express a tensor. The first one is using a letter without
any index (such as T ) to represent a tensor, though this contains two drawbacks: ① one
cannot tell the type of a tensor; ② it is not easy to state that a contraction is between
which upper slot and which lower slot. (The symbol Cij T we used before is only
temporary, it is not convenient to use in computations.) The second notation is to use
the components (such as T μν ρ ) to represent a tensor, and to use the equalities obeyed
by the components to represent the equalities obeyed by tensors. The equalities of
components are the equalities of numbers, and thus all of the tensor equations in the
literature using this notation are equalities of numbers. This notation can overcome
the two difficulties of the first notation; however, it has a serious disadvantage of itself:
sometimes, one can choose a special basis and obtain a relatively simple equation
relating its components, but this equation only holds for this basis, and cannot be
used to represent the tensor equation in general. We want to know which equations
can and which cannot represent tensor equations, yet this is difficult to tell in this
component notation. To overcome this problem, Roger Penrose created the “abstract
index notation”. The main points are as follows:
1. A tensor of type (k, l) is represented by a letter with k upper indices and l lower
indices, all the indices are lower-case Latin letters, which only indicate the type of
a tensor, and thus are called abstract indices. For example, v a stands for a vector,
in which the upper index a plays the same role as the → in v (and hence one cannot
say a = 1 or a = 2), ωa stands for a dual vector, T ab c stands for a tensor of type
(2, 1), and so on. v b and v a stand for the same vector (i.e., v); however, we should
pay attention to the “balance of indices” when writing an equation. For example, one
can write αu a + v a = wa or αu b + v b = w b , but not αu a + v b = wa .
2. Repeated upper and lower indices represent the contraction between these two
indices; for example,

T a a = T (eμ∗ ; eμ ) = T μ μ , T ab a = T (eμ∗ , • ; eμ ) , T ab b = T ( • , eμ∗ ; eμ ) .

3. The tensor product symbol is omitted. For instance, suppose T ∈ TV (2, 1),
S ∈ TV (1, 1), then T ⊗ S can be written as T ab c S d e . In the notation without indices,
generally, ω ⊗ μ = μ ⊗ ω, as when acting on (v, u), whether ω acts on u or v
depends on the order of these letters [the first letter in ω ⊗ μ act on the first letter
56 2 Manifolds and Tensor Fields

in (v, u), i.e., ω acts on v]. In the abstract index notation, since repeated upper and
lower indices are assumed to be contracted, ω ⊗ μ(v, u) can be written as either
ωa μb v a u b or μb ωa v a u b [both stand for ω(v)μ(u)]. Since the acting target of both
ωa μb and μb ωa is the same v a u b , we have ωa μb = μb ωa . That is, the letters that
represent tensors can be interchanged assuming their indices travel with them. The
non-commutativity of the order of a tensor product is now manifested by ωa μb =
ωb μa .
4. When we are talking about the components of a tensor, the corresponding
indices are labeled by lower-case Greek letters, such as μ, ν, α, β, etc. (as we
used before). These indices are called component indices or concrete indices, and
we can ask about whether μ = 1 or μ = 2. A basis expansion of a tensor T =
T μν σ eμ ⊗ eν ⊗ eσ ∗ can now be written as

T ab c = T μν σ (eμ )a (eν )b (eσ )c , (2.6.1)

[the lower index c of (eσ )c has already indicated that it is a dual basis vector, so there
is no need to write (eσ ∗ )c ] while T μν σ = T (eμ∗ , eν∗ ; eσ ) can now be written as

T μν σ = T ab c (eμ )a (eν )b (eσ )c . (2.6.2)

Note that the indices of both (2.6.1) and (2.6.2) (whether abstract or concrete) are
“balanced”. Suppose T ∈ TV (0, 2), then T should be denoted by Tab . Let eμ be the
μth basis vector of a basis, then from (2.4.7) we can see that T (• , eμ ) = C12 (T ⊗ eμ ),
and since T ⊗ eμ should be denoted by Tab (eμ )c using the abstract index notation,
T (• , eμ ) should be denoted by Tab (eμ )b , also abbreviated as Taμ , i.e.,

T ( • , eμ ) ≡ Tab (eμ )b = Taμ . (2.6.3)

This is an expression with both abstract and component indices; we may consider
Ta1 , . . . , Tan as n dual vectors, where Taμ stands for “the μth dual vector”.
5. From the “multifaceted view of tensors”, we can see that a tensor of type
(1, 1) T a b on V can be viewed either as a linear map from V to V or a linear
map from V ∗ to V ∗ . That is, T a b acting on a vector v b ∈ V still returns a vector,
denoted by u a ≡ T a b v b ∈ V , while T a b acting on a dual vector ωa ∈ V ∗ still returns
a dual vector, denoted by μb ≡ T a b ωa ∈ V ∗ . Actually, it can be seen at a glance
from the abstract index notation that T a b v b and T a b ωa are a vector and a dual vector,
respectively. Thus, the abstract index notation is a simple and intuitive representation
of the “multifaceted view of tensors”. Using δ a b to represent the identity map from
V to V , i.e., δ a b v b := v a ∀v b ∈ V , we can easily see that it is also an identity map
from V ∗ to V ∗ , i.e., δ a b ωa = ωb ∀ωa ∈ V ∗ . It is not difficult to further show that
(exercise) the result of δ a b contracting with any tensor is substituting the upper index
b of that tensor with a (or substituting the lower index a with b), such as δ a b Tac = Tbc ,
δ a b T cb e = T ca e . Suppose {(eμ )a } is a basis of V , and {(eμ )a } is the dual basis, then

(eμ )a (eμ )b = δ b a . (2.6.4)


2.6 The Abstract Index Notation 57

This is a tensor of type (1, 1); to prove it, we only need to verify that the result of each
side acting on an arbitrary vector v a is the same (exercise). Suppose {(eμ )a } is a basis
of V and {(eμ )a } is the dualbasis, then the components of δ a b in this basis δ μ ν ≡
+ 1, (μ = ν)
δ a b (eμ )a (eν )b satisfy δ μ ν = . The proof is very simple: taking δ 1 1
0, (μ = ν)
as an example, δ 1 1 = δ a b (e1 )a (e1 )b = (e1 )a (e1 )a = 1. Note that δ 0 0 = +1 even for
the Lorentzian signature.
6. Since a metric g ∈ TV (0, 2), it should be denoted by gab . Suppose v ∈ V , then
g(• , v) ∈ V ∗ (see the paragraph after Example 1 in Sect. 2.4). Regarding g as the T
in (2.4.7), we get g(• , v) = C12 (g ⊗ v) = C12 (gab v c ) = gab v b ; hence, g(• , v) should
be denoted by gab v b . Further, when there is a metric g, V is identified with V ∗ under
the isomorphism g : V → V ∗ , and gab v b ≡ g(• , v) is exactly the image of v a under
this map. Hence gab v b should be identified with v a , and may just simply be denoted
by va (which can be taken as a definition of va ). That is, although mathematically
speaking v a and va are two different types of objects (a vector and a dual vector), in
application they represent the same thing (and thus both are denoted by v). Thus, we
usually write
va = gab v b . (2.6.5)

On the other hand, since g : V → V ∗ is an isomorphism, its inverse map g −1 naturally


exists. It is not difficult to show that g −1 is a tensor of type (2, 0). Though it seems it
should be denoted by (g −1 )ab , it is usually denoted by g ab (the upper indices prevent
confusion with gab ). Using similar reasoning, the image of any ωb ∈ V ∗ under a
map g ab is g ab ωb , which may simply be denoted by ωa in order to indicate that it
represents the same thing as ωa ; therefore, (see Fig. 2.10)

ωa = g ab ωb . (2.6.6)

Equations (2.6.5) and (2.6.6) indicate that one can use gab and g ab to “raise” and
“lower” the upper and lower indices, respectively. These operations of raising and
lowering indices are applicable for any abstract index in any tensor. For instance, a
tensor T of type (1, 1) can be denoted by T a b in abstract index notation, and lowering
the index using the metric is actually performing the tensor product and contraction
between g and T to obtain a tensor of type (0, 2), g(•, eμ ) ⊗ T (eμ∗ ; •), which is
denoted by Tab in abstract index notation, i.e., Tab ≡ gac T c b .
Using (2.6.6) and (2.6.5) in turn we have

Fig. 2.10 A metric g can


naturally identify V with V ∗ ,
and thus the indices may be
raised and lowered
58 2 Manifolds and Tensor Fields

ωa = g ab ωb = g ab (gbc ωc ) , ∀ωa ∈ V ,

and hence
g ab gbc = δ a c , (2.6.7)

which is actually a corollary of the fact that g ab is the inverse of gab .


Suppose {(eμ )a } is a basis of V , and {(eμ )a } is the dual basis, use gμν and g μν to
represent the components of gab and g ab in this basis, respectively, then on the one
hand we have δ a c = δ μ σ (eμ )a (eσ )c , on the other hand

δ a c = g ab gbc = g μν (eμ )a (eν )b gρσ (eρ )b (eσ )c = g μν gνσ (eμ )a (eσ )c ,

where the third equality is because (eν )b gρσ (eρ )b = δ ρ ν gρσ = gνσ , and hence

g μν gνσ = δ μ σ . (2.6.8)

The above equation indicates that the matrix formed by the components gμν of
the metric gab in any basis is invertible (whose inverse is the matrix formed by
the components g μν of the inverse metric g ab in the same basis), and thus is non-
degenerate. Therefore, the non-degeneracy of gab assures the non-degeneracy of its
matrix (gμν ) in any basis. Conversely, suppose there exist a basis {(eμ )a } and its dual
basis {(eμ )a } such that (gμν ) is non-degenerate, then (gμν ) has an inverse matrix
(g μν ). Let g ab ≡ g μν (eμ )a (eν )b , then it is easy to prove from g μν gνσ = δ μ σ that
g ab gbc = δ a c , and thus gab : V → V ∗ is non-degenerate since it has an inverse map
g ab . (The proof of “the inverse exists ⇒ non-degenerate” is left as an exercise. Hint:
gab : V → V ∗ having an inverse indicates that it is a one-to-one map, while if gab
is degenerate, then, besides the zero element, there would be a v a = 0 in V whose
image is also 0 ∈ V ∗ , which contradicts the fact that gab is one-to-one.)
It is not difficult to see that the upper and lower indices of the components of a
tensor can be raised and lowered using the components of a metric gμν and its inverse
g μν . For instance, we can write gμν v ν as vμ because

gμν v ν = gab (eμ )a (eν )b v ν = gab (eμ )a v b = va (eμ )a = vμ .

As an example of the abstract index notation, here we introduce the abstract index
expression for the 4-dimensional Minkowski metric ηab .
The definition of the Minkowski metric (2.5.17) can be expressed in abstract index
notation as
ηab := ημν (dx μ )a (dx ν )b ,

where {(dx μ )a } is the dual basis of the Lorentzian coordinate system. If we use
{t, x, y, z} to represent {x 0 , x 1 , x 2 , x 3 }, then since the only nonzero ημν are η00 = −1
and η11 = η22 = η33 = 1, the equation above can be expressed as

ηab = −(dt)a (dt)b + (dx)a (dx)b + (dy)a (dy)b + (dz)a (dz)b , (2.6.9a)
2.6 The Abstract Index Notation 59

which corresponds to the expression for the line element ds 2 = −dt 2 + dx 2 + dy 2 +


dz 2 . If we use the spherical coordinate system {t, r, θ, ϕ} instead, then using

x = r sin θ cos ϕ , y = r sin θ sin ϕ , z = r cos θ ,

it is not difficult to derive from (2.6.9a) that

ηab = −(dt)a (dt)b + (dr )a (dr )b + r 2 (dθ )a (dθ )b + r 2 sin2 θ (dϕ)a (dϕ)b ,
(2.6.9b)
which corresponds to the line element ds 2 = −dt 2 + dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 ).
In much of the literature that does not use the abstract index notation, the compo-
nent indices in a 4-dimensional spacetime and a 3-dimensional Riemannian space are
denoted by Greek letters μ, ν, . . . (each can be 0, 1, 2, 3) and Latin letters i, j, k, . . .
(each can be 1, 2, 3), respectively. According to what we mentioned previously, the
Latin indices in this text are supposed to represent abstract indices. However, in order
to distinguish the component indices of 4 dimensions and 3 dimensions, we allow
one exception: whenever we discuss a 3-dimensional Riemannian space, the Latin
letters that start from i (i, j, k, . . . ) are component indices (each can be 1, 2, 3), and
the other Latin letters (such as a, b, c, etc.) are still abstract indices. For example, a
3-dimensional vector v can be expressed as v a = v i (∂/∂ x i )a (i is summed from 1 to
3).
In the abstract index notation, a coordinate basis vector is denoted by (∂/∂ x μ )a ,
and a dual coordinate basis vector is denoted by (dx μ )a . Using a metric gab and its
inverse g ab to raise and lower their indices, respectively, we obtain a dual vector
gab (∂/∂ x μ )b and a vector g ab (dx μ )b . Denote the gab (∂/∂ x μ )b by ωa for short and
expand it using the dual coordinate basis as gab (∂/∂ x μ )b = ων (dx ν )a . Applying both
sides to (∂/∂ x σ )a yields gσ μ = ωσ ; hence,

gab (∂/∂ x μ )b = gμν (dx ν )a . (2.6.10a)

Thus, in general g ab (dx μ )b does not equal (dx μ )a . Similarly, we have

g ab (dx μ )b = g μν (∂/∂ x ν )a . (2.6.10b)

When gab = δab (Euclidean metric) and {x μ } is a Cartesian coordinate system, these
two equations above can be simplified as

δab (∂/∂ x μ )b = (dx μ )a , δ ab (dx μ )b = (∂/∂ x μ )a , (2.6.11)

and when gab = ηab (take 4-dimensional Minkowski as an example) and {x μ } is a


Lorentzian coordinate system, then we have

ηab (∂/∂ x 0 )b = −(dx 0 )a , ηab (∂/∂ x i )b = (dx i )a ; (2.6.12a)


η (dx )b = −(∂/∂ x ) ,
ab 0 0 a
η (dx )b = (∂/∂ x ) .
ab i i a
(2.6.12b)
60 2 Manifolds and Tensor Fields

Here, i = 1, 2, 3 are not abstract indices.


An upper index and a lower index are also called a contravariant index and a
covariant index, respectively. Correspondingly, a vector v a and a dual vector ωa are
also called a contravariant vector and a covariant vector, respectively.
The symmetry of a tensor can be expressed conveniently as follows using the
abstract index notation.
Definition 1 T ∈ TV (0, 2) is said to be symmetric if T (u, v) = T (v, u), ∀u,
v ∈ V.
Since T (u, v) = Tab u a v b and T (v, u) = Tab v a u b = Tba u a v b , in abstract index
notation, the necessary and sufficient condition of T to be symmetric is that Tab =
Tba . In the abstract index notation, a tensor of type (0, 2) can be denoted as either
Tab or Tba . However, only when T is symmetric can we write Tab = Tba ; thus, one
should be more careful when writing an equation than when writing a tensor using
abstract indices. Similarly, a tensor of type (1, 1) can be expressed as either T a b
or Tb a , whose upper indices can be lowered using a metric as gca T a b = Tcb and
gca Tb a = Tbc . Although they stand for the same tensor, only a tensor of type (1, 1)
that is symmetric after the index is lowered can be written as T a b = Tb a . When the
indices are not raised or lowered using a metric, the upper and lower indices of a
tensor of type (k, l) are ordered separately, and there is no order between an upper
index and a lower index. Therefore, if we want, a tensor of type (1, 1) can be written
as Tba , a tensor of type (2, 1) can be written as Tcab , etc. However, uncertainty will
occur when raising and lowering indices in this kind of expression. Since we will
raise and lower indices frequently, in this text we stagger the upper and lower indices
from the very beginning, e.g., T ab c .
The discussion above indicates that abstract index notation is formally quite sim-
ilar to the component index notation. This is exactly one of the merits of the abstract
index notation: it can represent tensor equations yet retains many advantages of the
component index notation.
Definition 2 For a tensor Tab of type (0, 2), the symmetric part (denoted by T(ab) )
and the antisymmetric part (T[ab] ) are defined respectively as

1 1
T(ab) := (Tab + Tba ) , T[ab] := (Tab − Tba ) ,
2 2
generally, the symmetric and antisymmetric parts of a tensor Ta1 ...al of type (0, l) are
defined as
1
T(a1 ...al ) := Taπ(1) ...aπ(l) , (2.6.13)
l! π
1
T[a1 ...al ] := δπ Taπ(1) ...aπ(l) , (2.6.14)
l! π

where π represents a permutationof (1, . . . , l), π(1) stands for the first number in
the permutation described by π , π represents the summation of all permutations,
2.6 The Abstract Index Notation 61

and δπ ≡ ±1 (+ for even permutations, − for odd permutations). For example,

1
T(a1 a2 a3 ) := (Ta1 a2 a3 + Ta3 a1 a2 + Ta2 a3 a1 + Ta1 a3 a2 + Ta3 a2 a1 + Ta2 a1 a3 ) ,
6
1
T[a1 a2 a3 ] := (Ta1 a2 a3 + Ta3 a1 a2 + Ta2 a3 a1 − Ta1 a3 a2 − Ta3 a2 a1 − Ta2 a1 a3 ) .
6

Definition 3 T ∈ TV (0, l) is said to be totally symmetric if Ta1 ...al = T(a1 ...al ) ; T is


said to be totally antisymmetric if Ta1 ...al = T[a1 ...al ] .
The concepts above (Definitions 1–3) can also be applied to tensors of type (k, 0).
For instance, T is said to be totally symmetric if T a1 ...ak = T (a1 ...ak ) .
Remark 1 Any tensor of type (0, 2) can be expressed as a summation of its symmetric
part and antisymmetric part, i.e., Tab = T(ab) + T[ab] . However, this is not true for
tensors of type (0, l) where l > 2. For example, Tabc = T(abc) + T[abc] , yet Tabc =
T(abc) ⇒ T[abc] = 0 [see Theorem 2.6.2 (e)].
Theorem 2.6.1 (a) Suppose Ta1 ...al = T(a1 ...al ) , then

Ta1 ...al = Taπ(1) ...aπ(l) (where π represents an arbitrary permutation) , (2.6.15)

i.e., any term in the expansion of T(a1 ...al ) (l! terms in total) equals Ta1 ...al ; for example,

Tabc = T(abc) ⇒ Tabc = Tacb = Tcab = Tcba = Tbca = Tbac . (2.6.16)

(b) Suppose Ta1 ...al = T[a1 ...al ] , then

Ta1 ...al = δπ Taπ(1) ...aπ(l) , (2.6.17)

i.e., any even permutation term in the expansion of T[a1 ...al ] equals Ta1 ...al , and any
odd permutation term equals −Ta1 ...al ; for example,

Tabc = T[abc] ⇒ Tabc = −Tacb = Tcab = −Tcba = Tbca = −Tbac . (2.6.18)

Similar conclusions also hold for the tensors of type (k, 0) such that (the upper
indices) are totally symmetric and totally antisymmetric.
Proof We only take the case of l = 3 as an example. For the other integer values of
l, one can prove it in the same manner.
(a) From Tabc = T(abc) we have Tacb = T(acb) (the latter equation is nothing but a
result of changing the abstract indices for both sides of the former one), and since
T(acb) = T(abc) (which is manifest from the definition of T(abc) ), we have Tacb =
T(acb) = Tabc . The other equalities of the right-hand side of (2.6.16) can be proved
likewise.
(b) From Tabc = T[abc] we have Tacb = T[acb] = −T[abc] = −Tabc . The other
equalities of the right-hand side of (2.6.18) can be proved likewise. 
62 2 Manifolds and Tensor Fields

In the future we will often deal with the computations that involve parentheses
and square brackets, and the theorem below will bring great convenience for many
computations.
Theorem 2.6.2 (a) The brackets are “contagious” in a contraction process, i.e.,

T[a1 ...al ] S a1 ...al = T[a1 ...al ] S [a1 ...al ] = Ta1 ...al S [a1 ...al ] , (2.6.19)

and so are the parentheses.


(b) One can arbitrarily add or delete one kind of bracket (parentheses or square
brackets) inside a pair of the same kind of bracket; for example,

1
T[[ab]c] = T[abc] , where T[[ab]c] ≡ (T[abc] − T[bac] ) . (2.6.20)
2
(c) A pair of brackets inside a pair of the other kind of brackets yields zero; for
example,
T[(ab)c] = 0 , T(a[bcd]) = 0 . (2.6.21)

(d) The contraction of different kinds of brackets yields zero; for example,

T (abc) S[abc] = 0 . (2.6.22)

(e)

Ta1 ...al = T(a1 ...al ) ⇒ T[a1 ...al ] = 0 , (2.6.23)


Ta1 ...al = T[a1 ...al ] ⇒ T(a1 ...al ) = 0 . (2.6.24)

Similar conclusions also hold for the tensors of type (k, 0) such that the upper indices
are totally symmetric or totally antisymmetric.
Proof The proof of (a), (b), (c) are left as exercises. (d) is a corollary of (a) and (c),
and (e) is a corollary of (c). 

Exercises

˜2.1. Show that the homeomorphism ψi± defined in Example 2 of Sect. 2.1 sat-
isfies the compatibility condition on all the overlap regions of Oi± , which
verifies that S 1 is indeed a 1-dimensional manifold.
2.2. Deduce that an n-dimensional vector space can be regarded as an n-
dimensional trivial manifold.
2.3. Suppose X and Y are topological spaces, f : X → Y is a homeomorphism.
If X is also a manifold, define a differential structure for Y such that f :
X → Y is upgraded to a diffeomorphism.
2.6 The Abstract Index Notation 63

˜2.4. Suppose x, y are the natural coordinates of R2 , C(t) is a curve whose para-
metric equations are x = cos t and y = sin t, t ∈ (0, π ). If p = C(π/3),
write down the components of the tangent vector of the curve at p in the
natural coordinate basis, and sketch this curve as well as this tangent vector.
2.5. Suppose the tangent vectors of two curves C(t) and C (t) = C(2t0 − t) at
C(t0 ) = C (t0 ) are v and v , respectively. Show that v + v = 0.
˜2.6. Suppose O is the coordinate patch of the coordinate system {x μ }, p ∈ O,
v ∈ V p , v μ are the coordinate components of v. Regarding x μ as a C ∞
function on O, show that v μ = v(x μ ). Hint: act both sides of v = v ν X ν on
a function x μ .
2.7. Suppose M is a 2-dimensional manifold, (O, ψ) and (O , ψ ) are two coor-
dinate systems on M whose coordinates are x, y and x , y , respectively,
and the coordinate transformation on O ∩ O is x = x, y = y − x ( =
constant). Write down the expression for the expansion of ∂/∂ x and ∂/∂ y
in terms of ∂/∂ x and ∂/∂ y .
˜2.8. (a) Show that [u, v] in (2.2.9) pointwisely satisfies the two conditions in
the definition of a vector (Definition 2 in Sect. 2.2), and thus is a vector
field. (b) Suppose u, v, w are smooth vector fields on M. Show that

[[u, v], w] + [[w, u], v] + [[v, w], u] = 0 (this is called the Jacobi identity) .

˜2.9. Suppose r, ϕ are the polar coordinates on an open set (the coordinate patch)
in R2 , x and y are natural coordinates.
(a) Write down the expression for the expansion of the polar coordinate
basis ∂/∂r and ∂/∂ϕ (as vector fields on the coordinate patch) in terms of
∂/∂ x and ∂/∂ y.
(b) Derive the expression for the expansion of a vector [∂/∂r, ∂/∂ x] in
terms of ∂/∂ x and ∂/∂ y.
(c) Set êr ≡ ∂/∂r , êϕ ≡ r −1 ∂/∂ϕ. Derive the expression for the expansion
of [êr , êϕ ] in terms of ∂/∂ x and ∂/∂ y.
˜2.10. Suppose u, v are vector fields on M. Show that the components of [u, v]
in any coordinate basis satisfy

∂v μ ∂u μ
[u, v]μ = u ν ν
− v ν ν . Hint: use (2.2.3 ) and (2.2.3).
∂x ∂x

˜2.11. Suppose {eμ } is a basis of V , and {eμ∗ } is the dual basis, v ∈ V , ω ∈ V ∗ .


Show that
ω = ω(eμ )eμ∗ , v = eμ∗ (v)eμ .
μ
˜2.12. Show that ων = ∂∂ xx ν ωμ (Theorem 2.3.4).
˜2.13. Show that the map v → v ∗∗ defined by (2.3.5) is an isomorphism. Hint:
one may use a conclusion of linear algebras, i.e., a one-to-one linear map
between two vector spaces with the same dimension must be onto.
64 2 Manifolds and Tensor Fields

˜2.14. Suppose C11 T and (C11 T ) are contractions of a tensor T of type (2, 1)
defined in two different basis {eμ } and {eμ }. Show that (C11 T ) = C11 T .
*˜2.15. Suppose g is a metric of V . Show that g : V → V ∗ is an isomorphism (see
the hint for Exercise 2.13).
˜2.16. Show that the arc length of a curve does not depend on the parametrization.
2.17. Suppose {x, y} is a Cartesian coordinate system of 2-dimensional Euclidean
space. Show that {x , y } defined by (2.5.14) is also a Cartesian system.
2.18. Suppose {t, x} is a Lorentzian coordinate system of 2-dimensional
Minkowski space. Show that {t , x } defined by (2.5.20) is also a Lorentzian
system.
˜2.19. (a) Using the tensor transformation law, derive all the components gμν of the
3-dimensional Euclidean metric in a spherical coordinate system. (b) Given
the expression for the line element of the 4-dimensional Minkowski metric
in a Lorentzian system ds 2 = −dt 2 + dx 2 + dy 2 + dz 2 , derive all the com-
ponents of g and its inverse g −1 in a new coordinate system {t , x , y , z },
denoted by gμν and g μν . This new coordinate system is defined as follows:

t = t, z = z, x = (x 2 + y 2 )1/2 cos(ϕ − ωt) ,


y = (x 2 + y 2 )1/2 sin(ϕ − ωt) , ω = constant ,

where ϕ satisfies cos ϕ = y(x 2 + y 2 )−1/2 , sin ϕ = x(x 2 + y 2 )−1/2 . Hint:


first derive g μν and then derive gμν .
˜2.20. Show that the lengths of the spherical coordinate basis vectors ∂/∂r , ∂/∂θ ,
∂/∂ϕ in 3-dimensional Euclidean space are 1, r and r sin θ .
μ σ
˜2.21. Using the abstract index notation, show that T μ ν = ∂∂xx ρ ∂∂ xx ν T ρ σ .
2.22. Using g and g to represent the two n × n matrices constituted by the
components gμν and gμν of gab in coordinate systems {x μ } and {x μ },
respectively, show that g = |∂ x ρ /∂ x σ |2 g, where |∂ x ρ /∂ x σ | is the Jaco-
bian determinant of the coordinate transformation {x μ } → {x μ }, i.e.,
the n × n determinant constituted by ∂ x ρ /∂ x σ . NB: This exercise indi-
cates that the determinant of a metric is not an invariant under a coor-
dinate transformation. Hint: take the determinant of the equality gρσ =
(∂ x μ /∂ x ρ )(∂ x ν /∂ x σ )gμν .
˜2.23. Suppose {x μ } is an arbitrary local coordinate system on a manifold. Deter-
mine whether each of the following equation is true or false:
(1) (∂/∂ x μ )a (∂/∂ x ν )a = gμν , where (∂/∂ x ν )a ≡ gab (∂/∂ x ν )b ;
(2) (dx μ )a (dx ν )a = g μν , where (dx μ )a ≡ g ab (dx μ )b ;
(3) (∂/∂ x μ )a = (dx μ )a ;
(4) (dx μ )a = (∂/∂ x μ )a ;
(5) v μ ωμ = vμ ωμ ;
(6) gμν T νρ Sρ σ = Tμρ S ρσ ;
(7) v a u b = v b u a ;
(8) v a u b = u b v a .
References 65

2.24. Suppose Tab is a tensor of type (0, 2) on a vector space V . Show that
Tab v a v b = 0, ∀v a ∈ V ⇒ Tab = T[ab] . Hint: express v a as the sum of two
arbitrary vectors u a and wa .
2.25. Show that Tabcd = Ta[bc]d = Tab[cd] ⇒ Tabcd = Ta[bcd] .
Remark (1) The above claim has the following generalization:

T···a···b···c··· = T···[a···b]···c··· = T···a···[b···c]··· ⇒ T···a···b···c··· = T···[a···b···c]··· .

The premise above only contains two equal signs, the key point is that
the index b from both T···[a···b]···c··· and T···a···[b···c]··· are inside the square
brackets.
(2) Both the original and generalized claims will still hold when changing
the square brackets in the premise and conclusion to parentheses.

References

Hawking, S. W. and Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
Kline, M. (1980), Mathematics: The Loss of Certainty, Oxford University Press, New York.
Sachs, R. K. and Wu, H. (1977), General Relativity for Mathematicians, Spinger-Verlag, New York.
Schutz, B. F. (1980), Geometrical Methods of Mathematical Physics, Cambridge University Press,
Cambridge.
Spivak, M. (1970), A Comprehensive Introduction to Differential Geometry, Vol. I, II, Publish or
Perish INC, Berkeley.
Straumann, N. (1984), General Relativity and Relativistic Astrophysics, Spinger-Verlag, Berlin.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Chapter 3
The Riemann (Intrinsic) Curvature
Tensor

3.1 Derivative Operators

In Euclidean space there is a familiar derivative operator ∇,  the action of which on,
for example, a function (scalar field) f yields a vector field ∇  f (gradient) and on
a vector field v (with contraction) it yields a scalar field ∇  · v (divergence). Since
there exists a Euclidean metric δab , a vector va can be naturally identified with a dual
vector va = δab vb . Now we want to generalize ∇  to an arbitrary manifold that may
not have a metric, so we need to distinguish vectors and dual vectors. It has been
shown that ∇  behaves more like a dual vector after being generalized, and hence
should be denoted by ∇a . Actually, ∇ itself is an operator, which is neither a vector
nor a dual vector; by regarding ∇ as a dual vector, we mean that the result of it acting
on a function f is a dual vector ∇a f . More generally, the result of ∇ acting on a
tensor field of type (k, l) is a tensor field of type (k, l + 1). Therefore, we have the
following definition:
Definition 1 Use F M (k, l) to represent the collection of all C ∞ tensor fields of type
(k, l) on a manifold M. [A function f can be viewed as a tensor field of type (0, 0)
(scalar field), and hence F M (0, 0) ≡ F M .] A map ∇ : F M (k, l) → F M (k, l + 1)
is called a derivative operator1 on M if it satisfies the following conditions:
(a) Linearity:

∇a (αT b1 ···bk c1 ···cl + β S b1 ···bk c1 ···cl ) = α∇a T b1 ···bk c1 ···cl + β∇a S b1 ···bk c1 ···cl

∀T b1 ···bk c1 ···cl , S b1 ···bk c1 ···cl ∈ F M (k, l) , α, β ∈ R ;

(b) Leibniz rule:

∇a (T b1 ···bk c1 ···cl S d1 ···dk e1 ···el ) = T b1 ···bk c1 ···cl ∇a S d1 ···dk e1 ···el + S d1 ···dk e1 ···el ∇a T b1 ···bk c1 ···cl

1F (k, l) can be relaxed to the collection of all C 1 tensor fields of type (k, l); that is, ∇a can act on
an arbitrary tensor field of class C 1 .
© Science Press 2023 67
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_3
68 3 The Riemann (Intrinsic) Curvature Tensor

∀T b1 ···bk c1 ···cl ∈ F M (k, l), S d1 ···dk e1 ···el ∈ F M (k , l ) ;

(c) Commutativity with contraction;


(d) v( f ) = va ∇a f , ∀ f ∈ F M , v ∈ F M (1, 0).

Remark 1 (1) Condition (c) can also be expressed as ∇ ◦ C = C ◦ ∇, where C stands


for contraction. In the future, we will often write equations like

∇a (vb ωb ) = vb ∇a ωb + ωb ∇a vb ,

which requires condition (c) since the derivation of this equation reads

∇a (vb ωb ) = ∇a [C(vb ωc )] = C12 [∇a (vb ωc )]


= C12 (vb ∇a ωc ) + C12 [(∇a vb )ωc ] = vb ∇a ωb + ωb ∇a vb ,

where we used condition (c) in the second step (see Sect. 2.4 for a refresher on the
operation of C).
(2) The function v( f ) on the left-hand side of condition (d) should not be denoted
by va ( f ) since it may be easily mistaken for a vector field. This is one of the few cases
where we should but we do not put on an abstract index. To understand condition
(d), one can use ∇  in Euclidean space as an example. Suppose va is a vector field in
Euclidean space whose expansion in the Cartesian coordinates is

va = v1 (∂/∂ x)a + v2 (∂/∂ y)a + v3 (∂/∂z)a ,

then the action of it on a function f can be expressed as

 f = va ∇a f .
v( f ) = v1 (∂ f /∂ x) + v2 (∂ f /∂ y) + v3 (∂ f /∂z) = v · ∇

Thus, condition (d) is a generalization of this property to a general manifold.


(3) Suppose ∇a is an arbitrary derivative operator, then from condition (d) it is
easy to show that (exercise)

∇a f = (d f )a , ∀ f ∈ FM , (3.1.1)

where (d f )a is the abstract index expression for a dual vector field d f generated by
a function f [see (2.3.7)].
(4) In general relativity, a derivative operator also satisfies ∇a ∇b f = ∇b ∇a f ,
∀ f ∈ F M . From Definition 1 in Sect. 2.6 we can see that this is essentially the
abstract index expression for

(∇∇ f )(u, v) = (∇∇ f )(v, u) , ∀u, v ∈ F M (1, 0) ,


3.1 Derivative Operators 69

which means ∇∇ f is a symmetric tensor of type (0, 2). A derivative operator that
satisfies this additional condition is called a torsion-free derivative operator. Unless
stated otherwise, all ∇a in this text will stand for torsion-free derivative operators.
[Optional Reading 3.1.1]
This optional reading has the same spirit as Optional Reading 2.2.1. For the sake of concise-
ness, here we abbreviate a tensor field T b1 ···bk c1 ···cl as T .

Theorem 3.1.1 Suppose T1 , T2 ∈ F M (k, l) are equal in a neighborhood N of p ∈ M, i.e.,


T1 | N = T2 | N , then ∇a T1 | p = ∇a T2 | p .

Proof The proof is similar to that of Theorem 2.2.1, and should be carried out by the
reader. 

Remark 2 Suppose a tensor field T is only defined on a neighborhood U of p ∈ M (= U ),


i.e., T ∈ FU (k, l), T ∈/ F M (k, l). According to Definition 1, ∇a can only act on a tensor
field on M, and so ∇a T is meaningless. However, one can always find a T̄ ∈ F M (k, l) and
a neighborhood N ⊂ U of p such that T̄ | N = T | N , and thus one can define ∇a T as ∇a T̄ .
Although for the same T there are infinitely many T̄ that satisfies the requirement above,
Theorem 3.1.1 guarantees that ∇a T̄ are the same for all T̄ . Thus, it is legal to define ∇a T as
∇a T̄ . Therefore, we say that ∇a is a local operator, the action of which on T has a value at
p that only depends on the behavior of T on a neighborhood of p (no matter how “small”
it is). The reader may already be familiar with the similar property of the derivative of a
function in calculus.

[The End of Optional Reading 3.1.1]

For any manifold, there always exists a derivative operator that satisfies Defini-
tion 1 [see Theorem 1.1 in Chap. 4 of Chern et al. (1999)]. In fact, derivative operators
on a manifold not only exist, but also they are numerous. Now we will discuss how
many there can be. From (3.1.1) we know that two different derivative operators ∇a
and ∇˜ a acting on the same function give the same result, i.e.,

∇a f = ∇˜ a f = (d f )a , ∀ f ∈ FM . (3.1.2)

Thus, the difference between ∇a and ∇˜ a can only be manifested by the action on a
tensor field not of type (0, 0). First we discuss the action on a tensor field of type
(0, 1) (a dual vector field). Suppose a dual vector μb ∈ V p∗ is given at a point p ∈ M,
and consider two arbitrary dual vector fields ωb , ωb ∈ F M (0, 1) on M that satisfy
ωb | p = ωb | p = μb (ωb and ωb are called two extensions of μb on M). Suppose ∇a is
a derivative operator on M, then ∇a ωb | p and ∇a ωb | p are not the same in general. This
is similar to the fact that two functions f (x) and f (x) that have the same value at x0
[i.e., f (x0 ) = f (x0 )] are not assured to have (d f /dx)|x0 = (d f /dx)|x0 . However,
we are about to show that for any two derivative operators ∇a and ∇˜ a on M, as long
as ωb | p = ωb | p , we have

[(∇˜ a − ∇a )ωb ] p = [(∇˜ a − ∇a )ωb ] p ,

where (∇˜ a − ∇a )ωb is short for ∇˜ a ωb − ∇a ωb .


70 3 The Riemann (Intrinsic) Curvature Tensor

Theorem 3.1.2 Suppose p ∈ M and ωb , ωb ∈ F M (0, 1) satisfy ωb | p = ωb | p , then

[(∇˜ a − ∇a )ωb ] p = [(∇˜ a − ∇a )ωb ] p . (3.1.3)

Proof Alternately, this equation can be rearranged as

[∇a (ωb − ωb )] p = [∇˜ a (ωb − ωb )] p . (3.1.4)

Suppose b ≡ ωb − ωb . Choose a coordinate system {x μ } such that its coordinate


patch includes p, then ωb | p = ωb | p leads to μ ( p) = 0, where μ are the coordinate
components of b . Hence, at p we have

[∇a (ωb − ωb )] p = [∇a b ] p = {∇a [μ (dx μ )b ]}| p


= μ ( p)[∇a (dx μ )b ] p + [(dx μ )b ∇a μ ] p = [(dx μ )b ∇a μ ] p ,

Similarly we have [∇˜ a (ωb − ωb )] p = [(dx μ )b ∇˜ a μ ] p . From (3.1.2) we know that


[∇a μ ] p = [∇˜ a μ ] p , which completes the proof. 

Although [∇a ωb ] p and [∇˜ a ωb ] p depend on the value of ωb in a neighborhood of p,


Theorem 3.1.2 indicates that [(∇˜ a − ∇a )ωb ] p only depends on the value of ωb at
p. This means that (∇˜ a − ∇a ) is a linear map that turns ωb | p , a dual vector at p,
into [(∇˜ a − ∇a )ωb ] p , which implies that [(∇˜ a − ∇a )ωb ] p is a tensor of type (0, 2)
at p. (For a given dual vector μb at p, we choose any dual vector field ωb such that
ωb | p = μb , then [(∇˜ a − ∇a )ωb ] p is the image of μb under this linear map.) Therefore,
(∇˜ a − ∇a ) at p corresponds to a tensor C c ab of type (1, 2), which satisfies

[(∇˜ a − ∇a )ωb ] p = C c ab ωc | p . (3.1.5)

Since p is chosen arbitrarily, the difference between two derivative operators ∇a and
∇˜ a on M is manifested by a tensor C c ab of type (1, 2); that is:
Theorem 3.1.3

∇a ωb = ∇˜ a ωb − C c ab ωc , ∀ωb ∈ F (0, 1) . (3.1.6)

∇a being torsion free will give rise to the following symmetry of the tensor field
C c ab :
Theorem 3.1.4 C c ab = C c ba .

Proof Let ωb = ∇b f = ∇˜ b f [(3.1.2)] where f ∈ F M , then (3.1.6) will yield ∇a ∇b f


= ∇˜ a ∇˜ b f − C c ab ∇c f . Switching the indices a and b, we get ∇b ∇a f = ∇˜ b ∇˜ a f −
C c ba ∇c f . Subtracting these two equations and noticing the torsion-free condition,
we have C c ab ∇c f = C c ba ∇c f . Let T c ab ≡ C c ab − C c ba , then ∀ f ∈ F M we have
T c ab ∇c f = 0, and hence the components of T c ab in an arbitrary coordinate basis
3.1 Derivative Operators 71

are T σ μν = T c ab (dx σ )c (∂/∂ x μ )a (∂/∂ x ν )b = 0 [where the second step is because


T c ab (dx σ )c = T c ab ∇c x σ = 0 (regard x σ as f )]. Therefore, T c ab = 0. 

Theorem 3.1.5

∇a vb = ∇˜ a vb + C b ac vc ∀vb ∈ F M (1, 0) . (3.1.7)

Proof Suppose ωb is an arbitrary dual vector field on M, then

∇a (ωb vb ) = ωb ∇a vb + vb ∇a ωb = ωb ∇a vb + vb (∇˜ a ωb − C c ab ωc ) ,

where we used (3.1.6) in the last step. On the other hand, ∇˜ a (ωb vb ) = ωb ∇˜ a vb +
vb ∇˜ a ωb . Since ωb vb is a scalar field, it follows from (3.1.2) that ∇a (ωb vb ) =
∇˜ a (ωb vb ), and hence the right-hand sides of the two equations above are equal.
Therefore, we obtain

ωb ∇a vb = ωb ∇˜ a vb + C c ab vb ωc = ωb ∇˜ a vb + C b ac vc ωb , ∀ωb ∈ F M (0, 1) ,

and thus we have (3.1.7). 

By a similar analysis, one can also show that the difference between the result of
∇a and ∇˜ a acting on a tensor field T b1 ···bk c1 ···cl of type (k, l), i.e., ∇a T b1 ···bk c1 ···cl −
∇˜ a T b1 ···bk c1 ···cl , can be expressed in k + l terms, each of which has a C c ab . In front
of each term there is a + sign if it contracts with an upper index of T , and a − sign
if it contracts with a lower index of T ; for example,

∇a T b c = ∇˜ a T b c + C b ad T d c − C d ac T b d ,

and the general form is given in the following theorem:


Theorem 3.1.6
 
∇a T bl ···bk c1 ···cl = ∇˜ a T bl ···bk c1 ···cl + C bi ad T b1 ···d···bk c1 ···cl − C d ac j T b1 ···bk c1 ···d···cl
i j

∀T ∈ F M (k, l) .
(3.1.8)

Proof Exercise. 

Theorem 3.1.6 indicates that the difference between two arbitrary derivative oper-
ators is only manifested by a tensor field C c ab . Conversely, it is not difficult to verify
that given an arbitrary derivative operator ∇˜ a and a smooth tensor field C c ab with
symmetric lower indices, ∇a defined by (3.1.8) satisfies all of the conditions in
Definition 1, and thus this ∇a is also a derivative operator. Therefore, there exists
numerous derivative operators on a manifold as long as there is one. A manifold with
a chosen derivative operator can be denoted by (M, ∇a ), and this combination has
72 3 The Riemann (Intrinsic) Curvature Tensor

more structure than M itself (∇a provides additional structure); for instance, we can
now talk about the parallel transport of a vector along a curve (see Sect. 3.2) and the
curvature of (M, ∇a ) (see Sect. 3.4).
Suppose {x μ } is a coordinate system of M, the coordinate basis and dual basis of
which are {(∂/∂ x μ )a } and {(dx μ )a }. Define a map ∂a : F O (k, l) → F O (k, l + 1) on
the coordinate patch of O as follows [we only write down the case T b c ∈ F O (1, 1)
as an example]:
∂a T b c := (dx μ )a (∂/∂ x ν )b (dx σ )c ∂μ T ν σ , (3.1.9)

where T ν σ are the components of T b c in this coordinate system, and ∂μ is short for
∂/∂ x μ , the partial derivative with respect to a coordinate x μ . It is not difficult to verify
that ∂a satisfies all of the conditions in Definition 1 plus the torsion-free condition,
and thus ∂a is a torsion-free derivative operator on O. This is a derivative operator
that by definition depends on the coordinate system, and it is only defined in the
coordinate patch of this coordinate system, called the ordinary derivative operator
of this coordinate system. Equation (3.1.9) indicates that ∂μ T ν σ are the components
of ∂a T b c in this coordinate system, and therefore the definition of ∂a can also be
formulated as: the coordinate components of the ordinary derivative ∂a T bl ···bk c1 ···cl
of a tensor field T bl ···bk c1 ···cl are equal to ∂(T ν1 ···νk σ1 ···σl )/∂ x μ , the derivatives of the
coordinate components of this tensor field with respect to the coordinates. Thus, we
can easily see that:
(1) ∂a of any coordinate system acting on a coordinate basis vector and a dual
coordinate basis vector of this system yields zero, i.e.,

∂a (∂/∂ x ν )b = 0 , ∂a (dx μ )b = 0 . (3.1.10)

(2) ∂a satisfies a much stronger condition than the torsion-free condition, i.e.,

∂a ∂b T ··· ··· = ∂b ∂a T ··· ··· , or ∂[a ∂b] T ··· ··· = 0 ,

where T ··· ··· is a tensor field of any type.


Although ∂a can be viewed as a special case of ∇a , the definition of it depends on
the coordinate system. We call those ∇a that are independent of a coordinate system
(and any other externally imposed factor) covariant derivative operators, in which
∂a is not included.
Definition 2 Suppose ∂a is an ordinary derivative operator of a given coordinate
system on (M, ∇a ), then the tensor field C c ab that manifests the difference between
∇a and ∂a [regard ∂a as ∇˜ a in (3.1.6)] is called the Christoffel symbol of ∇a in this
coordinate system, denoted by c ab .
Remark 3 Normally, textbooks may emphasize that Christoffel symbols are not ten-
sors, while this text and some other books [e.g., Wald (1984)] call it a tensor instead.
There is no substantial conflict between them, it is just the subtle difference in the
definition of a Christoffel symbol. In the books that use the component index nota-
tion, a Christoffel symbol is defined as an array of numbers which do not obey the
3.1 Derivative Operators 73

tensor transformation law under a coordinate transformation, and hence do not con-
stitute a tensor. From the very beginning, however, we define a Christoffel symbol as
a tensor, which is a multilinear map, but since it corresponds to ∂a which depends on
the coordinate system, a Christoffel symbol is a tensor associated with the coordinate
system (the tensor itself will change under a coordinate transformation). Suppose ∇a
is a derivative operator assigned on M, {x μ } and {x μ } are two coordinate systems
on M, the intersection of their coordinate patches is U , and the Christoffel symbols
of ∇a in these two systems are c ab and ¯ c ab , respectively. As tensors, they can be
expressed as components (in U ) using the {x μ } system or the {x μ } system. Suppose
the components of c ab in the {x μ } and {x μ } systems are { σ μν } and { σ μν } (these
two arrays of numbers certainly satisfy the tensor transformation law), and the com-
ponents of ¯ c ab in the {x μ } and {x μ } systems are { ¯ σ μν } and { ¯ σ μν } (which also
satisfy the tensor transformation law); however, { σ μν } and { ¯ σ μν } do not satisfy
the tensor transformation law. Nevertheless, textbooks normally just define { σ μν }
and { ¯ σ μν } to be the Christoffel symbols in the coordinate systems {x μ } and {x μ },
respectively, and therefore there is no doubt that they do not constitute a tensor. It
is right for those books to emphasize “a Christoffel symbol is not a tensor”, but we
instead emphasize that “a Christoffel symbol is a tensor associated with a coordinate
system”. The reader may ask: why do you have to describe a Christoffel symbol as
a tensor? The answer is: as long as we use the abstract index notation and follow
the above reasoning (including the elegant argument from the “multifaceted view of
tensor”), surely we need to admit that C c ab is a tensor that reflects the difference
between ∇a and ∇˜ a . Under the premise that a derivative operator has been assigned
to M, for a given coordinate system there is a derivative ∂a , and if we regard ∂a as ∇˜ a ,
then C c ab (which is now denoted by c ab ) is, of course, a tensor. It would be a slap in
the face if we do not admit that c ab is a tensor. However, at the same time, we should
emphasize that c ab is a tensor associated with a coordinate system. (There are as
many ∂a , and thus as many c ab , as there are coordinate systems). This emphasis is
essentially the same as the emphasis in many books that say “a Christoffel symbol is
not a tensor”. They are just two ways of wording the same issue. What is important
is not how it is worded but the substance of it, i.e., we should keep in mind that it
does not satisfy the tensor transformation law between { σ μν } and { ¯ σ μν }.

Similarly, suppose vb is a vector field, then ∂a vb is also a tensor field associated


with the coordinate system. Expand ∂a vb in the coordinate system associated with
∂a :
∂a vb = (dx μ )a (∂/∂ x ν )b vν ,μ ,

where vν ,μ ≡ ∂μ vν ≡ ∂vν /∂ x μ (the comma stands for the partial derivative). Again,
textbooks often emphasize that vν ,μ does not constitute a tensor, while we say that
∂a vb is a tensor field associated with the coordinate system; they are also just two
ways of wording the same issue. More specifically, suppose ∂a and ∂a are the ordinary
derivative operators of two coordinate systems {x μ } and {x μ }, respectively, then
usually ∂a vb = ∂a vb (that is why ∂a vb is a tensor field associated with the coordinate
system). If we expand ∂a vb and ∂a vb in terms of their own coordinate basis:
74 3 The Riemann (Intrinsic) Curvature Tensor

∂a vb = (dx μ )a (∂/∂ x ν )b vν ,μ , ∂a vb = (dx μ )a (∂/∂ x ν )b v ν ,μ ,

where v ν ,μ ≡ ∂v ν /∂ x μ , then ∂a vb = ∂a vb makes it so the tensor transformation law


is not generally satisfied between vν ,μ and v ν ,μ (this can also be verified directly, see
Exercise 3.2). That is why textbooks usually say that vν ,μ is not a tensor. As for ∇a vb ,
it is a tensor independent of a coordinate system whose components in a coordinate
system are usually denoted by vν ;μ , i.e., ∇a vb = vν ;μ (dx μ )a (∂/∂ x ν )b . Since ∇a vb is
independent of a coordinate system, and vν ;μ satisfies the tensor transformation rule;
thus, textbooks usually say that it is a tensor (actually, the components of a tensor),
and call it the covariant derivative of vν (actually, the coordinate components of
the covariant derivative ∇a vb ). Similarly, ων;μ , the coordinate components of ∇a ωb ,
are also called the covariant derivative of ων .
Theorem 3.1.7

vν ;μ = vν ,μ + ν
μσ v
σ
, ων;μ = ων,μ − σ
μν ωσ , (3.1.11)

where vν and ων are the components of any vector field and dual vector field in an
arbitrary coordinate basis, ν μσ are the components of the Christoffel symbol of this
system in this basis. (Many books may say “ ν μσ is the Christoffel symbol of this
system”; later on, we will also say this for simplicity.)

Proof Exercise 3.3. 

Theorem 3.1.8 Condition (c) in Definition 1 is equivalent to

∇a δ b c = 0 , (3.1.12)

where δ b c is a tensor field of type (1, 1), whose definition at each point p ∈ M is
δ b c vc = vb , ∀vc ∈ V p .

Proof [Optional Reading]


(A) Suppose ∇a satisfies all of the conditions in Definition 1, we would like to show that it
satisfies (3.1.12). ∀v b ∈ F M (1, 0) we have

∇a v b = ∇a (δ b c v c ) = ∇a [C(δ b c v d )] = C[∇a (δ b c v d )]
= C(v d ∇a δ b c + δ b c ∇a v d ) = vc ∇a δ b c + δ b c ∇a v c = v c ∇a δ b c + ∇a v b ,

where C stands for the contraction of indices c and d; in the third equality we used condition
(c) and in the last step we used δ b c Ta c = Ta b ∀Ta c . The above equation indicates that
vc ∇a δ b c = 0 ∀v c ∈ F M (1, 0), and therefore ∇a δ b c = 0.
(B) Suppose ∇˜ a satisfies conditions (a), (b), (d) in Definition 1 and (3.1.12). We would like
to show that it also satisfies condition (c). For this propose, suppose ∇a satisfies all of the
conditions in Definition 1. Since the proof of Theorem 3.1.2 does not need condition (c),
(3.1.6) is satisfied. We cannot use Theorem 3.1.4 directly since the proof of it needs condition
(c); however, one can still prove it from the properties of ∇a and ∇˜ a (motivated readers may
try it as a challenging exercise), and therefore we have (3.1.8). From this, using the fact that
∇a satisfies (c) we can show that ∇˜ a satisfies condition (c). 
3.2 Derivative and Parallel Transport of a Vector Field Along a Curve 75

The commutator [u, v]a of two vector fields on M does not require M to have any
additional structure [see (2.2.9)]; however, the inconvenience of this equation is that
it cannot be isolated by the object it acts on (a scalar field f ). Now, after we have the
concept of a derivative operator, we can write the explicit expression of a commutator
of vector fields [u, v]a by means of an arbitrary torsion-free derivative operator, as
shown in the following theorem:
Theorem 3.1.9
[u, v]a = u b ∇b va − vb ∇b u a , (3.1.13)

where ∇b is an arbitrary torsion-free derivative operator.


Proof ∀ f ∈ F M we have

[u, v]( f ) = u(v( f )) − v(u( f )) = u b ∇b (va ∇a f ) − vb ∇b (u a ∇a f )


= u b (∇b va )∇a f + va u b ∇b ∇a f − vb (∇b u a )∇a f − u a vb ∇b ∇a f
= (u b ∇b va − vb ∇b u a )∇a f ,

where in the second step we used condition (d) of a derivative operator, in the third
step we used conditions (b) and (c), and in the fourth step we used the torsion-
free condition. Finally, using (d) again, namely [u, v]( f ) = [u, v]a ∇a f , we arrive at
(3.1.13). 
Remark 4 Choose the derivative operator ∂b of an arbitrary coordinate system {x μ }
as ∇b from (3.1.13), then we have

[u, v]μ = (dx μ )a [u, v]a = u ν ∂ν vμ − vν ∂ν u μ .

This is the claim in Exercise 2.10.

3.2 Derivative and Parallel Transport of a Vector Field


Along a Curve

3.2.1 Parallel Transport of a Vector Field Along a Curve

After a derivative operator is assigned to a manifold M, we can introduce the concept


of the parallel transport of a vector field along a curve.
Definition 1 Suppose va is a vector field along a curve C(t). va is said to be par-
allelly transported along C(t) if T b ∇b va = 0, where T a ≡ (∂/∂t)a is the tangent
vector field of the curve.
Just like T a ∇a f = T ( f ) can be interpreted as the derivative of f along T a [i.e., along
C(t)], T b ∇b va can be interpreted as the derivative of the vector field va along T a (see
76 3 The Riemann (Intrinsic) Curvature Tensor

Sect. 3.2.3 for details). Thus, Definition 1 can also be interpreted as: a necessary and
sufficient condition for va to be parallelly transported along C(t) is that the derivative
of it along T b vanishes.
Theorem 3.2.1 Suppose a curve C(t) is in the coordinate patch of a coordinate sys-
tem {x μ } and the parametric representation of the curve is x μ (t). Let T a ≡ (∂/∂t)a ,
then a vector va along C(t) satisfies

T b ∇b va = (∂/∂ x μ )a (dvμ /dt + μ


νσ T
ν σ
v ). (3.2.1)

Proof Let ∂a be the ordinary derivative operator of the coordinate system {x μ }, then
it follows from (3.1.7) that

T b ∇b va = T b (∂b va + a
bc v
c
) = T b [(dx ν )b (∂/∂ x μ )a ∂ν v μ + a
bc v
c
]
ν μ a μ ν
= T (∂/∂ x ) (∂v /∂ x ) + a
bc T v = (∂/∂ x ) [T (∂v /∂ x ν ) +
b c μ a ν μ μ
νσ T
ν σ
v ],
(3.2.2)

where T ν are the coordinate components of the tangent vector T b of that curve. From
(2.2.7) we know that T ν = dx ν (t)/dt, and hence

T ν (∂vμ /∂ x ν ) = [dx ν (t)/dt][∂vμ (t (x))/∂ x ν ] = dvν (t)/dt .

Plugging it into (3.2.2) we obtain (3.2.1). 


[Optional Reading 3.2.1]
There is one thing that we need to clarify. According to Optional Reading 3.1.1, ∀ p ∈ C(t),
to make ∇a va | p meaningful va needs to be defined at least in a neighborhood U of p.
Unfortunately, va is only defined on the curve C(t), while any neighborhood of p would
contain points that are not on C(t), and thus the ∇b va in (3.2.1) is actually meaningless!
Thankfully, ∇b va only shows up in the form of T b ∇b va in (3.2.1), and T b ∇b va does not
have this problem. The key point is that adding T b before ∇b va tells us to take the derivative
of va along the tangent direction of the curve, and therefore it only involves the value of va
on this curve. Now we will explain it precisely. Take the value of (3.2.2) at p ∈ C(t) we
have
T b ∇b va | p = (∂/∂ x μ )a | p [T ν (∂v μ (x)/∂ x ν ) + μ νσ T ν v σ ]| p . (3.2.3)
∂vμ (x)/∂ x ν in the square bracket is the derivative of a function v μ with respect to the
argument x ν . When taking the derivative, the tiny change x ν of x ν will take the point with
coordinates x ν away from p. Since x ν (ν = 1, . . . , n) is arbitrary, the point can move in a
neighborhood U of p, which contains points that are not on C(t) whose v μ are meaningless.
Thus, ∂vμ (x)/∂ x ν is essentially a meaningless quantity. To resolve this issue, one can define
a vector field v̄a (called the extension of va ) on the neighborhood U , which is required to be
equal to va only on U ∩ C(t). Now define T b ∇b va | p as T b ∇b v̄a | p , i.e.,

T b ∇b va | p ≡ T b ∇b v̄a | p = (∂/∂ x μ )a | p [T ν (∂ v̄ μ (x)/∂ x ν ) + μ


νσ T
ν σ
v̄ ]| p . (3.2.4)
Unlike ∂vμ (x)/∂ x ν , ∂ v̄μ (x)/∂ x ν has a precise meaning. However, each va has an infinite
number of extensions v̄a ; if the T b ∇b v̄a | p for different extensions are different, then it would
be meaningless to define T b ∇b va | p using (3.2.4). In fact, suppose v̄a and v̄ a are two different
extensions, then indeed we have ∂ v̄μ (x)/∂ x ν = ∂ v̄ μ (x)/∂ x ν . However, it would no longer
3.2 Derivative and Parallel Transport of a Vector Field Along a Curve 77

be a problem with T ν added in front of ∂ v̄ μ (x)/∂ x ν , since at p we have

T ν ∂ v̄μ (x)/∂ x ν = [dx ν (t)/dt][∂ v̄ μ (x)/∂ x ν ] = dv̄ μ (t)/dt


= dv̄ μ (t)/dt = T ν ∂ v̄ μ (x)/∂ x ν ,

where the key step (the third equality) is because v̄ μ (t) [the function of one variable that
comes from the combination of a vector v̄a on U and C(t)] is equal to v μ (t). In conclusion,
∇b va | p is meaningless, but T b ∇b va | p is meaningful.
[The End of Optional Reading 3.2.1]

Theorem 3.2.2 A point C(t0 ) on a curve and a vector at this point uniquely defines
a vector field that is parallelly transported along the curve.

Proof If there exists a coordinate system whose coordinate patch contains the whole
curve, then it follows from (3.2.1) that T b ∇b va = 0, the definition of parallel trans-
port, is equivalent to

dvμ μ ν σ
+ νσ T v = 0, μ = 1, . . . , n . (3.2.5)
dt

These are n first-order ordinary differential equations of n functions vν (t) to be solved


(note that both μ νσ and T ν are given functions of t), and by giving a vector at a
point C(t0 ) we are actually giving initial conditions vν (t0 ) to the equations, and thus
there will be a set of unique solutions vν (t). The readers can use the “relay method”
to generalize the above proof to the cases where the curve cannot be covered by one
coordinate patch. 

Suppose p, q ∈ M, then V p and Vq are two vector spaces, and their elements cannot
be compared. However, if there is a curve C(t) that connects p and q, we can define
a map from V p to Vq in the following way: ∀va ∈ V p , from Theorem 3.2.2 we know
that there is a unique parallelly transported vector field on C(t) (whose value at p
is va ), and its value at q can be defined as the image of va . Note that this is a map
that depends on the curve, which means va could be different for another curve that
connects p and q. However, after all, the existence of ∇a in some ways (although
is curve-dependent) connects two vector spaces V p and Vq that were completely
unrelated before. Therefore, ∇a is also called a connection.2
Beginners often raise questions like: why do we call ∇a a derivative operator?
In other words, why is this ∇a some kind of generalization of the familiar ∇  in 3-
dimensional Euclidean space on a general manifold? Why do we interpret T b ∇b va
as the derivative of va along T b ? Why do we call va that satisfies T b ∇b va = 0 a
vector field that is parallelly transported along the curve? In order to answer these
questions, we need Sect. 3.2.2 first.

2For the formal definition of a connection given in terms of the language of fiber bundles, see
Appendix I in Volume III.
78 3 The Riemann (Intrinsic) Curvature Tensor

3.2.2 The Derivative Operator Associated with a Metric

Until now, no metric has been involved in this chapter, rather, we only assumed a
connection (i.e., a derivative operator) ∇a is assigned to M. If a metric gab is also
assigned to M, then one can talk about the inner product between two vectors. To
make the concept of parallel transport agree with the familiar parallel transport in
Euclidean space, we should add the following requirement: suppose u a and va are
vector fields parallelly transported along C(t), then u a va (≡ gab u a vb ) is a constant on
C(t); that is, the “inner product” of two vectors is invariant under parallel transport.
Suppose T a is the tangent vector field of C(t), then this requirement is equivalent to

0 = T c ∇c (gab u a v b ) = gab u a T c ∇c v b + gab v b T c ∇c u a + u a v b T c ∇c gab = u a v b T c ∇c gab .

A necessary and sufficient condition for the above equation to hold for any curve and
any two vector fields that are parallelly transported along the curve is

∇c gab = 0 . (3.2.6)

When there is no metric, the choice of ∇c is very arbitrary. After a metric is assigned,
we may choose a ∇c that satisfies the additional requirement ∇c gab = 0. Now we
will prove that this requirement determines a unique ∇a .
Theorem 3.2.3 After assigning a metric gab to a manifold M, there exists a unique
∇a such that ∇a gbc = 0.
Proof Suppose ∇˜ a is an arbitrary derivative operator. We want an appropriate C c ab
such that the ∇a determined by it and ∇˜ a satisfies ∇a gbc = 0. From (3.1.8) we have

∇a gbc = ∇˜ a gbc − C d ab gdc − C d ac gbd = ∇˜ a gbc − Ccab − Cbac .

Hence, it follows from ∇a gbc = 0 that

Ccab + Cbac = ∇˜ a gbc , (3.2.7)

Similarly, we have

Ccba + Cabc = ∇˜ b gac , (3.2.8)


Cbca + Cacb = ∇˜ c gab . (3.2.9)

Adding (3.2.7) to (3.2.8) and subtracting (3.2.9), we obtain by using Ccab = Ccba
that

2Ccab = ∇˜ a gbc + ∇˜ b gac − ∇˜ c gab ,

or
3.2 Derivative and Parallel Transport of a Vector Field Along a Curve 79

1 cd ˜
C c ab = g (∇a gbd + ∇˜ b gad − ∇˜ d gab ) . (3.2.10)
2

The combination of this C c ab and ∇˜ a , namely ∇a , is then the solution to the equation
∇a gbc = 0. This must be the unique solution since if ∇a also satisfies ∇a gbc = 0,
treating ∇a as ∇˜ a we can see that C c ab vanishes, which means there is no difference
between ∇a and ∇a . 

The ∇a that satisfies ∇a gbc = 0 is called the derivative operator associated (or
compatible) with g bc . From now on, unless stated otherwise, when we talk about
∇a when there is a gab , we will choose it to be the derivative operator associated
with gab . It can be proved that (exercise) ∇a gbc = 0 assures that ∇a g bc = 0 (and
vice versa), which is exceptionally convenient for performing calculations.

Example 1 In Euclidean space, there exist infinitely many derivative operators that
satisfy Definition 1 in Sect. 3.1. However, there is only one derivative operator
that is associated with the Euclidean metric δab , namely the ordinary derivative
operator ∂a of a Cartesian coordinate system {x μ } (all Cartesian systems have
the same ∂a ), since it follows from the definition of δab (2.5.11) that ∂c δab =
(dx σ )c (dx μ )a (dx ν )b ∂σ δμν = 0. For the 3-dimensional Euclidean space, the ∂a of
a Cartesian coordinate system is the familiar ∇  in the standard vector field theory.

Suppose ∇a is associated with gbc , and choose the ∂a of an arbitrary coordinate


system as ∇˜ a , then the C c ab in (3.2.10) is the Christoffel symbol c ab of ∇a in
this coordinate system. From this equation it is not difficult to derive the following
expression for the components σ μν of c ab in this system:

σ 1 σρ
μν = g (gρμ,ν + gνρ,μ − gμν,ρ ) . (3.2.10 )
2

The derivation is as follows: regard ∂a as the ∇˜ a in (3.2.10), and then the C c ab in


the equation is c ab ; hence,

1 cd
c
ab = g (∂a gbd + ∂b gad − ∂d gab ) ,
2
σ
μν = c ab (dx σ )c (∂/∂ x μ )a (∂/∂ x ν )b
1
= (dx σ )c (∂/∂ x μ )a (∂/∂ x ν )b g cd (∂a gbd + ∂b gad − ∂d gab )
2
1 1
= g σρ (∂μ gνρ + ∂ν gμρ − ∂ρ gμν ) = g σρ (gρμ,ν + gνρ,μ − gμν,ρ ) .
2 2

[We used ∂a (∂/∂ x ν )b = 0 in the second-to-last equality.] Using the symmetry


σ σ
μν = νμ it is not difficult to see that this is exactly (3.2.10 ). By combining
this equation and (3.1.11), it is easy to derive the coordinate components vν ;μ and
ων;μ of covariant derivatives ∇a vb and ∇a ωb .
80 3 The Riemann (Intrinsic) Curvature Tensor

Remark 1 σ μν depends on both the ∇a assigned on M and the coordinate system.


If M has a metric gab , then ∇a will be referred to as the derivative operator associated
with gab except when otherwise stated, and “the Christoffel symbol of the coordinate
system” will refer to the Christoffel symbol of this ∇a in this coordinate system. For
instance, when talking about the Christoffel symbol of a coordinate system in the 3-
dimensional Euclidean space, we mean the Christoffel symbol of the ∇a associated
with the Euclidean metric (i.e., the ∂a of a Cartesian system) in this coordinate
system. The Christoffel symbol of the ∂a of a Cartesian system is obviously zero in
any Cartesian system. As an exercise (Exercise 3.7), the reader may derive all the
nonvanishing σ μν of ∂a in the spherical coordinate system using (3.2.10 ).

Suppose T is a vector field in the 3-dimensional Euclidean space, then T · ∇ f


is the component of ∇  f in the direction of T , i.e., the derivative of f along T . On
the other hand, it follows from condition (d) of a derivative operator that T a ∂a f =
T ( f ), and the right-hand side of it is exactly the derivative of f along T a . Thus,
T a ∂a f = T · ∇
 f . A further question is: what does T b ∂b va stand for? The answer is
that it stands for the derivative of va along T a . For the details, see Sect. 3.2.3.

3.2.3 Relationship Between the Derivative and Parallel


Transport of a Vector Field Along a Curve

First we talk about the simplest case, i.e., Euclidean space. There is one type of
special coordinate system (Cartesian system) in Euclidean space, using which we
can define the absolute (curve-independent) parallel transport of a vector.
Definition 2 A vector ṽ at p in Euclidean space is referred to as the result of a vector
v at q parallelly transported to p if their components in the same Cartesian system
are the same. (NB: The parallel transport for one Cartesian system is the parallel
transport for all Cartesian systems.)
Definition 3 In Euclidean space, the derivative of a vector field v on a curve C(t)
along the curve, denoted by dv/dt, is defined as

dv  1 
 := lim (ṽ| p − v| p ) ∀ p ∈ C(t) , (3.2.11)
dt p t→0 t

 p is the result of v|q parallelly transported to p (q is a neighboring point of


where ṽ|
p on the curve), and t ≡ t (q) − t ( p). Now we will show that dv /dt is T b ∂b va in
the abstract index notation [where T b is the tangent vector field of C(t), and ∂b is
the ordinary derivative operator of a Cartesian system], to do which we only have to
show that their components are the same in a Cartesian system {x i }:

dvi
the ith component of T b ∂b va = (dx i )a T b ∂b va = T b ∂b [(dx i )a va ] = T b ∂b vi = T (vi ) = .
dt
3.2 Derivative and Parallel Transport of a Vector Field Along a Curve 81

[where we used condition (d) of a derivative operator in the fourth equality, and the
definition of a tangent vector, (2.2.6 ), in the fifth equality.] On the other hand, from
(3.2.11) we can see that
 
dv  1 i 1 i dvi 
the ith component of = lim (ṽ | − v i
| ) = lim (v | − v i
| ) = ,
dt  p dt  p
p p q p
t→0 t t→0 t

[where we used Definition 2 in the second equality, and the third equality is nothing
but the definition of the derivative of a function vi (t).] Comparing the two equations
we can see that
dv
= T b ∂b v a . (3.2.12)
dt

Generalizing to any curve C(t) on any manifold M with any ∇a , we can naturally
call T b ∇b va the derivative of va along T b [or along C(t)]. Sometimes this derivative
is also denoted by Dva /dt, i.e.,

Dva
≡ T b ∇b v a . (3.2.13)
dt

However, Definition 2 cannot be generalized to an arbitrary manifold (M, ∇a ) with


an arbitrary connection. In Sect. 3.4 we will introduce the concept of the intrinsic
curvature of a manifold, and it will be pointed out in Sect. 3.5 that only a space
whose intrinsic curvature is zero has the concept of absolute (i.e., curve-independent)
parallel transport. However, seeing that dv /dt is equivalent to v parallelly transported
along C(t) in Euclidean space, we can naturally regard T b ∇b va = 0 as the definition
of the parallel transport of a vector field va on a curve C(t) in (M, ∇a ). This explains
the motivation of Definition 1 in Sect. 3.2.1 (although one should pay attention
that the parallel transport defined in this way usually depends on the curve). In this
manner, we first defined T b ∇b va , the derivative of va along the curve, then using
which defined the parallel transport of va along the curve (which is in the opposite
order to how we treat a Euclidean space). Since the derivative T b ∇b va of va along the
curve is kind of abstract, after having the curve-dependent notion of parallel transport,
it is beneficial to interpret T b ∇b va by aid of the term parallel transport. The essence
of this interpretation is actually the meaning of (3.2.11), see the following theorem.

Theorem 3.2.4 Suppose va is a vector field on a curve C(t) of (M, ∇a ), T b is the


tangent vector field of C(t), p and q are neighboring points on C(t) (see Fig. 3.1),
then
1 a
T b ∇b va | p = lim (ṽ | p − va | p ) , (3.2.14)
t→0 t

where t ≡ t (q) − t ( p), and ṽa | p is the result of va |q parallelly transported to p.


82 3 The Riemann (Intrinsic) Curvature Tensor

Fig. 3.1 Parallelly


transporting va |q along the
curve to p yields ṽa | p , we
can subtract va | p from it and
define the derivative along
the curve

Fig. 3.2 ψs,t maps va (s) to


ṽa (t)

Proof [Optional Reading]


All we have to prove is the following equivalent statement:

d 
T b ∇b va |t = [ψs,t v(s)]a  , (3.2.15)
ds s=t

where T b ∇b va |t and va (s) are short for T b ∇b va |C(t) and va (C(s)), respectively, and ψs,t is
the translation map from vector space VC(s) to VC(t) (see Fig. 3.2). It is not difficult to show
that (Exercise 3.8) ψs,t : VC(s) → VC(t) is an isomorphism.
Suppose ṽa is a vector field parallelly transported along C(t) that is determined by va (s),
then
ṽa (t) = [ψs,t v(s)]a , (3.2.16)
T b ∇b ṽa = 0 . (3.2.17)
The coordinate component expression for (3.2.16) is

ṽ μ (t) = (ψs,t )μ ν v ν (s) , (3.2.16 )



where (ψs,t ν are the elements of the matrix (ψs,t ). The coordinate component expression
for (3.2.17) is
dṽμ (t)
+ μ νσ T σ ṽν = 0 .
dt
Using (3.2.16 ) this equation can also be written as
d
[(ψs,t )μ ν v ν (s)] + μ
νσ T
σ
(ψs,t )ν ρ v ρ (s) = 0 .
dt
Applying the above equation to t = s, and noticing that ψs,s is the identity map, we obtain

d 
[(ψs,t )μ ν ] = −( μ νσ T σ )|s . (3.2.18)
dt t=s

On the other hand, by definition, ψt,s is the inverse map of ψs,t , i.e., (ψs,t )μ ρ (ψt,s )ρ ν = δ μ ν ,
and hence
3.3 Geodesics 83
   
d(ψs,t )μ ρ  d(ψt,s )ρ ν 
0= (ψt,s )ρ ν  + (ψs,t )μ ρ 
ds s=t ds s=t
 
d(ψs,t )μ ν  d(ψt,s )μ ν 
=  +  . (3.2.19)
ds s=t ds s=t

Now let us prove (3.2.15). The μth component of the right-hand side of this equation is
 
d  d 
[ψs,t v(s)]μ  = [(ψs,t )μ ν v ν (s)]
ds s=t ds s=t
 
d   dvν (s) 
= μ 
(ψs,t ) ν  v (t) + (ψs,t )μ ν s=t
ν 
ds s=t ds s=t
 
d  dvν (s) 
= − (ψt,s )μ ν  v ν (t) + δ μ ν
ds s=t ds s=t

dv (s) 
μ
= ( μ νσ T σ )|t v ν (t) +
ds s=t

μ σ ν dv (s) 
μ
= ( νσ T v )|t + = (T b ∇b va )μ |t ,
ds s=t

where we used (3.2.19) in the third step and (3.2.18) in the fourth step. The right-hand side
of the above equation is the μth component of the right-hand side of (3.2.15), and (3.2.14)
is therefore proved. 

3.3 Geodesics

Definition 1 A curve γ (t) on (M, ∇a ) is called a geodesic if its tangent vector field
T a satisfies T b ∇b T a = 0.

Remark 1 ① We can see that a necessary and sufficient condition for a curve to be
a geodesic is that its tangent vector field is parallelly transported along the curve. ②
T b ∇b T a = 0 is called a geodesic equation. ③ Suppose there is a metric field gab
on a manifold M, then the geodesics of (M, gab ) refer to the geodesics of (M, ∇a ),
where ∇a is associated with gab .
Suppose a geodesic γ (t) is located in the coordinate patch of a coordinate system,
then substituting T a for the va in (3.2.5) yields

dT μ μ ν
+ νσ T Tσ = 0, μ = 1, . . . , n .
dt

Suppose x ν = x ν (t) are the parametric equations of γ (t), then T μ = dx μ /dt. Hence,
the equation above can be rewritten as

d2 x μ μ dx ν dx σ
+ νσ = 0, μ = 1, . . . , n . (3.3.1)
dt 2 dt dt
This is the coordinate component expression for a geodesic equation.
84 3 The Riemann (Intrinsic) Curvature Tensor

Example 1 The Christoffel symbol of the Euclidean (Minkowski) metric in a Carte-


sian (Lorentzian) system vanishes, and the general solution to the geodesic equation
(3.3.1) is x μ (t) = a μ t + bμ (where a μ , bμ are constants). If we call the curve in
Euclidean (Minkowski) space with parametric equations x μ (t) = a μ t + bμ a straight
line (segment), then a geodesic in Euclidean (Minkowski) space is synonymous with
a straight line (segment). Thus, a geodesic can be viewed as the generalization of the
concept of a straight line in Euclidean space to a generalized Riemannian space.

Example 2 Suppose S 2 is a 2-dimensional sphere in a 3-dimensional Euclidean


space. Set up a spherical coordinate system that is centered at the origin, then the
3-dimensional Euclidean line element is ds 2 = dr 2 + r 2 (dθ 2 + sin2 θ dφ 2 ). If the
line element is lying on S 2 then r = R (the radius of the sphere) leads to dr = 0,
and hence the “induced line element” (called the standard spherical line element) is
dŝ 2 = R 2 (dθ 2 + sin2 θ dφ 2 ). That is, the 3-dimensional Euclidean metric δab induces
a 2-dimensional metric gab on a sphere S 2 , whose components in the coordinate basis
{(∂/∂θ )a , (∂/∂φ)a } are gθθ = R 2 , gφφ = R 2 sin2 θ , gθφ = gφθ = 0. It can be proved
from (3.1.1) that, when measured by this metric, a curve on the sphere is a geodesic
if and only if it is a great circle (with an appropriate parametrization).

Theorem 3.3.1 Suppose γ (t) is a geodesic, then the tangent vector field T a
of its
reparametrization γ (t ) [= γ (t)] satisfies

T b ∇b T a
= αT a
[α is a function defined on γ (t)] . (3.3.2)

Proof
 a  a
∂ ∂
dt dt a
T = a
= = T ,
∂t ∂t
dt dt
   2  
dt b dt a dt dt dt
0 = T b ∇b T a = T ∇b T = T b ∇b T a + T a T b ∇b
dt dt dt dt dt
 2    2 2
dt dt d dt dt d t
= T b ∇b T a + T a = T b ∇b T a + T a 2 ,
dt dt dt dt dt dt
 2  2
d2 t d2 t
and hence T b ∇b T a
=− dt
dt dt 2
T a . Set α ≡ − dt
dt dt 2
, then (3.3.2) is satis-
fied. 

Theorem 3.3.2 Suppose the tangent vector field T a of a curve γ (t) satisfies
T b ∇b T a = αT a [α is a function on γ (t)], then there exists a t = t (t) such that
γ (t ) [= γ (t)] is a geodesic.

Proof Exercise 3.9. 

Definition 2 A parameter which makes a curve become a geodesic is called an affine


parameter of this curve.
3.3 Geodesics 85

Remark 2 Sometimes a curve that satisfies T b ∇b T a = αT a is also called a geodesic.


Nonetheless, in order to avoid confusion, a better way to call it is a “non-affinely
parametrized geodesic”.

Theorem 3.3.3 If t is an affine parameter of a geodesic, then the necessary and


sufficient condition of any parameter t of this curve to be an affine parameter is
t = at + b (where a, b are constants and a = 0).

Proof Exercise 3.9. 

Theorem 3.3.4 A point p of a manifold (M, ∇a ) with connection and a vector va


at p determines the unique geodesic γ (t) that satisfies
(1) γ (0) = p;
(2) the tangent vector of γ (t) at p is equal to va .

Proof Choose an arbitrary coordinate system {x μ } whose coordinate patch contains


p, then the definition of a geodesic is equivalent to (3.3.1). Consider this equation as
n second-order ordinary differential equations with respect to n unknown functions
x μ (t), then giving p ∈ M and va ∈ V p is giving the initial conditions x μ (0) = x μ | p
and (dx μ /dt)|0 = vμ , and hence there is a unique solution. 

Remark 3 Similar to Theorem 2.2.8, the word “unique” in Theorem 3.3.4 should
also be understood as “locally unique”.

The discussions above do not involve a metric. From now on, we will suppose there
is a metric field gab on M. Since a tangent vector T a is parallelly transported along
a geodesic, and since the self “inner product” gab T a T b of a parallelly transported
vector is a constant, the sign of gab T a T b does not change along the geodesic, which
indicates that geodesics can always be classified as three types: timelike, spacelike
and null (there is no “outlandish” geodesic that can turn from one type into another).

Theorem 3.3.5 The arc length parameter of a (nonnull) geodesic is an affine param-
eter.

Proof Exercise 3.9. Hint: First show that a tangent vector of an affinely parametrized
geodesic has a constant magnitude along the curve. 

As we all know, a straight line (segment) is the shortest path between two points in
Euclidean space. Now we will discuss to what extent this conclusion can be applied
to a manifold with a Lorentzian metric (a spacetime).

Theorem 3.3.6 Suppose gab is a Lorentzian metric field on a manifold M, and


p, q ∈ M, then a smooth spacelike (timelike) curve between p and q is a geodesic
if and only if it extremizes the arc length of the curve.

Remark 4 ① This theorem also holds for any case where gab is positive definite [in
this case the modifier “spacelike (timelike)” is omitted]. ② The meaning of extrem-
izing the arc length is as follows: suppose C is a spacelike (timelike) curve between
86 3 The Riemann (Intrinsic) Curvature Tensor

p and q, then one can add a small modification to it and obtain many spacelike
(timelike) curves that are “infinitely close” to C. Theorem 3.3.6 claims that, a nec-
essary and sufficient condition for a curve C to be a geodesic is that the length of the
curve is an extremum among the lengths of all possible spacelike (timelike) curves.
The condition for a function f (x) of one variable to take an extremum is that its
first order derivative is zero. However, the “argument” corresponds to the length l
(which can be seen as the “function value”) in Theorem 3.3.6 is not a real number
but a curve. Here we are concerned about the change of l when a curve turns into
another curve, and thus l is not a function but a functional. According to the theory
of variations, the necessary and sufficient condition for l to be extremized is that its
variation δl vanishes.

Proof [Optional Reading]


We will give a proof by means of a coordinate system. Suppose C(t) is a curve, x μ (t) are the
parametric equations of it in a coordinate system, p ≡ C(t1 ) and q ≡ C(t2 ), then it follows
from (2.5.6) that the arc length from p to q can be expressed using the coordinate language
as
t2  
dx μ dx ν 1/2
l= gμν dt . (3.3.3)
t1 dt dt
[We assume that C(t) is a spacelike curve. If C(t) is timelike, then a minus sign should
be added in the parentheses of this equation, which does not affect the result.] Suppose
C (t) is an “infinitely close” spacelike curve whose parametrization x μ (t) satisfy x μ (t1 ) =
x μ (t1 ), x μ (t2 ) = x μ (t2 ), and the variation δx μ (t) ≡ x μ (t) − x μ (t) is “infinitesimal”. This
variation causes a small change in gμν and the components of the tangent dx μ /dt as

∂gμν σ
δgμν ≡ gμν [x σ (t) + δx σ (t)] − gμν [x σ (t)] = δx (t)
∂xσ
and  
dx μ d(x μ + δx μ ) dx μ d(δx μ )
δ ≡ − = ,
dt dt dt dt
which, through (3.3.3), give rise to the following variation of l:
 −1/2
1 t2 dx μ dx ν
δl = gμν
2 t1 dt dt
 ν
dx μ d dx ν d ∂gμν μ
σ dx dx
× gμν (δx ν ) + gμν (δx μ ) + (δx ) dt.
dt dt dt dt ∂xσ dt dt

Since arc length is independent of the parametrization of a curve, one can choose the most
convenient parameter for the calculation. Theorem 3.3.5 indicates that no matter what the
old parameter is (denoted by t˜ for now), we can always choose a new parameter t = t (t˜)
such that
μ
the length of the tangent vector at each point of the curve is normalized, i.e.,
ν
gμν dxdt dxdt = 1 (namely the arc length parameter). Also, noticing the symmetry of gμν , the
equation above can then be simplified as
3.3 Geodesics 87
 ν
t2 dx μ d 1 ∂gμν μ
σ dx dx
δl = gμν (δx ν ) + (δx ) dt
t1 dt dt 2 ∂xσ dt dt
     ν
t2 d dx μ ν d dx μ 1 ∂gμν μ
σ dx dx
= gμν δx − gμν δx ν + (δx ) dt
t1 dt dt dt dt 2 ∂xσ dt dt
   
t2 d dx μ 1 ∂gμν dx μ dx ν
= − gμσ + (δx σ )dt ,
t1 dt dt 2 ∂ x σ dt dt

where in the last step we used the premise that δx σ vanishes at C(t1 ) and C(t2 ). The equation
above indicates that the necessary and sufficient condition for δl to vanish for any δx σ is
that
 
d dx μ 1 ∂gμν dx μ dx ν
0=− gμσ +
dt dt 2 ∂ x σ dt dt
d2 x μ ∂gμσ dx ν dx μ 1 ∂gμν dx μ dx ν
= − gμσ − ν
+ .
dt 2 ∂ x dt dt 2 ∂ x σ dt dt
Contracting this equation with g ρσ yields

d2 x ρ ρσ 1 dx μ dx ν
0=− − g (gμσ,ν − g μν,σ )
dt 2 2 dt dt
d2 x ρ 1 ρσ dx μ dx ν
=− 2
− g (gσ μ,ν + gνσ,μ − gμν,σ )
dt 2 dt dt
d2 x ρ dx μ dx ν
=− − ρ μν .
dt 2 dt dt
This is exactly the coordinate expression for the geodesic equation (3.3.1).
As a problem
μ ν
for thinking, the reader may consider what result it leads to if we do not set
gμν dxdt dxdt = 1. 

An extremum of a function of one variable can be either a minimum [sufficient


condition is f (x) = 0, f (x) > 0], a maximum [sufficient condition is f (x) =
0, f (x) < 0] or neither of these [necessary condition is f (x) = 0, f (x) = 0].
Similar to this, the extremum of arc length also has the above three possibilities,
which we shall discuss below.
First, we discuss the case where gab is positive definite. Given an arbitrary curve
between p and q, one can always modify it a little and obtain a curve with greater
length, and hence there is no maximum length of a curve between p and q. Suppose
C is a curve between p and q with minimum length, then it follows Theorem 3.3.6
that it must be a geodesic. However, the length of a geodesic between p and q is not
necessarily minimum since an extremum can be neither a minimum nor a maximum.
For instance, Fig. 3.3 represents a sphere, γ1 and γ2 are two geodesics from the
south pole s to the north pole n that are very close to each other, and γ is another
geodesic. Although the curve sand is a geodesic between s and d, its length is not a
minimum. The point is that there is a north pole n on the curve, which is “conjugate”
to the south pole s; that is, there exists a geodesic γ2 from s to n that is “infinitely
close” to γ1 (for the precise definition of a pair of “conjugate points”, see Optional
Reading 7.6.3). It can be proved that the necessary and sufficient condition for the
length of a geodesic to be minimum is that there is no pair of conjugate points on
88 3 The Riemann (Intrinsic) Curvature Tensor

Fig. 3.3 The length of a


geodesic sand is not a
minimum

Fig. 3.4 Given a timelike


curve C between p and q,
one can always find a nearby
timelike curve C shorter
than it

the curve. There is certainly no conjugate points in Euclidean space, and therefore a
straight line (segment) is the shortest between two points.
And then we discuss the case where gab is a Lorentzian metric. We first look at
Minkowski spacetime as the simplest example. We have said that straight lines and
geodesics are synonymous in Minkowski spacetime. Suppose p and q are connected
by a timelike geodesic γ . Is it the shortest curve between p and q? No. Since the
length of a null curve is zero, any timelike curve C is not the shortest. One can always
modify it slightly and make it a timelike curve C that is close enough to null whose
length is less than C (see Fig. 3.4). In fact, not only is a timelike geodesic γ not
the shortest, but it is also the longest curve between p and q. Here we show it in
2-dimensional Minkowski spacetime as an example (it can be easily carried over to
an arbitrary dimensional Minkowski spacetime). Since the parametric representation
x μ (t) of γ are linear functions, by performing a translation and a boost [(2.5.19),
(2.5.20)] of the Lorentzian coordinates, we can choose a Lorentzian system {x 0 , x 1 }
that can make the coordinate line of x 0 coincide with γ . Suppose C is an arbitrary
timelike non-geodesic between p and q, we can use a lot of constant-x 0 lines to divide
γ into many line segments (see Fig. 3.5). From the expression for a Minkowski line
element we can see that the arc length of the line segments pa and pb, respectively,
are

dl pa = −ds 2 = −[−(dx 0 )2 + 0] = dx 0 ,
dl pb = −[−(dx 0 )2 + (dx 1 )2 ] < dx 0 = dl pa .
3.3 Geodesics 89

Fig. 3.5 A geodesic γ is the


longest timelike curve
between p and q

This result can also be applied to any other line segment, and thus lγ > lC , i.e., a
timelike geodesic is the longest timelike curve between two points in Minkowski
spacetime. In other words, a (timelike) straight line (segment) is the longest between
two points in Minkowski spacetime. And since the longest curve must be a geodesic,
the necessary and sufficient condition for a timelike curve between two points in a
Minkowski spacetime to be the longest is that it is a geodesic. Now let us talk about
a general spacetime. Suppose C is the timelike curve between p and q that has the
greatest length, then it follows from Theorem 3.3.6 that it is a geodesic. However,
the converse is not necessarily true, because Theorem 3.3.6 only assures that the
length of a geodesic between p and q is an extremum, but does not guarantee that it
is a maximum. (Of course, it is definitely not a minimum either since the length of
a null curve is zero.) It can be proved that the necessary and sufficient condition for
the length of a geodesic in an arbitrary spacetime to be a maximum is that there is
no pair of conjugate points on the curve. Summary: for two points that are timelike
related in any spacetime: ① the longest curve between them is a timelike geodesic;
② a timelike geodesic between them is not necessarily the longest curve (though for
Minkowski spacetime it certainly is); ③ there is no shortest timelike curve between
them.
[Optional Reading 3.3.1]
Using geodesics we can define two useful concepts; namely, the exponential map of a gen-
eralized Riemannian space (M, gab ) and Riemannian normal coordinates.
The exponential map of p ∈ M is a map from V p (or a subset of it) to a manifold M,
denoted by
exp p : V p (or a subset of it) → M ,
defined as follows: ∀va ∈ V p , ( p, va ) determines a unique geodesic γ (t). If we set the affine
parameter t as zero at p, then the image of va under the map exp p is defined as the point
with t = 1 on the geodesic, i.e., exp p (va ) := γ (1). Suppose 0 is the zero element of V p .
Since the unique geodesic determined by ( p, 0) maps all the points of R (or an interval of
it) to p, we have exp p (0) = p. However, if we remove the point γ (1) from M, i.e., we use
M − {γ (1)} as the background manifold (see Fig. 3.6), then va has no image under the map
exp p . Therefore, the domain of the exponential map can only be a subset of V p , denoted by
90 3 The Riemann (Intrinsic) Curvature Tensor

Fig. 3.6 Removing a point


γ (1), then va has no image
under exp p

Fig. 3.7 Two geodesics


determined by ( p, va ) and
( p, v a ) intersect at q.
Choosing the magnitude of
va and v a such that
q = γ (1) = γ (1), we can
see that exp p is not
one-to-one

Fig. 3.8 exp p : V̂ p → N is


a diffeomorphism

V̂ p , i.e., exp p : V̂ p → M. Figure 3.7 indicates that two geodesics γ (t) and γ (t) determined
by ( p, va ) and ( p, v a ) intersect at q. Choosing the magnitude of va and v a appropriately,
one can make q = γ (1) = γ (1), so that

q = exp p (va ) = exp p (v a ) .

Thus, in this case exp p is not a one-to-one map. Since we have removed a point as shown in
Fig. 3.6, there is no u a ∈ V p for q such that q = exp p (u a ); thus, in this case exp p is not an
onto map. However, it can be proved that as long as we add proper constraints on the domain
and the range of exp p , it will be not only one-to-one and onto, but also a diffeomorphism.
See the following theorem:

Theorem 3.3.7 ∀ p ∈ M, one can always find an open subset V̂ p that contains the zero
element in the tangent space V p of p (regarded as an n dimensional manifold), and find an
open subset N of M that contains p such that exp p : V̂ p → N is a diffeomorphism (see
Fig. 3.8).

Proof See Hawking and Ellis (1973) pp. 33–34. 

Definition 3 A neighborhood N of p ∈ M is called a normal neighborhood of p if V p


has an open subset V̂ p such that exp p : V̂ p → N is a diffeomorphism.
3.3 Geodesics 91

Fig. 3.9 Figure of


Theorem 3.3.8

Using this diffeomorphism exp p : V̂ p → N we can define coordinates inside N : choose


an arbitrary basis {(eμ )a } of V p , and define the n components of va ≡ exp−1
p (q) ∈ V̂ p , the
inverse image of q ∈ N under exp p , as the n coordinates of q. The coordinate system defined
in this way is called the Riemannian normal coordinate system, whose coordinate patch
is N .

Theorem 3.3.8 Suppose (N , ψ) is a Riemannian normal coordinate system of a point p,


then the image of each geodesic γ (t) in N that passes through p under the map ψ, denoted
by ψ(γ (t)), is a straight line in Rn that passes through the origin (see Fig. 3.9).

Proof Without loss of generality, we may consider p = γ (0). Denote v1a ≡ (∂/∂t)a | p , q1 ≡
γ (1), then q1 = exp p (v1a ). Suppose q is an arbitrary point on γ (t), q ≡ γ (tq ). Performing
a reparametrization to γ (t) by choosing a new parameter t = α −1 t (α = constant) yields
a geodesic γ (t ) = γ (t). Choosing an appropriate constant α we can make γ (1) = q, and
hence q = exp p (va ), where

va ≡ (∂/∂t )a | p = [(∂/∂t)a dt/dt ] p = αv1a .

Therefore, the Riemannian normal coordinate values of q are


μ
x μ (q) = vμ = αv1 , (3.3.4)
μ
where v μ and v1 are the components of va and v1a in a selected basis. γ (1) = q indi-
cates that the new parameter at q has the value tq = 1, and together with the fact that
μ
tq = α −1 tq due to t = α −1 t, we get α = tq . Hence, (3.3.4) becomes x μ (q) = tq v1 , which
μ μ
can also be expressed as x (tq ) = v1 tq . Since q is an arbitrary point of γ (t), dropping
μ μ
the lower index q we obtain x μ (t) = v1 t; noticing that v1 = constant, we can see that the
μ μ
curve ψ(γ (t)) with parametric equations x (t) = v1 t is a straight line in Rn that passes
through the origin. 

Theorem 3.3.9 The Christoffel symbol of the connection ∇a of (M, gab ) in the Riemannian
normal coordinate system at p satisfies c ab | p = 0.

Proof Any geodesic γ (t) that passes through p can be expressed using the Riemannian
normal coordinate system at p as

d2 x μ μ dx ν dx σ
+ νσ = 0, μ = 1, . . . , n .
dt 2 dt dt
Since a Riemannian normal coordinate system (N , ψ) maps γ (t) into a straight line in Rn ,
we have d2 x μ /dt 2 = 0. Thus,

μ dx ν dx σ
νσ = 0, μ = 1, . . . , n .
dt dt
92 3 The Riemann (Intrinsic) Curvature Tensor

Any geodesic γ (t) that passes through p can be expressed by the above equation. Using T a
to represent the tangent vector of the geodesic at p, then the above equation gives
μ ν
νσ | p T Tσ = 0, μ = 1, . . . , n .

For each μ, the left-hand side of the above equation is a quadratic polynomial with respect
to n variables T ν , and the fact that it vanishes for any T ν renders all the coefficients being
zero, i.e., μ νσ | p = 0, ν, σ = 1, . . . , n, and therefore a bc | p = 0. 

[The End of Optional Reading 3.3.1]

3.4 The Riemann Curvature Tensor

3.4.1 Definition and Properties of the Riemann Curvature

A derivative operator being torsion free assures that (∇a ∇b − ∇b ∇a ) f = 0, i.e.,


∇a ∇b f is a symmetric tensor of type (0, 2). Refer to the operator ∇a ∇b − ∇b ∇a
as the commutator of the derivative operator ∇a , then ∇a being torsion free is
manifested by the fact that the action of its commutator on a function yields zero.
However, the commutator of a torsion-free derivative operator acting on a tensor field
of another type does not necessarily yield zero, and the Riemann curvature tensor is
exactly the manifestation of this non-commutativity.
Theorem 3.4.1 Suppose f ∈ F , ωa ∈ F (0, 1), then

(∇a ∇b − ∇b ∇a )( f ωc ) = f (∇a ∇b − ∇b ∇a )ωc . (3.4.1)

Proof Expand ∇a ∇b ( f ωc ) and ∇b ∇a ( f ωc ), respectively, into 4 terms and subtract


them. Noticing the torsion-free condition, we get (3.4.1). 

Theorem 3.4.2 Suppose ωc , ωc ∈ F (0, 1) and ωc | p = ωc | p , then

[(∇a ∇b − ∇b ∇a )ωc ]| p = [(∇a ∇b − ∇b ∇a )ωc ]| p . (3.4.2)

Proof Exercise 3.11. Hint: use Theorem 3.4.1. 

Theorem 3.4.2 indicates that (∇a ∇b − ∇b ∇a ) is a linear map that turns a dual
vector ωc | p at p into a tensor [(∇a ∇b − ∇b ∇a )ωc ]| p of type (0, 3). The way of doing
this is: extend ωc | p arbitrarily into a dual vector field ωc defined on a neighborhood
of p, evaluate (∇a ∇b − ∇b ∇a )ωc , and then taking the value of it at p we obtain
the image of the map. Theorem 3.4.2 assures that this image does not depend on
the choice of extension. Therefore, (∇a ∇b − ∇b ∇a ) corresponds to a tensor of type
(1, 3) at p, called the Riemann curvature tensor, denoted by Rabc d . Since p is
arbitrary, Rabc d is also a tensor field. Hence, we have:
3.4 The Riemann Curvature Tensor 93

Definition 1 The Riemann curvature tensor field Rabc d of a derivative operator


∇a is defined by the following equation

(∇a ∇b − ∇b ∇a )ωc = Rabc d ωd , ∀ωc ∈ F (0, 1) . (3.4.3)

The Riemann tensor field reflects the non-commutativity of a derivative operator,


and it is a tensor field that describes the intrinsic properties of (M, ∇a ). As long as
we choose a derivative operator we can talk about its Riemann tensor. Of course, we
can also talk about the Riemann tensor of a generalized Riemannian space (M, gab ),
also called the Riemann tensor of gab , which is referred to as the Riemann tensor
field of the derivative operator ∇a associated with gab . A metric whose Riemann
tensor field vanishes is called a flat metric. Now we will show that Euclidean and
Minkowski metrics are both flat metrics.
Theorem 3.4.3 The Riemann curvature tensor field of the Euclidean space (Rn , δab )
and Minkowski space (Rn , ηab ) are both zero.

Proof In Euclidean (Minkowski) space, the ordinary derivative operator ∂a of any


Cartesian (Lorentzian) system is the specific derivative operator associated with δab .
Since

(∂a ∂b − ∂b ∂a )ωc = (dx μ )a (dx ν )b (dx σ )c (∂μ ∂ν ωσ − ∂ν ∂μ ωσ ) = 0 , ∀ωc ,

the Rabc d of ∂a vanishes. 

Therefore, Euclidean space and Minkowski space are called flat spaces. In fact,
Minkowski space is similar to Euclidean space in many ways, and thus is also called
a pseudo-Euclidean space.
Equation (3.4.3) reflects the non-commutativity of a derivative operator acting on
a dual vector field. From this we can deduce the non-commutativity of a derivative
operator acting on a tensor field of an arbitrary type T c1 ···ck d1 ···dl , i.e., express (∇a ∇b −
∇b ∇a )T c1 ···ck d1 ···dl in terms of Rabc d . We have the following theorems:
Theorem 3.4.4

(∇a ∇b − ∇b ∇a )vc = −Rabd c vd ∀vc ∈ F (1, 0) . (3.4.4)

Proof ∀ωc ∈ F (0, 1), we have vc ωc ∈ F ; hence, it follows from the torsion-free
condition that

0 = (∇a ∇b − ∇b ∇a )(vc ωc ) = ∇a (vc ∇b ωc + ωc ∇b vc ) − ∇b (vc ∇a ωc + ωc ∇a vc )


= vc ∇a ∇b ωc + ωc ∇a ∇b vc − vc ∇b ∇a ωc − ωc ∇b ∇a vc .

Thus, ωc (∇a ∇b − ∇b ∇a )vc = −vc (∇a ∇b − ∇b ∇a )ωc = −vc Rabc d ωd = −ωc Rabd c
vd , and therefore we get (3.4.4). 
94 3 The Riemann (Intrinsic) Curvature Tensor

Theorem 3.4.5 ∀T c1 ···ck d1 ···dl ∈ F (k, l) we have


k 
l
(∇a ∇b − ∇b ∇a )T c1 ···ck d1 ···dl = − Rabe ci T c1 ···e···ck d1 ···dl + Rabd j e T c1 ···ck d1 ···e···dl .
i=1 j=1
(3.4.5)
Proof Omitted. 
Theorem 3.4.6 A Riemann curvature tensor has the following properties [NB: (1)
and (4) are general, (2), (3) and (5) require the torsion-free condition] :

(1) Rabc d = −Rbac d ; (3.4.6)


(2) R[abc] d = 0 [cyclic identity] ; (3.4.7)
(3) ∇[a Rbc]d e = 0 [Bianchi identity, published by L. Bianchi in 1902] ;
(3.4.8)

if there is a metric field gab on M and ∇a gbc = 0, then we can define Rabcd ≡
gde Rabc e , which also satisfies

(4) Rabcd = −Rabdc ;


(3.4.9)
(5) Rabcd = Rcdab .
(3.4.10)

Proof (1) It is obvious by definition.


(2) Since R[abc] d ωd = ∇[a ∇b ωc] − ∇[b ∇a ωc] = 2∇[a ∇b ωc] , to prove (3.4.7) all
we have to show is that

∇[a ∇b ωc] = 0 , ∀ωc ∈ F (0, 1) . (3.4.11)

It follows from (3.1.8) that

∇a (∇b ωc ) = ∂a (∇b ωc ) − d
ab ∇d ωc − d
ac ∇b ωd

= ∂a (∂b ωc − e
bc ωe ) − d
ab ∇d ωc − d
ac ∇b ωd

= (∂a ∂b ωc − e
bc ∂a ωe − ωe ∂a e
bc ) − d
ab ∇d ωc − d
ac ∇b ωd ,
(3.4.12)

and hence

∇[a ∇b ωc] = ∂[a ∂b ωc] − e


[bc ∂a] ωe − ωe ∂[a e
bc] − d
[ab ∇|d| ωc] − d
[ac ∇b] ωd ,

where |d| in the lower indices [ab|d|c] indicates that d does not participate in the
antisymmetrization. Noticing that ∂a ∂b ωc = ∂b ∂a ωc and e bc = e cb , we see from
Theorem 2.6.2 (c) that each term on the right-hand side of the above equation van-
ishes.
3.4 The Riemann Curvature Tensor 95

(3) To prove (3.4.8), we only have to show that ωe ∇[a Rbc]d e = 0 ∀ωe ∈ F (0, 1).
Since

ωe ∇a Rbcd e = ∇a (Rbcd e ωe ) − Rbcd e ∇a ωe


= ∇a (∇b ∇c ωd − ∇c ∇b ωd ) − Rbcd e ∇a ωe ,

one has

ωe ∇[a Rbc]d e = ∇[a ∇b ∇c] ωd − ∇[a ∇c ∇b] ωd − R[bc|d| e ∇a] ωe


= ∇[a ∇b ∇c] ωd − ∇[b ∇a ∇c] ωd − R[bc|d| e ∇a] ωe . (3.4.13)

To derive the sum of the first two terms on the right, first we write out the expression
without the square bracket

∇a ∇b ∇c ωd − ∇b ∇a ∇c ωd = (∇a ∇b − ∇b ∇a )∇c ωd = Rabc e ∇e ωd + Rabd e ∇c ωe ,

where in the second equality we used (3.4.5). Antisymmetrizating the lower indices
a, b, c, and noticing (3.4.7), then we have

∇[a ∇b ∇c] ωd − ∇[b ∇a ∇c] ωd = R[ab|d| e ∇c] ωe = R[bc|d| e ∇a] ωe ,

which indicates that the right-hand side of (3.4.13) vanishes. Therefore, ωe ∇[a Rbc]d e
= 0.
(4) Applying (3.4.5) to gcd , it follows from ∇a gcd = 0 that

0 = (∇a ∇b − ∇b ∇a )gcd = Rabc e ged + Rabd e gce = Rabcd + Rabdc ,

and hence (3.4.9) holds.


(5) Left as an exercise (Exercise 3.12). 

Remark 1 Suppose dim M = n, then Rabcd has in total n 4 components Rμνσρ . How-
ever, since the algebraic equations (3.4.6), (3.4.7), (3.4.9) and (3.4.10) are satisfied,
the number of independent components is only [for a proof, see Bergmann (1976)
pp. 172–174]
n 2 (n 2 − 1)
N= .
12
After a metric is chosen, each tensor Tab of type (0, 2) corresponds to a tensor
T a b ≡ g ac Tcb of type (1, 1), which is nothing but a linear transformation on a vector
space. The components of this linear transformation in an arbitrary basis form a
matrix, and the matrices in different bases are similar to each other; hence, they have
the same trace, whose value is T a a = g ac Tac , called the trace of the tensor T a b , also
called the trace of Tab . Similarly, for a given tensor Rabcd of type (0, 4), we can
in principle obtain the following six “traces” through contraction [each “trace” is a
tensor of type (0, 2)]: g ab Rabcd , g ac Rabcd , g ad Rabcd , g bc Rabcd , g bd Rabcd , g cd Rabcd .
96 3 The Riemann (Intrinsic) Curvature Tensor

However, due to the properties of Rabcd which comes from lowering the upper index
of the Riemann tensor Rabc d [(1), (4), (5) of Theorem 3.4.6] and the symmetry of
g ac , it is easy to see from (d) of Theorem 2.6.2 that the first and the sixth contractions
above vanishes; the second and the fifth are equal (reason: g ac Rabcd =g ac Rbadc , which
is essentially the same as g bd Rabcd , we do not write g ac Rabcd =g bd Rabcd only because
we need to take care of the balance of indices); the third and the forth are equal
and they are the negative of the second and the fifth ones. Hence, among these
six contractions there is only a single independent one, we can take, for example,
g bd Rabcd , denoted by Rac , called the Ricci tensor. What should be emphasized is
that we do not need a metric to define the Ricci tensor since Rac ≡ Rabc b is endowed
with a clear meaning. We can also take the trace of Rac using the metric, i.e., g ac Rac ,
denoted by R, called the scalar curvature. From (3.4.10), it is easy to show that
Rac = Rca . Besides, one should also be acquainted with the traceless part of Rabc d ,
which is called the Weyl tensor, defined as follows:
Definition 2 For a generalized Riemannian space of dimension n  3, the Weyl
tensor Cabcd is defined by the following expression:

2 2
Cabcd := Rabcd − (ga[c Rd]b − gb[c Rd]a ) + Rga[c gd]b .
n−2 (n − 1)(n − 2)
(3.4.14)

Theorem 3.4.7 Weyl tensors have the following properties:

(1) Cabcd = −Cbacd = −Cabdc = Ccdab , C[abc]d = 0 . (3.4.15)


(2) The trace of Cabcd over any pair of indices vanishes, e.g., g Cabcd = 0 .
ac

Proof Exercise. 

Remark 2 Equation (3.4.14) indicates that Rabcd is the summation of its traceless
part Cabcd and its trace part

2 2
(ga[c Rd]b − gb[c Rd]a ) − Rga[c gd]b .
n−2 (n − 1)(n − 2)

Definition 3 The Einstein tensor of a generalized Riemannian space is defined by

1
G ab := Rab − Rgab . (3.4.16)
2
Theorem 3.4.8
∇ a G ab = 0 (where ∇ a G ab ≡ g ac ∇c G ab ) . (3.4.17)

Proof From the Bianchi identity (3.4.8) and (3.4.6) we have 0 = ∇a Rbcd e + ∇c Rabd e
+ ∇b Rcad e . Contracting indices a and e yields 0 = ∇a Rbcd a + ∇c Rabd a + ∇b Rcad a
= ∇a Rbcd a − ∇c Rbd + ∇b Rcd . Acting g bd on it we get
3.4 The Riemann Curvature Tensor 97

0 = g bd ∇a Rbcd a − g bd ∇c Rbd + g bd ∇b Rcd


= ∇a Rc a − ∇c R + ∇b Rc b = 2∇a Rc a − ∇c R . (3.4.18)

Hence, ∇ a G ab = ∇ a Rab − 21 g ab ∇ a R = ∇a Rb a − 21 ∇b R = 0, where we used Rab =


Rba in the second equality and (3.4.18) in the third equality. 

Equation (3.4.17) that the Einstein tensor satisfies is significant for establishing
Einstein’s equation of general relativity, for details, see Sect. 7.7.

3.4.2 Computing Riemann Curvature from a Metric

Suppose M has a given metric gab , from ∇a gbc = 0 a unique connection ∇a is deter-
mined, and thus we have a Riemann tensor Rabc d . A common problem is to compute
Rabc d from the given gab . Computing a tensor means deriving its components in
a certain basis. There are two types of basis: coordinate basis and non-coordinate
basis. In this section, we only talk about the method of computing curvature using
a coordinate basis; the methods using non-coordinate bases are introduced in Sects.
5.7 and 8.7.
After we choose an arbitrary coordinate system, the components gμν of the metric
are then known, and in this coordinate system the connection ∇a satisfying ∇a gbc = 0
can be characterized by its Christoffel symbol in this system:

σ 1 σρ
μν = g (gρμ,ν + gνρ,μ − gμν,ρ ) [i.e., (3.2.10 )] . (3.4.19)
2
σ
μν has three component indices, and thus { σ μν } contains n 3 numbers. The sym-
metry σ μν = σ νμ makes it so that only n 2 (n + 1)/2 among the n 3 numbers are
independent (when n = 4 there are 40 independent numbers). The first step for the
calculation is to derive all the nonvanishing σ μν from the given gμν .
From the definition of the Riemann tensor we have Rabc d ωd = 2∇[a ∇b] ωc , where
∇a ∇b ωc can be expressed in six terms using (3.4.12) (there are five terms in this
equation, and the fifth term can be expanded into two terms, i.e., ∂b ωd − e bd ωe ).
Antisymmetrizing the indices a, b in each term, and noting that ∂[a ∂b] ωc = 0,
[ab] = [(ab)] = 0, we obtain
d d

Rabc d ωd = 2(− e
c[b ∂a] ωe − ωe ∂[a e
b]c − d
c[a ∂b] ωd + d
c[a
e
b]d ωe )

= −2ωd ∂[a d
b]c +2 e
c[a
d
b]e ωd , ∀ωd ∈ F (0, 1) .

Hence,
Rabc d = −2∂[a d
b]c +2 e
c[a
d
b]e , (3.4.20)

whose coordinate components are


98 3 The Riemann (Intrinsic) Curvature Tensor

Rμνσ ρ = ρ
μσ,ν − ρ
νσ,μ + λ
σμ
ρ
νλ − λ
σν
ρ
μλ , (3.4.20 )

where ρ μσ,ν ≡ ∂ ρ μσ /∂ x ν . From the equation above we can also obtain the expres-
sion for the coordinate components of the Ricci tensor

Rμσ = Rμνσ ν = ν
μσ,ν − ν
νσ,μ + λ
μσ
ν
λν − λ
νσ
ν
λμ . (3.4.21)

[Optional Reading 3.4.1]


If the components gμν of a metric gab in a coordinate system are all constants, then all of
its Christoffel symbols vanish ( σ μν = 0), and from (3.4.20 ) we know that its Riemann
tensor Rabc d = 0; thus, (at least in this coordinate patch) it is a flat metric. Conversely, if we
know that for a gab there is Rabc d = 0, does there always exists a coordinate system such
that the coordinate components gμν of gab are all constants? The answer is affirmative. See
the following theorem.

Theorem 3.4.9 A metric field gab is (locally) flat (i.e., Rabc d = 0) if and only if there exists
a coordinate system such that the coordinate components of gab are all constants.

Proof The proof of this theorem requires techniques that we have not covered yet, see
Appendix J of Volume III. 

[The End of Optional Reading 3.4.1]


[Optional Reading 3.4.2]
Equation (3.4.12) contains ν νσ . In fact, “contracted Christoffel symbols” like this are
involved in many computations. For instance, inspired by the definition of the divergence
∇ · v in a 3-dimensional Euclidean space, we can define the divergence of a vector field va
in (M, ∇a ) as ∇a va (we may also call ∇a T ab the divergence of a tensor field T ab ). Since
∇a va = ∂a va + a ab v b , we need to deal with the “contracted Christoffel symbol” a ab
when calculating the divergence. Now we shall derive the expression for ν νσ . It follows
from (3.4.19) that
 
μ 1 μλ 1 μλ 1 μλ
μσ = g (gσ λ,μ + gμλ,σ − gμσ,λ ) = g gμλ,σ + g μλ gσ [λ,μ] = g gμλ,σ ,
2 2 2

where we used the fact that g [μλ] = 0 in the last step. This equation can be rewritten as

μ 1 μλ ∂gμλ
μσ = g . (3.4.22)
2 ∂xσ
On the other hand, the determinant g of the matrix constituted by gμλ can be expanded with
respect to the μth row as g = gμλ Aμλ (where Aμλ is the cofactor of gμλ , and the sum is
only taken over λ); hence, ∂g/∂gμλ = Aμλ . Thus, from the expression for the inverse matrix
elements g μλ = Aλμ /g we have
∂g
= gg μλ . (3.4.23)
∂gμλ
Since gμλ are functions of the coordinates x σ , g is a function of x σ as well, and

∂g ∂g ∂gμλ ∂gμλ
= = gg μλ σ , (3.4.24)
∂xσ ∂gμλ ∂ x σ ∂x

where (3.4.23) is used in the last step. Combining (3.4.22) and (3.4.24) yields
3.5 The Intrinsic Curvature and the Extrinsic Curvature 99

μ 1 ∂g 1 ∂ |g|
μσ = = √ . (3.4.25)
2g ∂ x σ |g| ∂ x σ

This is the expression for the “contracted Christoffel symbol”. The divergence ∇a va (as a
scalar field) can be derived by means of an arbitrary basis. Using the coordinate basis, it is
easy to derive from (3.4.25) and ∇a va = ∂a va + a ab v b that

1 ∂
∇a va = √ ( |g|v σ ) . (3.4.26)
|g| ∂ x σ

As an example of the application, now we derive the expression for the divergence ∇  · v of
a vector field v in the 3-dimensional Euclidean space in both the Cartesian and the spherical
coordinate systems. First, we rewrite the above equation as

 · v = ∇a va = √1 ∂ ( |g|vi ) .
∇ (3.4.27)
|g| ∂ x i

 · v =
(1) For a Cartesian coordinate system, g = 1, ∇ ∂vi
= ∂v1
+ ∂v2
+ ∂v3
; this is the
∂xi ∂x1 ∂x2 ∂x3
familiar formula for divergence.

(2) For a spherical coordinate system, g = r 2 sin2 θ ,

 · v = 1 ∂
∇ (vi r 2 sin θ)
r 2 sin θ ∂ x i
 
1 ∂(v 1 r 2 sin θ) ∂(v2 r 2 sin θ) ∂(v3 r 2 sin θ)
= 2 + + , (3.4.28)
r sin θ ∂r ∂θ ∂ϕ

where v 1 , v2 , v3 are the components of va in the coordinate basis {(∂/∂r )a , (∂/∂θ)a ,


(∂/∂ϕ)a }. However, normally the formula in an electrodynamics textbook is written in terms
of the components of va in an orthonormal basis {(er )a , (eθ )a , (eϕ )a } (denoted by vr , vθ , v ϕ ).
Note that

(er )a = (∂/∂r )a , (eθ )a = r −1 (∂/∂θ)a , (eϕ )a = (r sin θ)−1 (∂/∂ϕ)a ,

which means v1 = vr , v 2 = r −1 v θ , v3 = (r sin θ)−1 v ϕ . Plugging these into (3.4.27) yields


θ ϕ
 · v = 1 ∂(v r ) + 1 ∂(v sin θ) + 1 ∂(v ) ,
r 2

r 2 ∂r r sin θ ∂θ r sin θ ∂ϕ
which agrees with the formula in electrodynamics textbooks.
[The End of Optional Reading 3.4.2]

3.5 The Intrinsic Curvature and the Extrinsic Curvature

According to our intuition, a plane is flat while a curved surface is not. More precisely,
these “flat” and “curved” surfaces in our mind are all 2-dimensional surfaces (such as
spherical and cylindrical surfaces) embedded in the 3-dimensional Euclidean space.
Now we ask: given an n-dimensional manifold, can we talk about if it is curved by
following the same idea? As long as it can be embedded into an (n + 1)-dimensional
manifold, the answer will be yes. The curvature defined by embedding a manifold in
100 3 The Riemann (Intrinsic) Curvature Tensor

a manifold with one extra dimension is called the “extrinsic curvature”, which has
a precise definition (for details see Chap. 14). According to this definition, both of
a sphere and a cylindrical surface in 3-dimensional Euclidean space have a nonzero
curvature, which tallies with our intuition. However, the Riemann curvature we intro-
duced in this chapter is the intrinsic curvature, which reflects the “intrinsic warping”
of a manifold M after a connection ∇a is assigned. Unlike the extrinsic curvature,
there is no need to embed M in a one-higher dimensional manifold to tell the intrinsic
curvature. [Generally speaking, any property of (M, gab ) that can be determined by
just gab (without having to embed the manifold in a higher dimensional manifold)
is called an intrinsic property of (M, gab ).] The term “intrinsic curvature” actu-
ally just reflects the following three equivalent properties; a generalized Riemannian
space with these properties is called a curved space.
(1) The non-commutativity of the derivative operator, i.e., (∇a ∇b − ∇b ∇a )ωc =
Rabc d ωd , ∀ωc ∈ F (0, 1), where the nonvanishing tensor field Rabc d is used as the
definition of the intrinsic (Riemann) curvature, see Sect. 3.4.
(2) The curve-dependence of the parallel transport of a vector.
As we have discussed in Sect. 3.2, for two points p and q in (M, ∇a ), there
exists a curve-dependent translation map between their tangent spaces V p and Vq ;
that is, for a curve between p and q, any vector va at p determines a vector field
ṽa (satisfies ṽa | p = va ) parallelly transported along the curve whose value at q can
be defined as the image of va . In other words, ṽa |q is the result of va parallelly
transported to q. For Euclidean, Minkowski and any other flat space, this parallel
transport is curve-independent; thus, there is no need to specify a curve when we
talk about “parallelly transporting a vector at p to q”. This simplicity is called the
absoluteness of the parallel transport, which is pretty familiar to us. (Do you specify a
curve when you parallel transport a vector from a point to another point in Euclidean
space?) However, it is not as simple for a curved space. It can be proved that [see
Wald (1984) pp. 37–38; Straumann (1984) Theorem 5.7] a necessary and sufficient
condition for the intrinsic curvature Rabc d to be nonvanishing is that there exists a
closed curve such that a vector at a point on the curve will not return to itself when
parallelly transported along the curve; therefore, the parallel transport depends on
a curve (there is only a curve-dependent concept of parallel transport). Spherical
geometry provides a simple but intuitive example of this phenomenon:

Example 1 It can be computed that the Rabc d of a 2-dimensional sphere (together


with the induced metric) in a 3-dimensional Euclidean space is nonvanishing (see
Exercise 3.13). Figure 3.10 indicates that there exists a closed curve abca (each
segment is an arc of a great circle) such that a vector fails to return to itself when
parallelly transported along the curve. Take the vector va at a in the figure for example;
it is a tangent vector of a geodesic ab. Since the tangent of a geodesic is parallelly
transported along the geodesic, the result of the parallel transport of va to b is u a
(see Fig. 3.10), which is orthogonal to the tangent vector T a of bc. Since the parallel
transport preserves the orthogonality, as shown in the figure, the result of u a parallelly
transported to c is wa . wa is tangent to a geodesic ac, and hence v a coming from wa
parallelly transported to c should also be tangent to ac; therefore, v a = va .
3.5 The Intrinsic Curvature and the Extrinsic Curvature 101

Fig. 3.10 A vector va at a


becomes v a = va after being
parallelly transported along a
closed curve abca on the
sphere

Fig. 3.11 Identifying l1 and


l2 yields a cylindrical surface

(3) There exist geodesics that are parallel at first which become not parallel.
The two meridians in Fig. 3.10 give an intuitive example. For the precise meaning
see Sect. 7.6.
The curvature tensor field Rabc d of a flat space vanishes, and thus it does not
have any of the three properties above. Specifically, ① the derivative operator ∂a
associated with the flat metric (i.e., the ordinary derivative operator of a Cartesian
or Lorentzian system) does not have the non-commutativity; ② the parallel transport
of a vector does not depend on the curve, and thus one can talk about the “absolute
parallel transport” of a vector; ③ parallel lines will never intersect.
The intrinsic curvature and the extrinsic curvature are two different concepts. For
instance, a 2-dimensional cylindrical surface in 3-dimensional Euclidean space has
a nonzero extrinsic curvature but a zero intrinsic curvature. A cylindrical surface can
be viewed as the part between two parallel lines l1 and l2 on a plane after identifying
(gluing together) these two lines (see Fig. 3.11). Since the computation of Rabc d
at p only involves a neighborhood of p, it would not become nonzero due to the
identification of l1 and l2 .

Exercises

˜3.1. Now we give up the torsion-free condition,


(1) Show that there exists a tensor T c ab (called the torsion tensor) such that

∇a ∇b f − ∇b ∇a f = −T c ab ∇c f , ∀ f ∈ F .
102 3 The Riemann (Intrinsic) Curvature Tensor

Hint: set ∇˜ a as a torsion-free operator, imitate the derivation in Theo-


rem 3.1.4.
(2) Show that T c ab u a vb = u a ∇a vc − va ∇a u c − [u, v]c ∀u a , va ∈ F (1, 0).
3.2. Suppose va is a vector field, vν and v ν are the components of va in coordinate
systems {x ν } and {x ν }, respectively; Aν μ ≡ ∂vν /∂ x μ , A ν μ ≡ ∂v ν /∂ x μ . Show
that the relation between Aν μ and A ν μ generally does not satisfy the tensor
components transformation law. Hint: use the transformation law between vν
and v ν .
˜3.3. Prove Theorem 3.1.7.
b a a
3.4. Using the following definition of σ μν : ∂ ∂x ν ∇b ∂ ∂x μ = σ μν ∂ ∂x σ , show
that
σ
(a) = σ νμ ; (Hint: use the fact that ∇a is torsion free and that coordinate
μν
basis vectors commute.)
(b) vν ;μ = vν ,μ + ν μβ vβ . (NB: This is actually an equivalent definition of
Christoffel symbols.)
˜3.5. Determine whether each of the equations are true or false:
(1) ∇a (dx μ )b = 0;
(2) vν ;μ = (∇a vb )(∂/∂ x μ )a (dx ν )b ;
(3) vν ,μ = (∂a vb )(∂/∂ x μ )a (dx ν )b ;
(4) vν ;μ = (∂/∂ x μ )a ∇a vν ;
(5) vν ,μ = (∂/∂ x μ )a ∇a vν .
˜3.6. Suppose C(t) is a curve in the coordinate patch of {x μ }, x μ (t) is the parametric
representation of C(t) in this coordinate system, and va is a vector field on
C(t). Let Dvμ /dt ≡ (dx μ )a (∂/∂t)b ∇b va . Show that

Dvμ dvμ μ σ dx ν (t)


= + νσ v .
dt dt dt

˜3.7. Find all of the nonvanishing σ μν of a spherical coordinate system in the 3-


dimensional Euclidean space.
3.8 Suppose I is an interval of R, and C : I → M is a curve in (M, ∇a ). Show
that ∀s, t ∈ I , the translation map ψ : VC(s) → VC(t) (see Fig. 3.2) is an iso-
morphism.
˜3.9. Prove Theorems 3.3.2, 3.3.3 and 3.3.5.
˜3.10. (a) Write down the geodesic equation of the spherical metric ds 2 = R 2 (dθ 2 +
sin2 θ dϕ 2 ) (where R is a constant); (b) verify that any arc of a great circle
satisfies the geodesic equation. Hint: choose a spherical coordinate system
{θ, ϕ} such that the given great circle arc is part of the equator, and use ϕ as
the affine parameter.
˜3.11. Prove Theorem 3.4.2.
*3.12. Prove (3.4.10).
References 103

˜3.13. Derive all of the components of the Riemann tensor of the spherical metric (see
Exercise 3.10) in the {θ, ϕ} coordinate system.
3.14. Derive all of the components of the Riemann tensor of the metric ds 2 =
2 (t, x)(−dt 2 + dx 2 ) in the {t, x} coordinate system (use  ˙ and  to represent
the partial derivatives of the function  with respect to t and x, respectively).
3.15. Derive all of the components of the Riemann tensor of the metric ds 2 =
z −1/2 (−dt 2 + dz 2 ) + z(dx 2 + dy 2 ) in the {t, x, y, z} coordinate system.
3.16. Suppose α(z), β(z), γ (z) are three arbitrary functions, h = t + α(z)x +
β(z)y + γ (z). Derive all of the components of the Riemann tensor of the metric

ds 2 = −dt 2 + dx 2 + dy 2 + h 2 dz 2

in the {t, x, y, z} coordinate system.


3.17. Show that the Einstein tensor of a 2-dimensional generalized Riemannian space
vanishes. Hint: the Riemann tensor of a 2-dimensional generalized Riemannian
space has only one independent component.

References

Bergmann, P. G. (1976), Introduction to the Theory of Relativity, Dover Publications INC, New
York.
Chern, S. S., Chen, W. & Lam, K. S. (1999), Lectures on Differential Geometry, World Scientific
Publishing Company, Singapore.
Hawking, S. W. & Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
Straumann, N. (1984), General Relativity and Relativistic Astrophysics, Spinger-Verlag, Berlin.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Chapter 4
Lie Derivatives, Killing Fields
and Hypersurfaces

4.1 Maps of Manifolds

Suppose M and N are manifolds (whose dimensions can be different) and φ : M →


N is a smooth map. Let F M (k, l) and F N (k, l) represent the collection of all smooth
tensor fields of type (k, l) on M and N , respectively. φ naturally induces a series of
maps as follows.
Definition 1 The pullback map φ ∗ : F N → F M is defined as

(φ ∗ f )| p := f |φ( p) , ∀ f ∈ FN , p ∈ M ,

i.e., φ ∗ f = f ◦ φ, see Fig. 4.1.


From Definition 1 it is not difficult to prove that

(1) φ ∗ : F N → F M is a linear map, i.e.,


φ ∗ (α f + βg) = αφ ∗ ( f ) + βφ ∗ (g) ∀ f, g ∈ F N , α, β ∈ R .
(2) φ ( f g) = φ ∗ ( f )φ ∗ (g) ,

∀ f, g ∈ F N . (4.1.1)

Definition 2 For any point in M one can define the pushforward map φ∗ : V p →
Vφ( p) as follows: ∀va ∈ V p , define its image φ∗ va ∈ Vφ( p) as

(φ∗ v)( f ) := v(φ ∗ f ) , ∀ f ∈ FN . (4.1.2)

It should also be verified (Exercise 4.1) that the φ∗ va defined in this manner satisfies
the two conditions for a vector in Definition 2 of Sect. 2.2 and is thus indeed a vector
at φ( p). Many works refer to φ∗ as the tangent map of φ.
Theorem 4.1.1 φ∗ : V p → Vφ( p) is a linear map, i.e.,

φ∗ (αu a + βva ) = αφ∗ u a + βφ∗ va , ∀u a , va ∈ V p , α, β ∈ R .


© Science Press 2023 105
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_4
106 4 Lie Derivatives, Killing Fields and Hypersurfaces

Fig. 4.1 The definition of M N


φ∗ f
p (p)

* f
f
IR

Proof Exercise 4.2. 

Theorem 4.1.2 Suppose C(t) is a curve in M and T a is the tangent vector of the
curve at a point C(t0 ), then φ∗ T a ∈ Vφ(C(t0 )) is the tangent vector of the curve φ(C(t))
at φ(C(t0 )) (the image of the tangent vector of a curve is the tangent vector of the
image of the curve).

Proof Exercise 4.2. Hint: use the definition of the tangent vector of a curve
[see (2.2.6)]. 

Definition 3 The pullback map can be extended to φ ∗ : F N (0, l) → F M (0, l) in


the following way: ∀T ∈ F N (0, l) define φ ∗ T ∈ F M (0, l) as

(φ ∗ T )a1 ···al | p (v1 )a1 · · · (vl )al := Ta1 ···al |φ( p) (φ∗ v1 )a1 · · · (φ∗ vl )al ,
∀ p ∈ M , v1 , . . . , vl ∈ V p . (4.1.3)

Definition 4 ∀ p ∈ M the pushforward map can be extended to φ∗ : TV p (k, 0) →


TVφ( p) (k, 0) in the following manner [namely, φ∗ is a map that turns a tensor of
type (k, 0) at p into a tensor of the same type at φ( p)]: ∀T ∈ TV p (k, 0) its image
φ∗ T ∈ TVφ( p) (k, 0) is defined by the following equation:

(φ∗ T )a1 ···ak (ω1 )a1 · · · (ωk )ak := T a1 ···ak (φ ∗ ω1 )a1 · · · (φ ∗ ωk )ak ,

∀ω1 , . . . , ωk ∈ Vφ( p) ,

where (φ ∗ ω)a is defined as (φ ∗ ω)a va := ωa (φ∗ v)a ∀va ∈ V p .

Remark 1 Definition 2 is nothing but a special case of Definition 4 when k = 1. If


we refer to a scalar field as a tensor field of type (0, 0), then Definition 1 is nothing
but a special case of Definition 3 when l = 0. Definition 3 indicates that the pullback
map φ ∗ can turn a tensor field of type (0, l) on N into a tensor field of the same type
on M, and thus is a map that turns a field into a field; while according to Definition 4,
the pushforward map φ∗ only turns a tensor of type (k, 0) at a point p in M into a
tensor of the same type at the image point φ( p). Can we extend φ∗ to a map that
turns a tensor field of type (k, 0) on M into a tensor field of the same type on N ?
Generally speaking, we cannot. Take a vector field as an example. Given a vector
field v on M, to define the image field φ∗ v on N we need to define a vector for each
point of N , which is always related to the inverse image φ −1 (q). [This is analogous
4.1 Maps of Manifolds 107

to Definition 3, according to which φ ∗ can turn a field T on N into a field φ ∗ T on


M, and to define the value φ ∗ T at any point of M one will need the value of T
at φ( p).] If φ is not an onto map, then φ −1 (q) may not exist, and thus we cannot
use the v at φ −1 (q) as the v on the right-hand side of (4.1.2). If φ is not a one-to-
one map, then there may be more than one point for the inverse image φ −1 (q), and
thus we cannot determine the v of which inverse point should be defined as the v
on the right-hand side of (4.1.2). This implies that if φ is just a smooth map, then
φ∗ cannot necessarily pushforward a field to a field. However, if φ : M → N is a
diffeomorphism, then the trouble we just mentioned will disappear. The pushforward
map φ∗ can be viewed as a map that turns a tensor field of type (k, 0) on M into a
tensor field of the same type on N , i.e., φ∗ : F M (k, 0) → F N (k, 0). Furthermore,
since φ −1 exists and is smooth, its pullback map φ −1∗ maps F M (0, l) to F N (0, l),
which can be regarded as the pushforward map of φ. Therefore, φ∗ can be generalized
even further as φ∗ : F M (k, l) → F N (k, l). For instance, suppose T a b ∈ F M (1, 1),
then (φ∗ T )a b ∈ F N (1, 1) is defined as

(φ∗ T )a b |q ωa vb := T a b |φ −1 (q) (φ ∗ ω)a (φ ∗ v)b , ∀q ∈ N , ωa ∈ Vq∗ , v ∈ Vq ,

where (φ ∗ v)b should be understood as (φ∗−1 v)b . Similarly, the pullback map can also
be generalized as φ ∗ : F N (k, l) → F M (k, l). The generalized φ∗ and φ ∗ are still
linear maps and are the inverse of each other.

Suppose φ : M → N is a diffeomorphism, p ∈ M, {x μ } and {y μ } are local coor-


dinate systems of M and N , respectively, whose coordinate patches O1 and O2 satisfy
p ∈ O1 and φ( p) ∈ O2 . Thus, p ∈ φ −1 [O2 ]. φ being a diffeomorphism ensures that
M and N have the same dimension, and hence the μ of both {x μ } and {y μ } range from
1 to n. A diffeomorphism is originally defined to be a transformation of points; how-
ever, it can also be viewed as a transformation of coordinates since we can define a set
of new coordinates {x μ } on φ −1 [O2 ] using φ : M → N as follows: ∀q ∈ φ −1 [O2 ]
define x μ (q) := y μ (φ(q)). Thus, a diffeomorphism φ automatically induces a coor-
dinate transformation x μ → x μ in the neighborhood O1 ∩ φ −1 [O2 ] of p. It is not
difficult to prove from Theorem 4.1.2 that ∀q ∈ O1 ∩ φ −1 [O2 ] we have

φ∗ [(∂/∂ x μ )a |q ] = (∂/∂ y μ )a |φ(q) , (4.1.4)

from which one can also show that

φ∗ [(dx μ )a |q ] = (dy μ )a |φ(q) . (4.1.5)

Therefore, there exist two different viewpoints for a diffeomorphism φ : M → N :


① the active viewpoint, which consider φ as, by definition, a transformation of
points [which turns p into φ( p)] and the consequent tensor transformation [which
turns a tensor T at p into a tensor φ∗ T at φ( p)]; ② the passive viewpoint, which
regards p and all tensors at p as unchanged, and the consequence of φ : M → N is
simply a coordinate transformation (which turns {x μ } into {x μ }). Although these two
108 4 Lie Derivatives, Killing Fields and Hypersurfaces

viewpoints seems quite at odds with each other, they are equivalent for all practical
purposes. The theorem below can be seen as some kind of manifestation of this
equivalence.

Theorem 4.1.3

(φ∗ T )μ1 ···μk ν1 ···νl |φ( p) = T μ1 ···μk ν1 ···νl | p , ∀T ∈ F M (k, l) , (4.1.6)

where the left-hand side are the components of the new tensor φ∗ T at the new point
φ( p) in the old coordinate system {y μ }, and the right-hand side are the components
of the old tensor T at the old point p in the new coordinate system {x μ }.

Proof Exercise 4.2. 

Remark 2 Equation (4.1.6) is an equality of real numbers, the left-hand side is the
number coming from the active viewpoint (which regards the point and the tensor as
changed but the coordinate system as unchanged), while the right-hand side is the
number coming from the passive viewpoint (which regards the point and the tensor as
unchanged but the coordinate system as changed). Both sides being equal indicates
that these two viewpoints are equivalent for all practical purposes.

Example 1 Suppose T al ···ak b1 ···bl in Theorem 4.1.3 is a vector va . Let u a ≡ φ∗ va ∈


Vφ( p) , then it is not difficult to prove from (4.1.6) that

u μ = vν (∂ x μ /∂ x ν )| p . (4.1.7)

[Optional Reading 4.1.1]


Now we further explain the equivalence of the active and passive viewpoints. Suppose
Tab is a tensor field on M, then its components in a coordinate system {x σ } form a set
of functions of coordinates x σ , i.e., Tμν (x σ ). Suppose there is a coordinate transformation
{x σ } → {x σ }, then the components of Tab in the coordinate system {x σ } form a set of
functions of coordinates x σ , i.e., Tμν (x σ ). These two sets of functions are different in

general. (Here we mean the expressions for Tμν and Tμν  are different, though the symbols

for the argument do not matter.) If we want to obtain another set of functions Tμν  from

the function set Tμν , we just need to perform the coordinate transformation, but not the
transformation for points and tensors on the manifold; that is, there is no need to employ the
map between manifolds and the map of tensors induced by it. This can be called the “passive
approach” of acquiring a new set of functions Tμν  . However, the same effect can also be

obtained by adopting the following “active approach”. Suppose N is another manifold and
there exists a diffeomorphism φ : M → N , then T̃ab ≡ φ∗ Tab is a tensor field on N , the
components of which in a coordinate system {y σ } are also a set of functions T̃μν (y σ ) that,
in general, have a different form than Tμν (x σ ). This approach involves the transformation
of points (φ : M → N ) and the transformation of tensor fields (φ∗ : Tab → T̃ab ) but not any
coordinate transformation, which is exactly what the active viewpoint means. In order to
make sure that they will lead to the same end—that is, that the new function sets T̃μν and
 coming from the active and passive approaches are the same—we only need to set the
Tμν
coordinate transformation on M induced by the diffeomorphism φ : M → N in the active
approach as the coordinate transformation {x σ } → {x σ } in the passive approach. In fact, if
we suppose p ∈ M and q ≡ φ( p) ∈ N , then
4.1 Maps of Manifolds 109

T̃μν (y σ (q)) = T̃μν |q = (φ∗ T )μν |q = Tμν


 
| p = Tμν (x σ ( p)) = Tμν

(y σ (q)) ,

where Theorem 4.1.3 and the requirement of “setting the coordinate transformation induced
by φ : M → N as {x σ } → {x σ }” are applied in the third and the fifth equality, respectively.
This equation above indicates that T̃μν (y σ ) = Tμν
 (y σ ), i.e., functions T̃ 
μν and Tμν are
equivalent.
This is only an example that shows the equivalence of the active and passive viewpoints
for practical purposes. The fact that Theorem 4.1.3 was used in a key step of the proof
indicates once again that this theorem is some kind of manifestation of this equivalence.
[The End of Optional Reading 4.1.1]

[Optional Reading 4.1.2]


In this optional reading, we introduce several useful theorems as supplements.

Theorem 4.1.4 Suppose φ : M → N is a smooth map, then ∀T ∈ F N (0, l), T  ∈ F N (0, l  )


we have
φ ∗ (T ⊗ T  ) = φ ∗ (T ) ⊗ φ ∗ (T  ) . (4.1.8)

Proof The reader should add abstract indices to the equation and carry out the proof. 

Theorem 4.1.5 Suppose φ : M → N is a smooth map, then ∀T ∈ TV p (k, 0), T  ∈ TV p (k  , 0)


we have
φ∗ (T ⊗ T  ) = φ∗ (T ) ⊗ φ∗ (T  ) . (4.1.9)

Proof The reader should add abstract indices to the equation and carry out the proof. 

Theorem 4.1.6 Suppose φ : M → N is a diffeomorphism, then ∀T ∈ F M (k, l), T  ∈


F M (k  , l  ) we have
φ∗ (T ⊗ T  ) = φ∗ (T ) ⊗ φ∗ (T  ) . (4.1.10)

Remark 3 ① The above equation is an equality of tensor fields on N , while (4.1.9) is just an
equality of tensors at a point φ( p) ∈ N . ② The above equation will still hold if we substitute
φ ∗ for φ∗ ; however, T and T  in this case should be tensor fields on N , and the new equation
should be viewed as an equality of tensor fields on M.

Proof Exercise. 

Theorem 4.1.7 Suppose φ : M → N is a diffeomorphism, then φ∗ (and φ ∗ ) commute with


any contraction.

Proof To show that φ∗ (C T ) = C(φ∗ T ), first we take a tensor field T a b on M as an example.


In this case φ∗ (C T ) = C(φ∗ T ) is an equality of scalar fields on N , and all we have to do is to
show that it holds for the image point φ( p) ∈ N of any p ∈ M. Suppose {(eμ )a } and {(eμ )a }
are a basis and its dual basis at p, then T a b = T μ ν (eμ )a (eν )b . It follows from (4.1.10) that

φ∗ T a b = (φ∗ T μ ν )[φ∗ (eμ )a ][φ∗ (eν )b ] ,

and hence
C(φ∗ T ) = (φ∗ T μ ν )[φ∗ (eμ )a ][φ∗ (eν )a ] .
Taking (∂/∂ x μ )a from (4.1.4) and (dx μ )a from (4.1.5) as (eμ )a and (eμ )a , respectively,
yields
[φ∗ (eμ )a ][φ∗ (eν )a ] = (∂/∂ y μ )a (dy ν )a = δ ν μ .
110 4 Lie Derivatives, Killing Fields and Hypersurfaces

(Actually it can be proved that the above equation holds for any {(eμ )a } and {(eμ )a } at p.)
Therefore,

C(φ∗ T ) = (φ∗ T μ ν )δ ν μ = φ∗ (T μ ν δ ν μ ) = φ∗ (T μ μ ) = φ∗ (C T ) .

The reader may generalize this proof to a tensor field of arbitrary type on M. 

[The End of Optional Reading 4.1.2]

4.2 Lie Derivatives

As we have discussed at the end of Sect. 2.2, a smooth vector field va on M gives rise
to a one-parameter group of diffeomorphisms φ.1 Suppose T ··· ··· is a smooth tensor
field on M, then φt∗ T ··· ··· is also a smooth tensor field of the same type, where φt is a
group element from the one-parameter group of diffeomorphisms φ. The difference
of these two tensor fields at p ∈ M, namely, φt∗ T ··· ··· | p − T ··· ··· | p , is a tensor at p,
and the quotient (φt∗ T ··· ··· | p − T ··· ··· | p )/t in the limit of t approaches zero can be
viewed as some kind of derivative of the tensor field T ··· ··· at p. Therefore, we have
the following definition:
Definition 1
1
Lv T a1 ···ak b1 ···bl := lim (φt∗ T a1 ···ak b1 ···bl − T a1 ···ak b1 ···bl ) (4.2.1)
t→0 t

is called the Lie derivative of a tensor field T a1 ···ak b1 ···bl along a vector field va . (To
avoid confusion, the v in Lv is not written as va .)
Remark 1 Since φt∗ is a linear map, the Lie derivative is a linear map from F M (k, l)
to F M (k, l). From (4.2.1) and Theorem 4.1.7 we can also see that Lv commutes with
contractions.
Theorem 4.2.1
Lv f = v( f ) , ∀f ∈F . (4.2.2)

Proof ∀ p ∈ M, suppose C(t) is the orbit of φ that passes through p. Set p = C(0),
then φt ( p) = C(t), and va | p ≡ (∂/∂t)a | p is the tangent vector of C(t) at p (see
Fig. 4.2). Hence,

1 1
Lv f | p = lim (φt∗ f − f )| p = lim [ f (φt ( p)) − f ( p)]
t→0 t t→0 t
1 d
= lim [ f (C(t)) − f (C(0))] = ( f ◦ C)|t=0 = v( f )| p . 
t→0 t dt

1If va is incomplete, then it can only give rise to a one-parameter local group of diffeomorphisms.
This section only involves local properties, so there is no need to distinguish local and global.
4.2 Lie Derivatives 111

Fig. 4.2 Figure for the proof


of Theorem 4.2.1
( p)=C( t )
v a |p t

p C(0)

Taking n = 2 for example, we now introduce a special coordinate system that is


quite useful for computing Lie derivatives. Suppose {x 1 , x 2 } is a coordinate system,
then the x 1 -coordinate lines and x 2 -coordinate lines comprise a “coordinate grid”.
To see the coordinates of a point in the coordinate patch, all we have to do is to find
the intersection of two coordinate lines at which this point is located. Since any Lie
derivative is taken along a vector field va , we may choose the integral curves of va as
the x 1 -coordinate lines. More precisely, if va = (∂/∂t)a , we simply choose t to be
the first coordinate x 1 of this system, i.e., we set va = (∂/∂ x 1 )a . Then, we arbitrarily
choose another set of curves that are transverse to them (that is, the two tangent
vectors at any point of intersection are not parallel) as the x 2 -coordinate lines. Such
a coordinate system is called a coordinate system adapted to the vector field va .2
The discussion above can be generalized to manifolds of arbitrary dimensions.

Theorem 4.2.2 The components of the Lie derivative of a tensor field T a1 ···ak b1 ···bl
along va in a coordinate system adapted to va are

∂ T μ1 ···μk ν1 ···νl
(Lv T )μ1 ···μk ν1 ···νl = . (4.2.3)
∂x1
Remark 2 The left-hand side of the above equation satisfies the tensor transformation
law under a coordinate transformation while the right-hand side does not. Hence, this
equation cannot be written as an equality of tensors.

Proof Here, we only take n = 2, k = l = 1 as an example (it is easily carried over


to the general case). Since φt∗ = (φt−1 )∗ = φ−t∗ , the components of (4.2.1) in an
arbitrary coordinate system can be expressed as

1
(Lv T )μ ν | p = lim [(φ−t∗ T )μ ν | p − T μ ν | p ] ∀p ∈ M . (4.2.4)
t→0 t

Let q ≡ φt ( p). Since (4.2.4) only involves the points near p, one can always consider
p and q as being in the same adapted coordinate patch. For φ−t , q is the old point
and p is the new point, and hence it follows from (4.1.6) that

2As long as va = 0 at a point, one can always define a coordinate system adapted to va in a
neighborhood of the point.
112 4 Lie Derivatives, Killing Fields and Hypersurfaces
 
∂ x μ ∂ x σ ρ
(φ−t∗ T )μ ν | p = T μ ν |q = T σ , (4.2.5)
∂ x ρ ∂ x ν q

where x σ are the adapted coordinates (the old coordinates), while x μ are the new
coordinates induced by φ−t . The right-hand side of the above equation involves the
value of the partial derivatives between the new and old coordinates at q which, to
calculate, we need to find the coordinate transformation in a small neighborhood N of
q. ∀q̄ ∈ N , denote p̄ ≡ φ−t (q̄). From the definition of adapted coordinates we know
that x 1 (q̄) = x 1 ( p̄) + t, x 2 (q̄) = x 2 ( p̄); also, by definition, the new coordinates at q̄
induced by φ−t are x 1 (q̄) ≡ x 1 ( p̄), x 2 (q̄) ≡ x 2 ( p̄), and hence x 1 (q̄) = x 1 (q̄) − t,
x 2 (q̄) = x 2 (q̄). Since q̄ is an arbitrary point in N , for N we have x 1 = x 1 − t,
x 2 = x 2 , and taking the derivatives we get (∂ x μ /∂ x ρ )|q = δ μ ρ , (∂ x σ /∂ x ν )|q =
δ σ ν . Therefore, (4.2.5) becomes (φ−t∗ T )μ ν | p = T μ ν |q , and plugging this into (4.2.4)
yields (Lv T )μ ν | p = ∂ T μ ν /∂ x 1 | p . 

It follows from Theorem 4.2.2 that Lv satisfies the Leibniz rule.

Theorem 4.2.3
Lv u a = [v, u]a , ∀u a , va ∈ F (1, 0) , (4.2.6)

or, by means of the expression for a commutator (3.1.13), we have

Lv u a = vb ∇b u a − u b ∇b va , (4.2.6 )

where ∇a is an arbitrary torsion-free derivative operator.

Proof The claim we are about to prove is an equality of vectors, all we have to show
is that the corresponding equality of components in a coordinate system holds. The
most convenient one to use is certainly the adapted coordinate system. Suppose the
ordinary derivative operator of a coordinate system {x μ } adapted to va is ∂a , then

[v, u]μ = (dx μ )a [v, u]a = (dx μ )a (vb ∂b u a − u b ∂b va ) = vb ∂b u μ


= v(u μ ) = ∂u μ /∂ x 1 = (Lv u)μ ,

where the third equality comes from the fact that va = (∂/∂ x 1 )a leads to ∂b va = 0,
condition (d) in the definition of a derivative operator is used in the fourth equality,
and (4.2.3) is used in the last step. 

Theorem 4.2.4

Lv ωa = vb ∇b ωa + ωb ∇a vb , ∀va ∈ F (1, 0), ωa ∈ F (0, 1) , (4.2.7)

where ∇a is an arbitrary torsion-free derivative operator.

Proof Exercise 4.7. Hint: use Theorem 4.2.3 and 4.2.1, the latter of which will give
Lv (ωa u a ) = vb ∇b (ωa u a ). 
4.3 Killing Vector Fields 113

Theorem 4.2.5


k 
l
Lv T a1 ···ak b1 ···bl = v c ∇c T a1 ···ak b1 ···bl − T a1 ···c···ak b1 ···bl ∇c vai + T a1 ···ak b1 ···c···bl ∇b j v c
i=1 j=1
∀T ∈ F (k, l), v ∈ F (1, 0) , ∇a is an arbitrary torsion-free derivative operator. (4.2.8)

Proof Exercise. 

4.3 Killing Vector Fields

Up to this point, this chapter has not yet mentioned any metric or any derivative
operator associated with a metric since the definition of a Lie derivative does not
require any additional structure on the manifold M. However, if a metric field gab is
assigned to M, then one can also impose a higher requirement on a diffeomorphism
φ : M → M, i.e., φ ∗ gab = gab . Therefore, we have the following definition:

Definition 1 A diffeomorphism φ : M → M is called an isometric isomorphism,


or isometry for short, if φ ∗ gab = gab .

Remark 1 ① An isometry is a special diffeomorphism that “preserves the metric”,


namely φ ∗ gab = gab . Note that this is an equality of tensor fields, which means
the two tensors gab | p and φ ∗ gab | p at each point p are equal. ② From φ −1∗ ◦ φ ∗ =
(φ ◦ φ −1 )∗ = identity (see Exercise 4.5(c)) it is easy to see that φ : M → M is an
isometry if and only if φ −1 : M → M is an isometry.

Among all the vector fields on a manifold M there is a special class of vector
fields, namely the smooth vector fields. Each smooth vector field gives rise to a
one-parameter group of diffeomorphisms.3 If a metric field gab is assigned to M,
then we can also pick a special subclass among all the smooth vector fields, in
which the one-parameter group of diffeomorphisms given by each vector field is a
one-parameter group of isometries; that is, each group element φt : M → M is an
isometry. Therefore, we have the following definition:

Definition 2 A vector field ξ a on (M, gab ) is called a Killing vector field if its
one-parameter (local) group of diffeomorphisms is a one-parameter (local) group of
isometries. Equivalently (motivated readers should verify this), ξ a is called a Killing
vector field if Lξ gab = 0.

3 We do not require the vector field to be complete. When talking about an incomplete vector field,
the one-parameter group of diffeomorphisms refers to its one-parameter local group of diffeomor-
phisms.
114 4 Lie Derivatives, Killing Fields and Hypersurfaces

Theorem 4.3.1 The necessary and sufficient condition for ξ a to be a Killing vector
field on (M, gab ) is that ξ a satisfies the following Killing equation:

∇a ξb + ∇b ξa = 0 , or ∇(a ξb) = 0 , or ∇a ξb = ∇[a ξb] , (4.3.1)

where ∇a is the torsion-free operator associated with gbc (∇a gbc = 0).
Proof For any vector field ξ a , it follows from (4.2.8) that

Lξ gab = ∇a ξb + ∇b ξa , (4.3.1 )

where we used the fact that ∇a gbc = 0. By definition, ξ a being a Killing vector field
is equivalent to Lξ gab = 0. Hence, the necessary and sufficient condition for ξ a to
be a Killing vector field is that it satisfies the Killing equation ∇a ξb + ∇b ξa = 0. 
Theorem 4.3.2 If there exists a coordinate system {x μ } such that all the com-
ponents of gab satisfy ∂gμν /∂ x 1 = 0, then (∂/∂ x 1 )a is a Killing vector on the
coordinate patch.
Proof {x μ } is a coordinate system adapted to (∂/∂ x 1 )a . From (4.2.3) we can see that
(L∂/∂ x 1 g)μν = ∂gμν /∂ x 1 = 0, and hence L∂/∂ x 1 gab = 0, i.e., (∂/∂ x 1 )a is a Killing
vector field. 
Theorem 4.3.3 Suppose ξ a is a Killing vector field, and T a is the tangent of a
geodesic, then T a ∇a (T b ξb ) = 0, i.e., T b ξb is a constant along the geodesic.
Proof T a ∇a (T b ξb ) = ξb T a ∇a T b + T b T a ∇a ξb = T b T a ∇a ξb = 0, where the defi-
nition of a geodesic is used in the second equality, and Theorem 4.3.1 (i.e.,
∇a ξb = ∇[a ξb] ) and Theorem 2.6.2 (d) are used in the third equality. 
Suppose ξ a and ηa are Killing vector fields, α and β are real constants, then from
the linearity of the Killing equation we know that αξ a + βηa is also a Killing vector
field. It is not difficult to see that the collection of all the Killing vector fields on M
is a vector space. It can also be proved (Exercise 4.13) that the commutator [ξ, η]a
is also a Killing vector field.
Theorem 4.3.4 There are at most n(n + 1)/2 independent Killing vector fields
(n ≡ dim M) on (M, gab ). That is, the dimension of the collection of all the Killing
vector fields on M (as a vector space) is less than or equal to n(n + 1)/2.
Proof See Wald (1984) pp. 442–443. 
Remark 2 ① Isometries can be viewed as some kind of symmetry transformations
that “preserve the metric”, and thus a Killing vector field represents a symmetry of
(M, gab ). A generalized Riemannian space that has n(n + 1)/2 independent Killing
vector fields is called a maximally symmetric space. ② The general method of finding
all the Killing vector fields on (M, gab ) is to find the general solution of the Killing
equation. However, for some (M, gab ) that are relatively simple, there also exist
methods that are a lot easier. We provide several examples below.
4.3 Killing Vector Fields 115

Example 1 Find all the independent Killing vector fields of the following generalized
Riemannian spaces.
(1) 2-dimensional Euclidean space (R2 , δab ). Suppose {x, y} is a Cartesian coor-
dinate system, then ds 2 = dx 2 + dy 2 , i.e., all the components of the Euclidean metric
δab in this coordinate system are constant. Hence, it follows from Theorem 4.3.2 that
(∂/∂ x)a and (∂/∂ y)a are Killing vector fields. We believe that a Euclidean space
is maximally symmetric, and it follows from Theorem 4.3.4 that there should be
three independent Killing Fields when n = 2. As expected, if we change to a polar
coordinate system, then ds 2 = dr 2 + r 2 dϕ 2 , and thus all of the components of δab
in this coordinate system are independent of ϕ. Therefore, it follows from Theorem
4.3.2 that (∂/∂ϕ)a is a Killing vector field. The expanded form of it in the coor-
dinate basis of a Cartesian coordinate basis is (∂/∂ϕ)a = −y(∂/∂ x)a + x(∂/∂ y)a .
The coefficients of the expansion depends on the coordinates, from which it is not
difficult to show that (∂/∂ϕ)a is independent of the first two Killing fields. (∂/∂ x)a
and (∂/∂ y)a being Killing reflects the translational invariance of the 2-dimensional
Euclidean metric along the x- and y-axes, while (∂/∂ϕ)a being Killing manifests the
rotational invariance of this metric.
(2) 3-dimensional Euclidean space (R3 , δab ). Since n = 3, there are six inde-
pendent Killing fields, namely (∂/∂ x)a , (∂/∂ y)a , (∂/∂z)a , −y(∂/∂ x)a + x(∂/∂ y)a ,
−z(∂/∂ y)a + y(∂/∂z)a and −x(∂/∂z)a + z(∂/∂ x)a . The first three reflect the trans-
lational invariance of the 3-dimensional Euclidean metric along the x-, y- and z-axes,
while the last three reflect the rotational invariance of the metric along the z, x, y
axes, respectively.
(3) 2-dimensional Minkowski space (R2 , ηab ). In a Lorentzian coordinate system
{t, x} we have ds 2 = −dt 2 + dx 2 , and thus we see that (∂/∂t)a and (∂/∂ x)a are
Killing fields. To find the third one, we define new coordinates ψ and η as follows:

x = ψ cosh η , t = ψ sinh η , 0 < ψ < ∞, −∞ < η < ∞ . (4.3.2)

The Minkowski line element can be expressed in terms of the new coordinates as
ds 2 = dψ 2 − ψ 2 dη2 . This expression indicates that all the components of ηab in the
new coordinate system are independent of the coordinate η, and hence (∂/∂η)a is
also a Killing vector field (whose integral curves are hyperbolas). The expanded form
of it in the coordinate basis of a Lorentzian coordinate basis is

(∂/∂η)a = t (∂/∂ x)a + x(∂/∂t)a . (4.3.3)

From the fact that the coefficients of the expansion are coordinate dependent we
can see that (∂/∂η)a is independent of the first two Killing fields. The coordinate
patch of η and ψ defined by (4.3.2) is just an open subset of R2 which is restricted
by x > |t| (see region A in Fig. 4.3). However, (4.3.3) is defined on the whole R2 ,
and it is not difficult to verify that (∂/∂η)a is a Killing field on R2 . It is timelike
in the regions A and B in Fig. 4.3, spacelike in the regions C and D, and null on
the two lines with 45◦ tilt. t (∂/∂ x)a + x(∂/∂t)a is called the boost Killing vector
116 4 Lie Derivatives, Killing Fields and Hypersurfaces

Fig. 4.3 The boost Killing t


vector field (∂/∂η)a is
timelike in the regions A and C q
B, spacelike in the regions C p
and D, and null on the two
45◦ lines B A
x

D C( )

field, which indicates that the Minkowski metric has the invariance under a boost,
corresponding to the Lorentz transformation (for details, see Theorem 4.3.5).
(4) 4-dimensional Minkowski space (R4 , ηab ). Since n = 4, there are in total 10
independent Killing fields, divided into three groups:
(a) 4 translations (∂/∂t)a , (∂/∂ x)a , (∂/∂ y)a , (∂/∂z)a ;
(b) 3 spatial rotations
−y(∂/∂ x)a + x(∂/∂ y)a , −z(∂/∂ y)a + y(∂/∂z)a , −x(∂/∂z)a + z(∂/∂ x)a ;
(c) 3 boosts t (∂/∂ x)a + x(∂/∂t)a , t (∂/∂ y)a + y(∂/∂t)a , t (∂/∂z)a + z(∂/∂t)a .
Group (a) reflects the translational invariance of the Minkowski metric along the
t-, x-, y-, z-axes; group (b) reflects the spatial rotational invariance with respect to
z-, x-, y-axes, respectively; group (c) reflects the invariance under the boosts within
the t x-, t y-, t z-planes.

In Sect. 4.1 we already introduced the active and passive viewpoints of a diffeo-
morphism (in which the former is a transformation of points and tensor fields while
in the latter is a coordinate transformation) and their relationship (a transformation of
points induces a coordinate transformation). Now that we know an isometry is a spe-
cial diffeomorphism, we can expect that the coordinate transformation induced by an
isometry is also a special coordinate transformation. In fact, this is true! First, we will
use the 2-dimensional Euclidean space (R2 , δab ) as an example. Each Killing vector
field will give rise to a one-parameter group of isometries {φλ : R2 → R2 | λ ∈ R}.
From the active viewpoint, there are three kinds of isometries in this group, i.e., three
independent Killing vector fields:
① Translational Killing vector field (∂/∂ x)a . It induces the translation along the
x-direction, which can be expressed as x  = x + λ, y  = y; (It is not difficult to prove
by following the proof of Theorem 4.3.5; the expressions in ② and ③ can be proved
similarly.)
② Translational Killing vector field (∂/∂ y)a . It induces the translation along the
y-direction, which can be expressed as x  = x, y  = y + λ;
③ Rotational Killing vector field (∂/∂ϕ)a = −y(∂/∂ x)a + x(∂/∂ y)a . It induces
the rotation with respect to the origin, which can be expressed using polar coordinates
as r  = r , ϕ  = ϕ + λ, or expressed using Cartesian coordinates as x  = x cos λ −
y sin λ, y  = x sin λ + y cos λ.
Now we look at the 2-dimensional Minkowski space (R2 , ηab ). It also contains
three kinds of isometries, i.e., three independent Killing vector fields:
4.3 Killing Vector Fields 117

① Time translational Killing vector field (∂/∂t)a . It induces the time translation
along the t-direction, which can be expressed as t  = t + λ, x  = x (where x and t
are Lorentzian coordinates);
② Spatial translational Killing vector field (∂/∂ x)a . It induces the spatial transla-
tion along the x-direction, which can be expressed as t  = t, x  = x + λ;
③ Boost Killing vector field (∂/∂η)a = t (∂/∂ x)a + x(∂/∂t)a . The coordinate
transformation it induces is the well-known Lorentz transformation, see the following
theorem:

Theorem 4.3.5 Suppose {x, t} is the Lorentzian coordinate system of the 2-


dimensional Minkowski space (R2 , ηab ), φλ : R2 → R2 is a group element of the
one-parameter group of isometries (i.e., φλ the isometry labeled by the parameter
λ ∈ R) that corresponds to the boost Killing field ξ a ≡ t (∂/∂ x)a + x(∂/∂t)a , then
the coordinate transformation {x, t} → {x  , t  } induced by φλ is a Lorentz transfor-
mation.

Remark 3 This theorem indicates that a boost and a Lorentz transformation are two
different wordings (active and passive) of the same transformation.

Proof The parametric equations of an integral curve of the vector field satisfy-
ing ξ a ≡ (∂/∂η)a are dx μ (η)/dη = ξ μ (μ = 0, 1). Noticing that ξ a ≡ t (∂/∂ x)a +
x(∂/∂t)a [see (4.3.3)], we have

dx(η) dt (η)
= t (η) , = x(η) . (4.3.4)
dη dη

∀ p ∈ R2 , suppose C(η) is the integral curve that satisfies p = C(0), i.e., x(0) = x p ,
t (0) = t p , then it is not difficult to prove that (4.3.4) have the particular solutions
(i.e., the parametric equations of the curve)

x(η) = x p cosh η + t p sinh η , t (η) = x p sinh η + t p cosh η . (4.3.5)

Suppose q ≡ φλ ( p), then q is the point on C(η) that has the parameter η = λ, i.e.,
q = C(λ). Hence, the new coordinates t  and x  induced by φλ satisfy

x p ≡ xq = x p cosh λ + t p sinh λ , t p ≡ tq = x p sinh λ + t p cosh λ .

Since p is arbitrary, we can drop the subscript p and write

x  = x cosh λ + t sinh λ = cosh λ(x + t tanh λ) ,


t  = t cosh λ + x sinh λ = cosh λ(t + x tanh λ) . (4.3.6)

Let v ≡ tanh λ, γ ≡ (1 − v2 )−1/2 = cosh λ, then

x  = γ (x + vt) , t  = γ (t + vx) . (4.3.7)


118 4 Lie Derivatives, Killing Fields and Hypersurfaces

This is exactly the well-known Lorentz transformation. (Note that we have applied
the system of geometrized units, where the speed of light c = 1.) 
[Optional Reading 4.3.1]
For any point p in R2 , C(η) in the proof above is a complete curve, i.e., η ∈ (−∞, ∞).
If p is in the region A or B, then C(η) is timelike; if p is in the region C or D, then C(η) is
spacelike; if p is on the lines with 45◦ tilt, then C(η) is null. The most special case is that
p = (0, 0), i.e., p is the origin of the {t, x} system, where C(η) = p (a single-point curve).
Thus, each line with 45◦ tilt is not one integral curve but the union of 3 integral curves,
in which the first and second ones are the upper and lower halves (excluding the origin),
respectively, and the third one is the single-point curve { p}. The range of the parameter of
these 3 lines are all (−∞, ∞).
[The End of Optional Reading 4.3.1]

It is easy to obtain from ds 2 = −dt 2 + dx 2 and (4.3.7) that ds 2 = −dt 2 + dx 2 ,


and thus the coordinate transformation induced by the isometry that corresponds to
a boost turns a Lorentzian system {t, x} into another Lorentzian system {t  , x  }. This
conclusion can be generalized to the following theorem:

Theorem 4.3.6 Suppose {x μ } is a Lorentzian coordinate system of (Rn , ηab ), then


the necessary and sufficient condition for {x μ } to also be a Lorentzian coordinate
system is that it is induced by {x μ } through an isometry φ : Rn → Rn .

Proof Denote ηab by gab , and denote its components in coordinate systems {x μ } and
{x μ } as gμν and gμν 
, respectively.
(A) Suppose φ : Rn → Rn is an isometry (i.e., φ ∗ gab = gab ), and {x μ } is the
coordinate system induced by the Lorentzian system {x μ } through φ, then ∀ p ∈ Rn

we have gμν | p = (φ∗ g)μν |φ( p) = (φ −1∗ g)μν |φ( p) = gμν |φ( p) = ημν , where (4.1.6) is
used in the first equality, the third equality comes from the fact that φ being an
isometry makes φ −1 an isometry, and the fourth equality comes from the fact that
{x μ } is Lorentzian. This equation shows that the components of gab at p in the system
{x μ } are ημν , and hence {x μ } is a Lorentzian system.
(B) Suppose {x μ } and {x μ } are both Lorentzian coordinate systems, φ : Rn → Rn
is the diffeomorphism that corresponds to the coordinate transformation {x μ } →
{x μ }, then ∀ p ∈ Rn we have (φ −1∗ g)μν | p = (φ∗ g)μν | p = gμν 
|φ −1 ( p) = ημν = gμν | p ,
where (4.1.6) is used in the second equality, while the third and fourth equalities come
from the fact that {x μ } and {x μ } are Lorentzian. This indicates that φ −1∗ gab = gab ,
and hence φ −1 is (which means φ is also) an isometry. 

Remark 4 This theorem can also be applied to Euclidean space, where one only
needs to change the Lorentzian system to a Cartesian system. We therefore can
say that under an isometry a Lorentzian (or Cartesian) coordinate system remains
Lorentzian (or Cartesian).
4.4 Hypersurfaces 119

4.4 Hypersurfaces

Definition 1 Suppose M and S are manifolds, dim S  dim M ≡ n. A map φ : S →


M is called an embedding if φ is one-to-one and C ∞ , and ∀ p ∈ S, the pushforward
map φ∗ : V p → Vφ( p) is non-degenerate [Vφ( p) is the tangent space at the point φ( p)
in M], i.e., φ∗ va = 0 ⇒ va = 0.

Remark 1 The above conditions for embedding makes it so the topology and mani-
fold structure of S can be naturally carried to φ[S], and hence makes φ : S → φ[S]
a diffeomorphism.

Definition 2 An embedding φ : S → M is called an embedded submanifold of M,


or a submanifold of M for short. The image φ[S] is also often called an embedded
submanifold. If dim S = n − 1, then φ[S] ⊂ M is called a hypersurface of M.

Example 1 Suppose U is an open subset of M, and restrict the manifold structure


of M on U , then U is a manifold with the same dimension of M. Consider U as the
S in Definition 1, and set φ : U → M to be the identity map, then U ≡ φ[U ] is an
embedded submanifold (of the same dimension).

Example 2 Suppose S is the unit sphere S 2 in R3 (viewed as M), then the identity
map φ : S 2 → R3 gives rise to an embedded submanifold of R3 . Noticing that S 2
has one lower dimension than R3 , we conclude that S 2 is a hypersurface of R3 .
[Optional Reading 4.4.1]
An embedded submanifold φ[S] has two topologies, one is the topology that comes
naturally from the embedding (see Remark 1), and the other is the topology on φ[S] (as
a subset of M) induced by M (see Example 5 in Sect. 1.2). These two topologies are not
necessarily the same. However, if we further require them to be the same, then we impose a
stricter requirement on the embedding. An embedding satisfying this additional requirement
is called a regular embedding [see Chern et al. (1999)]. The term “embedding” in some
works [e.g., Hawking and Ellis (1973)] actually refers to a regular embedding. Suppose
S = R, and M = R2 , then an embedding φ : S → M is a smooth curve in R2 . The one-
to-one condition of φ in the definition does not allow the embedded submanifold to be a
self-intersecting curve (such as the figure-eight shaped curve in Fig. 4.4). Is the curve that
is “arbitrarily close to self-intersecting” but not self-intersecting in Fig. 4.5 an embedded
submanifold? The answer is: it is an embedded submanifold but not a regular embedded
submanifold. From now on, most of the cases in this text where we talk about an embedded
submanifold will refer to a regular embedded submanifold.
[The End of Optional Reading 4.4.1]

Suppose φ[S] is a hypersurface of M, and q ∈ φ[S] ⊂ M. As a point in M, q has


an n-dimensional tangent space Vq . If wa ∈ Vq is a tangent vector of a curve passing
through q and lying on φ[S] (“lying on” means each point of the curve is on φ[S]),
then we say wa is tangent to φ[S]. Use Wq to denote the subset of Vq that formed by all
the elements which are tangent to φ[S]. The definition of a hypersurface assures that
Wq is an (n − 1)-dimensional submanifold of Vq . Speaking of a hypersurface, one
may naturally think of its normal vectors. Suppose φ[S] is a hypersurface, q ∈ φ[S],
120 4 Lie Derivatives, Killing Fields and Hypersurfaces

Fig. 4.4 A self-intersecting IR


IR2
curve is not an embedded
submanifold

[IR]

Fig. 4.5 A curve that is +

8
IR2
“arbitrarily close to
self-intersecting” is an
embedded submanifold but
not a regular embedded b
(b)
submanifold +

8
(a)

8
a

then a normal vector n a at q should be defined as a vector that is orthogonal to all the
vectors tangent to φ[S]. However, orthogonality is only meaningful after a metric is
assigned. When there is no metric on M, one cannot define a normal vector n a , but
can instead define a “normal covector” n a . Covector is another name for dual vector.
Since a dual vector gives a real number when acting on a vector (with no need for a
metric), a normal covector can be defined as follows:

Definition 3 Suppose φ[S] is a hypersurface, q ∈ φ[S]. A nonzero dual vector n a ∈


Vq∗ is called a normal covector of φ[S] at q if n a wa = 0, ∀wa ∈ Wq .

Theorem 4.4.1 There exists a normal covector n a at each point q on a hypersurface


φ[S]. The normal covector at q is unique up to a numerical factor.

Proof Suppose {(e2 )a , . . . , (en )a } is an arbitrary basis of Wq . Since dim Vq = n, there


must be elements in Vq that are linearly independent of {(e2 )a , . . . , (en )a }. Choose any
one of such elements and denote it by (e1 )a , then {(eμ )a | μ = 1, . . . , n} is a basis of
Vq , whose dual basis is denoted by {(eμ )a }. Set n a = (e1 )a , then n a (eτ )a = δ 1 τ = 0
(τ = 2, . . . , n). Hence n a wa = 0 ∀wa ∈ Wq , and thus n a is a normal covector. If
there exists m a that satisfies m a (eτ )a = 0 (τ = 2, . . . , n), then its components in the
dual basis {(eμ )a } are m τ = m a (eτ )a = 0 (τ = 2, . . . , n), and thus m a = m 1 (e1 )a =
m 1 n a , i.e., m a and n a only differ by multiplication by a numerical factor m 1 . 

Remark 2 A normal covector of an embedded submanifold that is not a hypersurface


does not have a uniqueness like this.
4.4 Hypersurfaces 121

[Optional Reading 4.4.2]


Suppose x, y, z are the natural coordinates of R3 . Consider a function f = ax + by + cz
(where at least one of the constants a, b, c is nonzero), then the points in R3 that satisfy f = 0
will form a hypersurface (a plane) in R3 . If f = x 2 + y 2 + z 2 − a 2 , a = 0, then the equation
f = 0 represents another hypersurface (a sphere). However, if f = x 2 + y 2 + z 2 , then only
the origin satisfies f = 0, and therefore f = 0 does not at all represent a hypersphere. The
key point is that in this case we have d f | f =0 = 0. Another extreme example is the case where
f : R3 → R is defined as f ( p) = 0 ∀ p ∈ R3 . In this case the subset of points that satisfy
f = 0 is R3 itself, and thus is not a hypersurface either. The key point is still d f | f =0 = 0.
Generalizing to the cases where f is a smooth function on an arbitrary manifold M, it can
be proved that, as long as d f | f =c = 0 (i.e., ∇a f | f =c = 0), then f = c (constant) gives a
hypersurface in M [for details, see Chillingworth (1976) pp. 156–158].
[The End of Optional Reading 4.4.2]

Theorem 4.4.2 Let φ[S] represent the hypersurface defined by f = constant. Sup-
pose q ∈ φ[S], and ∇a f |q = 0, then ∇a f |q is a normal covector of φ[S] at q.

Proof All we have to prove is that, for any q ∈ φ[S], we have wa ∇a f = 0, ∀wa ∈ Wq .
Since wa is always tangent to a curve C(t) lying on φ[S] and passing through q, we
get wa ∇a f = ∂t∂ ( f ) = 0 ∀wa ∈ Wq , where the last step is because f is a constant
on C(t). 

Suppose n a is a normal covector of φ[S] at q. If there is a metric gab on M, then


n a ≡ g ab n b ∈ Vq is orthogonal to all tangent vectors at q on φ[S] (since gab n a wb =
n b wb = 0 ∀wb ∈ Wb ), and hence n a is called a normal vector of the hypersurface
φ[S] at q. If gab is positive definite (e.g., R2 embedded into 3-dimensional Euclidean
space), n a certainly does not belong to Wq , i.e., n a ∈ Vq − Wq ; however, if gab is
Lorentzian, then it is possible that n a belongs to Wq . Now we will discuss the case
where gab is a Lorentzian metric.

Theorem 4.4.3 Suppose n a is a normal vector of φ[S] at q, then a necessary and


sufficient condition for n a ∈ Wq is n a n a = 0.

Proof
(A) Suppose n a ∈ Wq . Since n a is a normal covector of φ[S], regarding the wa
in Definition 3 as the n a in the present expression n a n a , we have n a n a = 0.
(B) From the proof of Theorem 4.4.1 we know that for any normal covector n a
there exists a basis {(eμ )a } such that (e2 )a , . . . , (en )a ∈ Wq and n a = (e1 )a ; hence,
for the first component of n a in this basis we have n 1 = n a (e1 )a = n a n a . Therefore,
n n a = 0 ⇒ n = 0 ⇒ n = nτ =2 n τ (eτ )a ∈ Wq .
a 1 a


Example 3 Suppose S = R, M = R2 , the metric on M is gab = ηab , and φ : R →


R2 is an embedding, then φ[R] is a hypersurface in the 2-dimensional Minkowski
spacetime. Suppose t and x are Lorentzian coordinates. Here we discuss the following
three representative cases [the noteworthy one is case (3) where the normal vector is
null]:
(1) φ[R] is parallel to the x-axis [see Fig. 4.6a]. ∀q ∈ φ[R], let (e2 )a = (∂/∂ x)a ,
and choose (e1 )a = α(∂/∂t)a + β(∂/∂ x)a , (α, β can be arbitrary real numbers, but
α = 0) then it is not difficult to verify that (e1 )a = α −1 (dt)a . From the proof of
122 4 Lie Derivatives, Killing Fields and Hypersurfaces

na [IR] (e1)a
[IR]
(e2)a
(e2)a

(e2)a q
q (e1)a
[IR]
(e1)a q na na
(a) n a n a < 0 (spacelike hypersurface). (b) n a n a > 0 (timelike hyper- (c) n a n a = 0 (null hypersurface).
surface).

Fig. 4.6 Three cases of embedding R into R2 (t-axis points vertically upwards, x-axis points
horizontally to the right)

Theorem 4.4.1 we can see that (e1 )a is a normal covector n a whose corresponding
normal vector is n a = α −1 g ab (dt)b = −α −1 (∂/∂t)a , satisfying n a ∈
/ Wq and n a n a <
a
0 (i.e., n is timelike).
(2) φ[R] is parallel to the t-axis [see Fig. 4.6b]. ∀q ∈ φ[R], let (e2 )a = (∂/∂t)a ,
and choose (e1 )a = α(∂/∂t)a + β(∂/∂ x)a , (α, β can be arbitrary real numbers, but
β = 0.) then (e1 )a = β −1 (dx)a . Take (e1 )a to be the normal covector n a whose
corresponding normal vector is n a = β −1 (∂/∂ x)a , satisfying n a ∈ / Wq and n a n a > 0
a
(i.e., n is spacelike).
(3) φ[R] makes an angle of 45◦ with the x-axis (in Euclidean) [see Fig. 4.6c]. ∀q ∈
φ[R], let (e2 )a = (∂/∂t)a + (∂/∂ x)a , and choose (e1 )a = α(∂/∂t)a + β(∂/∂ x)a ,
α = β, then (e1 )a = (α − β)−1 [(dt)a − (dx)a ]. Take (e1 )a to be the normal cov-
ector n a whose corresponding normal vector is

n a = (α − β)−1 g ab [(dt)b − (dx)b ] = −(α − β)−1 [(∂/∂t)a + (∂/∂ x)a ] = −(α − β)−1 (e2 )a ,

satisfying n a ∈ Wq and n a n a = 0 (i.e., n a is null). In this case, the normal vector n a


of the hypersurface is not only perpendicular to all the vectors at q tangent to the
surface, but itself is also one of these tangent vectors!

Definition 4 A hypersurface is said to be spacelike if its normal vectors are every-


where timelike (n a n a < 0); a hypersurface is said to be timelike if its normal vectors
are everywhere spacelike (n a n a > 0); a hypersurface is said to be null or lightlike
if its normal vectors are everywhere null (n a n a = 0).

If n a n a = 0, when we talk about a normal vector later on, we will regard it as a


normalized normal vector, i.e., n a n a = ±1.

Definition 5 Suppose φ[S] is an embedding submanifold (not necessarily a hyper-


surface) in M. Let Wq be the tangent space at an arbitrary point q ∈ φ[S] that is
tangent to φ[S]. A tensor h ab on Wq is called the induced metric derived from the
metric gab on Vq if

h ab w1a w2b = gab w1a w2b , ∀w1a , w2b ∈ Wq . (4.4.1)


4.4 Hypersurfaces 123

The induced metric h ab is essentially the result of restricting the acting target
of gab of Vq to Wq . Since h ab is defined pointwisely on φ[S], it gives rise to an
induced metric field on φ[S]. When φ[S] is a timelike or spacelike hypersurface,
the induced metric can be conveniently expressed by the normalized normal vector
(n a n a = ±1) as

h ab = gab ∓ n a n b . (− when n a n a = +1, and + when n a n a = −1.) (4.4.2)

It is easy to see that ∀w1a , w2b ∈ Wq we have h ab w1a w2a = gab w1a w2a ∓ n a w1a n b w2b =
gab w1a w2a , which satisfies (4.4.1). However, there are actually many h ab that satisfy
(4.4.1), why do we only use the one defined by (4.4.2)? For the reason, see Optional
Reading 4.4.3.
[Optional Reading 4.4.3]
For convenience, we suppose Vq to be 4-dimensional (and thus Wq is 3-dimensional). As
an induced metric (a metric on Wq ), h ab in (4.4.1) is a tensor on Wq (a 3-dimensional tensor),
i.e., h ab ∈ TWq (0, 2) (which cannot act on elements in Vq − Wq ). However, for the conve-
nience of performing the 4-dimensional calculation, we want to find a 4-dimensional tensor
of type (0, 2) [i.e., an element of TVq (0, 2)], which can represent the 3-dimensional tensor
h ab . h ab ≡ gab ∓ n a n b is such a 4-dimensional tensor (note that both terms on the right-
hand side are 4-dimensional tensors). To distinguish from the h ab in (4.4.1), we temporarily
denote the h ab in h ab ≡ gab ∓ n a n b as h̄ ab . It can be proved that TVq (0, 2) has a sub-
set SVq (0, 2) ≡ {Tab ∈ TVq (0, 2) | Tab n a = 0, Tab n b = 0} that is naturally isomorphic to
TWq (0, 2), and thus SVq (0, 2) and TWq (0, 2) can be naturally identified (for details see Chap.
14). It is easy to see that gab ∈/ SVq (0, 2) while h̄ ab ∈ SVq (0, 2), and h̄ ab w1a w2b = gab w1a w2b
∀w1 , w2 ∈ Wq ; thus, one can identify h̄ ab as h ab . It can also be proved (left to the reader)
a b

that the only element in SVq (0, 2) that satisfies (4.1.1) (and thus can serve as h ab ) is h̄ ab ,
this is the reason why we regard the 4-dimensional tensor h̄ ab ≡ gab ∓ n a n b as the induced
metric. From now on, we will not distinguish the notation of h̄ ab and h ab .
The above conclusion about tensors of type (0, 2) can also be generalized as follows:
a special subset of TVq (0, l), namely {Ta1 ···al ∈ TVq (0, l) | the contraction on n a and any
index of Ta1 ···al vanishes}, is naturally isomorphic to TWq (0, l), and thus they can be natu-
rally identified. This identification makes it possible to substitute the elements of the former
one for the elements of the latter one when discussing and writing equations, which brings
us great convenience.
[The End of Optional Reading 4.4.3]

Remark 3 Equation (4.4.2) also holds when gab is positive definite (just change the
sign ∓ to −). As an exercise, the reader should write down the expression of expand-
ing the 3-dimensional Euclidean metric using the dual vector basis of a spherical
coordinate system, and verify that the induced metric h ab = gab − n a n b on the sphere
is the same as the induced metric ĝab defined in Example 2 of Sect. 3.3. [Hint: the
normalized normal covector on a sphere is n a = (dr )a .]

Suppose φ[S] is a timelike or spacelike hypersuface, q ∈ φ[S], and h ab satisfies


(4.4.2). Let
h a b ≡ g ac h cb = δ a b ∓ n a n b , (4.4.3)
124 4 Lie Derivatives, Killing Fields and Hypersurfaces

Fig. 4.7 va ∈ Vq is va
decomposed into the normal + n anb v b
component ±n a (n b v b ) and
the tangential component
h a b v b ∈ Wq

a
h bv b
q
[S ]

then ∀va ∈ Vq we have h a b vb = va ∓ n a (n b vb ), or

va = h a b vb ± n a (n b vb ) . (4.4.4)

The above equation represents a decomposition of the vector va (Fig. 4.7), where
±n a (n b vb ) is parallel to n a , called the normal component, and h ab vb is perpendicular
to n a [since n a (h ab vb ) = 0], called the tangential component (the component tangent
to φ[S]). h ab is called the projection map from Vq to Wq .
Theorem 4.4.4 The induced “metric” on a null hypersurface is degenerate (and
thus there is no induced metric).
Proof Let h ab represent the induced “metric”. The hypersurface being null leads to
the result that n a ∈ Wq (see Theorem 4.4.3), and hence there is a nonzero element
n a in Wq such that h ab n a wb = gab n a wb = 0, ∀wa ∈ Wq . Thus, h ab is a degenerate
tensor on Wq . 
Example 4 Suppose t, x, y, z are Lorentzian coordinates of the 4-dimensional Minkowski
space (R4 , ηab ), r, θ, ϕ are the spherical coordinates corresponding to x, y, z, then
ηab can be expressed in terms of the dual coordinate basis vectors as

ηab = −(dt)a (dt)b + (dr )a (dr )b + r 2 (dθ )a (dθ )b + r 2 sin2 θ (dϕ)a (dϕ)b . (4.4.5)

The equation t − r = 0 defines a null hypersurface S , which is a cone with the


origin as its apex (see Fig. 4.8). ∀q ∈ S ⊂ R4 (q is not at the apex), we have a
4-dimensional tangent space Vq and a 3-dimensional tangent space (tangent to S )
Wq ⊂ Vq . Let

n a |q ≡ (∂/∂t)a |q + (∂/∂r )a |q (the subscript q will be omitted below) ,

Fig. 4.8 The induced


“metric” of a null
hypersurface S is
degenerate
4.4 Hypersurfaces 125

then n a is a null normal vector of S at q, and hence n a ∈ Wq . Therefore, {(∂/∂θ )a ,


(∂/∂ϕ)a , n a } is a basis of Wq . Now we calculate the components h μν of the “metric”
h ab induced by ηab on Wq .

h θθ ≡ h ab (∂/∂θ )a (∂/∂θ )b = ηab (∂/∂θ )a (∂/∂θ )b = r 2 ,

where (4.4.1) is used in the second equality and (4.4.5) is used in the third equality.
Similarly, we have h φφ = r 2 sin2 θ , and the third diagonal element of h μν (denoted
by h nn ) is

h nn ≡ h ab n a n b = ηab [(∂/∂t)a + (∂/∂r )a ][(∂/∂t)b + (∂/∂r )b ] = −1 + 1 = 0 .

Also, it is easy to verify that all of the non-diagonal elements vanish. Hence,
⎡ ⎤
r2 0 0
(h μν ) = ⎣ 0 r 2 sin2 θ 0 ⎦ ,
0 0 0

and therefore h ab is degenerate [we also say its “signature” is (+, +, 0)]. Thus,
ηab does not have an induced metric on the null hypersurface S . However, the
intersection S of S and an arbitrary constant-t surface (t > 0) is a 2-dimensional
sphere with a radius r = t. Let Ŵq ⊂ Wq represent the subspace formed by all the
elements in Wq that are tangent to S (see Fig. 4.8), then ηab does have an induced
metric on Ŵq , denoted by ĥ ab . Also, it is not difficult to verify that

ĥ ab = r 2 (dθ )a (dθ )b + r 2 sin2 θ (dϕ)a (dϕ)b . (4.4.6)

It is not difficult for the reader to discuss the null hypersurface in (R4 , ηab ) defined
by t − z = 0 in a similar manner.
In the discussion so far, we have considered an embedded submanifold φ[S] as
the image of the embedding map φ : S → M for convenience. However, sometimes
it is useful to regard the map φ itself as a submanifold, as it was originally defined
in Definition 2. In this case, the induced metric in Definition 5 can be equivalently
defined as follows:

Definition 5 Suppose (M, gab ) is a generalized Riemannian space, and φ : S → M


is an embedding submanifold of M. Let Wq be the tangent space at an arbitrary point
q ∈ S. A metric h ab on Wq is called the induced metric derived from gab if

h ab w1a w2b = gab (φ∗ w1a )(φ∗ w2b ) , ∀w1a , w2b ∈ Wq . (4.4.7)

Since, by definition, gab (φ∗ w1a )(φ∗ w2b ) = (φ ∗ gab )w1a w2b , and since w1 a and w2 a are
arbitrary, the above equation can also be written simply as

h ab = φ ∗ gab . (4.4.8)
126 4 Lie Derivatives, Killing Fields and Hypersurfaces

Note that the above definition is valid at any point q ∈ S. Since q is arbitrary, the
induced metric as a tensor field on S is essentially the pullback of gab on M.

Theorem 4.4.5 Suppose (M, gab ) is a generalized Riemannian space, and φ : S →


M is an embedded submanifold, with h ab = φ ∗ gab the induced metric on S. Suppose
ψ : M → M is a diffeomorphism, then (1) φ  = ψ ◦ φ is an embedded submanifold;

(2) the induced metric h ab = φ ∗ gab is equal to h ab when ψ is an isometry of gab .

Proof (1) Since both φ and ψ are one-to-one and C ∞ , so is φ  = ψ ◦ φ. Let W p


be the tangent space of S at an arbitrary point p ∈ S. For any wa ∈ W p , we have
φ∗ wa = (ψ ◦ φ)∗ wa = ψ∗ (φ∗ wa ) (see Exercise 4.5). Since ψ is a diffeomorphism,
ψ∗ : Vφ( p) → Vφ  ( p) is an isomorphism (see Exercise 4.4), and thus φ∗ wa = 0 implies
φ∗ wa = 0. Also, since φ∗ : W p → Vφ( p) is nondegenerate, φ∗ wa = 0 implies wa =
0. As a consequence, φ∗ : W p → Vφ  ( p) is nondegenerate at every p ∈ S. Therefore,
φ  : S → M is an embedded submanifold.

(2) Using the result of Exercise 4.5 (c), h ab = φ ∗ gab = (ψ ◦ φ)∗ gab = (φ ∗ ◦
ψ )gab = φ (ψ gab ). When ψ : M → M is an isometry of gab , we have ψ ∗ gab =
∗ ∗ ∗

gab , and so h ab = φ ∗ gab = h ab . 

Exercises

˜4.1. Show that (φ∗ v)a defined by (4.1.2) satisfies the two conditions for a vector
in Definition 2 of Sect. 2.2.
˜4.2. Prove Theorems 4.1.1, 4.1.2, 4.1.3.
4.3. Suppose φ : M → N is a smooth map, p ∈ M, and y μ are the coordinates
in a neighborhood of φ( p). Show that

(φ∗ v)a = v(φ ∗ y μ )(∂/∂ y μ )a , ∀va ∈ V p .

4.4. Suppose M and N are manifolds, φ : M → N is a diffeomorphism, p ∈ M,


and q ≡ φ( p). Show that the pushforward map φ∗ : V p → Vq is an isomor-
phism.
4.5. Suppose M, N , Q are manifolds, φ : M → N and ψ : N → Q are smooth
maps.
(a) Show that (ψ ◦ φ)∗ f = (φ ∗ ◦ ψ ∗ ) f , ∀ f ∈ F Q .
(b) Show that (ψ ◦ φ)∗ va = ψ∗ (φ∗ va ), ∀ p ∈ M, va ∈ V p .
(c) Regard both (ψ ◦ φ)∗ and φ ∗ ◦ ψ ∗ as maps from F Q (0, l) to F M (0, l).
Show that
(ψ ◦ φ)∗ = φ ∗ ◦ ψ ∗ .

4.6. Suppose φ : M → N is a diffeomorphism, va and u a are vector fields on M.


Show that φ∗ ([v, u]a ) = [φ∗ v, φ∗ u]a , where [v, u]a represents the commuta-
tor.
4.4 Hypersurfaces 127

˜4.7. Prove Theorem 4.2.4.


˜4.8. Suppose va ∈ F M (1, 0), ωa ∈ F M (0, 1). Show that for any coordinate sys-
tem {x μ } we have

∂ωμ ∂vν
(Lv ω)μ = vν ν
+ ων μ . Hint: use (4.2.7) and set the ∇a to be ∂a .
∂x ∂x

˜4.9. Suppose u a , va ∈ F M (1, 0), then the following equality holds when both
sides acting on a tensor field of any type:

[Lv , Lu ] = L[v,u] (where [Lv , Lu ] ≡ Lv Lu − Lu Lv ) .

Prove the case where the acting targets are respectively f ∈ F M and wa ∈
F M (1, 0). Hint: when the acting target is wa one can use the Jacobi identity
(Exercise 2.8).
4.10. Suppose Fab is an antisymmetric tensor field on 4-dimensional Minkowski
space, whose components in a Lorentzian coordinate system {t, x, y, z}
are F01 = −F13 = xρ −1 , F02 = −F23 = yρ −1 , F03 = F12 = 0, where ρ ≡
(x 2 + y 2 )1/2 . Show that Fab has rotational symmetry, i.e., Lv Fab = 0, where
va = −y(∂/∂ x)a + x(∂/∂ y)a .
4.11. Suppose ξ a is a Killing vector field on (M, gab ), and ∇a is associated with
gab . Show that ∇a ξ a = 0.
4.12. Suppose ξ a is a Killing vector field on (M, gab ), φ : M → M is an isome-
try. Show that φ∗ ξ a is also a Killing vector field on (M, gab ). Hint: use the
conclusion in Exercise 4.5(c).
4.13. Suppose ξ a and ηa are Killing vector fields on (M, gab ). Show that their
commutator [ξ, η]a is also a Killing vector field. NB: This conclusion makes
the collection of all Killing vector fields on M not only a vector space, but
also a Lie algebra (for details, see Appendix G in Volume II).
4.14. Suppose ξ a is a Killing vector field of a generalized Riemannian space
(M, gab ), and Rabc d is the Riemann curvature tensor of gab .
(a) Show that ∇a ∇b ξc = −Rbca d ξd . NB: This equation is significant for
proving Theorem 4.3.4. Hint: from the definition of Rabc d and the Killing
equation (4.3.1) we can see that ∇a ∇b ξc + ∇b ∇c ξa = Rabc d ξd . Refer this
to as the first equation. By substituting the indices a → b, b → c, c → a
we get the second equation, and by substituting twice we get the third
equation. Adding the first equation to the second equation and subtracting
the third equation, and using (3.4.7), one can prove the claim.
(b) Use the conclusion of (a) to show that ∇ a ∇a ξ c = −Rcd ξ d , where Rcd is
the Ricci tensor.
˜4.15. Verify that (∂/∂η)a in (4.3.3) indeed satisfies the Killing equation (4.3.1).
˜4.16. Find the coordinate transformation induced by an arbitrary element φa
from the one-parameter group of isometries generated by R a = x(∂/∂ y)a −
y(∂/∂ x)a in the 2-dimensional Euclidean space.
128 4 Lie Derivatives, Killing Fields and Hypersurfaces

*4.17. Suppose each point of a hypersurface φ[S] in a spacetime (M, gab ) has a null
tangent vector while it does not have any timelike tangent vector (“tangent
vector” means the vector is tangent to φ[S]). Show that this is a null hyper-
surface. Hints: ① show that any vector orthogonal to a timelike vector t a must
be spacelike [choose an orthonormal basis {(eμ )a } such that (e0 )a = t a ]; ②
show that each point on a timelike hypersurface has a timelike tangent vector;
③ prove the original claim from these two lemmas.

References

Chern, S. S., Chen, W. and Lam, K. S. (1999), Lectures on Differential Geometry, World Scientific
Publishing Company, Singapore.
Chillingworth, D. (1976), Differential Topology with a View to Applications, Pitman Publishing,
London.
Hawking, S. W. and Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Chapter 5
Differential Forms and Their Integrals

5.1 Differential Forms

We first introduce “forms” on an n-dimensional vector space V , and then discuss


“differential forms” on an n-dimensional manifold M.
Definition 1 ωa1 ···al ∈ TV (0, l) is called an l-form on V if

ωa1 ···al = ω[a1 ···al ] .

For convenience in writing, we will sometimes drop the lower indices and write an
l-form as ω.

Theorem 5.1.1 (a) ωa1 ···al = ω[a1 ···al ] ⇒ for any basis we have ωμ1 ···μl = ω[μ1 ···μl ] ;
(b) ∃ a basis such that ωμ1 ···μl = ω[μ1 ···μl ] ⇒ ωa1 ···al = ω[a1 ···al ] .

Proof Exercise. 

Theorem 5.1.2 Suppose ω is an l-form, then

(a) ωa1 ···al = δπ ωaπ(1) ···aπ(l) . (5.1.1)

[See the explanation after (2.6.14) for the meaning of δπ , aπ(1) , . . . , aπ(l) .] For exam-
ple, ωab = −ωba , ωabc = −ωacb = ωcab = · · · ;

(b)For any basis, ωμ1 ···μl = δπ ωμπ(1) ···μπ(l) . (5.1.1 )

Proof (a) See the proof of Theorem 2.6.1 (b);


(b) Exercise. 

It follows from (5.1.1 ) that any component ωμ1 ···μl of an l-form with repeated
indices must vanish, e.g.,

© Science Press 2023 129


C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_5
130 5 Differential Forms and Their Integrals

ω112 = ω133 = ω212 = 0 .

Denote the collection of all the l-forms on V by (l). A 1-form is actually a dual
vector on V , and hence (1) = V ∗ . We stipulate that any real number is called a
0-form on V , then (0) = R. Since an l-form is a tensor of type (0, l), we naturally
have (l) ⊂ TV (0, l). Moreover, it is easy to show that (l) is a linear subspace
of TV (0, l). The computation of the dimension of (l) can be inspired by the com-
putation of the dimension of TV (0, l) in Theorem 2.4.1: to find the dimension of
TV (0, l), one finds a basis first, and to do so one needs to define the tensor prod-
uct. However, the tensor product of two differential forms (as two tensors) is not
totally antisymmetric, and hence is no longer a differential form. Nonetheless, one
can totally antisymmetrize all its indices and make it a differential form. Thus, we
have the following definition:
Definition 2 Suppose ω and μ are respectively an l-form and an m-form, then their
wedge product is an (l + m)-form defined by the following equation:

(l + m)!
(ω ∧ μ)a1 ···al b1 ···bm := ω[a1 ···al μb1 ···bm ] . (5.1.2)
l!m!

In other words, the wedge product is a map ∧ : (l) × (m) → (l + m) which
satisfies (5.1.2).
The wedge product (ω ∧ μ)a1 ···al b1 ···bm can also be denoted by ωa1 ···al ∧ μb1 ···bm , or
ω ∧ μ for short.
It follows from the definition that the wedge product satisfies both the associative
law and distributive law, i.e., (ω ∧ μ) ∧ ν = ω ∧ (μ ∧ ν) (and thus ω ∧ μ ∧ ν has
a clear meaning) and ω ∧ (μ + ν) = ω ∧ μ + ω ∧ ν. However, the wedge product
does not in general obey the commutative law. For instance, for 1-forms ω and μ we
have

ω ∧ μ ≡ ωa ∧ μb ≡ (ω ∧ μ)ab = 2ω[a μb] = ωa μb − ωb μa ,


μ ∧ ω ≡ (μ ∧ ω)ab = 2μ[a ωb] = μa ωb − μb ωa ,

and thus for the wedge product of any two 1-forms we have ω ∧ μ = −μ ∧ ω. Car-
rying over to the general case, suppose ω and μ are an l- and an m-form, respectively,
then
ω ∧ μ = (−1)lm μ ∧ ω . (5.1.3)

Theorem 5.1.3 Suppose dim V = n, then

n!
dim (l) = , if l  n ; (5.1.4)
l!(n − l)!
(l) = {0} (only contains the zero element) , if l > n .
5.1 Differential Forms 131

Proof Take n = 3, l = 2 as an example. Suppose {(e1 )a , (e2 )a , (e3 )a } is a basis of


V , and {(e1 )a , (e2 )a , (e3 )a } is the corresponding dual basis, then ωab (as a tensor on
V ) can be expanded as

ωab = ω11 (e1 )a (e1 )b + ω12 (e1 )a (e2 )b + ω13 (e1 )a (e3 )b
+ ω21 (e2 )a (e1 )b + ω22 (e2 )a (e2 )b + ω23 (e2 )a (e3 )b
+ ω31 (e3 )a (e1 )b + ω32 (e3 )a (e2 )b + ω33 (e3 )a (e3 )b .

Noticing that ω11 = ω22 = ω33 = 0, ω21 = −ω12 , ω32 = −ω23 , ω13 = −ω31 , the
above equation becomes

ωab = ω12 [(e1 )a (e2 )b − (e2 )a (e1 )b ] + ω23 [(e2 )a (e3 )b − (e3 )a (e2 )b ]
+ ω31 [(e3 )a (e1 )b − (e1 )a (e3 )b ]
= ω12 (e1 )a ∧ (e2 )b + ω23 (e2 )a ∧ (e3 )b + ω31 (e3 )a ∧ (e1 )b . (5.1.5)
Thus, any ωab ∈ (2) can be expressed linearly in terms of {(e1 )a ∧ (e2 )b , (e2 )a ∧
(e3 )b , (e3 )a ∧ (e1 )b }. It is not difficult to show that the three 2−forms in the curly
brackets are linearly independent (Exercise 5.1), and hence they comprise a set
of basis vectors. Therefore, dim (2) = 3. The reader may generalize the above
discussion to the case where l, n are arbitrary positive integers and l  n, and show
that any l-form ω can be expanded as

ωa1 ···al = ωμ1 ···μl (eμ1 )a1 ∧ · · · ∧ (eμl )al , (5.1.6)
C

where {(e1 )a , . . . , (en )a } is an arbitrary basis of V ∗ , and ωμ1 ···μl are the components
of ω in the basis of TV (0, l) constituted by {(e1 )a , . . . , (en )a }, i.e.,

ωμ1 ···μl = ωa1 ···al (eμ1 )a1 · · · (eμl )al , (5.1.7)



C stands for summing over all combinations of taking l numbers from n numbers
(1, . . . , n), i.e., there are in total Cnl vectors in the basis of (l), and hence we obtain
(5.1.4). As for the case of l > n, it can be easily seen from Theorem 5.1.2 (b) that all
the components of ω ∈ (l) in this case are 0, and then (l) has only one element,
namely the zero element: (l) = {0}. 

Equation (5.1.5) is a special case of (5.1.6) when n = 3 and l = 2. To make it easier


to understand, here we provide another example: suppose n = 4 and l = 3, then
(5.1.6) appears as

ωabc = ω123 (e1 )a ∧ (e2 )b ∧ (e3 )c + ω124 (e1 )a ∧ (e2 )b ∧ (e4 )c


+ ω134 (e1 )a ∧ (e3 )b ∧ (e4 )c + ω234 (e2 )a ∧ (e3 )b ∧ (e4 )c ,

where each component is determined by (5.1.7), e.g., ω134 = ωabc (e1 )a (e3 )b (e4 )c .
132 5 Differential Forms and Their Integrals

[Optional Reading 5.1.1]


Equation (5.1.6) can also be expressed as

1  n
ωa1 ···al = ωμ ···μ (eμ1 )a1 ∧ · · · ∧ (eμl )al (the symbol is omitted by convention) .
l! 1 l μ ,...,μ
1 l
(5.1.6 )
The number of nonzero terms on the right-hand side is equal to the number of permutations
of taking l numbers from n numbers, i.e., Pnl = n!/(n − l)!, which can be divided into
Cnl = n!/[l!(n − l)!] groups, each containing l! terms. All the terms in each group are the
same, so dividing by l! yields Cnl = n!/[l!(n − l)!] terms, which is in agreement with (5.1.6).
[The End of Optional Reading 5.1.1]

Now let us get back to a manifold M. If we assign an l-form on V p to each point p


on M (or A ⊂ M), we obtain an l-form field (the word “field” is usually omitted)
on M (or A). 1-form fields and 0-form fields are simply dual vector fields and scalar
fields, respectively. A smooth l-form field on M is called a differential l-form, also
called an l-form field or an l-form for short.
Suppose (O, ψ) is a coordinate system, then an l-form field on O can be con-
veniently expressed pointwise linearly using a dual coordinate basis field {(dx μ )a }.
Setting (eμ )a in (5.1.6) to be (dx μ )a , we have

ωa1 ···al = ωμ1 ···μl (dx μ1 )a1 ∧ · · · ∧ (dx μl )al , (5.1.8)
C

where
ωμ1 ···μl = ωa1 ···al (∂/∂ x μ1 )a1 · · · (∂/∂ x μl )al (5.1.9)

is a function on O. An important special case is when l = n. Since now Cnl = Cnn = 1,


there is only one term in the summation of (5.1.8), i.e.,

ωa1 ···al = ω1···n (dx 1 )a1 ∧ · · · ∧ (dx n )an , (5.1.10)

which can be shortened as

ω = ω1···n dx 1 ∧ · · · ∧ dx n . (5.1.10 )

The equation above can be interpreted like this: the collection of all the n-forms at any
point p in M is a 1-dimensional vector space, which only has one independent basis
vector. Take the basis vector to be dx 1 ∧ · · · ∧ dx n | p , then (5.1.10 ) is the expansion
of ω| p in this basis. Note that the coefficient ω1···n can be different from point to
point, and thus is a function on the coordinate patch, which can be expressed as a
function of n variables, namely ω1···n (x 1 , . . . , x n ).
We will use  M (l) to represent the collection of all the l-forms on M.
Definition 3 The exterior differentiation operator on a manifold M is the map
d :  M (l) →  M (l + 1), which can be defined as

(dω)ba1 ···al := (l + 1)∇[b ωa1 ···al ] , (5.1.11)


5.1 Differential Forms 133

where ∇b is an arbitrary torsion-free derivative operator1 (since it can be shown from


C c ab = C c ba that for arbitrary ∇ and ∇˜ we have ∇˜ [b ω··· ] = ∇[b ω··· ] ). Fundamentally,
one does not have to assign a derivative operator (or any additional structure, e.g., a
metric) to M before defining the exterior differentiation operator.
Example 1 We have defined (d f )a in Sect. 2.3, and we also know from (3.1.1)
that (d f )a = ∇a f . Thus, (d f )a is the exterior differentiation of f ∈  M (0). This is
exactly the reason why we used the symbol d f .
An advantage of writing an l-form field ω in terms of the dual coordinate basis
expansion (5.1.8) is the convenience of computing dω. See the following theorem:

Theorem 5.1.4 Suppose ωa1 ···al = C ωμ1 ···μl (dx μ1 )a1 ∧ · · · ∧ (dx μl )al , then

(dω)ba1 ···al = (dωμ1 ···μl )b ∧ (dx μ1 )a1 ∧ · · · ∧ (dx μl )al . (5.1.12)
C

Proof Exercise 5.4. Hint: choose the ordinary derivative operator ∂a of this coordinate
system as the ∇b in (5.1.11). 
Theorem 5.1.5 d ◦ d = 0.
Proof Choosing the ordinary derivative operator ∂a of an arbitrary coordinate system
as the ∇b in (5.1.11) yields

[d(dω)]cba1 ···al = (l + 2)(l + 1)∂[c ∂[b ωa1 ···al ]] = (l + 2)(l + 1)∂[[c ∂b] ωa1 ···al ] = 0 ,

where Theorem 2.6.2 (b) is used in the second equality, and ∂[a ∂b] T ··· ··· = 0 in
Sect. 3.1 is used in the third equality. 
Definition 4 Suppose ω is an l-form field on M. ω is said to be closed if dω = 0;
ω is said to be exact if there exists an (l − 1)-form field μ such that ω = dμ.
Remark 1 Theorem 5.1.5 can be expressed alternatively as follows: if ω is exact,
then ω is closed. However, to make the converse to be true one has to impose an
additional requirement on M. The requirement is omitted here; what the reader has
to know is that the trivial manifold Rn satisfies this requirement. Since any manifold
is locally trivial, one concludes that a closed l-form field on any manifold must be
at least locally exact. That is, suppose ω is a closed l-form field on a manifold M,
then for any point p of M there must be a neighborhood N on which there exists an
(l − 1)-form field μ such that ω = dμ.
Corollary 5.1.6 When M = R2 , Theorem 5.1.5 and its converse gives the following
proposition in standard calculus: given functions X (x, y) and Y (x, y), a necessary
and sufficient condition for the existence of a function f (x, y) such that d f = X dx +
Y dy is ∂ X/∂ y = ∂Y/∂ x.

1This definition is sufficient for this text, but the general definition of the exterior differentiation
does not require the torsion-free condition, see, e.g., Warner (1983); Chern et al. (1999).
134 5 Differential Forms and Their Integrals

Proof It follows from Theorem 5.1.4 that the exterior differentiation of the 1-form
field X dx + Y dy is

d(X dx + Y dy) = dX ∧ dx + dY ∧ dy
   
∂X ∂X ∂Y ∂Y
= dx + dy ∧ dx + dx + dy ∧ dy
∂x ∂y ∂x ∂y
 
∂X ∂Y ∂Y ∂X
= dy ∧ dx + dx ∧ dy = − dx ∧ dy . (5.1.13)
∂y ∂x ∂x ∂y

(A) If there exists a function f such that the equality d f = X dx + Y dy of 1-form


fields holds, then it follows from (5.1.13) that
 
∂Y ∂X
− dx ∧ dy = dd f = 0 .
∂x ∂y
Hence, ∂ X/∂ y = ∂Y/∂ x.
(B) If ∂ X/∂ y = ∂Y/∂ x, then it follows from (5.1.13) that d(X dx + Y dy) = 0,
namely the 1-form field X dx + Y dy is closed. Therefore, X dx + Y dy is exact
(because M = R2 ), i.e., there exists a function f such that X dx + Y dy = d f . 
[Optional Reading 5.1.2]
When we say a property holds locally on a manifold M, we mean ∀ p ∈ M ∃ a neigh-
borhood N of p such that this property holds on N . What is important is that ∀ p ∈ M
there is such an N ; thus, “holds locally” does not mean it only holds in a local area but not
anywhere else. The crucial point of the word “local” is to emphasize it does not necessarily
hold (globally) on the whole manifold M. We hereby give three examples to help the reader
understand this.
1. People often hear that “any manifold looks locally like Rn ”, the precise meaning is that:
every point p of M has a coordinate neighborhood O such that there exists a homeomorphism
(which can be promoted to a diffeomorphism) ψ : O → ψ[O] ⊂ Rn , and thus, O and ψ[O]
“cannot be more alike”. One can always choose an O such that ψ[O] is homeomorphic to
Rn , and hence M looks locally like Rn . However, M may not look globally like Rn , i.e.,
there may not exist a diffeomorphism from M to Rn .
2. “A closed l-form field is locally exact” means that ∀ p ∈ M ∃ a neighborhood N of
p, there is an (l − 1)-form field μ such that ω = dμ on N . However, there may not exist a
global (l − 1)-form field μ on M that satisfies ω = dμ.
3. “A Möbius strip (see Fig. 5.3) looks locally like C 2 (a cylinder)” means that ∀ p ∈ M
∃ an open neighborhood N of p such that N is diffeomorphic to an open subset of C 2 .
However, there does not exist a diffeomorphism from the whole Möbius strip to C 2 .
The properties involved in the three examples above all hold locally, which demonstrates
the importance of distinguishing local properties from global properties.
[The End of Optional Reading 5.1.2]

5.2 Integration on Manifolds

First, we take the 3-dimensional Euclidean space (R3 , δab ) as an example. Suppose
v is a vector field, L is a smooth curve, and S is a smooth surface. Before we specify
5.2 Integration on Manifolds 135

Fig. 5.1 A curve in


Euclidean space. The arrow
represents the assigned
direction of the integral

Fig. 5.2 A surface in


Euclidean space. n is the
assigned normal direction

Fig. 5.3 Möbius strip (an


example of a non-orientable
manifold)

the direction of L (the arrow inFig. 5.1) andthe normal direction of S (the arrow n
in Fig. 5.2), both the integrals L v · dl and S v · d S can only be determined up to
a minus sign. By extension, one should assign an “orientation” to a manifold before
calculating the integral on it. However, not all manifolds are orientable.

Definition 1 An n-dimensional manifold is said to be orientable if there exists on


it a C 0 nowhere vanishing n-form field ε.

Example 1 R3 is an orientable manifold, since there exists a C ∞ 3-form field ε ≡


dx ∧ dy ∧ dz on R3 , where x, y, z are natural coordinates.

Example 2 A Möbius strip is a non-orientable manifold (Fig. 5.3).

Definition 2 If a C 0 nowhere vanishing n-form field ε is given on an n-dimensional


orientable manifold M, then we say M is oriented. Suppose ε 1 and ε 2 are two
different C 0 nowhere vanishing n-form fields. If there exists a function h that is
positive everywhere such that ε 1 = hε2 , then we say ε 1 and ε 2 provide the same
orientation to M.

Remark 1 From the orientation point of view, the ε1 and ε 2 that satisfy ε 1 = hε2
(h > 0) are equivalent. Since the collection of all the 1-forms at each point on an
n-dimensional manifold M is a 1-dimensional vector space (see (5.1.4)), for any
two n-form fields ε1 and ε 2 there must be ε1 = hε2 , where h is a ( not necessarily
positive) function on M. If ε1 and ε 2 are nowhere vanishing, then h is nowhere
136 5 Differential Forms and Their Integrals

vanishing; if ε 1 and hε 2 are C 0 , then h is C 0 . For a connected manifold2 (we will


only talk about connected manifolds), a nowhere vanishing function can only be
either positive everywhere or negative everywhere. Thus, a connected manifold can
only have two kinds of orientations.
Definition 3 After we choose an orientation on M represented by ε, a basis field
{(eμ )a } on an open subset O ⊂ M is said to be right-handed measured by ε if
there exists a function h on O that is positive everywhere such that ε = h(e1 )a1 ∧
· · · ∧ (en )an , where {(eμ )a } is the dual basis of {(eμ )a } (otherwise it is said to be left-
handed). A coordinate system is called a right (left)-handed system if its coordinate
basis is right (left)-handed.
Now we introduce the integral of an n-form field ω on an n-dimensional oriented
manifold M. ω can be expanded using the wedge product dx 1 ∧ · · · ∧ dx n of a dual
coordinate basis as [see (5.1.10 )]

ω = ω1···n (x 1 , . . . , x n )dx 1 ∧ · · · ∧ dx n . (5.2.1)

Thus, each n-form field ω gives rise to a function of n variables, i.e., ω1···n (x 1 , . . . , x n ),
in the coordinate patch. We call the n-tuple integral of this function of n variables
the integral of the n-form field ω; the precise definition is as follows:
Definition 4 Suppose (O, ψ) is a right-handed coordinate system on an n-
dimensional oriented manifold M, ω is a continuous n-form field on an open subset
G ⊂ O, then the integral of ω on G is defined as
 
ω := ω1···n (x 1 , . . . , x n )dx 1 · · · dx n . (5.2.2)
G ψ[G]

The right-hand side of the above equation is just the standard integral3 of a function
of n variables on an open subset ψ[G] of Rn , which is already well-defined.
Remark 2 (1) To show the validity of Definition 4, one should also prove that the
integral of ω on G does not depend on the choice of the right-handed system. We
only prove the case n = 2 below as an example; the reader should carry over the
proof to the general case.
Suppose (O, ψ) and (O  , ψ  ) are right-handed coordinate systems that satisfy
G ⊂ O ∩ O  . The coordinates of these two systems are denoted by x 1 , x 2 and x 1 ,

2 A topological space (X, T ) is said to be connected if it only has two subsets that are both open
and closed (Definition 7 of Sect. 1.2), and is said to be arcwise connected if any two points in
X can be joined by a continuous curve in X . A manifold is said to be connected (or arcwise
connected) if its base topological space is connected (or arcwise connected). For a topological
space, arcwise connected must be connected, but connected is not necessary arcwise connected
(there exist “sideswipe” counterexamples). For a manifold, arcwise connected is equivalent to
connected [see Abraham and Marsden (1978) Proposition 1.1.33].
3 Namely, the Riemann or Lebesgue integral.
5.2 Integration on Manifolds 137

x 2 , respectively, then

ω = ω12 dx 1 ∧ dx 2 = ω12 dx 1 ∧ dx 2 .
    
Let G ω≡ ψ[G] ω12 dx 1 dx 2 and ( G ω) ≡ ψ  [G] ω12 dx 1 dx 2 . We want to prove
  
ω = ω. (5.2.3)
G G

 ∂x1 ∂x2 ∂x2 ∂x1


From the tensor transformation law we see that ω12 = ω + ω =
 ∂xμ ∂ x 1 ∂ x 2 12 ∂ x 1 ∂ x 2 21
ω12 det ∂ x ν , where
 μ ∂x1 ∂x1
∂x
det ≡ ∂ x 12 ∂ x 22
∂x ∂x
∂ x ν ∂ x 1 ∂ x 2

is the Jacobian of this coordinate transformation. According to a well-known law in


multivariable calculus,
  
ω12 dx 1 dx 2 = ω12 det(∂ x μ /∂ x ν )dx 1 dx 2 = 
ω12 dx 1 dx 2 ,
ψ[G] ψ  [G] ψ  [G]
(5.2.4)
and hence (5.2.3) is proved.
However, if {x μ } and {x μ } are right and left-handed systems, respectively, then
we have det(∂ x μ /∂ x ν ) < 0. From the multivariable calculus we know that the
det(∂ x μ /∂ x ν ) on the right-hand side of the first equality in (5.2.4) should be changed
to | det(∂ x μ /∂ x ν )| = − det(∂ x μ /∂ x ν ), and hence (5.2.4) turns to
  
ω12 dx 1 dx 2 = ω12 det(∂ x μ /∂ x ν )dx 1 dx 2 = − 
ω12 dx 1 dx 2 .
ψ[G] ψ  [G] ψ  [G]
(5.2.5)
μ
Therefore, to
 make sure the definition of the integral is consistent, when {x } is
left-handed G ω should be defined as
 
ω := − ω1···n (x 1 , . . . , x n )dx 1 · · · dx n . (5.2.6)
G ψ[G]

(2) Whether a coordinate system is right-handed or left-handed


 is determined by
the orientation one chooses for the manifold. Hence, G ω defined by (5.2.2) and
(5.2.6) depends on the orientation given by ε, and the sign of the integral will change
when the orientation is changed.
(3) Definition 4 only defines the integral of ω on an open subset G in the coordinate
patch. The integral of ω over the whole manifold M can be defined by “sewing” the
local integrals, which entails the concept of a “partition of unity”. [The reader may
refer to Wald (1984).]
138 5 Differential Forms and Their Integrals

Suppose S and M are manifolds with dimensions l and n(> l), respectively,
and φ : S → M is an embedding (see Sect. 4.4). Since φ[S] is an l-dimensional
submanifold, of course we can talk about the integral of an l-form field μ on it
(Definition 4 applies). However, the fact that “φ[S] is embedded in M” leads to two
possible meanings of “an l-form field on φ[S]”. Just like “a vector field on φ[S]” can
be tangent or not tangent to φ[S], “an l-form field on φ[S]” can also be classified
as “tangent to” and not “tangent to” φ[S]. Precisely speaking, an l-form field μ on
φ[S] is said to be “tangent to” φ[S] if ∀q ∈ φ[S], μ|q is an l-form on Wq (rather than
Vq ); that is, μ|q is a linear map that can turn l arbitrary elements of Wq into a real
number. An “l-form field on φ[S]” can either be tangent to φ[S] or not “tangent to”
φ[S]. Since we consider the φ[S] as an independent manifold when we talk about
the integral of an l-form on φ[S] (and do not care about the “outside” situation), only
an l-form μ that is “tangent to” φ[S] is meaningful. Nevertheless, since an l-form
field μ on φ[S] that is not “tangent to” φ[S] is a linear map that can turn l arbitrary
elements in Vq (rather then only Wq ) of each point q ∈ φ[S] into a real number, and
Wq is nothing but a subspace of Vq , we can obtain an l-form μ that is “tangent to”
φ[S] by just restricting the acting range of μ to Wq . We denote it by μ̃ and call it the
restriction of μ. Precisely, we have the following definition:
Definition 5 Suppose μa1 ···al is an l-form field on an l-dimensional submanifold
φ[S] ⊂ M. An l-form field μ̃a1 ···al on φ[S] (viewed as a manifold independent of M)
is called the restriction of the l-form field μa1 ···al on φ[S] if

μ̃a1 ···al |q (w1 )a1 · · · (wl )al = μa1 ···al |q (w1 )a1 · · · (wl )al ,
∀q ∈ φ[S] , (w1 )a1 · · · (wl )al ∈ Wq . (5.2.7)

Similar to the induced metric (see Definition 5 of Sect. 4.4), in the perspective that
a submanifold is the embedding map φ : S → M itself, the restriction of a form μ is
essentially the pullback φ ∗ μ on S. Especially, one can show that the integral of the
μ̃ in Definition 5 satisfies  
μ̃ = φ ∗ μ .
φ[S] S

Later on, whenever we talk about the integral of an l-form field μ over an l-
dimensional submanifold φ[S], one should always
 interpret it as the integral of the
restriction of μ, i.e., always interpret φ[S] μ as φ[S] μ̃ or S φ ∗ μ.

5.3 Stokes’s Theorem

In the 3-dimensional Euclidean space, the Stokes theorem



 · d S =
 × A)
(∇ A · dl
S L
5.3 Stokes’s Theorem 139

and Gauss’s theorem  


(∇ · A)dV =  A · ndS
 
V S

share a common property in that they manifest a relationship between an integral


over a region and an integral on the boundary. Before we bring up the general Stokes
theorem, we first introduce the concept of “a manifold with boundary”. The simplest
example for an n-dimensional manifold with boundary is

Rn− := {(x 1 , . . . , x n ) ∈ Rn |x 1  0} ,

where x 1 , . . . , x n are natural coordinates, the subset formed by all the points on
x 1 = 0 is called the boundary of Rn− , which by itself is an (n − 1)-dimensional
manifold (in fact it is just Rn−1 ). Carrying over to the general case, an n-dimensional
manifold N with boundary is defined in a way similar to an n-dimensional manifold,
except the Rn in that definition is changed to Rn− . That is, each element in the open
cover {Oα } of N should be homeomorphic to an open subset of Rn− ; all the points
in N that are mapped to x 1 = 0 (such as p in Fig. 5.4) form the boundary of N ,
denoted by ∂ N . Note that ∂ N is an (n − 1)-dimensional manifold; i(N ) ≡ N − ∂ N
is an n-dimensional manifold. For instance, a solid ball B in R3 is a 3-dimensional
manifold with boundary, whose boundary (a 2-sphere) is a 2-dimensional manifold,
while i(B) is a 3-dimensional manifold.
Theorem 5.3.1 (Stokes’s Theorem) Suppose a compact subset N of an n-
dimensional oriented manifold is an n-dimensional manifold with boundary, and
ω is an (n − 1)-form field (whose differentiability is at least C 1 ) on M, then
 
dω = ω. (5.3.1)
i(N ) ∂N

Proof See, for example, Chern et al. (1999). 


Remark 1 Restricting the orientation ε of M on N yields the orientation of N ,
also denoted by ε, which naturally induces an orientation on the boundary ∂ N of
N , denoted by ε̄, short for ε̄a1 ···an−1 . Take R2− as an example, in which M = R2 ,

Fig. 5.4 A diagrammatic


sketch for a manifold N with
boundary, in which p is a
boundary point
140 5 Differential Forms and Their Integrals

N = R2− , ∂ N = {(x 1 , x 2 )|x 1 = 0}. Suppose the orientation of R2 (and, conse-


quently, R2− ) is εab = (dx 1 )a ∧ (dx 2 )b , then {x 1 , x 2 } is a right-handed system mea-
sured by εab . Since x 1 |∂ N = 0, after getting rid of x 1 , {x 2 } is a coordinate system of
∂ N . We define ε̄a as the induced orientation of ∂ N such that {x 2 } is a right-handed
system measured by ε̄a . This requirement can be satisfied by choosing ε̄a = (dx 2 )a .
This basic requirement of an induced orientation can be generalized to any manifold
N with boundary [for details, see Wald (1984) p. 431]. The left-hand side of (5.3.1)
is the integral of an n-form field dω over an n-dimensional manifold i(N) (with ε
as its orientation), and the right-hand side is the integral of an (n − 1)-form field ω
over an (n − 1)-dimensional manifold ∂ N (with ε̄ as its orientation).

Example 1 Suppose A is a vector field on the 2-dimensional Euclidean space, L is a


smooth closed curve in R2 , S is an open subset surrounded by L (see Fig. 5.5), x 1 and
x 2 are Cartesian coordinates. Then, the familiar Stokes theorem for 2-dimensional
Euclidean space (also called Green’s theorem) is

(∂ A2 /∂ x 1 − ∂ A1 /∂ x 2 )dx 1 dx 2 = Al dl . (5.3.2)
S L

Now we will show that the equation above is a special case of Theorem 5.3.1. Let
M = R2 , then S ∪ L can be treated as the N in Theorem 5.3.1, where S and L serve
as i(N) and ∂ N , respectively. If we turn Aa into a 1-form field using the Euclidean
metric δab , then Aa can be treated as the ω in Theorem 5.3.1. Expand Aa using the
dual coordinate basis vectors of the Cartesian system: ω = Aa = Aμ (dx μ )a , then

∂ Aμ ν ∂ A1 2 ∂ A2 1
dω = dAμ ∧ dx μ = ν
dx ∧ dx μ = dx ∧ dx 1 + dx ∧ dx 2

 x  ∂ x 2 ∂ x 1

∂ A2 ∂ A1
= − dx 1 ∧ dx 2 .
∂x1 ∂x2

Thus, the left-hand side of (5.3.2) can be expressed as i(N) dω, which means it is
a special case of the
 left-hand
 side of (5.3.1). On the other hand, the right-hand
side of (5.3.1) is ∂ N ω = ∂ N ω̃. Setting the arc length l as the local coordinate
of L, expanding ω̃ using the dual coordinate basis vector as ω̃a = ω̃1 (l)(dl)a , and
contracting both sides with (∂/∂l)a , we have

ω̃1 (l) = ω̃a (∂/∂l)a = ωa (∂/∂l)a = Aa (∂/∂l)a = Al ,

Fig. 5.5 Stokes’s theorem


for 2-dimensional Euclidean
space
5.4 Volume Elements 141

and hence ω̃ = Al dl. Therefore, the right-hand side of (5.3.2) can be written as

Al dl = ω. (5.3.3)
L ∂N

Thus, we can see that (5.3.2) is a special case of (5.3.1).


[Optional Reading 5.3.1]
There is one thing we need to make clear in the derivation of (5.3.3). The integral region
of L ω̃ is a closed curve L, which is a 1-dimensional non-trivial manifold that takes at least
two coordinate patches to cover it. Therefore, one should perform the local integral for each
coordinate patch and then “sew” them together. Luckily, now we can deal with it in a simple
way: suppose L  is the manifold coming from removing a point from L, then L  can be
covered by one coordinate patch, and since it does not affect the value of the integral when
a point is removed, the derivation is valid.
[The End of Optional Reading 5.3.1]

Now we have introduced the integral of a differential form on a manifold and some
related theorems. To talk about the integral of a function on a manifold, we first
introduce the concept of a volume element in Sect. 5.4.

5.4 Volume Elements

Definition 1 An arbitrary C 0 and nowhere vanishing n-form field ε on an n-


dimensional orientable manifold M is called a volume element.

Remark 1 The difference between a volume element and an orientation is that: if


ε 1 and ε 2 are two C 0 and nowhere vanishing n-form fields, and there is a function
h that is positive everywhere such that ε1 = hε2 , then ε 1 and ε 2 represent the same
orientation; however, as long as ε1 = ε 2 , they are two different volume elements.
For an orientable connected manifold, there are only two orientations, while there
are infinitely many volume elements. When talking about integration or the volume
elements on an orientable manifold, one does not need to assign a metric field to the
manifold. The choice of a volume element is quite arbitrary, and no volume element
is special. (There is only one requirement: the volume element has to be compatible
with the orientation, i.e., the multiplicative factor between the ε representing the
volume element and the ε representing the orientation is positive.) However, if a
metric field gab is assigned to the manifold, then there exists a natural way to choose
a specific volume element.
First we consider a 2-dimensional oriented manifold with a metric gab . Suppose
εa1 a2 is an arbitrary volume element, then εa1 a2 ≡ g a1 b1 g a2 b2 εb1 b2 is meaningful, and
εa1 a2 εa1 a2 is a scalar field that can be computed using any basis. Choose the orthonor-
mal basis. If gab is a positive definite metric, then

εa1 a2 εa1 a2 = δ μ1 ν1 δ μ2 ν2 εν1 ν2 εμ1 μ2 = δ 11 δ 22 ε12 ε12 + δ 22 δ 11 ε21 ε21 = 2(ε12 )2 .


142 5 Differential Forms and Their Integrals

If gab is Lorentzian, then

εa1 a2 εa1 a2 = η11 η22 ε12 ε12 + η22 η11 ε21 ε21 = −2(ε12 )2 .

Generalizing to an n-dimensional manifold with an arbitrary metric gab we have

εa1 ···an εa1 ···an = (−1)s n!(ε1···n )2 ,

where ε1···n is a component of εa1 ···an in the orthonormal basis, and s is the number
of −1 among the components of gab in the orthonormal basis; for instance, s = 0
for definite positive metrics, and s = 1 for Lorentzian metrics. To choose a specific
volume element using the given metric, one just needs to impose the following simple
requirement on the components of the volume element εa1 ···an in the orthonormal basis
{(eμ )a }:
ε1···n = ±1 , (5.4.1)

i.e.,
εa1 ···an = ±(e1 )a1 ∧ · · · ∧ (en )an (for an orthonormal basis) , (5.4.2)

which is equivalent to requiring

εa1 ···an εa1 ···an = (−1)s n! . (5.4.3)

An εa1 ···an that satisfies the above equation is called the volume element associated
(or compatible) with the metric gab . The above equation can only determine the
volume element up to a minus sign, only together with the requirement “the volume
element is compatible with the orientation” can the volume element be uniquely
determined. Thus, the + and − signs on the right-hand side of (5.4.2) correspond to
right and left-handed orthonormal bases.
Summary. When dealing with an integral, we are only concerned here with orientable
manifolds.4 First, one should choose an orientation and make M an orientable man-
ifold. A basis being right or left-handed is stipulated by the orientation we choose.
When there is no metric field gab (or any other available geometric structure), except
for being required to be compatible with the orientation, the volume element is
quite arbitrary. After gab is assigned, εa1 ···an is uniquely determined by gab and the
requirement of it being compatible with the orientation, called the associated vol-
ume element for short. Later on, unless stated otherwise, all the volume elements we
mention when there is a metric will refer to this unique associated volume element.
Choose any right-handed Cartesian system {x, y, z} in the 3-dimensional
Euclidean space (R3 , δab ) by intuition and assign the orientation using the 3-form
field ε = dx ∧ dy ∧ dz, then according to Definition 3 of Sect. 5.2, {x, y, z} is a
right-handed system measured by ε. Comparing ε = dx ∧ dy ∧ dz and (5.4.2) we
can see that ε is an associated volume element. Suppose G is an open subset of R3

4Integration can also be defined on non-orientable manifolds. In this case, one needs the concept
of a “twisted” (also called “odd” or “pseudo”) form, which is outside the scope of this text.
5.4 Volume Elements 143

and the integral G dxdydz exists, then this integral naturally stands for the volume
of G (by the definition of volume in standard calculus).
 On the other hand, it follows
from Definition
 4 of Sect. 5.2 that
 the integral G ε of the 3-form field ε on G ⊂ R3
is exactly G dxdydz, and thus G ε is the volume of G. Generalize to any oriented
manifold N with a positive definite metric gab : suppose ε is the associated volume
element, if N ε exists, then we call it the volume (or length and area for 1- and
2-dimensional manifolds, respectively) of N (measured by gab ). This is the reason
why ε is called a volume element.
Theorem 5.4.1 Suppose ε is an associated volume element, {(eμ )a } and {(eμ )a } are
a basis and its dual basis, g is the determinant of the components of gab in this basis,
|g| is the absolute value of g, then (+ for right-handed basis and − for left-handed
basis)
εa1 ···an = ± |g|(e1 )a1 ∧ · · · ∧ (en )an . (5.4.4)

Proof [Optional Reading]


From (5.4.3) we know that ε and the components of gab in the given basis satisfy

(−1)s n! = εμ1 ···μn εμ1 ···μn = g μ1 ν1 · · · g μn νn εν1 ···νn εμ1 ···μn . (5.4.5)
The right-hand side of this equation should be interpreted as summing over each of μ1 · · · μn
and ν1 · · · νn from 1 to n. Considering the total antisymmetry of εν1 ···νn and εμ1 ···μn , one
can
 simplify the summation above into a sum over the permutations. More precisely, let
π(ν1 ···νn ) represent summing over all the permutations of 1, 2, . . . , n, then
 
r.h.s. of (5.4.5) = g μ1 ν1 · · · g μn νn εν1 ···νn εμ1 ···μn
π(μ1 ···μn ) π(ν1 ···νn )

= g μ1 1 g μ2 2 g μ3 3 · · · g μn n ε123···n εμ1 ···μn
π(μ1 ···μn )

+ g μ1 2 g μ2 1 g μ3 3 · · · g μn n ε213···n εμ1 ···μn + · · · . (5.4.5 )
π(μ1 ···μn )

There are n! terms on the right-hand side of this equation. Using ε̂μ1 ···μn to represent the
Levi-Civita symbol, i.e.,

⎨ + 1,
⎪ (when μ1 · · · μn is an even permutation of 1, 2, . . . , n) ,
ε̂μ1 ···μn = − 1 , (when μ1 · · · μn is an odd permutation of 1, 2, . . . , n) ,


0, (when two of μ1 , . . . , μn are equal) ,
 
we have εμ1 ···μn = ε123···n ε̂μ1 ···μn . Denote π(μ1 ···μn ) as π for short, then

the first term on the r.h.s. of (5.4.5 )



= (ε123···n )2 g μ1 1 g μ2 2 g μ3 3 · · · g μn n ε̂μ1 μ2 μ3 ···μn = (ε123···n )2 det(g μν ) ,
π

where det(g μν ) stands for the determinant of the matrix g μν (the definition of the determinant
is used in the last step). Also,

the second term on the r.h.s. of (5.4.5 )



= − g μ1 2 g μ2 1 g μ3 3 · · · g μn n ε123···n εμ1 μ2 μ3 ···μn
π
144 5 Differential Forms and Their Integrals

= − (ε123···n )2 g μ1 2 g μ2 1 g μ3 3 · · · g μn n ε̂μ1 μ2 μ3 ···μn
π

= − (ε1···n )2 g μ2 2 g μ1 1 g μ3 3 · · · g μn n ε̂μ2 μ1 μ3 ···μn
π

= (ε1···n )2 g μ1 1 g μ2 2 g μ3 3 · · · g μn n ε̂μ1 μ2 μ3 ···μn
π
= (ε123···n ) det(g μν ) .
2

Similarly one can prove that each term on the right-hand side of (5.4.5 ) equals (ε1···n )2
det(g μν ). Noticing that there are n! terms on the right-hand side of the above equa-
tion, plugging them back to (5.4.5) yields (−1)s n! = (n!)(ε1···n )2 det(g μν ), or (−1)s =
(ε1···n )2 det(g μν ). The fact that the matrix g μν is the inverse of gμν gives that det(g μν ) =
1/ det(gμν ) ≡ 1/g. Plugging into the previous equation, we obtain

(−1)s g = (ε1···n )2 , ε1···n = ± |g| ,

and therefore we have (5.4.4). 

Remark 2 For an orthonormal basis we have |g| = 1, and hence (5.4.4) goes back
to (5.4.2).
Theorem 5.4.2 Suppose ∇a and ε are respectively the derivative operator and the
volume element associated with the metric, then

∇b εa1 ···an = 0 . (5.4.6)

Proof It follows from ∇b gac = 0 and (5.4.3) that εa1 ···an ∇b εa1 ···an = 0, and thus for
any vector field vb we have

εa1 ···an vb ∇b εa1 ···an = 0 . (5.4.7)

Since the collection of all the n-forms at a point in M is a 1-dimensional vector


space, any two n-forms at this point can only differ by a multiplicative factor h (h
can be different from point to point). Therefore, vb ∇b εa1 ···an = hεa1 ···an . Plugging into
(5.4.7) gives h = 0, and thus vb ∇b εa1 ···an = 0. Since vb is an arbitrary vector field,
we have ∇b εa1 ···an = 0. 
Now we are going to prove two identities about volume elements that are quite useful.
To do so, we need to prove the following lemma first.
Lemma 5.4.3
(n − j)! j! [a j+1
δ [a1 a1 · · · δ a j a j δ a j+1 b j+1 · · · δ an ] bn = δ an ]
b j+1 · · · δ bn . (5.4.8)
n!
Proof [Optional Reading]
Here we only give the main steps. The reader should fill in the details of the proof for
each step. First, one can show that
1 [a2
δ [a1 a1 δ a2 b2 · · · δ an ] bn = δ b2 · · · δ a n ] bn ,
n
5.5 Integrating Functions on Manifolds, Gauss’s Theorem 145

2 [a3
δ [a2 a2 δ a3 b3 · · · δ an ] bn = δ b3 · · · δ a n ] bn ,
n−1
and carrying over to the general case,

j
δ [a j a j δ a j+1 b j+1 · · · δ an ] bn = δ [a j+1 b j+1 · · · δ an ] bn .
n − ( j − 1)
Therefore, it can be proved that

1 2 3 j
δ [a1 a1 · · · δ a j a j δ a j+1 b j+1 · · · δ an ] bn = ··· δ [a j+1 b j+1 · · · δ an ] bn
n n−1n−2 n− j +1
(n − j)! j! [a j+1
= δ an ]
b j+1 · · · δ bn . 
n!

Theorem 5.4.4

(a) εa1 ···an εb1 ···bn = (−1)s n!δ [a1 b1 · · · δ an ] bn , (5.4.9)


a1 ···a j a j+1 ···an [a j+1 an ]
(b) ε εa1 ···a j b j+1 ···bn = (−1) (n − j)! j!δ
s
b j+1 ···δ bn . (5.4.10)

Proof εa1 ···an εb1 ···bn = ε[a1 ···an ] ε[b1 ···bn ] indicates that all the upper indices and all the
lower indices of εa1 ···an εb1 ···bn are antisymmetric. It is not difficult to prove that the
collection of all tensors of type (n, n) satisfying this condition is a 1-dimensional
vector space, and since δ [a1 b1 · · · δ an ] bn belongs to this collection (it is not difficult to
show that δ [a1 b1 · · · δ an ] bn = δ [a1 [b1 · · · δ an ] bn ] ), any tensor in this collection can only
differ by a multiplicative factor. Thus, εa1 ···an εb1 ···bn = K δ [a1 b1 · · · δ an ] bn . Contract-
ing with εa1 ···an εb1 ···bn , the left-hand side yields (−1)s n!(−1)s n!, and the right-hand
yields K εb1 ···bn εb1 ···bn = K (−1)s n!, and hence K = (−1)s n!, which brings (5.4.9).
Contracting the first j upper and lower indices on both sides gives

εa1 ···a j a j+1 ···an εa1 ···a j b j+1 ···bn = (−1)s n!δ [a1 a1 · · · δ a j a j δ a j+1 b j+1 · · · δ an ] bn
= (−1)s (n − j)! j!δ [a j+1 b j+1 · · · δ an ] bn .

(Lemma 5.4.3 is used in the last step.) Thus, we arrive at (5.4.10). 

5.5 Integrating Functions on Manifolds, Gauss’s Theorem

Definition 1 Suppose ε is an arbitrary volume element on a manifold


 M, and f is
a C 0 function on M, then the integral of f on M (denoted by M f ) is defined as
the integral of the n-form field f ε on M, i.e.,
 
f := fε. (5.5.1)
M M

From Definition 1 we see that the integral of a function depends on the choice
of a volume element. As long as a metric is given on the manifold, we
146 5 Differential Forms and Their Integrals

stipulate that the integral of a function is always defined using the associated vol-
ume element. In this way, for an oriented manifold with a metric, the integral of
a given function is determined. Take the 3-dimensional Euclidean space (R3 , δab )
as an example. Suppose {x, y, z} is a right-handed Cartesian coordinate system,
then ε = dx ∧ dy ∧ dz is an associated volume element,  and hence
 the integral of
a function f : R3 → R on (R3 , δab ) is, by definition, R3 f = R3 f ε. The right-
hand side is nothing but an integral of a 3-form field ω ≡ f ε, and according
to its definition (Definition 4 of Sect. 5.2), one should express ω in the form of
(5.2.1) using the dual basis of the right-handed system. Let F(x, y, z) be the func-
tion of 3 variables coming from combining f with the Cartesian system {x, y, z},
then
ω = F(x, y, z) dx ∧ dy ∧ dz .

[This is a special case of (5.2.1).] Hence,


   
f = fε = ω= F(x, y, z) dx ∧ dy ∧ dz .

If you like, you can also compute it using the (right-handed) spherical coordinate
system {r, θ, ϕ}. It follows from the line element ds 2 = dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 )
that g = r 4 sin2 θ , and thus from (5.4.4) we know that ε = r 2 sin θ dr ∧ dθ ∧ dϕ.
Therefore, (5.2.1) in the present case is ω ≡ f ε = F̂(r, θ, ϕ)r 2 sin θ dr ∧ dθ ∧ dϕ
[where F̂(r, θ, ϕ) comes from combining f with {r, θ, ϕ}]. Hence,
   
f = fε = ω= F̂(r, θ, ϕ)r 2 sin θ dr ∧ dθ ∧ dϕ .

Now we will introduce the general form of Gauss’s theorem. The form of Gauss’s
law that is familiar to readers is
 

 · A)dV
(∇ =  A · ndS . (5.5.2)
V S

Respectively, the two sides of the above equation can be colloquially described as “the
integral of the product of the function ∇ · A and the volume element dV ” and “the
integral of the product of the function A · n and the area element (2-dimensional
volume element) dS”. Now we will show in two steps that the Stokes theorem
(5.3.1) leads to a formula which includes (5.5.2) as a special case. The first step
is to derive Theorem 5.5.1, the left-hand side of which can be seen as a generaliza-
tion of (5.5.2).

Theorem 5.5.1 Suppose M is an n-dimensional oriented manifold, N is an n-


dimensional compact embedded submanifold with boundary in M, gab is a metric
on M, ε and ∇a are the associated volume element and the associated derivative
operator, and va is a C 1 vector field on M, then
5.5 Integrating Functions on Manifolds, Gauss’s Theorem 147
 
(∇b vb )ε = vb εba1 ···an−1 . (5.5.3)
i(N ) ∂N

Remark 1 The left-hand side of the equation above can be seen as a generalization
of the left-hand side of (5.5.2).

Proof The exterior derivative of the (n − 1)-form field ωa1 ···a−1 ≡ vb εba1 ···an−1 is the
n-form field (dω)ca1 ···an−1 = n∇[c (vb ε|b|a1 ···an−1 ] ), in which ∇c can be any torsion-free
derivative operator. The collection of all n-forms at any point in N is a 1-dimensional
vector space. Hence, two n-forms dω and ε only differ by a multiplicative factor, i.e.,

n∇[c (vb ε|b|a1 ···an−1 ] ) = hεca1 ···an−1 , (5.5.4)

where h is a function on N that can be found as follows: contracting both sides with
εca1 ···an−1 the right-hand side yields (−1)s hn!, and the left-hand side yields

nεca1 ···an−1 ∇[c (vb ε|b|a1 ···an−1 ] ) = nε[ca1 ···an−1 ] ∇c (vb εba1 ···an−1 )
= nεca1 ···an−1 εba1 ···an−1 ∇c vb = n(−1)s (n − 1)!δ c b ∇c vb = (−1)s n!∇b vb .

[Theorem 2.6.2(a) is used in the first equality; we stipulate ∇c to be associated with


gab starting from the second step; in the third equality we used (5.4.10).] Hence,
h = ∇b vb , and dω = ε∇b vb . Therefore, the Stokes theorem in this case takes the
form of (5.5.3). 

Now we go one step further and rewrite the right-hand side of (5.5.3) into a form
like the right-hand side of (5.5.2). Since the latter involves the volume element dS on
the boundary S, let us start with the volume element of ∂ N . Here we only talk about the
case where ∂ N is not a null hypersurface, and thus we can talk about the normalized
normal vector n a of ∂ N that satisfies n a n a = ±1 (see Sect. 4.4). The induced metric
of the metric gab on ∂ N is h ab = gab ∓ n a n b [see (4.4.2)]. Regarding ∂ N as an
(n − 1)-dimensional manifold with the metric h ab , its volume element (denoted by
ε̂a1 ···an−1 ) should satisfy two conditions: ① compatible with the induced orientation
of ∂ N (denoted by ε̄a1 ···an−1 , see Remark 1 of Sect. 5.3); ② associated with h ab , i.e.,

ε̂a1 ···an−1 ε̂a1 ···an−1 = (−1)ŝ (n − 1)! , (5.5.5)

where ε̂a1 ···an−1 is the result of raising the indices of ε̂a1 ···an−1 using h ab , and ŝ is the
number of negative numbers in the diagonal elements of h ab . The volume element
ε̂a1 ···an−1 on ∂ N that satisfies these two conditions is called the induced volume
element. Suppose n b is the outgoing unit normal vector of ∂ N [with i(N ) being the
interior, there is a clear meaning for “outgoing”], then the induced volume element
ε̂a1 ···an−1 and the volume element εba1 ···an−1 on N have the following relation (for a
proof, see Optional Reading 5.5.1):

ε̂a1 ···an−1 = n b εba1 ···an−1 . (5.5.6)


148 5 Differential Forms and Their Integrals

[Optional Reading 5.5.1]


Now we will show that the ε̂a1 ···an−1 in the equation above does satisfy the two conditions
of the induced volume element. ∀q ∈ ∂ N , suppose {(eμ )a } is the right-handed orthonormal
basis at q satisfying (e1 )a = n a , then

εa1 ···an = (e1 ∧ · · · ∧ en )a1 ···an = ±n a1 ∧ (e2 ∧ · · · ∧ en )a2 ···an .

From the spirit of Remark 1 in Sect. 5.3 [see, Wald (1984) p. 431 for details] we know that
(e2 ∧ · · · ∧ en )a2 ···an serves as the induced orientation ε̄a2 ···an at q ∈ ∂ N , and hence

εa1 ···an = ±n a1 ∧ ε̄a2 ···an , also written as εba1 ···an−1 = ±n b ∧ ε̄a1 ···an−1 .

Using this, one can easily show that ε̄a1 ···an−1 = n b εba1 ···an−1 , and then it follows from (5.5.6)
that ε̂a1 ···an−1 = +1 · ε̄a1 ···an−1 . Thus, ε̂a1 ···an−1 is compatible with the induced orientation
ε̄a1 ···an−1 , i.e., condition ① is satisfied. As an exercise (Exercise 5.10), the reader should
verify that ε̂a1 ···an−1 = n b εba1 ···an−1 also satisfies condition ②, i.e., (5.5.5). Note that condition
② can only determine ε̂a1 ···an−1 up to a minus sign [i.e., ε̂a1 ···an−1 = −n b εba1 ···an−1 also
satisfies (5.5.5)]. Only when taken together with condition ① can ε̂a1 ···an−1 be determined as
n b εba1 ···an−1 .
[The End of Optional Reading 5.5.1]

The theorem below is the general version of Gauss’s theorem that contains (5.5.2)
as a special case.
Theorem 5.5.2 (Gauss’s Theorem) Suppose M is an n-dimensional oriented man-
ifold, N is an n-dimensional compact submanifold with boundary in M, gab is a
metric on M, ε and ∇a are, respectively, the associated volume element and the
associated derivative operator, ε̂ is the induced volume element on ∂ N , n a is the
outgoing normal vector of ∂ N satisfying n a n a = ±1, and va is a C 1 vector field on
M. Then,
 
(∇a va )ε = ± va n a ε̂ . (+for n a n a = +1, −for n a n a = −1.) (5.5.7)
i(N ) ∂N


Proof
 From Theorem 5.5.1 we know that all we have to prove is ∂ N vb εba1 ···an−1 =
± ∂ N va n a ε̂. Let
 ωa1 ···an−1
 = v εba1 ···an−1 . Noticing the discussion
b
at
 the end of
Sect. 5.2 about φ[S] ω ≡ φ[S] ω̃, we can see that here ∂ N vb εba1 ···an−1 is ∂ N ω̃. Hence,
all we have to prove is that

ω̃a1 ···an−1 = ±vb n b ε̂a1 ···an−1 ∀q ∈ ∂ N , (5.5.8)

where n a is the outgoing unit normal vector of ∂ N . Both sides of the above equation
are (n − 1)-forms on Wq , and hence there exists a K such that

ω̃a1 ···an−1 = K vb n b ε̂a1 ···an−1 , (5.5.9)

and thus all we have to prove is that K = ±1. Suppose {(e0 )a = n a , (e1 )a , . . . ,
(en−1 )a } is a right-handed orthonormal basis of Vq . Contracting (e1 )a1 · · · (en−1 )an−1
5.6 Dual Differential Forms 149

with the equation above, the right-hand side gives

K vb n b ε̂12···(n−1) = ±K vb (e0 )b ε̂12···(n−1) = ±K v0 , (5.5.10)

where we used n b = ±(e0 )b in the first equality; in the second equality we used
the following fact: it can be shown from the definition of the induced orientation
ε̄ that the right-handedness of {(e0 )a = n a , (e1 )a , . . . , (en−1 )a } (measured by the
orientation ε) assures the right-handedness of {(e1 )a , . . . , (en−1 )a } (measured by ε̄),
and thus ε̂12···(n−1) = 1. On the other hand, the left-hand side of (5.5.9) after the
contraction yields

ω̃a1 ···an−1 (e1 )a1 · · · (en−1 )an−1 = ωa1 ···an−1 (e1 )a1 · · · (en−1 )an−1
= vb εba1 ···an−1 (e1 )a1 · · · (en−1 )an−1 = vμ εμ12···(n−1) = v0 ε012···(n−1) = v0 , (5.5.11)

where (5.2.7) is used in the first equality, and the right-handedness of {(e0 )a = n a ,
(e1 )a , · · · , (en−1 )a } is used in the fifth equality. Comparing (5.5.10) and (5.5.11) we
obtain K = ±1. 

Remark 2 One of the conditions for (5.5.7) is that n a is the outgoing unit normal
vector of ∂ N . If we change the stipulation to “n a is outgoing when n a n a = +1, n a is
ingoing [pointing towards i(N )] when n a n a = −1”, then the ± sign in the right-hand
side of (5.5.7) vanishes, and Gauss’s theorem turns into the following form
 
(∇a va )ε = va n a ε̂ . (5.5.7 )
i(N ) ∂N

If ∂ N is a null hypersurface, i.e., n a n a = 0, then (5.5.7 ) still holds; however, ε̂ needs


to be defined as follows (sans proof):

1
εa ···a = n [a1 ε̂a2 ···an ] .
n 1 n

5.6 Dual Differential Forms

Use  p (l) to represent the collection of all l-forms (l  n) at p ∈ M. It follows from


(5.1.4) that
n!
dim  p (l) = = dim  p (n − l) .
l!(n − l)!

If M is an oriented manifold with a metric gab and ε is the associated volume element,
then we can define an isomorphism between  M (l) and  M (n − l) using ε and gab
as follows:
150 5 Differential Forms and Their Integrals

Definition 1 ∀ω ∈  M (l), define its dual form ∗ ω ∈  M (n − l) as

∗ 1 b1 ···bl
ωa1 ···an−l := ω εb1 ···bl a1 ···an−l , (5.6.1)
l!
where
ωb1 ···bl = g b1 c1 · · · g bl cl ωc1 ···cl .

Remark 1 The ∗ operator we defined above is called the Hodge star, and ∗ ω is also
called the Hodge dual of the form ω. It is not difficult to see that: ① ∗ :  M (l) →
 M (n − l) is an isomorphism; ② for a 0-form field f ∈ F M , its dual form field by
definition is
∗ 1
f a1 ···an = f εa1 ···an = f εa1 ···an ,
0!

i.e., ∗ f equals f times the volume element ε associated with the metric. Therefore,
one can say that the integral of a function f is defined as the integral of its dual form
field. Applying ∗ to the above equation again we have

∗ ∗ 1 b1 ···bn
( f ) = ∗ ( f ε) = fε εb1 ···bn = (−1)s f .
n!
[Equation (5.4.3) is used in the third equality.] This result can be generalized into
the following theorem:

Theorem 5.6.1
∗∗
ω = (−1)s+l(n−l) ω . (5.6.2)

Proof Exercise 5.11. 

Now, from the differential geometry point of view, let us revisit the vector algebra and
vector field theory on 3-dimensional Euclidean space (R3 , δab ) that we are already
familiar with (where M is R3 ).
(1) Why have we never heard of 1-, 2- and 3-form fields before? First, using
the Euclidean metric δab , one can turn a dual vector field ωa into a vector field
ωa = δ ab ωb , which eliminates the need to use a 1-form field. Later on, we will
not distinguish the upper and lower indices strictly when we are dealing only with
(R3 , δab ). Second, since n = 3,  M (2) and  M (1) have the same dimension, and
ω ∈  M (2) and ∗ ω ∈  M (1) can be identified using the isomorphism ∗ :  M (2) →
 M (1), which eliminates the need to use a 2-form field. Similarly,  M (3) and  M (0)
have the same dimension, and using the isomorphism ∗ :  M (3) →  M (0) one can
identify ω ∈  M (3) and ∗ ω ∈  M (0), the latter of which is exactly a function on
R3 (a 0-form field). Therefore, any differential form on the 3-dimensional Euclidean
space can be represented by a function or a vector field.
(2) Now we discuss the dot product and cross product operations of the vector
algebra. Denote the vectors A and B as Aa and B a , respectively. Naturally, the dot
5.6 Dual Differential Forms 151

product of A and B will be Aa B a . However, how should we understand the cross


product A × B?
 Let

ωab ≡ Aa ∧ Bb = 2 A[a Bb] , (where Aa ≡ δab Ab , Bb ≡ δba B a )

then
∗ 1 ab
ωc = ω εabc = εabc A[a B b] = εabc Aa B b , (5.6.3)
2
where εabc is the volume element associated with the Euclidean metric. Suppose
{x, y, z} is a right-handed Cartesian coordinate system, then its coordinate basis is
orthonormal. It follows from (5.4.2) that the nonzero components εi jk of εabc in this
system are
ε123 = ε312 = ε231 = −ε132 = −ε321 = −ε213 = 1 ,

and thus εi jk is the familiar Levi-Civita symbol. Therefore, the kth component of ∗ ωc
in this Cartesian system is

ωk = εi jk Ai B j , k = 1, 2, 3 . (5.6.4)

According to the definition of A × B,  the right-hand side of the above equation is


 
exactly the kth component ( A × B)k . Hence, A × B can be viewed as ∗ ω (or more
precisely, the vector corresponding to the dual vector ∗ ω). Also ω = A ∧ B, and
thus finding the cross product of A and B is the same as finding the wedge product
A ∧ B and then taking its dual. This can be expressed simply as × = ∗ ◦ ∧.
(3) Now let us look at the vector field theory of the 3-dimensional Euclidean space
from the viewpoint of differential geometry. As we mentioned previously, ∇  in the
vector field theory is the derivative operator ∂a associated with the Euclidean metric
δab ; in principle, any equation that involves ∇  can be expressed in terms of ∂a . For
instance:

 f = ∂a f ;
(a) ∇
 · A = ∂a Aa ;
(b) ∇
 × A = εabc ∂a Ab [the derivation is similar to (5.6.3)] ;
(c) ∇
 · ( A B)
(d) ∇  = ∂a (Aa B b ) ;
 A = ∂ a Ab ;
(e) ∇
(f) ∇ 2 f = ∂a ∂ a f ;
(g) ∇ 2 A = ∂a ∂ a Ab . (5.6.5)

By means of ∂a and the abstract index notation, one can also simplify the derivation
of some useful formulas and make the reasoning clearer. Here we give only two
examples.
152 5 Differential Forms and Their Integrals

Example 1 Using ∂a , show that

 · ( A × B)
∇  = B · (∇  − A · (∇
 × A)  .
 × B) (5.6.6)

Proof

 · ( A × B)
∇  = ∂c (εcab Aa Bb ) = εcab (Aa ∂c Bb + Bb ∂c Aa ) , (5.6.7)

while

B · (∇  = Bb (∇
 × A)  b = Bb εbca ∂c Aa = εcab Bb ∂c Aa ,
 × A)
− A · (∇  = −Aa (∇
 × B)  a = −Aa εacb ∂c Bb = εcab Aa ∂c Bb .
 × B)

Plugging into (5.6.7) we get (5.6.6). 

Example 2 Using ∂a , show that

 A · B)
∇(  = ( A · ∇)
 B + ( B · ∇)
 A + A × (∇  + B × (∇
 × B)  .
 × A) (5.6.8)

Proof For each term on the right-hand side of (5.6.8), we have

the first term = Aa ∂ a B b , the second term = Ba ∂ a Ab ,


the third term = A × (εcde ∂d Be ) = εbac Aa (εcde ∂ d B e )
= 2δ[d
b a
δe] Aa ∂ d B e = (δdb δea − δeb δda )Aa ∂ d B e = Aa ∂ b B a − Aa ∂ a B b ,

and similarly,

the fourth term = Ba ∂ b Aa − Ba ∂ a Ab .

Hence,

 A · B)
the r.h.s. of (5.6.8) = Aa ∂ b B a + Ba ∂ b Aa = ∂ b (Aa B a ) = ∇(  .

(4) The gradient, curl and divergence in the 3-dimensional Euclidean space can be
simply expressed using the exterior differentiation as follows:
Theorem 5.6.2 Suppose f and A are respectively a function and a vector field on
the 3-dimensional Euclidean space, then

grad f = d f , curl A = ∗ d A , div A = ∗ d(∗ A) . (5.6.9)

Proof Exercise 5.11. 


5.7 Computing the Riemann Curvature Using the Tetrad Method [Optional Reading] 153

The fact that R3 is a trivial manifold assures that a closed form field on R3 is exact
(see Remark 1 of Sect. 5.1). Combining this with (5.6.9), one can easily prove (Exer-
cise 5.15) the following well-known propositions which are not so straightforward
to prove by the standard vector analysis of 3-dimensional Euclidean space:
(1) a vector with no curl must be a gradient field, i.e.,

curl E = 0 ⇒ ∃ a scalar field φ s.t. E = grad φ ,

(2) a vector with no divergence must be a curl field, i.e.,

div B = 0 ⇒ ∃ a vector field A s.t. B = curl A .

5.7 Computing the Riemann Curvature Using the Tetrad


Method [Optional Reading]

There are two major methods for computing the Riemann curvature Rabc d of a derivative
operator ∇a . The first one uses a coordinate basis field; the second one uses a non-coordinate
basis field. In Sect. 3.4.2 we have already introduced the first method, in which the key step
is to find the manifestation of ∇a in a coordinate basis field, namely the Christoffel symbol
 σ μτ . This section will discuss how to compute Rabc d using a non-coordinate basis field.
First, we need to find the manifestation of ∇a in this non-coordinate basis field. For a given
derivative operator ∇a , suppose {(eμ )a } is an arbitrary basis field whose domain is U ⊂ M.
The derivative of the μth basis field (eμ )a along the τ th basis field (eτ )a , i.e., (eτ )b ∇b (eμ )a ,
is also a vector field on U , and thus can be expanded in terms of the basis field {(eσ )a }:

(eτ )b ∇b (eμ )a = γ σ μτ (eσ )a , (5.7.1)


where γ σ μτ , called the connection coefficients, can be regarded as the manifestation of ∇a
in the basis field {(eσ )a }. The γ σ μτ of a coordinate basis field are specifically denoted by
 σ μτ . It is not difficult to show that (Exercise 5.17) these  σ μτ are exactly the components
of the Christoffel symbol  c ab defined in Sect. 3.1 in this coordinate basis field. That is, the
coordinate components of a Christoffel symbol can be defined equivalently as follows:
 b  a  a
∂ ∂ ∂
∇b =  σ μτ . (5.7.2)
∂xτ ∂xμ ∂xσ

The contraction of (5.7.1) and the dual basis (eν )a gives the explicit expression for γ σ μτ :

γ ν μτ = (eν )a (eτ )b ∇b (eμ )a . (5.7.3)


τ can be chosen from 1 to n for given values of μ and ν. Thus, {γ ν μτ |ν, μ are fixed,
τ = 1, . . . , n} is the collection of n real functions γ ν μ1 , . . . , γ ν μn . Using these components
we can define a 1-form (ωμ ν )a , called the connection 1-form of ∇a in the basis field {(eμ )a },
denoted by ωμ ν a for short, as follows:

ωμ ν a := −γ ν μτ (eτ )a . (5.7.4)
154 5 Differential Forms and Their Integrals

Note that the lower index a of ωμ ν a is an abstract index, indicating it is a 1-form; μ and
ν are the indices numbering the connection 1-forms. It is easy to derive from the equation
above and (5.7.3) that

ωμ ν a = −(eν )c ∇a (eμ )c = (eμ )c ∇a (eν )c , (5.7.5)


where in the first equality we used (eτ )a (eτ )b = δ b a [see (2.6.4)], and in the second equality
we used the Leibniz rule and the definition of a dual basis (eν )c (eμ )c = δ ν μ . The collection
of all the connection 1-forms {ωμ ν a |μ, ν = 1, . . . , n} can be viewed as the manifestation of
∇a in the basis field {(eμ )a }. For a given ∇a , in principle one can choose a basis field {(eμ )a }
and compute all the connection 1-forms ωμ ν a of ∇a with respect to this basis field, and then
compute the curvature tensor. A basis is also called a frame (a 4-dimensional frame is also
called a tetrad). In many cases, a frame actually will mean a non-coordinate basis. Now, we
will present the tetrad method for computing the curvature introduced by Élie Cartan.
Since both ωμ ν a and the dual basis (eμ )c are 1-forms, we can drop the lower index a and
denote them as ωμ ν and eμ , respectively. Under the torsion-free condition, they have the
following relation:

Theorem 5.7.1 (Cartan’s first equation of structure)

deν = −eμ ∧ ωμ ν . (5.7.6)

Proof

−(eμ )a ∧ ωμ ν b = −(eμ )a ∧ [(eμ )c ∇b (eν )c ] = −2(eμ )[a (eμ )c ∇b] (eν )c


= −2δ c [a ∇b] (eν )c = −2∇[b (eν )a] = (deν )ab . 

Now we discuss how to calculate the curvature tensor Rabc d from ωμ ν . Let

Rabμ ν ≡ Rabc d (eμ )c (eν )d . (5.7.7)


Then Rabμ ν = −Rbaμ ν indicates that Rabμ ν can be regarded as the μth and νth 2-form
fields, denoted for short as Rμ ν . It follows from (5.7.7) that μ and ν are the component
indices (for the frame components) of Rabc d , while they can also be viewed as the indices
numbering the 2-form fields Rμ ν (however, the μ and ν in ωμ ν can only be viewed as the
indices numbering the 1-forms). The curvature 2-forms Rμ ν and the connection 1-forms
ωμ ν have the following relation:

Theorem 5.7.2 (Cartan’s second equation of structure)

Rμ ν = dωμ ν + ωμ λ ∧ ωλ ν . (5.7.8)

Proof It follows from (5.7.7) and the definition of Rabc d that

Rabμ ν = 2(eμ )c ∇[a ∇b] (eν )c .

Also,
5.7 Computing the Riemann Curvature Using the Tetrad Method [Optional Reading] 155

(eμ )c ∇a ∇b (eν )c = ∇a [(eμ )c ∇b (eν )c ] − [∇a (eμ )c ]∇b (eν )c


= ∇a ωμ ν b − [∇a (eμ )d ]δ c d ∇b (eν )c
= ∇a ωμ ν b − [∇a (eμ )d ](eλ )d (eλ )c ∇b (eν )c
= ∇a ωμ ν b + ωμ λ a ωλ ν b .

Hence,

Rabμ ν = 2∇[a ωμ ν b] + 2ωμ λ [a ωλ ν b] = (dωμ ν )ab + (ωμ λ ∧ ωλ ν )ab , (5.7.8 )


which is exactly (5.7.8). 

Remark 1 As we mentioned, (5.7.6) only holds for torsion-free connections. When torsion
exists, one should add an additional torsion term, and the complete first equation of structure
can be written as (also see Appendix I in Volume III):

T ν = deν + eμ ∧ ωμ ν ,

where T ν is the torsion 2-form, which relates to the torsion tensor defined in Exercise 3.1
as T ν ab ≡ T c ab (eν )c . Note that the definition of the exterior differentiation (5.1.11) holds
only without torsion, so the last step in the proof of Theorem 5.7.1 will not hold in this case.

Remark 2 Equation (5.7.8) is equivalent to (3.4.20 ); they are the component expressions for
the definition of the curvature (namely the relation between the connection and the curvature)
in a frame and in a coordinate basis, respectively.

When ωμ ν are already obtained, we can conveniently derive Rμ ν using the second equa-
tion of structure; all we have to do is to take the exterior differentiation and take the wedge
product of ωμ ν . To find all the components Rρσ μ ν of Rabc d in the chosen frame, all we have
to do is to take the contraction using the following formula:

Rρσ μ ν = Rabμ ν (eρ )a (eσ )b . (5.7.9)


Many other works use i j (or i j ), Rmni j and θ i to denote Rμ ν , Rρσ μ ν and eμ in this text
(note that their i, j and a, b are all component indices), and write the relation between the
curvature 2-form i j and the tetrad components Rmni j of the curvature tensor as
1
i j = Rmni j θ m ∧ θ n ,
2
Using our notation, this equation may be expressed as
1 1
Rμ ν = Rρσ μ ν eρ ∧ eσ , i.e., Rabμ ν = Rρσ μ ν (eρ )a ∧ (eσ )b . (5.7.10)
2 2
In fact, this is nothing but a special case of (5.1.6 ) when l = 2.
If, besides ∇a , there is also a metric gab given on M, which satisfies ∇a gbc = 0, then we
will have even more to discuss. Using gμν and g μν to represent the components of gab and
g ab in the chosen frame, i.e.,

gμν = gab (eμ )a (eν )b , (5.7.11)


g μν = g ab (eμ )a (eν )b , (5.7.12)
and introducing these two notations:
156 5 Differential Forms and Their Integrals

(a) (eμ )a ≡ gab (eμ )b , (b) (eμ )a ≡ g ab (eμ )b , (5.7.13)


we have
(a) (eμ )a = g μν (eν )a , (b) (eμ )a = gμν (eν )a . (5.7.14)
To prove (a) of (5.7.14), all we have to do is to verify that both sides acting on (eσ )a give
the same result. To prove (b) of (5.7.14), all we have to do is to verify that both sides acting
on (eσ )a give the same result. The proof is left to the reader.
Raising and lowering the indices for the two equations in (5.7.14) using g ab and gab
yields
(a) (eμ )a = g μν (eν )a , (b) (eμ )a = gμν (eν )a . (5.7.15)
In (5.7.14) and (5.7.15), both (a) indicate that the number index ν of the basis can be
raised by g μν , and both (b) indicate that the number index ν of the basis can be lowered by
gμν . Similarly, one can use gμν to lower the indices for ωμ ν a , i.e., define (NB: ωμνa was
meaningless without this definition)

ωμνa := gνσ ωμ σ a = gνσ (eμ )c ∇a (eσ )c . (5.7.16)

A frame with gμν as constants (i.e., ∇a gμν = 0) is called a rigid frame. An orthonormal
frame is the simplest rigid frame. For a Lorentzian metric, an orthonormal frame satisfies
gμν = ημν , which brings a huge convenience to the calculations (for details, see the example
at the end of this chapter). There is another kind of rigid frame that is frequently used in
general relativity—the complex null frame, which will be discussed in detail in Sects. 8.7
and 8.8. It is easy to see from (5.7.16) and ∇a gμν = 0 that the following relation holds for
rigid frames:
ωμνa = (eμ )b ∇a (eν )b . (5.7.17)

Theorem 5.7.3 For a rigid frame, we have

ωμνa = −ωνμa . (5.7.18)

Proof It follows from (5.7.17) that

ωμνa = ∇a [(eμ )b (eν )b ] − (eν )b ∇a (eμ )b = ∇a [gbc (eμ )c (eν )b ] − (eν )b ∇a (eμ )b
= ∇a gνμ − (eν )b ∇a (eμ )b = −(eν )b ∇a (eμ )b = −ωνμa ,

where the fourth equality comes from the fact that ∇a gνμ = 0. 

Equation (5.7.18) indicates that, for a rigid frame, the ωμνa are antisymmetric with respect
to μ and ν, which reduces the number of the independent connection 1-forms from n 2 (where
n is the dimension of M) to n(n − 1)/2 (there are 6 of them when n = 4). In a chosen
basis, the components ωμνρ ≡ ωμνa (eρ )a play a similar role in the computation as the
Christoffel symbols  σ μτ in a coordinate basis, the former of which also have n 3 numbers,
but with only n 2 (n − 1)/2 independent ones (there are 24 of them when n = 4). Hence, the
independent ωμνρ are less than the independent  σ μτ . [It follows from the symmetry of
its lower indices that there are n 2 (n + 1)/2 independent  σ μτ .] ωμνρ are called the Ricci
rotation coefficients.
The “tetrad method” of computing the curvature tensor includes the following three steps:
(a) choosing a tetrad; (b) computing all the connection 1-forms ωμ ν ; (c) using Cartan’s second
equation of structure (5.7.8) to compute all the curvature 2-forms Rμ ν from ωμ ν . Among
them, step (b) needs to be further elaborated. Since rigid tetrads are the most commonly
5.7 Computing the Riemann Curvature Using the Tetrad Method [Optional Reading] 157

used, here we only introduce the method of computing ωμ ν using a rigid tetrad. Choosing
an arbitrary coordinate system {x μ }, in which we define

μνρ ≡ [(eν )λ,τ − (eν )τ,λ ](eμ )λ (eρ )τ , (5.7.19)


where (eν )λ and (eμ )λ are the λth components of (eν )a and (eμ )a , i.e.,

(eν )λ ≡ (eν )a (∂/∂ x λ )a , (eμ )λ ≡ (eμ )a (dx λ )a ,

and (eν )λ,τ is an abbreviation for ∂(eν )λ /∂ x τ . It can be easily seen that μνρ = −ρνμ ;
hence, there are only n 2 (n − 1)/2 independent μνρ . After obtaining all the μνρ using
(5.7.19), one can compute all the ωμνρ using the following theorem.

Theorem 5.7.4
1
ωμνρ = (μνρ + ρμν − νρμ ) . (5.7.20)
2

Proof It follows from the torsion-free condition of ∇a that the lower indices of the Christoffel
symbol are symmetric, i.e.,  μ νσ =  μ σ ν . Hence,

(eν )λ,τ − (eν )τ,λ = (eν )λ;τ − (eν )τ ;λ ,

and thus (5.7.19) can be rewritten as

μνρ = [∇a (eν )b − ∇b (eν )a ](∂/∂ x τ )a (∂/∂ x λ )b (eμ )λ (eρ )τ


= [∇a (eν )b − ∇b (eν )a ](eρ )a (eμ )b = ωμνρ − ωρνμ .

From here, it is not difficult to get (5.7.20). 

Equation (5.7.20) is the explicit expression for ωμνρ , which is convenient for calculating
ωμνρ directly. However, the drawback is that this formula involves too many equations. If
the metric has some symmetries, it is usually faster to find the ωμ ν for a rigid tetrad using
Cartan’s first equation of structure (see the method given after the solution of Example 1).
Now we give a specific example of the calculation.

Example 1 Given the expression for the line element of a spacetime metric gab in the
{t, r, θ, ϕ} coordinate system:

ds 2 = −e2 A(r ) dt 2 + e2B(r ) dr 2 + r 2 (dθ 2 + sin2 θdϕ 2 ) , (5.7.21)


find all of its curvature 2-forms Rμ ν using an orthonormal tetrad.

Solution (a) Choose an orthonormal tetrad. It follows from (5.7.21) that the coordinate basis
vectors are orthogonal but not normalized; therefore, to make from them an orthonormal
basis, one may choose

(e0 )a = e−A (∂/∂t)a , (e1 )a = e−B (∂/∂r )a ,


−1
(e2 ) = r
a
(∂/∂θ) ,
a
(e3 )a = (r sin θ)−1 (∂/∂ϕ)a , (5.7.22)
and the corresponding dual basis vectors are

(e0 )a = e A (dt)a , (e1 )a = e B (dr )a ,


(e )a = r (dθ)a ,
2
(e3 )a = (r sin θ)(dϕ)a . (5.7.23)
Or, lowering the indices for (5.7.22) yields
158 5 Differential Forms and Their Integrals

(e0 )a = −e A (dt)a , (e1 )a = e B (dr )a ,


(e2 )a = r (dθ)a , (e3 )a = (r sin θ)(dϕ)a . (5.7.24)

(b) Compute μνρ using (5.7.19). In the calculation we need a coordinate system, and
naturally we choose the given system {t, r, θ, ϕ}. Noticing the antisymmetric relation μνρ =
−ρνμ , one can first find all (six) independent μ0ρ (namely, 001 , 002 , 003 , 102 , 103 ,
203 ), and then find all the independent μ1ρ , · · · . Equation (5.7.24) indicates that the only
nonvanishing component of (e0 )λ is (e0 )0 = −e A , which is only a function of r , and hence
the only nonvanishing term of (e0 )0,τ is (e0 )0,1 = −A e A (where  stands for the derivative
with respect to r ). Thus,

μ0ρ = [(e0 )0,1 − 0](eμ )0 (eρ )1 = −A e A (eμ )0 (eρ )1 .

Also, (eμ )0 and (eρ )1 are nonvanishing unless μ = 0 and ρ = 1; hence, the only nonvan-
ishing μ0ρ is

001 = −A e A (e0 )0 (e1 )1 = −A e A e−A e−B = −A e−B .

Similarly, one can find that the nonvanishing μνρ are

001 = −100 = −A e−B , 122 = −221 = −r −1 e−B ,


133 = −331 = −r −1 e−B , 233 = −332 = −r −1 cot θ .

Plugging into (5.7.20) yields the nonvanishing ωμνρ (note that ωμνρ = −ωνμρ ):

ω010 = −ω100 = −A e−B , ω122 = −ω212 = −r −1 e−B ,


ω133 = −ω313 = −r −1 e−B , ω233 = −ω323 = −r −1 cot θ .

Therefore, the six independent connection 1-forms ωμν are

ω01 = −A e−B e0 , ω02 = 0 , ω03 = 0 ,


−1 −B 2 −1 −B 3
ω12 = −r e e , ω13 = −r e e , ω23 = −r −1 cot θ e3 .

(c) Derive the curvature 2-forms using Cartan’s second equation of structure. To find the
exterior differentiation more conveniently, we rewrite the nonvanishing ωμν in terms of the
dual coordinate basis vectors:

ω01 = −A e A−B dt , ω12 = −e−B dθ ,


ω13 = −e−B sin θdϕ , ω23 = − cos θdϕ .

It follows from ωμ ν = g νσ ωμσ = ηνσ ωμσ that ω0 i = ω0i , ωi j = ωi j (i, j = 1, 2, 3). Plug-
ging into (5.7.8), it is not difficult to find

R0 1 = e−2B (A − A B  + A2 )e0 ∧ e1 , R0 2 = r −1 A e−2B e0 ∧ e2 ,


R0 3 = r −1 A e−2B e0 ∧ e3 , R1 2 = r −1 B  e−2B e1 ∧ e2 ,
−1  −2B 1
R1 = r
3
Be e ∧e , 3
R2 3 = r −2 (1 − e−2B )e2 ∧ e3 .

Here we only give the calculation for the longest one R1 3 :


5.7 Computing the Riemann Curvature Using the Tetrad Method [Optional Reading] 159
 
∂ −B ∂ −B
dω1 3 = − (e sin θ)dr + (e sin θ)dθ ∧ dϕ
∂r ∂θ
= e−B (B  sin θdr ∧ dϕ − cos θdθ ∧ dϕ)
= r −1 B  e−2B e1 ∧ e3 − r −2 e−B cot θ e2 ∧ e3 ,
λ
ω1 ∧ ωλ = ω1 2 ∧ ω2 3 = ω12 ∧ ω23 = r −2 e−B cot θ e2 ∧ e3 ,
3

R1 3 = dω1 3 + ω1 λ ∧ ωλ 3 = r −1 B  e−2B e1 ∧ e3 . 

We have demonstrated above the computation of the connection 1-forms ωμ ν using


(5.7.20). Now, with the same example, we introduce an equivalent method of deriving the
ωμ ν using Cartan’s first equation of structure. Taking the exterior differentiation of (5.7.23)
and plugging it into Cartan’s first equation (5.7.6), we find

A e−B e1 ∧ e0 = −e1 ∧ ω1 0 − e2 ∧ ω2 0 − e3 ∧ ω3 0 , (5.7.25a)


0 = −e0 ∧ ω0 1 − e2 ∧ ω2 1 − e3 ∧ ω3 1 , (5.7.25b)
r −1 e−B e1 ∧ e2 = −e0 ∧ ω0 2 − e1 ∧ ω1 2 − e3 ∧ ω3 2 , (5.7.25c)
−1 −B 1 −1
r e e ∧ e + r cot θ e ∧ e = −e ∧ ω0 − e ∧ ω1 − e ∧ ω2 . (5.7.25d)
3 2 3 0 3 1 3 2 3

In principle, expanding the 1-forms ωμ ν in terms of the basis eμ (e.g., ω1 0 = α0 e0 + α1 e1 +


α2 e2 + α3 e3 ) and plugging the result into (5.7.25), one can obtain all the ωμ ν . In fact,
however, one can usually “read off” or even guess the correct solution in a much simpler
manner. For example, the following guesses for ω1 0 , ω2 0 and ω3 0 will satisfy (5.7.25a):

ω1 0 = −A e−B e0 + α1 e1 , ω2 0 = ω3 0 = 0 . (5.7.26)


Plugging the above results into (5.7.25b) yields

0 = −α1 e0 ∧ e1 − e2 ∧ ω2 1 − e3 ∧ ω3 1 .

The last two terms in this equation do not contain e0 ∧ e1 , and hence α1 = 0. It seems that one
could guess ω2 1 = ω3 1 = 0; however, ω2 1 = 0 cannot satisfy (5.7.25c) and ω3 1 = 0 cannot
satisfy (5.7.25d). From (5.7.25c) one can guess that ω1 2 = −r −1 e−B e2 , and from (5.7.25d)
one can guess that ω1 3 = −r −1 e−B e3 and ω2 3 = −r −1 cot θ e3 . It can be easily seen that
these guesses also satisfy (5.7.25b) and (5.7.25c). Thus, the solution we just guessed, i.e.,

ω1 0 = −A e−B e0 , ω2 0 = ω3 0 = 0 ,
ω1 2 = −r −1 e−B e2 , ω1 3 = −r −1 e−B e3 , ω2 3 = −r −1 cot θ e3 ,

satisfies Cartan’s equation, and therefore is the correct answer [which is the same as the
result of step (b) in the solution of Example 1].
So far we have introduced two methods for computing the Riemann tensor Rabc d : the
coordinate basis method and the tetrad method (especially the orthonormal tetrad method).
Each of these two methods has advantages and disadvantages, one can choose which one
to use based on the specific problem and their own proficiency. Someone might wish that
there is a method that combines the coordinate basis method and the orthonormal tetrad
method, namely wish that there is an orthonormal coordinate basis. However, this is impos-
sible unless gab is a flat metric. The reason is simple: the coordinate basis being orthonormal
indicates that gab = ημν (∂/∂ x μ )a (∂/∂ x ν )b . Suppose ∂a is the ordinary derivative operator
of the coordinate system, then ∂a gbc = 0, and hence ∂a is the derivative operator associated
with gab . Since ∂[a ∂b] ωc = 0 ∀ωc , we see that the Rabc d for gab vanishes, i.e., gab is flat.
160 5 Differential Forms and Their Integrals

Exercises

˜5.1. Complete the proof of Theorem 5.1.3 by showing that the 2-forms (e1 )a ∧
(e2 )b , (e2 )a ∧ (e3 )b and (e3 )a ∧ (e1 )b are linearly independent.
˜5.2. Suppose V is a vector space and {(e1 )a , (e2 )a , (e3 )a , (e4 )a } is a basis of
V ∗ . Find the expansion of ωa ∈ (1), ωab ∈ (2), ωabc ∈ (3) and ωabcd ∈
(4) in this basis and explain the definition of the coefficients (e.g., ω12 ).
˜5.3. Using mathematical induction, show that (ω1 )a1 ∧ · · · ∧ (ωl )al = l!(ω1 )[a1 · · ·
(ωl )al ] , where (ω1 )a , . . . , (ωl )a are arbitrary dual vectors.
˜5.4. Prove Theorem 5.1.4.
˜5.5. Suppose ω is a 1-form field and u and v are vector fields. Show that dω(u, v) =
u(ω(v)) − v(ω(u)) − ω([u, v]). The left-hand side represents the result of dω
acting on u and v, i.e., (dω)ab u a vb .
˜5.6. Suppose vb and ωa1 ···al are a vector field and an l-form field, respectively, on
a manifold M. Show that
(a) Lv ωa1 ···al = da1 (vb ωba2 ···al ) + (dω)ba1 ···al vb .
NB: Let μa2 ···al ≡ vb ωba2 ···al , then da1 μa2 ···al means (dμ)a1 a2 ···al .
(b) Lv dω = dLv ω (this is actually a very useful identity).
Hints: (1) One can first prove the special case of (a) where l = 2, and
then it is not difficult to generalize it after getting the feeling.
(2) The result of (a) can make the proof of (b) quite simple.
5.7. Suppose O is the coordinate patch of the coordinate system {x μ } on an n-
dimensional manifold M (and O is homeomorphic to Rn ) and that ωa is a
1-form field on O. Show that
∂ωμ ∂ων
= μ (μ, ν = 1, . . . , n)
∂xν ∂x

if and only if there exists f : O → R such that ∇a f = ωa . Hint: follow the


proof of Corollary 5.1.6 in Sect. 5.1.
5.8. Suppose {x, y, z} and {r, θ, ϕ} are a Cartesian coordinate system and a spher-
ical coordinate system, respectively, of the 3-dimensional Euclidean space.
Write down the expression for dr ∧ dθ ∧ dϕ in terms of dx ∧ dy ∧ dz.
˜5.9. A connected manifold M together with a metric field gab with a Lorentzian
signature is called a spacetime. Suppose Fab is a 2-form field on an arbitrary
4-dimensional spacetime (we will see in Chap. 6 that the electromagnetic
field tensor Fab is exactly a 2-form field), show that

1 1
(Fac Fb c + ∗ Fac ∗ Fb c ) = Fac Fb c − gab Fcd F cd ,
2 4
where ∗ Fac ≡ (∗ F)ac , ∗ Fb c = g ac∗ Fba (this identity is helpful for studying
electromagnetic fields).
*5.10. Show that ε̂a1 ···an−1 ≡ ±n b ε̂ba1 ···an−1 is the volume element on ∂ N associated
with the induced metric field h ab .
References 161

5.11. Prove Theorems 5.6.1 and 5.6.2.


˜5.12. Suppose x, y, z are Cartesian coordinates of the 3-dimensional Euclidean
space. Show that (a) ∗ dx = dy ∧ dz; (b) ∗ (dx ∧ dy ∧ dz) = 1.
5.13. Suppose {r, θ, ϕ} is a spherical coordinate system of the 3-dimensional
Euclidean space. Show that, ∗ dr = (r 2 sin θ )dθ ∧ dϕ.
5.14. Suppose A and B are vector fields on R3 and ∇  is the derivative operator on
R3 associated with the Euclidean metric. Show that

 × ( A × B)
∇  = ( B · ∇)
 A + (∇  A − ( A · ∇)
 · B)  B − (∇  B .
 · A)

5.15. Using differential forms, prove the following well-known propositions that
are not so easy to prove by the vector analysis of the 3-dimensional Euclidean
space (see the end of Sect. 5.6):
(1) a vector with no curl must be a gradient field;
(2) a vector with no divergence must be a curl field.
5.16. Suppose ∇a is the associated derivative operator on a generalized Rieman-
nian space (M, gab ) (i.e., ∇a gbc = 0), ε is the associated volume element
(i.e., ∇a εb1 ···bn = 0), va is a vector field on M, va ≡ gab vb is the 1-form corre-
sponding to va , and ∗ v is the dual form field of va . Show that (∇a va )ε = d∗ v.
NB: This conclusion can be generalized as follows: suppose Fa1 ···ak is a k-
form field (k  n), denoted by F for short, and denote the (k − 1)-form
field ∇ ak Fa1 ···ak as divF, then ∗ (divF) = d∗ F. The Maxwell equations of an
electromagnetic field (see Sect. 12.6.1) provide an example.
5.17. Show that the  σ μτ defined by (5.7.2) are exactly the components of the
Christoffel symbol defined in Sect. 3.1 with respect to the given coordinate
basis in (5.7.2).
*5.18. Using the orthonormal tetrad method, find all the tetrad components of the
curvature tensors of the metrics in Exercises 14–16 of Chap. 3, and verify
that the results are the same as those of the curvature tensors derived from
the coordinate basis method. To distinguish from the coordinate components
Rμνσ ρ of Rabc d , one may change the notation of the tetrad components to
R(μ)(ν)(σ ) (ρ) after obtaining all the tetrad components of Rabc d .

References

Abraham, R. and Marsden, J. (1978), Foundations of Mechanics, Addison-Wesley Publishing Com-


pany, Redwood City.
Chern, S. S., Chen, W. and Lam, K. S. (1999), Lectures on Differential Geometry, World Scientific
Publishing Company, Singapore.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Warner, F. W. (1983), Foundations of Differentiable Manifolds and Lie Groups, Springer-Verlag,
New York.
Chapter 6
Special Relativity

6.1 Foundations of the 4-Dimensional Formulation

The traditional way to formulate special relativity is to use the so-called 3 + 1-


dimensional (or, for short, 3-dimensional) formulation, in which space and time
are treated separately in specific coordinate systems. However, after acquiring an
understanding of differential geometry in the previous chapters, one can also use a
4-dimensional “global” way to formulate special relativity, which not only makes it
easier to grasp the essence of the theory but also provides a necessary foundation for
learning general relativity. The mission of this chapter is to provide this geometric
reformulation of special relativity, that is, rather than using the 3-dimensional for-
mulation, we will develop a clearer and deeper understanding by approaching the
theory through the language of 4-dimensional geometry. Note that in this chapter we
assume that our readers have learned the basics of special relativity.

6.1.1 Preliminaries

Physics studies the evolution of physical objects. For the convenience of study, people
usually use physical models to describe physical objects. Models are the idealized
version of objects, such as point masses, point charges, charged surfaces, etc.
Now let us introduce a few fundamental concepts that will later be frequently
encountered using the language of models.
An “event” is supposed to be a very intuitive concept. A bomb explosion, a car
crash, a cough are all events, each of which occupies a certain part of space and
lasts for a certain period of time. The concept of an event in physics, however, is
the modeling of a real event, i.e., we regard every event as happening at a point in
space and an instant in time. No matter what is happening, the combination of a point
in space and an instant in time is called an event. The collection of all the events
is called a spacetime, and thus each event is a spacetime point. According to our

© Science Press 2023 163


C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_6
164 6 Special Relativity

physical intuition, a spacetime should be a “4-dimensional continuum”; however,


the precise definition of this phrase was not clear at first. Later, people found that
the mathematical concept that can make this precise is a 4-dimensional manifold.
Treating a spacetime as a 4-dimensional manifold (together with proper additional
structures, e.g., a Lorentzian metric) is a basic starting point (postulate) assumed in
physics. Both pre-relativity physics and special relativity assume that the spacetime
manifold is R4 (the difference is their additional structures on R4 , see later); general
relativity, on the other hand, allows the spacetime manifold to be any 4-dimensional
connected manifold.
A point mass in Newtonian mechanics is a modeling concept that refers to a
massive point in space. To discuss relativity, we generalize the concept of a point
mass to a particle. Here a particle means a modeling particle, which is related to
but different from those specific particles in physics, such as protons, neutrons, etc.,
in that it has no size at all. We can classify the particles into two types [see Synge
(1956)]: those with (rest) mass, which are the same as point masses; and those
without (rest) mass, which are often called photons for convenience’s sake. The
whole history of a particle is formed by a series of events, and therefore corresponds
to a curve in spacetime, called the world line of this particle. Suppose a boy (viewed
as a point mass) is at rest on the ground, and a fly is experiencing a uniform circular
motion around the boy (see Fig. 6.1a), then the world line of both the boy and the
fly is as shown in Fig. 6.1b (called a spacetime diagram). In a spacetime diagram,
the upward direction represents the time direction, and the horizontal directions
represent spatial directions. Each horizontal slice represents the whole space at a
certain moment of time. One can see the whole process of the motion (evolution) by
viewing the spacetime diagram from the bottom up.
A person who makes physical measurements is called an observer. Usually an
observer is modeled as a point mass. To make a measurement, the observer should
be equipped with an accurate clock, called a standard clock, and the reading of this
clock is called the proper time of this observer (see Sect. 6.1.4 for details). More
generally, one can consider that any point mass carries a standard clock, and each
point mass has its own proper time. Mathematically speaking, proper time is nothing
but a special parameter for the world line of a point mass. An observer can only make

Fig. 6.1 A boy is at rest on


the ground, around which a
fly is experiencing a uniform
circular motion
6.1 Foundations of the 4-Dimensional Formulation 165

direct measurements of the events which happen on its own world line. In order to
observe any event in the whole spacetime (or in an open subset of it), one needs
to set observers everywhere (like a “patrol”), and these ubiquitous observers form a
reference frame. More precisely, the set R of an infinite number of observers is called
a frame of reference, or a reference frame, if it satisfies the following condition:
any point in spacetime (or in an open subset of spacetime) is passed through by
one and only one observer in R. This abstract definition is actually the specification
and generalization of the often used concept of a reference frame. Take the familiar
example of a moving train. Imagine the train being filled with passengers (observers),
each of which carries a standard clock and is labeled by three real numbers (the spatial
coordinates). Any event which happens inside the train must happen to an observer,
who can record the spacetime coordinates t, x, y, z of this event (where t is the
reading of the standard clock and x, y, z are the spatial coordinates of the observer).
Although a train has only a limited size (length, width and height), when we talk
about the “train frame”, i.e., the reference frame of the train, as a modeled concept,
we have already assumed that the whole space is filled with observers. To be specific,
each spatial point is occupied by an observer in the train frame; these observers move
along with the train, which means they are motionless with respect to the observers
inside the train. On the other hand, the observers in the “ground frame” also fill up
the whole space, but they have a relative velocity with respect to the observers in
the train frame. If we use vertical lines to represent the world lines of the ground
frame observers in a spacetime diagram, the world lines of the train frame observers
will be parallel oblique lines (the reader should draw a picture). The specification
and generalization of this understanding (allowing two world lines in a frame to be
non-parallel, i.e., allowing the distance between two observers to change with time)
lead us to the preceding definition of a reference frame.

6.1.2 The Background Spacetime of Special Relativity

The so-called “geometric formulation” of special relativity actually refers to the con-
struction of a 4-dimensional (rather than 3-dimensional) model using the language
of differential geometry. The conclusions we derive will certainly agree with the
3-dimensional formulation of special relativity. To construct this geometric formu-
lation, the first problem is: what manifold, together with what additional structure,
should we use as the background spacetime? Physically speaking, any event in spe-
cial relativity can be described by the coordinates of an inertial frame. The ranges
for the coordinates t, x, y, z of any inertial frame are all from −∞ to ∞. Suppose
p and q are two neighboring points (see Fig. 6.2), which represent two neighboring
events in physics. According to special relativity, the important physical quantity
that describes the relationship between p and q is the infinitesimal interval, which
can be defined by means of an inertial coordinate system {t, x, y, z} as

ds 2 = −dt 2 + dx 2 + dy 2 + dz 2 . (6.1.1)
166 6 Special Relativity

[This book adopts the geometrized unit system, in which c = 1 (for details, see
Appendix A)]. An important property of an infinitesimal interval is that it preserves
its form when transformed from one inertial frame to another inertial frame, i.e.,

−dt 2 + dx 2 + dy 2 + dz 2 = −dt 2 + dx 2 + dy 2 + dz 2 .

(This invariance of the interval can be verified by performing a Lorentz transforma-


tion). This reminds us of the 4-dimensional Minkowski space in mathematics: the
line element in Minkowski space (R4 , ηab ) can be expressed in terms of a Lorentzian
coordinate system {x 0 , x 1 , x 2 , x 3 } as

ds 2 = −(dx 0 )2 + (dx 1 )2 + (dx 2 )2 + (dx 3 )2 . (6.1.1 )

This equation has the same form as (6.1.1), and it preserves this form when trans-
formed from one Lorentzian system to another Lorentzian system. Hence, one can
see that an infinitesimal interval in physics corresponds to a Minkowski line element
in mathematics, an inertial coordinate system in physics corresponds to a Lorentzian
coordinate system in mathematics, and the background spacetime of special relativ-
ity corresponds to the Minkowski space (R4 , ηab ). (Thus, a Minkowski space is also
called a Minkowski spacetime. We may regard Minkowski space as an expression
leaning towards the mathematics side, and Minkowski spacetime as leaning towards
the physics side). Even further, by changing “corresponds to” to “is identical to”, one
can say that the background spacetime of special relativity is Minkowski spacetime.
That is, special relativity is the study regarding the evolution of physical objects in
Minkowski spacetime. Any physical phenomenon happening in Minkowski space-
time belongs to the scope of special relativity.
Using an inertial coordinate system, one can define the speed of any particle.
Suppose L is the world line of a particle, p and q are two neighboring points on L
(see Fig. 6.2), and (t1 , x1 , y1 , z 1 ) and (t2 , x2 , y2 , z 2 ) are the coordinates of p and q
in an inertial frame R. Let

dt ≡ t2 − t1 , dx ≡ x2 − x1 , dy ≡ y2 − y1 , dz ≡ z 2 − z 1 ,

then the speed of the particle at p relative to the frame R is defined as

Fig. 6.2 The world line of a


particle and a line segment
6.1 Foundations of the 4-Dimensional Formulation 167

dx 2 + dy 2 + dz 2
u := . (6.1.2)
dt
Hence, it follows from (6.1.1) that the line element of L in between p and q is

ds 2 = −(1 − u 2 )dt 2 . (6.1.3)

The equation above indicates that u = 1 is equivalent to ds 2 = 0 (line element being


null); u < 1 is equivalent to ds 2 < 0 (line element being timelike). Therefore, the
two significant basic tenets of special relativity expressed in the 3 + 1-dimensional
formulation—① the speed of a photon relative to any inertial frame is u = 1; ② the
speed of a point mass relative to any inertial frame is u < 1—can now be reformulated
in terms of the 4-dimensional language as follows:
① The world line of a photon is a null curve in Minkowski spacetime;
② The world line of a point mass is a timelike curve in Minkowski spacetime.
In the 3 + 1-formulation one always needs a reference frame, and these basic
tenets above also require a definition of the speed u. Both “a reference frame”
and “a definition of speed” depend on one’s own choice, and therefore belong to
“human factors”. The use of these “human factors” not only makes it fail to be as
concise and self-contained as the 4-dimensional formulation, but sometimes may
also lead to misunderstanding. For instance, if we make different definitions of the
speed (there are several that, in some sense, are qualified to be called as “speed”),
then a “faster-than-light” particle would not contradict with the basic tenets in bold
above (the world line of a “faster-than-light” point mass can still be a timelike curve).
For instance, an important example of “faster-than-light” speed that does not violate
relativity is the recessional velocity of a galaxy, which will be discussed in detail in
Sect. 10.2.1. However, if one had only heard that “relativity does not allow travel at
a speed faster than light,” then they might think naively that this kind of seemingly
“faster-than-light” travel is forbidden by relativity. Nevertheless, a point mass whose
world line is a spacelike curve is certainly forbidden by relativity. Therefore, we can
see that the 4-dimensional geometric formulation naturally clarifies what is and is
not a violation of special relativity.

6.1.3 Inertial Observers and Inertial Frames

The fundamental postulates of special relativity are: the principle of invariant light
speed; and the special principle of relativity. The latter further contains the following
two aspects.
① Among all observers (i.e., point masses), there exists a special kind of observer,
called inertial observers, which are essentially distinguished from all the other
observers (non-inertial observers); that is, one can choose a special subset from the
collection of all the observers, in which each element is an inertial observer.
168 6 Special Relativity

② All inertial observers are on an equal footing, i.e., no inertial observer is pre-
ferred over any other; that is, one cannot choose a special element (or several) from
the subset formed by inertial observers. For example, one cannot ask which inertial
observer is at absolute rest.
Now we discuss the mathematical counterpart for an inertial observer. According
to the 3-dimensional formulation of special relativity, the speed of an inertial observer
relative to its own inertial coordinate system {t, x, y, z} is u = 0, and thus its world
line coincides with a t-coordinate line in this system. Suppose ∂a is the ordinary
derivative operator of this system, then ∂a (∂/∂t)b = 0. Hence,
 a  b
∂ ∂
∂a = 0. (6.1.4)
∂t ∂t

Noting that an inertial coordinate system is a Lorentzian coordinate system, we see


that ∂a is the derivative operator associated with the Minkowski metric ηab (satisfying
∂a ηbc = 0). Thus, (6.1.4) is a geodesic equation of Minkowski space, and hence the
world line of any inertial observer is a timelike geodesic. On the other hand, it can also
be proved that for any given timelike geodesic G one can always find a Lorentzian
coordinate system such that G is a t-coordinate line, and thus G represents an inertial
observer. Therefore, an inertial observer in physics corresponds to a timelike geodesic
in mathematics, or one can say that the world line of an inertial observer is a timelike
geodesic. From the mathematical perspective, timelike geodesics are the most natural
and simplest type of timelike curves; from the physical perspective, inertial observers
are the most natural and simplest type of observers. This is an elegant correspondence
between inertial observers and timelike geodesics.
Since each t-coordinate line in a Lorentzian coordinate system corresponds to
an inertial observer, the reference frames formed by all the t-coordinate lines in
this system is called an inertial reference frame, and this coordinate system is
called an inertial coordinate system in this inertial reference frame. When it is not
necessary to distinguish a reference frame and a coordinate system, both an inertial
reference frame and an inertial coordinate systems are called an inertial frame for
short. The domain of an inertial frame is the whole spacetime (the whole R4 ), and
thus is also called a global inertial frame. The world lines of all the observers in the
same inertial frame are parallel geodesics; in contrast, if two inertial observes belong
to two different inertial frame (such as the “train frame” and the “ground frame”
we mentioned above), then their world lines are geodesics that are not parallel to
each other. A point mass is said to be “free” if its world line is a geodesic, i.e., it is
undergoing inertial motion.
According to Theorem 4.3.6, a coordinate transformation between two Lorentzian
systems in the 4-dimensional Minkowski spacetime (R4 , ηab ) corresponds to an
isometry in (R4 , ηab ). Any isometry can be constructed from several basic isome-
tries, the latter of which includes the isometries that are “continuous” and “discrete”.
The “discrete” ones are reflections and inversions, while the “continuous” ones
includes three types [see Sect. 4.3, Example 1(4)]: (a) translations, represented by
6.1 Foundations of the 4-Dimensional Formulation 169

4 independent Killing vector fields: (∂/∂t)a , (∂/∂ x)a , (∂/∂ y)a , (∂/∂z)a ; (b) spa-
tial rotations, represented by 3 independent Killing vector fields: −y(∂/∂ x)a +
x(∂/∂ y)a , −z(∂/∂ y)a + y(∂/∂z)a , −x(∂/∂z)a + z(∂/∂ x)a ; and (c) boosts, repre-
sented by 3 independent Killing vector fields: t (∂/∂ x)a + x(∂/∂t)a , t (∂/∂ y)a +
y(∂/∂t)a , t (∂/∂z)a + z(∂/∂t)a . Now we interpret the physical meaning of these
three types of transformations by providing an example for each of them.
(a) Without loss of generality, consider time translation. In this case, the coordinate
transformation induced by the one-parameter group of isometries corresponding to
the Killing field (∂/∂t)a is

t = t + a , x = x , y = y , z = z ,

where a serves as the parameter for this one-parameter group. Physically, this trans-
formation corresponds to adding a value a to the initial setting of the standard clocks
of all the observers in the inertial frame R. Daylight saving time is an example of it,
where a = 1 (hour).
(b) Consider a rotation in the x y-surface. The coordinate transformation induced
by the one-parameter group of isometries corresponding to the Killing field
−y(∂/∂ x)a + x(∂/∂ y)a is

t = t , x  = x cos α − y sin α , y  = x sin α + y cos α , z = z ,

where α is a constant that serves as the parameter. Physically, this corresponds to a


spatial coordinate rotation inside the inertial reference frame.
(c) Consider a boost in the t x-surface. The coordinate transformation induced by
the one-parameter group of isometries corresponding to the Killing field t (∂/∂ x)a +
x(∂/∂t)a is (see Theorem 4.3.5)

t  = γ (t − vx) , x  = γ (x − vt) , y = y , z = z , (6.1.5)

where v is a constant that serves as the parameter, and γ ≡ (1 − v 2 )−1/2 . Physically,


this corresponds to the Lorentzian transformation between two reference frames R
and R  . The coordinate axes of these two reference frames are parallel and oriented
so that the frame R  is moving in the positive (or negative) x-direction with a constant
speed |v|, and the origins of their spatial coordinates are coincident at t = t  = 0.
Both translations and spatial rotations correspond to coordinate transformations in
the same inertial reference frame. For example, a time translation is just a resetting
of the zero for each standard clock owned by each observer in the same inertial
reference frame; neither the observers nor the reference frame are changed. For
another example, after a spatial rotation for {t, x, y, z}, the new coordinate system
{t  = t, x  , y  , z  } will still be an inertial coordinate system in this reference frame.
Thus, there exist many inertial coordinate systems in the same inertial reference
frame. Two inertial coordinate systems related by a boost, however, must belong to
two different inertial reference frames since their t-coordinate lines are different.
170 6 Special Relativity

6.1.4 Proper Time and Coordinate Time

The proper time of an observer (a point mass) is the reading of his standard clock.
However, what exactly is a standard clock? We will need to add the following defi-
nition:

Definition 1 A clock is called a standard clock or ideal clock if the difference


between the two readings τ1 and τ2 at two arbitrary points p1 and p2 on its world
line equals the arc length of its world line between p1 and p2 , i.e.,
 p2 
τ1 − τ2 = −ds 2 . (6.1.6)
p1

Remark 1 If we do not take c = 1, then the right-hand side of the above equation
should be multiplied by 1/c.

Remark 2 One should distinguish two concepts related to clocks—rate and (initial)
setting. A standard clock only has a requirement on its rate (i.e., the difference of
the readings at any two points on the world line equals the arc length), while the
synchronization problem in a reference frame only involves the initial (zero) setting.
Many regions in the world use daylight saving time, which stipulate the clock to be
“one hour faster” at a certain date of each year. The word “faster” may be misunder-
stood as raising the rate, but it actually just means changing the setting.

Remark 3 According to Definition 1, the proper time of an observer is equal to the arc
length of its world line. The zero of τ on the world line only depends on the setting,
which is arbitrary when there is only one observer (or a few observers). However,
if we consider a reference frame, then the zero of the proper time of each observer
needs to satisfy a certain kind of requirement. For instance, suppose R is an iner-
tial reference frame, and G is one of its observers. Let any p0 ∈ G be the zero of
the proper time of G and 0 represent the hypersurface passing through p0 that is
orthogonal to the world lines of all the observers, then any observer G  in R must
choose the intersection of 0 and their world line as the zero of their proper time.
This kind of requirement is called clock synchronization in an inertial frame. At
first sight, it seems that this could be realized as follows: Alice (observer G) sets her
clock to zero, denoted by event p0 , and simultaneously tells Bob (observer G  ) “set
your clock to zero right now.” However, since it takes time for a signal to propagate,
if we use q to represent the event of Bob receiving the notice, then q cannot be on the
hypersurface 0 . If Bob follows the order, i.e., sets his clock to zero at event q, then
it certainly cannot satisfy the requirement of clock synchronization. Thus, we can
see that clock synchronization is a nontrivial process in relativity. Here we introduce
a method of synchronization. First, Alice should tell Bob beforehand, “take a mirror
with you and zero your clock when you see the light signal I send.” At a point (event)
p1 , Alice would send a light signal to Bob; the light will be reflected when it arrives
6.1 Foundations of the 4-Dimensional Formulation 171

Fig. 6.3 Method of clock


synchronization

at Bob’s mirror (event p  ) and Alice will see this reflected light when she is at point
p2 (see Fig. 6.3). To synchronize her clock with that of Bob, Alice just needs to zero
her clock at p0 , namely the midpoint of p1 p2 (measured by the arc length). Note that
in this method we have used the fact that the speed of light does not depend on the
direction (the path of a photon is a null geodesic).

Remark 4 A standard clock is also a model. What kind of real clock can be regarded
as a standard clock? Experiments show that, in most cases, atomic clocks can be
treated as standard clocks to a high degree of accuracy, and even the clocks in our
daily life provide a good approximation. However, any real clock will deviate sub-
stantially from a standard clock in certain special cases [see Misner et al. (1973)
pp. 393–395; Rindler (1982) p. 31]. For example, a pendulum clock, the mechanism
of which highly depends on the gravitational acceleration of Earth, will become
completely useless in a spaceship far away from the Earth. Nonetheless, this only
affects the choice of a clock in an experiment and is thus irrelevant to our theoretical
discussions. In theory, all we need is the concept of a standard clock.

Remark 5 Later, when we talk about a world line, we always assume that we are using
the proper time τ as the parameter. Since the proper time is equal to the arc length of
the world line, the length of its tangent vector (∂/∂τ )a is 1 (see the paragraph before
Definition 7 in Sect. 2.5). Thus, one should interpret an observer as a timelike curve
with a unit tangent vector field.

Remark 6 Photons do not have the notion of proper time (the length of a null curve
vanishes), and therefore cannot serve as observers.
Suppose x 0 is the timelike coordinate of a coordinate system [i.e., ηab (∂/∂ x 0 )a
(∂/∂ x 0 )b < 0], and x 1 , x 2 , x 3 are spacelike coordinates [i.e., ηab (∂/∂ x i )a (∂/∂ x i )b >
0, i = 1, 2, 3], then the value of x 0 for any point p in the coordinate patch is called
the coordinate time of an event p in this system. The coordinate time for an inertial
reference frame is called an inertial coordinate time, whose domain is the whole
R4 . One should pay close attention to the following two differences between the
coordinate time and the proper time:
① Proper time only makes sense in relation to the points on the world line, and
so without a world line one cannot talk about proper time. If two world lines L 1 and
L 2 intersect at p, then p’s proper time on L 1 can be different from its proper time on
172 6 Special Relativity

L 2 . In contrast, coordinate time does not depend on a world line. As long as p is a


point in the coordinate patch, we can talk unambiguously about its coordinate time
in this system.
② The same spacetime point p can have different coordinate times in different
coordinate systems, while the proper time of an observer at p is independent of the
coordinate system.
The following proposition provides the relation between the proper time and
inertial coordinate time on a timelike curve.
Proposition 6.1.1 Suppose L(τ ) is the world line of a point mass, τ is the proper
time, and t is the coordinate time of an inertial frame R then

dt
= γu , (6.1.7)

where γu ≡ (1 − u 2 )−1/2 , with u being the speed of the point mass relative to R.

Proof Again, we use Fig. 6.2. It follows from dτ = −ds 2 and (6.1.3) that dτ 2 =
(1 − u 2 )dt 2 , and hence we have (6.1.7). 

If L(τ ) is a t-coordinate line in the inertial frame R, then from (6.1.2) we see that
u = 0, and hence it follows from (6.1.7) that dt = dτ . Thus, the coordinate time for
an inertial observer in their inertial frame is equal to their proper time.

6.1.5 Spacetime Diagrams

Diagrams are used frequently in the study of motion. The diagrams people usually use
to show spatial trajectories are spatial diagrams. For example, the spatial trajectory
for a projectile is a parabola. This kind of diagram does not have time involved, and
cannot reflect at which point the object is located at a certain moment of time. A
spacetime diagram, however, can overcome this drawback. It uses points to represent
events, and curves to represents the motion (evolution) of a particle in spacetime,
etc. If we only consider 1-dimensional motion, then we only need to draw a 2-
dimensional spacetime diagram. When drawing the diagram, one should choose an
arbitrary inertial frame R, and then draw a vertical axis pointing upward as its t-axis
(this axis represents the flow of time), and a horizontal axis as its x-axis (see Fig. 6.4).
All kinds of particles moving along the x-axis can be represented by the curves in
the figure. For example, the t-axis represents the world line of the observer G 0 at
x = 0 in the frame R, another vertical line in the picture represents the observer G 1
at x = x1 in the frame R (vertical indicates that the observer is at rest relative to
frame R), while the dashed line in the figure represents the world line of a photon.
For any given moment tˆ, we have a point (tˆ, x̂) on the line, whose spatial coordinate
x̂ reflects the position of the photon at tˆ. What does the tilted line G 0 in Fig. 6.4 stand
for? Since it is tilted, its x-coordinate will change linearly with the t-coordinate. It
6.1 Foundations of the 4-Dimensional Formulation 173

Fig. 6.4 A 2-dimensional


spacetime diagram based on
the reference frame R

Fig. 6.5 x  - and t  -axes lay


on two different sides of the
45◦ line

follows from (6.1.2) that its speed relative to R is a constant less than 1, and thus it
is a point mass experiencing inertial motion.
Actually, based on the fact that G 0 is a timelike straight line (geodesic), we can
directly tell that G 0 is an inertial observer. Also, from the fact that G 0 passes through
the origin we can see that it is the observer at x  = 0 in the frame R  obtained by
the transformation (6.1.5), i.e., G 0 is the t  -coordinate of R  . This conclusion can
also be verified another way: plugging x  = 0 into the Lorentz transformation (6.1.5)
yields t = x/v, and thus the t  -axis is a straight line passing through the origin, with
a slope 1/v. How do we draw the x  -axis of the frame R  ? The x  -axis satisfies
t  = 0, and plugging this into (6.1.5) yields t = vx, and thus the x  -axis is a straight
line passing through the origin that has a slope v; the dashed line bisects the angle
between the x  -axis and the t  -axis (see Fig. 6.5). One may ask: does this indicate
that the x  -axis and t  -axis are not orthogonal, and therefore R is preferred over
R  ? This actually is the “deception of the spacetime diagram” that comes from our
Euclidean way of thinking. In fact, noticing that {t  , x  , y  , z  } is also a Lorentzian
system, naturally we have ηab (∂/∂t  )a (∂/∂ x  )b = 0, i.e., (∂/∂t  )a and (∂/∂ x  )b are
orthogonal measured by the Minkowski metric. So R and R  are still on an equal
footing. Indeed, when drawing a picture, we usually choose a reference frame first,
and set their t-axis and x-axis to be, respectively, vertical and horizontal; however,
the choice of this reference frame is totally arbitrary. For instance, if we choose R 
first, then the spacetime diagram will look like Fig. 6.6, which seems to be different
from Fig. 6.5, but essentially they are the same.
The “deception” of a spacetime diagram is not only manifested in the orthogonal-
ity, but also in the judgement of length. Suppose p = (t, x) is an arbitrary spacetime
point, op is the straight line segment betweeno and p (see Fig. 6.7), whose length
measured by the Minkowski metric is lop = | − t 2 + x 2 |. Thus, the straight line
segment between o and each point on the hyperbola −t 2 + x 2 = K (constant) has the
same length, e.g., lop = loq , even though intuitively (i.e., according to the Euclidean
174 6 Special Relativity

Fig. 6.6 The spacetime


diagram based on frame R  ,
equivalent to Fig. 6.5

Fig. 6.7 The length of the


line between o and a point on
the hyperbola is a constant

Fig. 6.8 The Earth’s world


sheet (with one dimension
suppressed) and the
satellite’s world line

metric) their lengths are not the same in the diagram. The hyperbola in Fig. 6.7 is
called a calibration curve.
If the physical phenomenon also involves the second and the third spatial dimen-
sions, then the 2-dimensional spacetime diagram will not be enough. However, even
if we draw in perspective we can only represent three dimensions on a piece of paper,
and since we need one dimension to represent time, there are only two dimensions
left; therefore, one spatial dimension cannot be represented on paper (and has to be
“suppressed” in the diagram).
Luckily, in many cases, there will be one dimension (or even two) that are not
important, or there exists some kind of symmetry that allows us to suppress one
dimension without losing anything useful. Take an artificial satellite rotating around
the Earth as an example (see Fig. 6.8). The surface of the Earth is a 2-dimensional
6.1 Foundations of the 4-Dimensional Formulation 175

Fig. 6.9 A surface of


simultaneity of an inertial
frame R

sphere; however, one dimension is suppressed when we draw the diagram, so at


each moment the surface of the Earth is represented by a circle (C in the figure).
Approximately, the Earth can be considered as undergoing inertial motion, which we
can exploit to draw a diagram. The world line of each point on the ground (such as
Beijing or New York) is a vertical line, all of which together form a cylinder, called
the world sheet of the Earth’s surface. The helix in the figure represents the world
line of the satellite, the slope of which reflects the speed of its rotation.
Suppose R is an inertial reference frame in Minkowski spacetime, then each point
on a 3-dimensional plane (hyperplane) t that is orthogonal to all the observers has
the same t coordinate, and thus t is called a surface of simultaneity of the frame R
(see Fig. 6.9), which represents the “whole space” for R at t. Suppose C is a curve
on t , then any line segment will have dt vanish, and hence the Minkowski line
element ds 2 = −dt 2 + dx 2 + dy 2 + dz 2 will induce a spatial line element dŝ 2 =
dx 2 + dy 2 + dz 2 , namely the Euclidean line element. Thus, the space at anytime
for an inertial frame R is a 3-dimensional Euclidean space; this is exactly what
is assumed in the 3-dimensional formulation of special relativity. If we change to
another inertial frame R  , since the world lines of its observers are not perpendicular
to the world lines of the observers in R, the surfaces of simultaneity of R  are
certainly different from the surfaces of simultaneity of R. This can be regarded as
the cause of the relativity of simultaneity.

6.1.6 Spacetime Structure: Special Relativity Versus


Pre-Relativity Physics

In pre-relativity physics, space and time are the most primary concepts that everyone
knows. From the historical perspective, the concept of space and time came first, and
then, after the birth of relativity, the concept of spacetime was gradually developed.
Many people would think spacetime is not difficult to understand since “it is nothing
but space and time”. However, in relativity spacetime itself is the most primary con-
cept, while space and time are relative notions derived from spacetime. By “derived”
we mean the notions of space and time only come from applying a “3 + 1” decom-
position to spacetime using a reference frame, and by “relative” we mean there exist
many different ways of 3 + 1 decomposition for a spacetime (Fig. 6.5 represents two
176 6 Special Relativity

Fig. 6.10 Surfaces of


absolute simultaneity in
pre-relativity physics

different ways of decomposition for Minkowski spacetime using the reference frames
R and R  ). From the viewpoint of 4-dimensional geometry, the difference between
the concepts of space and time in relativity and pre-relativity physics come from the
difference between their spacetime structures. Pre-relativity physics assumes that
the spacetime manifold is R4 , equipped with some intrinsic additional structures.
The first one is a smooth function t : R4 → R, called the absolute time, such that
R4 can be foliated into infinitely many slices. Each slice is a constant-t surface t
(a hypersurface in R4 , see Fig. 6.10), called a surface of absolute simultaneity,
with a 3-dimensional Euclidean metric, which represents the “whole 3-dimensional
space” at t (see Optional Reading 6.1.6 for details). All the points on the same t
represent the events happening simultaneously at different places, and points on dif-
ferent t represent the events happening at different times. The so-called “absolute
simultaneity” means that simultaneity holds in whatever reference frame, which is
obviously different from relativity. In special relativity, two simultaneous events in
one reference frame can be non-simultaneous in another reference frame. There are
only surfaces of relative simultaneity in special relativity. If we compare surfaces of
simultaneity to playing cards, then pre-relativity physics only contains one deck of
cards (which is independent of the reference frame, and thus is absolute) while spe-
cial relativity has infinitely many decks of cards (which depend on reference frames,
and thus are relative; each individual card represents the whole space at a given time
in a given reference frame). This is a significant difference between the spacetime
structures of these two kinds of theory.
Now we discuss the difference between these two kinds of spacetime structure
from the perspective of causality. Given an event p ∈ R4 , one can always write
R4 − { p} as the union of three nonintersecting subsets M1 , M2 and M3 , i.e., R4 −
{ p} = M1 ∪ M2 ∪ M3 , where

M1 ≡ {q ∈ R4 − { p}| there exists an observer that experiences q before p} ,


M2 ≡ {q ∈ R4 − { p}| there exists an observer that experiences p before q} ,
M3 ≡ {q ∈ R4 − { p}| there is no observer that experiences both p and q} .

Pre-relativity physics assumes that the subset M3 is the surface t of absolute simul-
taneity (with p being removed) passing through p, while M1 and M2 are respec-
tively the “upper half of R4 ” and the “lower half of R4 ” on different sides of t
(see Fig. 6.11). Their physical meanings are: if q ∈ M2 , then we say that the event
q happens in the future of p; if q ∈ M1 , then we say that q happens in the past of p.
However, in special relativity, since the world lines of observers are timelike curves,
M2 and M1 are respectively the subset enclosed by the future light cone surface (a
null hypersurface) of p and the past light cone surface of p (excluding the points
6.1 Foundations of the 4-Dimensional Formulation 177

Fig. 6.11 The spacetime structure of pre-relativity physics. The surface of absolute simultaneity
passing through p is a 3-dimensional surface, above and below which are the future and the past of
p

Fig. 6.12 The spacetime structure of special relativity. There is no surface of absolute simultaneity.
The future and past of p are much smaller than the corresponding subset in Fig. 6.11, while the
subset M3 that has no causal relation with p is much larger than the M3 in Fig. 6.11

Fig. 6.13 The timelike vector T a is future-directed, while T a is past-directed

on the surface), while M3 will be a lot “bigger” than the 3-dimensional submanifold
t in Fig. 6.11, which contains all the points that are not contained by M1 and M2
(including the points on the light cone surfaces), see Fig. 6.12.
Suppose q ∈ M2 , then the tangent vector T a of the geodesic from p to q at p
must be timelike. Similarly, if q  ∈ M1 , then the tangent vector T a of the geodesic
from p to q  at p must also be timelike. However, physically T a and T a are quite
different after all: T a is future-directed while T a is past-directed (see Fig. 6.13).
In relativity, T a and T a are called a future-directed timelike vector and a past-
directed timelike vector, respectively. A timelike vector at p is either future-directed
or past-directed. The (nonvanishing) future-directed and past-directed null vectors
can be defined similarly.
[Optional Reading 6.1.1]
It is instructive to compare the spacetime structures of special relativity and general
relativity with that of pre-relativity physics. According to general relativity, gravity in essence
is the wrapping of 4-dimensional spacetime (see Sect. 7.1). Special relativity is about the
physics when gravity is not present (or can be ignored), and thus the background spacetime
is (R4 , ηab ). General relativity is about the physics when there is gravity, the background
spacetime of which is an arbitrary (connected) 4-dimensional manifold M together with a
178 6 Special Relativity

curved metric field gab , i.e., (M, gab ). The background spacetime of pre-relativity physics
can be revisited by formulating Newton’s theory of gravity using the 4-dimensional geometric
language. Based on Newton’s theory of gravity, the gravitational field of the space can be
described by the gravitational potential φ, whose relation with the mass density μ satisfies
the Poisson equation
∇ 2 φ = 4π μ . (6.1.8)
[We have set the gravitational constant G = 1, as we adopt the geometrized unit system]. A
point mass that is not subjected to any forces other than gravity is called a free point mass.
A free point mass with unit mass obeys the following equation of motion:

d2 x i ∂φ
=− i , i = 1, 2, 3 , (6.1.9)
dt 2 ∂x

where t is the Newtonian absolute time, and x i are the spatial Galilean coordinates (i.e., the
Cartesian coordinates in mathematics). After the initial conditions are given, the solutions
for x i (t) in (6.1.9) can be viewed as the parametric representation of a curve in space with t
as the parameter, which represents the spatial trajectory of the mass point. For instance, the
trajectory of a point mass projectile near the ground is a parabola. Cartan et al. reformulated
the facts above using the geometric language, the key points are as follows [also see Misner
et al. (1973), Chap. 12]:
The background spacetime of Newton’s theory of gravity is called Newtonian spacetime,
which is formed by a manifold R4 and the following additional structures: (a) there exists
a smooth function t : R4 → R4 , called the absolute time, satisfying certain conditions; (b)
there exists a derivative operator ∇a on R4 , whose Christoffel symbols in a given coordinate
system {x μ } satisfy

∂f μ
i
00 = , i = 1, 2, 3 ( f is a function on R4 ), the other νσ = 0. (6.1.10)
∂xi
From these two points, one finds the following:
(1) The existence of the absolute time provides an absolute “stratification” to the spacetime
manifold R4 : ∀ p ∈ R4 , there exists a constant-t surface t (a hypersurface in R4 ) such
that p ∈ t (see Fig. 6.10), which represents the “whole 3-dimensional space” at t, called a
surface of absolute simultaneity. Events p and q are said to be simultaneous if t ( p) = t (q).
Suppose γ (λ) is an arbitrary geodesic in Newtonian spacetime (λ is an affine parameter),
then its parametric representation x μ (λ) under a coordinate system satisfying (6.1.10) obeys
the following equations:

d2 x μ μ dx ν dx σ
+ νσ = 0, μ = 0, 1, 2, 3 . (6.1.11)
dλ2 dλ dλ
Let μ = 0, it follows from 0
νσ = 0 (ν, σ = 0, 1, 2, 3) that 0 = d2 x 0 /dλ2 = d2 t/dλ2 , and
hence
t = αλ + β , α, β are constants. (6.1.12)
This equation indicates that the absolute time t can serve as the affine parameter of any
geodesic whose α = 0. Now let μ = i in (6.1.11), it follows from (6.1.12) and i 00 =
∂ f /∂ x i , i jk = 0 (i, j, k = 1, 2, 3) that

d2 x i ∂f
+ i = 0, i = 1, 2, 3 . (6.1.9 )
dt 2 ∂x
Comparing this with (6.1.9) we know that, as long as we interpret f as the gravitational
potential φ and interpret x i as the Galilean coordinates, then a geodesic with the absolute
6.2 Interesting Typical Effects 179

time t as its affine parameter in Newtonian spacetime corresponds to the world line of a free
point mass.
(3) Plugging (6.1.10) into (3.4.20 ) and (3.4.21), it is not difficult to find the components of
the Riemann tensor and Ricci tensor of ∇a as follows ( f has been changed to φ):

∂2φ
R0i0 j = −Ri00 j = , all the other Rμνρ σ = 0 , (6.1.13)
∂xi ∂x j

3
∂ ∂φ
R00 = = ∇ 2 φ = 4π μ , all the other Rμν = 0 . (6.1.14)
∂xi ∂xi
i=1

Equation (6.1.13) indicates that Newtonian spacetime is not flat. (In comparison, in Einstein’s
theory a spacetime with gravity is not flat either). However, the derivative operator ∇ˆ a
induced by ∇ on each surface t of simultaneity is flat. [(6.1.10) indicates that i jk = 0,
i, j, k = 1, 2, 3, and the corresponding 3-dimensional Riemann tensor vanishes]. This can
also be verified from another point of view: when the α in (6.1.12) vanishes, the geodesic
γ (λ) lies on the surface β of simultaneity at t = β; from (6.1.11), i jk = 0 and with t = β
we get d2 x i /dλ2 = 0, and thus

x i (λ) = α i λ + β i , α i , β i are constants. (6.1.15)


This is a set of linear equations, and, as long as we interpret x i as the Cartesian coordinates
of β , then (6.1.15) indicates that the geodesic on β is a straight line. In fact, one just
needs to define the Euclidean metric on β in terms of x i as follows:

δab = δi j (dx i )a (dx j )b ,

then the ordinary derivative operator ∂a of the system {x i } will naturally satisfy ∂a δbc = 0,
and the Christoffel symbols of ∂a in {x i } will certainly vanish, i.e.,
i
jk = 0, i, j, k = 1, 2, 3 .

Thus, ∂a is exactly the ∇ˆ a on β induced by ∇a . Therefore, each surface t of absolute


simultaneity is a 3-dimensional Euclidean space, and the Galilean coordinates used com-
monly in physics are exactly the Cartesian coordinates of this space. {t, x i } is also called a
4-dimensional Galilean coordinate system.
Here is a natural question to ask: can we define a metric on R4 that is associated with ∇a ?
The answer is negative: as long as gravity exists, the “metric” one finds must be degenerate
[the signature for the “metric” with upper indices is (0, +, +, +)]. This optional reading
provides an example for the physical application of a manifold without metric but with a
derivative operator (and thus with curvature defined).
[The End of Optional Reading 6.1.1]

6.2 Interesting Typical Effects

6.2.1 Length Contraction

In the 3-dimensional language, a point mass is a point in space, but in the 4-


dimensional language it is a timelike curve (world line) in spacetime. Similarly,
a ruler in the 3-dimensional language is a line segment in space, while in the
180 6 Special Relativity

Fig. 6.14 The world sheet


of the ruler is absolute; the
line segments oa and ob are
the 1-dimensional rulers
observed in R and R  at
t = 0 and t  = 0,
respectively

4-dimensional language it is a 2-dimensional surface (a world sheet, see Fig. 6.14)


formed by the world lines of all points on the ruler. So now the length of a ruler,
a concept crystal clear in the 3-dimensional language, would become vague in the
4-dimensional language: the length of which line segment is the length of the ruler?
Those who speak the 4-dimensional language know that a ruler is not a 1-dimensional
object; rather, it is 2-dimensional. This is an absolute object, which does not depend
on reference frames, coordinate systems, or observers. Why is a ruler a 1-dimensional
object in the traditional (3-dimensional) language? This is because people see things
from the perspective of their own reference frames, which are relative in the first
place. Since each surface t of simultaneity of an inertial frame R represents the
whole space at t, the intersection of t and the ruler naturally represents “the ruler
measured (seen) by an observer in the frame R at t”. For instance, the intersec-
tion oa of the surface of simultaneity at t = 0 and the world sheet of the ruler is
the ruler measured by R at t = 0. Suppose the difference of the spatial coordinates
between o and a are x, y and z, then the length of the ruler measured by R is
(x 2 + y 2 + z 2 )1/2 , i.e., the length of the line segment oa. [One can either say it
is the Euclidean arc length, as oa is a line lying on a 3-dimensional Euclidean space
(the surface of simultaneity), or one can say it is the Minkowski arc length, as oa
is a line in 4-dimensional Minkowski spacetime. The fact that t is a constant on the
surface of simultaneity assures the consistency of these two viewpoints]. However,
the simultaneity is relative, and the intersection (line segment ob) of the world sheet
of the same ruler and the surface of simultaneity of a frame R  at t  = 0 represents
“the ruler measured by an observer in the frame R  at t  = 0”; hence, the length
of the ruler should be the length of ob. Since oa and ob are two different line seg-
ments in spacetime, it is not surprising that they have different lengths. Therefore,
length contraction (also called Lorentz contraction) is not caused by elasticity or any
physical mechanism (there is no “contraction” at all), and the essential cause of it
is nothing but the fact that different surfaces of simultaneity for different reference
frames lead to different measurements of 1-dimensional rulers (which are relative),
although there is only one ruler (only one world sheet of the ruler). So it is certainly
6.2 Interesting Typical Effects 181

Fig. 6.15 Based on the


simultaneity lines t = tb and
t = 0 of R , this reference
frame regards that C  is the
clock which runs slower

not surprising that different 1-dimensional rulers have different lengths. In this way,
“length contraction” is just like the classic parable of the blind men and an elephant.1
Since the frame R treats the ruler as at rest while R  treats the ruler as moving,
the lengths of the line segments oa and ob are the rest length and “moving” length,
respectively. Now the only problem left is to compare loa and lob . Intuitively we
have lob > loa , so it seems that the moving ruler is longer! However, this is the
“deception” of the spacetime diagram again. From the calibration curve passing
through a we see that lob < loa , and thus the moving ruler is shorter. To find the
quantitative relation between them, we just need to compute the length of both
line segments. The arc length of a spacetime curve is an absolute quantity and is
independent of the coordinate system. For the convenience of comparison, we choose
the same coordinate system (the inertial frame that corresponds to R) to compute
both of them. Noticing that the coordinates of o in this system are (0, 0, 0, 0),
 from the
the Minkowski line element in this system we obtain loa = xa2 − 0 =
expression of 
xa , and lob = xb2 − tb2 . Also, it follows from (6.1.5) that the equation for the x  -axis
is t = vx, and hence tb = vxb . From Fig. 6.14 we can see that xb = xa , and plugging
these into the equations above yields lob = γ −1 xb = γ −1 xa = γ −1loa . This is the
well-known quantitative relation for length contraction.

6.2.2 Time Dilation

Consider two standard clocks C1 and C2 in an inertial frame R and a standard clock
C  in another inertial frame R  . the world lines of these three clocks are shown in
Fig. 6.15. From the viewpoint of R, the clocks C1 and C2 are at rest, while C  is
moving. At the beginning, C  and C coincide at event o, at which both clocks are
zeroed. After a while, C  will coincide with C2 at event b. From the fact that proper

1This is the story of a group of blind men who have never come across an elephant trying to
conceptualize the elephant by touching it; however, their understandings of an elephant turn out to
be completely different, since each of them touches a different part of the elephant’s body.
182 6 Special Relativity

Fig. 6.16 Based on the


simultaneity lines t  = tb and
t  = 0 of R  , this frame
regards that C2 is the clock
which runs slower

time is equal to the arc length of the world line, we can see that the reading of the
clock C  at b is equal to lob . Both C2 and C1 belong to the same frame R, and the
x-axis is a line of simultaneity of R, so since the reading of C1 at o is zero, the reading
of C2 at c should also be zero. Hence, the reading of C2 at b equals lcb = loa . Plotting
the calibration curve through p we see that lob < loa = lcb , and thus the frame R
regards C  (the moving clock) as running slower. However, from the viewpoint of
R  , the event o happens simultaneously with d (see Fig. 6.16) rather than c. Since
the reading of C2 is zero at c, it must have a reading δ > 0 at d. A short time later, C2
coincides with C  at b. Although the reading lob of C  is smaller than the reading lcb of
C2 (which is admitted by both frames), the observer in R  does not admit that C  runs
slower, because when C  is zeroed, judged by the line of simultaneity of R  , C2 has
already had a reading δ (C2 “jumped the gun”). Hence, one should subtract δ from
the reading lcb of C2 at b, and then compare with lob , i.e., the observer in R  thinks
that one should compare ldb and lob . From the calibration curve passing though b we
know that lob > loe = ldb , and hence R  regards C2 as running slower, and it is still
the moving clock that runs slower. Figure 6.17 is a 3-dimensional demonstration of
the discussion above, where (a) and (b) are the 3-dimensional perspectives of R and
R  , respectively. Thus, we can see again that the 3-dimensional perspective depends
on the reference frame; only spacetime diagrams and the 4-dimensional formulation
can be independent of the reference frame.

Fig. 6.17 The 3-dimensional perspectives of R and R 


6.2 Interesting Typical Effects 183

Following the derivation of the quantitative relation between the lengths of a rest
and a moving ruler, and noticing xb = vtb , it is not difficult to find from Fig. 6.15
that the quantitative relation between the time intervals of a rest and a moving clock
is lob = γ −1loa .
The discussion above clearly indicates that, just like nothing is really contracting
in the phenomenon of “length contraction”, none of the clocks really runs slower
in the phenomenon of “time dilation”. (They all have the rate of a standard clock,
i.e., the difference of the readings equals the arc length). It should be emphasized
that there are a variety of methods for “clock comparison” (comparing the readings
of standard clocks in different inertial frames), which can lead to different results.
Therefore, when talking about clock comparison, one must stipulate all the details
of the method beforehand. The method we just used in the phenomenon of “time
dilation” is standard, but it is only one of the various methods for clock comparison.
A feature of this method is that it involves three clocks C1 , C2 and C  , two of which
are in the same inertial frame and have been synchronized beforehand. Without C2 ,
one can still get lob < loa from Fig. 6.15; however, one cannot conclude that “the
observer who carries C1 measures that C2 runs slower”, because there is no way
for that observer to measure it directly since the event b is not on the world line of
C1 . The only way for C1 to observe b is to receive a light signal (or other signals)
coming from b, which leads to the problem that the propagation of light takes time.
(One can certainly apply this method, but to do so this problem needs to be taken
into account). Actually, when we draw the conclusion that “C  is slower” by means
of C2 , we have already cleverly let the light signal play the role of a “messenger”,
since a light signal has been used when synchronizing C1 and C2 (see Sect. 6.1.4).
In summary, without C2 , one cannot compare C1 and C  using the method above; in
other words, this method of clock comparison will not have any physical meaning
without the third clock.
When there are only two clocks C and C  , there are still methods of clock compar-
ison that are physically meaningful. For example, as shown in Fig. 6.18, the observer
G that carries the clock C can compare the clocks by looking at two clocks using the
left and right eyes respectively at a time a. “Looking at C  using the right eye” means
that the light signal sent from C  at e is received by the right eye at a (the photon
goes from e to a through a future-directed null geodesic). If both clocks are zeroed
at o, then the reading of C at a equals loa , and the reading of C  at e equals loe . Since
loe < loa , the observer G will also conclude that “the moving clock runs slower”,
but the difference is that this method will make the moving clock even slower when
compared with the method in Fig. 6.15. To compute how much the clock slows down,
one can make a parallel line passing though e and intersect with the world line of
C at f (see Fig. 6.19). Let τ ≡ loa , τ  ≡ loe , p = lo f , q ≡ l f a , then le f = q. From
p = γ τ  (the quantitative relation for the regular time dilation) and u = q/ p, the
geometric expression for the relative speed of the two clocks (C regards C  as moved
for a distance q during a time p), we can easily find that2

2 This is the relativistic Doppler relation. It is valid for both positive and negative u, giving respec-

tively red and blue shift (see Sect. 6.6.6).


184 6 Special Relativity

Fig. 6.18 G looks at C from


the left eye and C  from the
right eye at a, and finds that
the moving clock C  runs
slower. The observation is a
“redshift” since the clock is
moving away

Fig. 6.19 The simple


geometric method for finding
the relation between τ and τ 

Fig. 6.20 G looks at C from


the left eye and C  from the
right eye at a, and finds that
the moving clock C  runs
faster. The observation is a
“blueshift” since the clock is
moving towards the observer


τ = (1 − u)/(1 + u)τ . (6.2.1)

One can even come up with a method of clock comparison (see Fig. 6.20) that
leads to the result that “the moving clock runs faster”! Suppose C and C  are both
zeroed at o, then an observer G who uses two eyes to observe C and C  respectively
at a will get negative readings from both clocks. From the figure it is easy to see
that loa < loe , and hence the reading of C  is even more negative than that of C.
Thus, G will think “the moving clock runs faster”. Though it sounds ridiculous, this
conclusion is beyond reproach: it is nothing but a result coming from the specific
method of clock comparison in Fig. 6.20. Therefore, to compare clocks we must
indicate all the details of the method beforehand, for which a spacetime diagram
would be very helpful.
6.2 Interesting Typical Effects 185

Fig. 6.21 The spacetime


diagram for Example 1

The examples above only involve 1-dimensional space. Here is an example in


2-dimensional space [Guo (2008) p. 235, Problem 5].
Example 1 A light source S and a receiver G are at rest in an inertial frame R. The
distance between them is l. Immerse the S − G apparatus in a homogeneous infinite
liquid medium (whose rest refractive index is n). Suppose the liquid is flowing with
a speed u relative to R along a direction that is perpendicular to the line connecting
S and G. Find the time t for a light signal going from S to G (measured in R).

Solution Denote the inertial frame at rest relative to the liquid as R  . Draw a 3-
dimensional spacetime diagram based on the frame R  (see Fig. 6.21), where events
p1 and p2 represent a photon coming out of S and arriving at G, respectively, and
hence the line segment p1 p2 represents the propagation of that photon. Viewed from
R and R  , the times this propagation takes are, respectively, the Minkowski length
of the line segments p3 p2 and p4 p2 (the world line S can be used as the t-axis for
the frame R). The length of p1 p3 is obviously l. Using σ and α to represent the
length of p1 p4 and p3 p4 , we can easily see that

t = γ −1 t  , where γ ≡ (1 − u 2 )−1/2 (the phenomenon of “time dilation”),


α
u= (in R  , G moved for a distance α in t  with a speed u),
t 
σ 2 = (l)2 + α 2 (the Pythagorean theorem in 3-dimensional Euclidean space),
1 σ
= (the speed of light is isotropic in a medium at rest, whose value is 1/n).
n t 
Solving the equations above yields

1 − u2
t = l .
n −2 − u 2 
186 6 Special Relativity

Fig. 6.22 The twin


“paradox”

6.2.3 The Twin “Paradox”

Figure 6.22a is the spacetime diagram for the twin “paradox”, where the two curves
are the world lines of the twin brothers A and B. The curve A is a vertical line
indicating that A stays at home (as an inertial observer), while the curve B is a
non-geodesic indicating that B goes for a journey in space and returns. Suppose
the brothers are of the same age when they separate, would they be the same age
when they meet again? If not, then who is older? This is nothing but a question of
comparing the proper times of A and B between p and q, i.e., a question of comparing
the arc lengths l A and l B between p and q. Since the timelike geodesic is the longest
timelike curve between two points in Minkowski spacetime (see the paragraph above
Optional Reading 3.3), we have l A > l B , and thus B is younger than A when they
meet again. Figure 6.22b is the simplest example of the twin “paradox” (where the
world line of B is a broken line composed by two timelike geodesics); using the
quantitative relation for time dilation, it is easy to see that l A = γ l B > l B .
These are the essentials of the twin “paradox”, and the problem itself is just that
simple. However, due to a lack of deep understanding at the early stage of relativity
study, people used to consider this problem as a paradox. The argument regarding
the twin “paradox” even had an upsurge in 1957–1958 (though most physicists had
agreed that the problem had been solved long ago), and some papers were even
published in journals like Nature and Science. The representatives for the two sides
were physicist W. H. McCrea and physicist and natural philosopher H. Dingle. Dingle
claimed that according to relativity everything is relative, and thus the twins should
be the same age when they meet again. McCrea, however, pointed out shrewdly that
it is not true that everything is relative in relativity; it is the fact that the twin brother
B has an acceleration while A does not have one which leads to the result of the age
difference. As the study went deeper, especially after the geometric language became
widely used, physicists have already reached a consensus on the twin “paradox”,
which is exactly what we have shown at the beginning of this subsection [see, for
example, Sachs and Wu (1977) p. 42–43; Wald (1977) pp. 25–26; Misner et al. (1973)
p. 167]. It should be particularly emphasized that one should not have the idea that
“everything is relative in relativity” based solely on the name of the theory, as this is
a critical misunderstanding!
6.2 Interesting Typical Effects 187

The twin “paradox” was verified experimentally in 1971 using cesium atomic
clocks, not humans, of course; the reader may refer to Hafele and Keating (1972a;
1972b) for more information and look at Exercise 6.10.
Now, we answer a few frequently asked question regarding the twin “paradox”.
Q: In the phenomenon of time dilation, the two observers are on an equal footing:
A thinks B’s clock runs slower, and B thinks A’s clock runs slower. Why in the twin
“paradox” are A and B not on an equal footing (everyone thinks B is younger than
A)?
A: The premise for these two phenomena are different. In the phenomenon of
time dilation, both observers are undergoing inertial motion; since inertial frames
are on an equal footing, the result for both of them are certainly the same. However,
in the twin “paradox”, one of the brothers is experiencing a non-inertial motion (the
world line is not a geodesic), otherwise they will not meet again after they have
separated. The premise implies an inequality between the two observers, and thus
the conclusion is also one-sided.
Q: The conclusion of the twin “paradox” is that the accelerating brother is younger.
However, since acceleration is relative, B accelerating relative to A means that A is
accelerating relative to B. In this way, wouldn’t B also think A is younger?
A: One needs to distinguish the 3-dimensional and 4-dimensional accelerations
(namely 3-acceleration and 4-acceleration, see Sect. 6.3), the former of which is rel-
ative, while the latter of which is absolute (independent of the choice of the observer,
reference frame, coordinate system, etc.). On the other hand, the concept of inertial
motion and non-inertial motion are both absolute: a point mass undergoes inertial
motion if and only if its world line is a geodesic (independent of the reference
frame!). When we consider the term “accelerated motion” as a synonym of “non-
inertial motion”, it is supposed to be understood as the 4-acceleration. If one uses
the word “acceleration” to describe the twin “paradox”, then one should say “the
brother with a 4-acceleration is younger”. No observer would say that A has a 4-
acceleration, and there is no issue anymore. It is already a convention for physicists
that in the 3-dimensional language, when talking about acceleration without spec-
ifying a reference frame, it is always assumed to be relative to an inertial frame.
Under this agreement, expressions like “the brother experiencing accelerated motion
is younger” and “an electric charge only radiates under accelerated motion” are both
correct.
Q: It is sometimes heard that the twin “paradox” is inside the scope of general
relativity, and thus cannot be interpreted by just using special relativity. Is that right?
A: No. Didn’t we just interpret it clearly in the first paragraph of this subsection?
The misconception of some people that the twin “paradox” is related to general
relativity may occur when they choose the coordinate system corresponding to the
reference frame of B for calculating the time experienced by B. This is a non-
inertial frame, and some people may think general relativity is involved as long
as we talk about non-inertial frames. The explanations of this misunderstanding
are the following: ① The time experienced by B is the length of his world line,
which is a geometric quantity independent of the coordinate system, and hence it
is not necessary at all to trouble yourself by choosing a non-inertial frame for the
188 6 Special Relativity

calculation. ② At the very least, even if you insist on using a non-inertial frame for
the calculation, there is no need to use general relativity at all. We should specify
the division criteria for special versus general relativity. At first, people thought
they could use coordinate systems as the criterion, and it would be considered to
be in the scope of general relativity as long as a non-inertial frame is involved.
Later, however, people realized that it is much more natural (and elegant) to use the
absolute spacetime geometry (which is independent of any human choice) as the
criterion. Therefore, now we shall have the following standard: any physics problem
that has Minkowski spacetime as its background is in the scope of special relativity,
while general relativity must be used when spacetime is curved (see Chap. 7). When
discussing any physics problem, an important but often ignored step is to identify
the spacetime background beforehand, i.e., to specify in what spacetime the physical
phenomenon happens. The premise for the twin “paradox” is that the whole process
happens in Minkowski spacetime, and thus is in the realm of special relativity (unless
one stipulates that the background spacetime is not Minkowski, which means the
gravitational field is not negligible, see Chap. 7). Unfortunately, some people went
even further and mistakenly thought that accelerated motion could lead to curved
spacetime and so general relativity must be involved. (Some may conclude that the
spacetime is curved just based on the fact that the Christoffel symbols μ νσ do not
all vanish in a non-inertial coordinate system, but do not realize that it is absolutely
normal to find the μ νσ of a Minkowski metric to be nonvanishing in a non-inertial
frame). Another famous example similar to this is Einstein’s rotating disk, which is
sometimes also misunderstood as a problem involving general relativity. The premise
of this problem is actually also that the whole phenomenon (including the motions of
the disk and the observers on it) takes place in Minkowski spacetime, and therefore
it is also within the scope of special relativity. The best way to clearly analyze the
problem of Einstein’s rotating disk is also by using the geometric language; however,
it is much more complex than the twin “paradox”, see Sect. 14.2 (Volume II) for
details.

6.2.4 The Garage “Paradox”

Suppose a car has the same rest length as a garage. When driving the car into the
garage, the driver thinks: “the moving garage becomes shorter, which will not be large
enough to fit the car.” However, the doorman of the garage thinks: “the moving car has
shrunk, and the garage will be more than enough to fit the car.” Is the driver correct? Is
the doorman correct? It will be crystal clear in the 4-dimensional geometric language.
To make the problem simple and more specific, we assume that the garage dose not
have a back wall (its “back wall” will be just a line on the ground). Figure 6.23 is the
spacetime diagram of the car coming into the garage at a uniform speed (one can use
a calibration curve to make sure the car and the garage have the same rest length in
the diagram). It is easy to see from the diagram that, measured by the inertial frame
of the doorman, the garage is longer than the car and has enough room for the car;
6.3 Kinematics and Dynamics of a Point Mass 189

Fig. 6.23 The spacetime


diagram for the garage
“paradox” (without a real
back wall)

measured by the inertial frame of the driver, however, the car is longer than the garage
and cannot be fit into the garage. The viewpoints of them are both correct, and the
divergence of their conclusion comes from the relativity of simultaneity. Actually,
a question like “can the car be fit in or not?” is not well-defined and should not be
raised. Since the conclusion is relative, this absolute type of question is meaningless,
just like in the phenomenon of “length contraction” one cannot ask “which ruler is
longer?” The case where the garage has a hard back wall is a little more complicated,
the basic principle is: due to relativity, any information cannot propagate faster than
the speed of light, and so the information that the head of the car crashes into wall
(and stops moving) takes time to propagate to the tail of the car; the tail of the car will
start to decelerate and come to a stop after “receiving” this information. Therefore,
the car would be physically compressed into a length that can be fit into the garage
(no matter from whose perspective). Motivated readers should draw a spacetime
diagram that roughly describes the whole process and finish Exercise 6.11.

6.3 Kinematics and Dynamics of a Point Mass

Due to the importance and the subtleties of concepts like momentum, energy and
mass in special relativity, it is necessary to first review some relevant issues.
The principle of relativity requires that the laws of physics have the same form
in all inertial frames. The transformation between reference frames in Newtonian
mechanics is a Galilean transformation, while in special relativity it is a Lorentz
transformation. Hence, in Newtonian mechanics the principle of relativity requires
the mathematical expressions for the laws of physics to be invariant under Galilean
transformations (called Galilean covariance), while in special relativity it requires
the mathematical expressions for the laws of physics to be invariant under Lorentz
transformations (called Lorentz covariance). This principle is very powerful in that
it is a “law of laws”, which means any law that does not have Lorentz covariance
must be modified in order to be fit into special relativity. One notable example is
the law of conservation of momentum. In Newtonian mechanics, the momentum
of a point mass is defined as the product of the mass m and the velocity u, i.e.,
p := m u, and the corresponding force is defined by the time rate of change of the
190 6 Special Relativity

Fig. 6.24 A perfectly


inelastic collision of two
identical balls

point mass’s momentum, i.e., f := d p/dt; plugging in the definition of p yields


f = mdu/dt = m a. Thus, even though f = m a is called Newton’s second law, it is
just a definition of a force; only when combined with the expression for a force under
a specific physics circumstance can it provide a law of physics. For instance, in the
circumstance of a spring, combined with f = −K x it yields d p/dt = −K x, which
is Hooke’s law. Now let us consider the collision of two balls in terms of the principle
of relativity. Use p1 and f 1 to represent, respectively, the momentum of ball 1 and
the force acting on it (from ball 2), then f 1 = d p1 /dt, and similarly f 2 = d p2 /dt.
Newton’s third law guarantees that f 1 = − f 2 , and hence d( p1 + p2 )/dt = 0, i.e.,
momentum is conserved in the collision. Thus, the conservation of momentum is
the result of combining the definition of a force with Newton’s third law. If we
observe the same collision process from another inertial frame, then according to
the velocity transformation formula derived from the Galilean transformation it is
not difficult to see that the momentum is still conserved, and so the conservation of
momentum is Galilean covariant, which satisfies the principle of relativity. However,
if one still uses the Newtonian definition of momentum in special relativity, i.e.,
p := m u (m = constant), then the following simple example is sufficient to show
that the conservation of this momentum is not Lorentz covariant. Consider a perfectly
inelastic collision of two identical balls. Suppose the velocities of these two balls in
the frame R  are equal in magnitude and opposite in direction before the collision
(and thus the total momentum is zero), then from the symmetry we can see that
the velocities of both balls will be zero (see Fig. 6.24), which indicates that the
momentum is conserved in R  in the collision process. Now we observe this process
in the frame R. Suppose ball 2 is at rest relative to R, then the velocity of R  relative
to R is equal to the velocity v of ball 1 relative to R  before the collision. From the
relativistic velocity transformation formula (which can be found in any textbook on
special relativity) we know that the velocity of ball 1 in the frame R is (for now we
keep the speed of light explicit, rather than set c = 1)

v+v 2v
u= = . (6.3.1)
1 + v /c
2 2 1 + v 2 /c2

Suppose the Newtonian mass for both of the balls is m, then the total momenta
of the balls in R before and after the collision are, respectively,

2mv
initial total momentum (magnitude) = mu + 0 = ,
1 + v 2 /c2
final total momentum (magnitude) = 2mv.
6.3 Kinematics and Dynamics of a Point Mass 191

(The conservation of Newtonian mass is used in the second line). The total momenta
before and after the collision are not the same, and thus the momentum is not con-
served in the frame R. This indicates that now the conservation of momentum is not
Lorentz covariant, and hence is not a law. Now we have two choices: either give up
on the conservation of momentum or render the conservation of momentum Lorentz
covariant by modifying the definitions of mass and momentum. In consideration of
the significance of the laws of conservation in physics, we certainly would like to
choose the latter one. To get an idea how to modify them, let us consider the fol-
lowing: suppose a point mass is accelerated by a constant force. Based on Newton’s
second law, its speed must eventually exceed the speed of light at some point in
time, which contradicts special relativity. In order to get rid of this inconsistency, it
would be reasonable to suspect that the mass of a point mass increases with its speed
in relativity. In this way, the acceleration of the point mass under a constant force
would become smaller and smaller, and so it is possible that its speed will never
reach the speed of light. Therefore, we can suggest the following modification: we
still define momentum as mass times velocity; however, the mass, now denoted by
m u (called the relativistic mass), is no longer a constant but depends on the speed
u. Now based on this idea let us reconsider the conservation of momentum in the
frame R in Fig. 6.24. Since ball 2 is at rest before the collision while ball 1 moves
with velocity u, their relativistic masses are, respectively, m 0 (called the rest mass)
and m u . Hence,

2m u v
initial total momentum (magnitude) = m u u + 0 = , (6.3.2)
1 + v 2 /c2
final total momentum (magnitude) = Mv v , (6.3.3)

where Mv represents the total mass of the combined body after the collision. Assume
that the total mass is invariant in the collision, i.e., m u + m 0 = Mv . (This is a very
natural assumption, the meaning of which will be explained later). Then (6.3.3)
becomes
final total momentum (magnitude) = (m u + m 0 )v . (6.3.4)

Comparing (6.3.2) and (6.3.4) we can see that, in order to let the conservation of
momentum hold in the frame R, we only have to require that

1 + v 2 /c2
mu = m0 . (6.3.5)
1 − v 2 /c2

Also, a simple calculation starting from (6.3.1) shows that

 1 − v 2 /c2
1 − u 2 /c2 = , (6.3.6)
1 + v 2 /c2
192 6 Special Relativity

and comparing this with (6.3.5) yields


m0
mu =  . (6.3.7)
1 − u 2 /c2

Thus, for the collision shown in Fig. 6.24, we can only guarantee that momentum
is conserved in R if we allow m u to change with the speed u according to (6.3.7).
From this, we extrapolate that the momentum of a point mass in special relativity
should be defined as
p := m u u [where m u is given by (6.3.7)] . (6.3.8)

Usually we denote γu ≡ (1 − u 2 /c2 )−1/2 , and hence the momentum can also be
expressed as
p = γu m 0 u , or, for short, p = γ m 0 u = m u u . (6.3.9)

Now that we have the definition of momentum, we can define force. In relativity, the
force f acting on a point mass is still defined by the time rate of change of the point
mass’s momentum:
dp
f := . (6.3.10)
dt
The principle of relativity requires the above equation to be Lorentz covariant, which
determines the transformation law of forces between inertial frames (see textbooks
on special relativity for details).
Now we introduce the definition of energy. First, following Newtonian mechanics,
we define the kinetic energy E k of a point mass using the following two requirements:
① E k = 0 when the point mass is at rest (u = 0), ② the time rate of change of the
kinetic energy equals the power f · u, from which we obtain

dE k dp d(m u u) du dm u du dm u
= f ·u = ·u =u· = mu u · +u·u = mu u + u2 ,
dt dt dt dt dt dt dt
(6.3.11)
where dm u /dt can also be expressed using (6.3.1) as
 
dm u d cm 0 m u u du
= √ = . (6.3.12)
dt dt c2 − u 2 c2− u 2 dt

Plugging this into (6.3.11) yields

dE k dm u dm u dm u
= (c2 − u 2 ) + u2 = c2 . (6.3.13)
dt dt dt dt
Noticing that m u = m 0 and E k = 0 when u = 0, by integrating over the above equa-
tion we find the kinetic energy at the speed u is
 mu
E k (u) = c2 dm = m u c2 − m 0 c2 . (6.3.14)
m0
6.3 Kinematics and Dynamics of a Point Mass 193

Albert Einstein boldly claimed that the m u c2 on the right-hand side of the equation
above to be the (total) energy of the point mass at the speed u (denoted by E = mc2 ,
where m is short for m u ). Thus, m 0 c2 is the mass when the point mass is at rest
(denoted by E 0 = m 0 c2 , called the rest energy of the point mass), and the kinetic
energy is the difference of the total energy and rest energy. E = mc2 indicates that the
energy E is proportional to the mass m (by which we mean the relativistic mass m u ),
called the equivalence of mass and energy. In the geometrized unit system c = 1, and
hence E = m, i.e., energy is equal to mass, and E 0 = m 0 indicates that an object has
the same amount of energy as its rest mass even at rest. This is an incredibly huge
amount of energy: the energy of an object with m 0 = 1 g (which is about 1% of a
bag of instant noodles) is

m 0 c2 = 10−3 × (3 × 108 )2 = 9 × 1013 J ,

which is roughly the energy released by an atomic bomb!


In Newtonian mechanics, there are both the law of conservation of mass and
the law of conservation of energy. What about in special relativity? First of all, the
energy defined by E = mc2 (NB: m is short for m u ) satisfies the law of conservation
of energy. This fact should be regarded as a theoretical hypothesis, which has been
supported by numerous experiments. As for whether the mass is conserved, it depends
on which mass you are talking about. Since E = mc2 , the conservation of E also
implies the conservation of the relativistic mass m, and they are not independent.3
As to the rest mass, we should emphasize that it does not obey the conservation law.
For instance, suppose in a fission process an atomic nucleus at rest is split into two
pieces (both are moving). Use M, m 1 , m 2 to represent the relativistic mass of the the
nucleus and the two pieces, respectively, then from energy conservation we have

Mc2 = m 1 c2 + m 2 c2 . (6.3.15)

Before the fission the nucleus is at rest, the relativistic mass of which is equal to the
rest mass M0 . Use m 01 , m 02 , u 1 and u 2 to respectively represent the rest masses and
the velocities of the two pieces. Let γ1 ≡ (1 − u 21 /c2 )−1/2 , γ2 ≡ (1 − u 22 /c2 )−1/2 ,
then m 1 = γ1 m 01 , m 2 = γ2 m 02 . Hence, it follows from (6.3.15) that

M0 = γ1 m 01 + γ2 m 02 > m 01 + m 02 . (6.3.16)

Thus, the rest mass is not conserved! The difference m 0 = M0 − (m 01 + m 02 ) is


called the mass defect. In summary: in special relativity there are in total only two
laws of conservation regarding momentum, energy, rest mass and relativistic mass,

3 However, make sure not to think this “law of the conservation of mass” as the same as the one
in Newtonian mechanics. The former is a conservation law of a physical quantity (the relativistic
mass), while the latter reflects the following tenet of Newton: matter can neither be created nor
destroyed. From today’s vantage point, this tenet is not quite true, since matter can be “destroyed”
— it can be turned into radiation, even though the energy does not change. Thus, energy is conserved
while matter is not.
194 6 Special Relativity

namely the conservation of momentum and the conservation of energy. Before, we


used to assume m u + m 0 = Mv when we rewrote (6.3.3) into (6.3.4), and now we
see that this equation represents the conservation of energy. Thus, energy should be
assumed to be conserved when proving the Lorentz covariance of the momentum
p = γ m 0 u.
In the original formulation of special relativity all four of these concepts existed:
rest mass m 0 , rest energy E 0 , relativistic mass m (i.e., m u ) and total energy E.
However, the relations E = mc2 and E 0 = m 0 c2 indicate that among them there are
only two independent ones. In fact, the modern literature (except for popular science)
usually only keeps two concepts: mass and energy, where “mass” m refers to the rest
mass (since we only keep one mass, there is no need to add “rest” to it, and the
subscript “0” is also unnecessary), and “energy” refers to the total energy E. Now
the relationship between E and m is E = γ mc2 [where γ ≡ (1 − u 2 /c2 )−1/2 ]. In
this way, there is only the law of the conservation of energy, while there is no law
of the conservation of mass (notice that there is a mass defect). Later on, when we
talk about mass in this text, unless otherwise stated, we always refer to the rest mass,
and denote it by m (although we have used m for the relativistic mass before). Since
the geometrized unit system is adopted, we have E = γ m. Having experienced a
winding development course in the early days of special relativity, Einstein wrote in
1948 in a personal letter: “It is not good to introduce the mass M = m(1 − v 2 /c2 )−1/2
of a body for which no clear definition can be given, ...... It is better to introduce no
other mass concept other than the ‘rest mass’ m.”
The above is a review. From now on we go back to the geometrized unit system,
in which c = 1. Before we introduce the 4-dimensional formulation of particle kine-
matics and dynamics, it is necessary to lay out the major definitions and relations in
the 3-dimensional formulation as follows (except for the mass m and charge q which
does not depend on the observer, all the quantities are defined relative to the inertial
frame {t, x, y, z}):

3-velocity (short for 3 − dimensional velocity) of a point mass u := dr /dt ,


(6.3.17)
where the position vector r = i x + j y + kz .
3-acceleration of a point mass a := du/dt . (6.3.18)
2 −1/2
3-momentum of a point mass p := γ m u , γ ≡ (1 − u ) , u ≡ |u| .
(6.3.19)
Energy of a point mass E := γ m . (6.3.20)
3-force acting on a point mass f := d p/dt . (6.3.21)
The relation between the power of the 3 − force f and the energy of the point mass
f · u := dE/dt . (6.3.22)
3-force acting on a charged point mass in an electromagnetic field (Lorentz force)
f := q( E + u × B) , (6.3.23)
6.3 Kinematics and Dynamics of a Point Mass 195

where q is the electric charge of the point mass, u is the 3-velocity, E and B are the
electric field and magnetic field, respectively.

Remark 1 ① The γ here is short for γu ≡ (1 − u 2 )−1/2 , while the γ in the Lorentz
transformation (6.1.5) stands for (1 − v 2 )−1/2 , where v is the relative speed between
two inertial frames, and u is the velocity of a particle with respect to the chosen
inertial frame. ② The transformations of coordinate systems are frequently involved
in relativity, and therefore people often use the term “invariant”. Note that “invariant”
and “conserved quantity” are two different concepts. A conserved quantity is a
quantity whose value remains a constant (does not change with time) in a physical
process; an invariant is a quantity that does not change with human factors such as a
coordinate system, reference frame, or observer. The former emphasizes the physical
process, and the latter emphasizes the transformation of the coordinate system, etc.
Energy is a conserved quantity rather than an invariant; (rest) mass is an invariant
rather than a conserved quantity; the electric charge of a charged particle is both an
invariant and a conserved quantity.
This is the 3-dimensional formulation based on a specific inertial frame. Now we
will introduce the 4-dimensional formulation, as well as the relationship between the
3- and 4-dimensional languages.

Definition 1 The 4-dimensional velocity (4-velocity) U a of a point mass is the


tangent vector of the world line (parametrized by the proper time τ ) of the point
mass, i.e.,  a

U a := . (6.3.24)
∂τ

Proposition 6.3.1 Let Ua ≡ ηab U b , then U a Ua = −1.

Proof The proper time is the arc length parameter of a timelike curve, and the tangent
vector of a curve whose parameter is the arc length has unit length (see Sect. 2.5). 

Remark 2 The 4-velocity is not defined outside the world line.


To observe the motion of a mass point, one can choose an arbitrary reference
frame R. Suppose L(τ ) is the world line of the point mass, then for any point p
on L(τ ) there is always an observer G in R passing through it (see Fig. 6.25), and
so G can measure the event p. Let Z a and U a represent the 4-velocities of G and
L(τ ), respectively, at p. Physically, it is not difficult to understand that if Z a = U a ,
the observer G will think the point mass L is at rest at the time of p. Otherwise, the
observer G will think the point mass has some kind of velocity (the 3-velocity) at the
time of p. Before giving the definition of the 3-velocity of the point mass relative to
the observer G at p, first we would like to set the stage as follows.
Imagine you are the observer G. ① You can only make direct measurements
on the events happening on your world line. If an event happens outside your
world line, you certainly may hear it or see it (observe indirectly), but it involves
a signal propagating from this event to you, which takes some time and makes the
196 6 Special Relativity

Fig. 6.25 The observer G


and the point mass L
intersect at p, so G can
measure the event p

discussion a little complicated. (On the theoretical side, the shape of an object moving
at high speed is a problem in this category; on the practical side, all the astronomy
observation are indirect measurements). The simplest, clearest, and most basic kind
of measurement is a direct measurement, i.e., the measurement of an event happening
on the observer’s world line, also called a local measurement. Luckily, a reference
frame is formed by ubiquitous observers, and the events happening elsewhere can
just be measured by another observer. ② When you measure an event happening at a
point p on your world line, in many cases what is important is just the 4-velocity at
p but not the whole world line. Then, there is no need to emphasize the world line of
the observer, and one only needs to know the tangent vector Z a of this world line at
p. Hence, we can extract a more abstract concept, called an instantaneous observer
[see Sachs and Wu (1977)], which contains two key elements, namely the point p and
a (future-directed) timelike unit vector Z a at p, together denoted by ( p, Z a ). ③ You,
as an observer, have a sense of spatial direction besides a sense of time (from your
standard clock). Assume you hold an arrow in your hand, and any direction it points
to represents a spatial direction you can perceive. The collection of all the directions
you can perceive at p (a point on your world line G) is of course a 3-dimensional set
W p , while for p as a point in R4 , its tangent space V p is 4-dimensional. What is the
relationship between W p and V p ? First let us consider the simplest case. Suppose
you are an inertial observer in an inertial frame R. The surface of simultaneity of R
is the 3-dimensional space of R at a certain time, which is orthogonal to the world
lines of all the inertial observers in this frame, and thus all the spatial vectors you
have at p are orthogonal to your 4-velocity Z a at p. Therefore, W p corresponds to
the 3-dimensional subspace of V p orthogonal to Z a , i.e.,

W p = {wa ∈ V p | ηab wa Z b = 0} .

This correspondence also applies to non-inertial observers, since we only care about
the situation at one point p on the world line of the observer.
In Fig. 6.26, W p is represented by as a small plane, but it is actually an “infinitesi-
mal” plane. The most precise interpretation of W p is a subspace of the tangent space
at p, which in the figure can only be drawn as a small plane. Suppose wa ∈ V p , when
wa ∈ W p , we say that wa is a spatial vector for the observer G. A (nonzero) spatial
vector must be a spacelike vector, but the converse is not true. From the definition
6.3 Kinematics and Dynamics of a Point Mass 197

Fig. 6.26 W p is the


3-dimensional subspace of
V p orthogonal to Z a , any
wa ∈ W p can be seen as a
spatial vector of G at p

Fig. 6.27 The observer G


measures that the spatial
displacement of the point
mass L during a time period
dt is (∂/∂ x i )a dx i , and thus
the 3-velocity should be
defined by (6.3.27)

we can see that a spacelike vector is absolute (does not depended on factors such as
the observer, reference frame or coordinate system, etc.), while a spatial vector is
relative (depends on the 4-velocity Z a of the observer). It follows from (4.4.2) that
the induced metric of ηab on W p at p is h ab = ηab + Z a Z b , and from the paragraph
below (4.4.4) we know that h a b = δ a b + Z a Z b is the projection map from V p to W p ,
i.e., h a b v b ∈ W p is the projection of v a ∈ V p onto W p .
Suppose the world line L of a point mass and the world line G of an observer
intersect at p, let us discuss the 3-velocity of L relative to G at p. First we discuss
the case where L(τ ) and G are both geodesics. Let U a and Z a be, respectively, the
4-velocities of L(τ ) and G at p (see Fig. 6.27), and {t, x i } be the coordinates of the
inertial frame that the inertial observer G belongs to. Then,
 a  a  a
∂ ∂ dt ∂ dx i
U =a
= + , (6.3.25)
∂τ ∂t dτ ∂xi dτ

which can also be expressed as


 a  a
∂ ∂
U dτ =
a
dt + dx i . (6.3.26)
∂t ∂xi

Suppose p = L(τ1 ), and let q ≡ L(τ1 + dτ ), then the geodesic segment pq repre-
sents the “infinitesimal” process of the point mass from the proper time τ1 to τ1 + dτ .
For the observer G, the time of this process would be dt in (6.3.26), and the corre-
sponding spatial displacement is (∂/∂ x i )a dx i . Hence, the 3-velocity of a point mass
L relative to G (also called the 3-velocity of L measured by G) should be defined
as
198 6 Special Relativity
 a  a
∂ dx i ∂ dx i /dτ
u :=
a
= . (6.3.27)
∂xi dt ∂xi dt/dτ
a
It follows from (6.3.25) that ∂∂x i dx
i


is the spatial projection of U a , i.e., h a b U b .
Now if we set γ ≡ dt/dτ , then (6.3.27) can be rewritten as

ha bU b
u a := . (6.3.28)
γ

The γ we just introduced (i.e., γ ≡ dt/dτ ) can also be expressed as

γ = −U a Z a , (6.3.29)

since −U a Z a = −ηab U a Z b = −ημν U μ (∂/∂t)ν = −η00 U 0 (∂/∂t)0 = U 0 = dt/dτ = γ .

Remark 3 ① It is easy to see that the 3-velocity u a is a spatial vector of the observer
G at p (and thus can be denoted by u). This is the most basic requirement for u a :
since the 3-velocity is a vector in the 3-dimensional language (called a 3-vector), of
course it should be a spatial vector. ② Although we have used a coordinate system in
the discussion above, the definition (6.3.28) of u a is independent of the coordinate
system. ③ Suppose R is an inertial reference frame that the inertial observer G
belongs to, then the u a in (6.3.28) is also called the 3-velocity of the point mass L
at p relative to R. Suppose {t, x i } is an arbitrary inertial coordinate system in R,
then it follows from (6.3.27) that the components of the 3-velocity in this system are
u i = dx i /dt. Note that the components of u defined by (6.3.17) are also u i = dx i /dt,
and thus this agrees with the definition of u a in (6.3.28). ④ A 3-vector (e.g., u a ) at
any point p in the 4-dimensional spacetime is an element in V p , and hence is also a
4-vector, just the time component u 0 is zero.
Since (6.3.28) only involves the tangent space of p (only involves an “infinites-
imal” neighborhood of p), it also applies to the cases where L(τ ) and G are not
geodesics, and therefore we have the following definitions:

Definition 2 Suppose L(τ ) is an arbitrary point mass, and p ∈ L, then the 3-velocity
u a of the point mass relative to any instantaneous observer ( p, Z a ) is defined by
(6.3.28), where h ab = ηab + Z a Z b , and γ ≡ −U a Z a .

Definition 3 The magnitude u = u a u a of the 3-velocity vector u a of a point mass
with respect to an instantaneous observer is called the 3-speed of the point mass with
respect to this instantaneous observer, where u a := ηab u b = h ab u b .

Remark 4 Suppose p ∈ L, and G is the geodesic determined by ( p, Z a ), then the


3-speed of the point mass L relative to the instantaneous observer ( p, Z a ) agrees
with the 3-speed of L relative to the inertial frame R that G belongs to [defined by
(6.1.2)]. For now we relax L(τ ) to be either a timelike, null, or spacelike curve. For
6.3 Kinematics and Dynamics of a Point Mass 199

Fig. 6.28 The instantaneous


rest inertial reference frame
of a point mass L at p

the timelike and spacelike cases, τ represents the arc length, and for the null case, τ
represents an arbitrary parameter. Let U a ≡ (∂/∂τ )a , we still use (6.3.28) to define
u a . Then,

(h a c U c )(h b d U d ) U cU d
u 2 = h ab u a u b = h ab = h cd
γ 2 γ2
ηcd U c U d + Z c Z d U c U d ηcd U c U d + γ 2
= = .
γ 2 γ2

The equation above indicates that u < 1 ⇔ ηcd U c U d < 0, u = 1 ⇔ ηcd U c U d = 0,


u > 1 ⇔ ηcd U c U d > 0. Thus, if we define the 3-velocity using (6.3.28), then the
basic tenet of special relativity can be expressed using the 3-dimensional language
as “the 3-speed of a point mass is slower than the speed of light”.
If the instantaneous observer ( p, Z a ) is tangent to the world line L of the particle,
then ( p, Z a ) is called an instantaneous rest observer of this particle (the particle
L is at rest at p to the observer). The geodesic G determined by p and Z a is called
the instantaneous rest inertial observer of L at p, and the inertial reference frame
that G belongs to is called an instantaneous rest inertial reference frame of L at
p, in which any inertial coordinate system is called an instantaneous rest inertial
coordinate system of L at p. The concept of an instantaneous rest inertial frame
will be very useful (Fig. 6.28).

Proposition 6.3.2 The 4-velocity of a point mass can be 3 + 1-decomposed by


means of an instantaneous observer ( p, Z a ):

U a = γ (Z a + u a ) , (6.3.30)

where u a is the 3-velocity of the point mass relative to the instantaneous observer,
and γ ≡ −Z a Ua .

Proof It follows from (6.3.28) that

γ u a = h a b U b = (δ a b + Z a Z b )U b = U a − γ Z a ,

and hence we have (6.3.30). 


200 6 Special Relativity

Remark 5 From (6.3.30) we can see that γ u a is the spatial component of U a .


Choosing an inertial frame {t, x, y, z} such that (∂/∂t)a = Z a , we can see from
(6.3.30) that γ Z a is the time component of U a . Hence, one can also write (6.3.30)
as U a = γ (1, u a ), which agrees with the commonly used expression U μ = γ (c, u)
in texts on special relativity.

Remark 6 U a is absolute (independent of any observer or coordinate system), while


the 3 + 1 decomposition of U a depends on the observer (or coordinate system), and
thus is relative. For another instantaneous observer ( p, Z a ), the same U a can be
expressed as U a = γ  Z a + γ  u a , i.e., both the time component γ  Z a and the spa-
tial component γ  u a are different from γ Z a and γ u a .

Definition 4 Suppose the (rest) mass of a point mass is m, and the 4-velocity is U a ,
then the 4-momentum P a of the point mass is defined as

P a := mU a . (6.3.31)

Proposition 6.3.3 The 4-momentum of a point mass can be 3 + 1-decomposed by


means of an instantaneous observer ( p, Z a ):

P a = E Z a + pa , (6.3.32)

where the energy E and the 3-momentum pa are defined by (6.3.20) and (6.3.19).

Proof It follows from Definition 4, (6.3.19) and (6.3.20) that

P a = mU a = m(γ Z a + γ u a ) = E Z a + pa .

Remark 7 Equation (6.3.32) indicates that the 3-momentum pa and the energy E are
respectively the spatial and time components of the 4-momentum P a , the latter of
which can be expressed as
E = −P a Z a . (6.3.33)

[This can be easily seen by contracting Z a with (6.3.32)]. The concept of the 4-
momentum of a point mass unifies two different concepts—the energy and momen-
tum of a point mass—organically into one physical quantity, which is independent
of the observer (P a is absolute). However, the way of decomposing P a into time
and spatial components depends on the observer, and thus is relative. If there is no
observer making a local measurement, then the 4-momentum still exists objectively,
but the energy and 3-momentum are meaningless. Now we can further understand
why most modern literature only uses the (rest) mass m and (total) energy E—they
are two fundamentally different types of quantity. The mass m of a point mass (e.g.,
an electron) is an invariant (just like its electric charge), which reflects an intrinsic
6.3 Kinematics and Dynamics of a Point Mass 201

property of a point mass. The energy E of a point mass depends on the observer (and
thus is not an invariant). The energy measured by an instantaneous rest observer is
the rest energy; although it has the same value as the mass, they are not quantities
of the same type [mass is an invariant, while the rest energy is a special case of an
observer dependent quantity (energy)].

Remark 8 It is easy to derive the relation of mass, energy and 3-momentum from
(6.3.32) as follows:

P a Pa = (E Z a + pa )(E Z a + pa ) = −E 2 + p 2 ,

where p stands for the magnitude of the 3-momentum. On the other hand, P a Pa =
mU a mUa = −m 2 , and therefore

E 2 = m 2 + p2 , (6.3.34)

which is exactly the well-known formula E 2 = m 2 c4 + p 2 c2 when c = 1.


[Optional Reading 6.3.1]
The laws of the conservation of the energy and 3-momentum in a collision process
are theoretical hypotheses that have been verified by countless experiments, which can be
expressed together in terms of the 4-dimensional language as: the total 4-momentum is
conserved in the collision (the law of the conservation of 4-momentum). The “collision”
is quite general, including all the interactions that happen at the same spacetime point; the
particles involved can be either point masses or photons (see the paragraph before Optional
Reading 6.6.5 for the definitions of the energy and 3-momentum of a photon), and the
particle numbers before and after the collision can be different (see Fig. 6.29). Let P a and
P̄ a represent the sum of the 4-momentum of all the particles before and after the collision,
respectively. (For Fig. 6.29 they are P a = P1a + P2a , P̄ a = P3a + P4a + P5a ). Then, the law of
the conservation of 4-momentum can be expressed as P a = P̄ a . Note that this kind of vector
equation is endowed with Lorentz covariance, and there is no need to worry anymore about
whether the energy or 3-momentum is conserved in one frame but not conserved in another
frame. The fact that the energy and 3-momentum are respectively the “time component” and
“spatial component” of the 4-momentum is significant in many aspects. As an example, let
us show that merely from the conservation of 3-momentum one can derive the conservation
of 4-momentum, and thus the conservation of energy. First we choose an instantaneous
observer ( p, Z a ) such that Z a is parallel to P a , then P a has no spatial component relative
to Z a , i.e., the 3-momentum pa = 0. From the conservation of 3-momentum we know that
p̄a = 0, i.e., P̄ a has no spatial component relative to Z a either, and hence P̄ a and P a at
most only differ by multiplication by a numerical factor (denoted by σ ): P̄ a = σ P a . Choose
another instantaneous observer ( p, Z a ) (where Z a and P a are not parallel), and let h a b
represent the projection map determined by Z a . Then, the 3-momentum p̄ a of P̄ a relative to
Z a satisfies p̄ a = h a b P̄ b = σ h a b P b = σ p a , and the fact that 3-momentum is conserved
in any inertial frame (which is the key point) assures that p̄ a = p a , and hence σ = 1.
Therefore, P a = P̄ a , i.e., the 4-momentum is conserved.
[The End of Optional Reading 6.3.1]

Definition 5 The 4-acceleration of a point mass is defined as

Aa := U b ∂b U a , (6.3.35)
202 6 Special Relativity

Fig. 6.29 Two particles


becomes three particles after
a collision

where U a is the 4-velocity of the point mass, and ∂b is the derivative operator asso-
ciated with ηab (i.e., ∂a ηbc = 0 ).

Remark 9 By definition we can see that ① 4-acceleration is absolute; ② Aa = 0


is equivalent to U b ∂b U a = 0 (the world line being a geodesic), i.e., the point mass
undergoes inertial motion. Thus, a necessary and sufficient condition for a point mass
to experience an inertial motion (be a free point mass) is that its 4-acceleration is
zero.

Proposition 6.3.4 The 4-acceleration Aa at each point on the world line of a point
mass is orthogonal to the 4-velocity U a , i.e., Aa Ua = ηab Aa U b = 0.

Proof Exercise 6.12. (Hint: use U b ∂b (U a Ua ) = 2Ua U b ∂b U a ). 

Unlike the 3-velocity u a , the 3-acceleration of a point mass L cannot be determined


just by one observer G, since to determine the 3-acceleration of L at p (the inter-
section of G and L) one needs to compare the 3-velocities of L at p and at another
point p  sitting next to p on L, while the latter in general is not an intersection of
G and L. This difficulty can be overcame by means of a coordinate system: one can
define the 3-acceleration of L at an arbitrary p on it relative to any coordinate system
(called the “coordinate 3-acceleration”). The most commonly used one should be
the 3-acceleration of L relative to an inertial coordinate system.

Definition 6 Suppose the parametric equations of the world line L(τ ) of a point
mass in an inertial coordinate system {t, x i } are t = t (τ ), x i = x i (τ ), then its 3-
acceleration relative to this system is defined as
 a
d2 x i (t) ∂
a a := , (6.3.36)
dt 2 ∂xi

where x i (t) is the function x i = x i (t) obtained by combining x i = x i (τ ) and


t = t (τ ) (namely the parametric equation of L with t as the parameter).
6.3 Kinematics and Dynamics of a Point Mass 203

Remark 10 It is ease to see that this definition agrees with (6.3.18).


Now we discuss the relation between the 4-acceleration Aa of a point mass and
its 3-acceleration a a relative to an inertial frame R.
Proposition 6.3.5 The components of the 4-acceleration Aa in an inertial frame R
are
A0 = γ 4 u · a , Ai = γ 2 a i + γ 4 (u · a)u i , (6.3.37)

where u and a are respectively the 3-velocity and 3-acceleration of the point mass
relative to R, γ ≡ (1 − u 2 )−1/2 , and u ≡ (u · u)1/2 .

Proof Suppose {(dx μ )a } is the dual coordinate basis of the frame R, then it follows
from the definition of Aa that

Aμ = Aa (dx μ )a = (dx μ )a U b ∂b U a = U b ∂b [(dx μ )a U a ]


dU μ dU μ
= U b ∂b U μ = =γ .
dτ dt
(The third equality is because the ∂a that satisfies ∂a ηbc = 0 is exactly the ordinary
derivative operator of the Lorentzian system). From (6.3.30) we can see that U 0 = γ ,
U i = γ u i , and thus

dU 0 dγ
A0 = γ =γ ,
dt dt
dU i d(γ u i ) du i dγ dγ
Ai = γ =γ = γ2 + ui γ = γ 2ai + ui γ .
dt dt dt dt dt

Also, from γ ≡ (1 − u 2 )−1/2 we get dγ /dt = γ 3 udu/dt = γ 3 u · a. Plugging this


into the two equations above yields (6.3.37). 

Remark 11 For a free point mass we have Aa = 0, and from (6.3.37) we can see that
its 3-acceleration relative to any inertial frame is a a = 0.

Proposition 6.3.6 The 4-acceleration of a point mass is equal to its 3-acceleration


relative to an instantaneous rest inertial coordinate system.

Proof Plugging u = 0 into (6.3.37) yields A0 = 0 and Ai = a i . 

Definition 7 The 4-force on a point mass is defined as

F a := U b ∂b P a , (6.3.38)

where U a and P a are the 4-velocity and 4-momentum of the point mass, respectively.
Equation (6.3.38) is also called (the 4-dimensional expression of) the relativistic
equation of motion for a point mass, but actually it is just the definition of the 4-force.
204 6 Special Relativity

To obtain the real physical laws, one also needs to combine (6.3.38) with the specific
expression of the 4-force in each specific case.

Remark 12 In this section, we only care about the case where the (rest) mass m of
the point mass remains a constant (dm/dτ = 0). In this case plugging P a = mU a
into (6.3.38) we obtain F a = m Aa . However, if m is changing during the motion
(dm/dτ = 0), then this conclusion does not hold, see Optional Reading 6.3.

Proposition 6.3.7 The spatial components F i (i = 1, 2, 3) of the 4-force on a point


mass in an inertial coordinate system {x μ } is equal to γ times the corresponding
component f i of the 3-force acting on it, and the time component F 0 of the 4-force
is equal to γ times f · u, the power of the 3-force. That is,

Fi = γ f i , F0 = γ f · u , (6.3.39)

where γ ≡ (1 − u 2 )−1/2 , and u is the magnitude of the 3-velocity u of the point mass
with respect to this system.

Proof In {x μ }, the components of F a are

F μ = F a (dx μ )a = (dx μ )a U b ∂b P a = U b ∂b [(dx μ )a P a ] = U b ∂b P μ .

Take the ith component. It follows from (6.3.32) and (6.3.19) that

d pi d pi dt
F i = U b ∂b P i = = = γfi .
dτ dt dτ
Now take the 0th component. It follows from (6.3.32) and (6.3.20) that

dE dE dt dE
F 0 = U b ∂b P 0 = U b ∂b E = = =γ =γ f ·u.
dτ dt dτ dt

[Optional Reading 6.3.2]
So far we only discussed the case where the (rest) mass m of the point mass is a constant
(dm/dτ = 0), but more generally, m may change in the motion, i.e., dm/dτ = 0. For instance,
consider a resistor in a DC circuit at rest in an inertial frame R . The Joule heat (which is
also a form of energy) caused by the current makes the rest energy mc2 of the resistor to
increase, and thus dm/dτ > 0. In the case where dm/dτ = 0, some previous conclusions
need to be modified. Such as,
(1) Although f · u can still be called the power of the 3-force (there are also people who
think it is improper to call it so), it is not equal to the rate of change of the total energy any
more. The relation between them is now
dE c2 dm
f ·u = − . (6.3.40)
dt γ dt
6.3 Kinematics and Dynamics of a Point Mass 205

Fig. 6.30 The triad of an


observer (spatial diagram)

Fig. 6.31 The tetrad field


along a world line

(2) The kinetic energy should be defined as the difference between the total energy γ mc2
and the rest energy mc2 , i.e., E k = (γ − 1)mc2 , which does not satisfy f · u = dE k /dt. In
fact, if dm/dτ = 0 then f · u is equal to neither dE/dt nor dE k /dt.
(3) The 4-force is still defined as U b ∂b P a ; however, F a = m Aa .
(4) Proposition 6.3.7 now should be stated as

dE
Fi = γ f i , F0 = γ (= γ f · u) . (6.3.41)
dt

[The End of Optional Reading 6.3.2]

To make a measurement, besides a standard clock, each observer also needs to be


equipped with a triad (3-dimensional frame). Intuitively, a triad is a frame welded
by three short rods with unit length that are orthogonal to each other (see Fig. 6.30);
the observer chooses which direction each rod points at, which represents a direction
of the measurement. Mathematically, a triad is abstracted as three orthonormal spatial
vector fields {(ei )a , i = 1, 2, 3} on the observer’s world line; “spatial” means that
they are all orthogonal to the 4-velocity Z a of the observer. Hence, including (e0 )a =
Z a , there are four orthonormal vector fields along the observer’s world line, called
the tetrad field (4-dimensional frame field), see Fig. 6.31. Later, unless stated
otherwise, when we talk about a tetrad field it will refer to a right-handed tetrad field.
Recall that a reference frame is formed by infinitely many observers filled in the
spacetime, and at each spacetime point there is one and only one observer’s world
line passing through it. Thus, given a reference frame, we will have a tetrad field in
the whole spacetime (or in an open subset of it). Any tensor at any spacetime point
can be expressed in terms of the tetrad at this point as a basis.
206 6 Special Relativity

Previously, when we talked about an observer we meant a timelike curve, while


this is not sufficient in many situations, in which one also needs to add a requirement
on the tetrad, i.e., an observer is a timelike curve with a tetrad field defined along it.
The precise definition of an inertial observer is then: an inertial observer is a non-
rotating observer undergoing inertial motion. “Undergoing inertial motion” means
the world line is a geodesic (before, this was the only requirement for an inertial
observer), while “non-rotating” is the requirement for the tetrad field on the world
line. Intuitively, suppose two boys A and B sit on two chairs on the ground. A sits on
a regular chair while B sits on a swivel chair (whose base is fixed on the ground) and
keeps rotating. Then, A can be viewed as an inertial observer while B cannot (since
he is rotating). Note that although the concept of an observer requires us to treat the
two boys as point-like (and thus each of them is represented by a world line, and
both are geodesics), the question whether or not there is rotation only depends on
whether or not the direction of each spatial basis vector at each point changes along
the curve, and thus it is a meaningful question (more detailed discussion of this can
be found in Sect. 7.3). For instance, suppose R is an inertial coordinate system, treat
each of the t-coordinate lines in it as the world line of an observer, and choose the
inertial coordinate basis as the tetrad field along the curve. Then, intuitively speaking
(or according to the precise definition in Sect. 7.3), each observer is non-rotating.
Normally when we talk about an inertial observer in an inertial frame, we often
assume that the inertial coordinate basis is used as the tetrad. Correspondingly, to
determine an instantaneous observer, one needs not only the ingredients of p and
Z a , but also replacing Z a by an orthonormal tetrad at p. Therefore, an instantaneous
observer should be represented by ( p, (eμ )a ), where (e0 )a = Z a . When it is not
necessary to emphasize the tetrad, an instantaneous observer can still be represented
by ( p, Z a ).

6.4 The Energy-Momentum Tensor of Continuous Media

When discussing media that are continuously distributed (gases, liquids, solids,
plasma, etc.), we care not about the behavior of any specific particle, but about the
statistical average over all of the particles. We are interested in the energy/momentum
density and energy/momentum flux density, etc. at each point of space rather than the
energy and momentum of any individual particle. Thus, a continuous medium is sim-
ilar to an electromagnetic field in many aspects, and we call it a matter field. Suppose
m is the rest mass in a macroscopically small volume V , the content of which has a 3-
velocity u relative to an inertial frame, then its 3-momentum is p = γ m u = (E/c2 )u,
where E is its energy and the meaning of γ is self-evident. Dividing the whole equa-
tion by V yields

1 1
3-momentum density = 2
energy density × u = 2 energy flux density . (6.4.1)
c c
6.4 The Energy-Momentum Tensor of Continuous Media 207

Fig. 6.32 T12 is the second


component of the force from
the matter below the area
element on the matter above,
T̂ ab (e1 )b is the 3-momentum
flux density along the
(e1 )a -direction (see optional
reading)

(The second equality can be understood by means of the following example: in


electromagnetism, suppose ρ is the charge density and u is the velocity of the charge
carrier, then the electric current density is j = ρ u). When we take c = 1, the 3-
momentum density is then equal to the energy flux density.
The expressions for the energy density, momentum density, energy flux density
(Poynting vector) and momentum flux density can be found in textbooks on electro-
dynamics, where the energy flux density is equal to the momentum density times c2 .
Just like the energy and 3-momentum of a particle together form the 4-momentum
vector P a , these density quantities of an electromagnetic field form a tensor Tab
of type (0, 2), called the energy-momentum tensor, which is a tensor field on
4-dimensional Minkowski spacetime, and all kinds of 3-dimensional densities are
nothing but different components of Tab . In fact, just like electromagnetic fields,
each matter field also has their own energy-momentum tensor Tab , which has the
following important properties and physical meanings:
1. Tab = Tba .
2. For any matter field that is closed (without interaction with the outside) we have
∂ a Tab = 0. We will see below that this is exactly the manifestation of conservation
of the energy, 3-momentum and angular momentum (the conservation of angular
momentum also requires Tab = Tba ).
3. For an arbitrary instantaneous observer ( p, (eμ )a ), (e0 )a = Z a we have
(a) μ ≡ Tab Z a Z b = T00 is the energy density measured by this observer;
(b) wi ≡ −Tab Z a (ei )b = −T0i is the i-component of the 3-momentum density
(energy flux density) measured by this observer;
(c) Tab (ei )a (e j )b = Ti j is the i j-component of the 3-stress tensor measured by
this observer. For instance, take a spatial unit area element perpendicular to (e1 )a
(Fig. 6.32 is the spatial diagram), then T12 is equal to the second component of
the force exerted from the matter below the area element on the matter above (see
textbooks on the theory of elasticity).
Thus, the energy-momentum tensor Tab is absolute, while the energy density,
3-momentum density, etc. are relative.
[Optional Reading 6.4.1]
Since {(ei )a } is orthonormal, it is not difficult to show that T i j = Ti j . Suppose {(eμ )a }
is the dual basis of {(eμ )a }, let us discuss the physical meaning of the spatial tensor field
T̂ ab ≡ T i j (ei )a (e j )b [or T̂ab ≡ Ti j (ei )a (e j )b ]. Let S represent the spatial unit area element
perpendicular to (ei )a (i is any of 1, 2, 3). It follows from the text above that
208 6 Special Relativity

T i j = Ti j = j − component of the force from the matter


on one side of S, to the matter on the other side, (6.4.2)

and thus T̂ ab should be interpreted as the 3-stress tensor. On the other hand,

T i j = T̂ ab (ei )a (e j )b = [T̂ ab (ei )b ](e j )a = j − component of T̂ ab (ei )b . (6.4.3)


Combining (6.4.2) and (6.4.3) yields

j − component of T̂ ab (ei )b = j − component of the force from the matter


on one side of S to the matter on the other side,

and hence

T̂ ab (ei )b = the force from the matter


on one side of S to the matter on the other side.

Also, a force is nothing but the rate of change of the 3-momentum of the object which the
force acts on, and the interaction between them is nothing but exchanging their 3-momenta.
Thus,

T̂ ab (ei )b = the 3 − momentum crossing a unit area perpendicular to (ei )a


along the direction of (ei )a in a unit time
= the 3 − momentum flux density along the direction of (ei )a .

The (ei )a in the equation above can be the unit vector of any spatial direction, and so
this equation indicates that the 3-momentum flux density along any spatial direction can
be obtained by contracting T̂ ab with the unit vector of this direction. Therefore, T̂ ab ≡
T i j (ei )a (e j )b can be interpreted (called) as the 3-momentum flux density tensor.
[The End of Optional Reading 6.4.1]

Definition 1 W a := −T a b Z b is called the 4-momentum density measured by the


instantaneous observer ( p, Z a ).

Proposition 6.4.1 The 4-momentum density W a measured by the instantaneous


observer ( p, (eμ )a ), (e0 )a = Z a can be decomposed as follows:

W a = μZ a + wa , (6.4.4)

where μ and wa ≡ wi (ei )a are respectively the energy density and 3-momentum
density measured by this observer, the latter of which is a spatial vector of this
observer.

Proof The components of W a in the frame {(eμ )a } are

W 0 = W a (e0 )a = −T a b Z b (−Z a ) = Tab Z b Z a = μ ,


W i = W a (ei )a = −T a b Z b (ei )a = −Tab Z b (ei )a = wi .

Hence, W a = μ(e0 )a + wi (ei )a = μZ a + wa . 


6.4 The Energy-Momentum Tensor of Continuous Media 209

Remark 1 Equations (6.4.4) and (6.3.32) are very similar: the left-hand side of the
latter is the 4-momentum P a , and the left-hand side of the former is the 4-momentum
density W a . Both equations are the 3 + 1 decomposition of a 4-vector. However, one
should notice a difference: the 4-momentum P a is independent of the observer, while
the 4-momentum density W a depends on the observer (from Definition 1 one can
see that W a is a 4-vector that depends on the observer).

Proposition 6.4.2 ∂ a Tab = 0 ⇒ energy conservation.

Proof Suppose t, x, y, z are the coordinates for an inertial frame R, and let Z a ≡
(∂/∂t)a . Then taking the derivative of W a ≡ −T a b Z b yields

∂a W a = ∂a (−T a b Z b ) = −Z b ∂ a Tab − T a b ∂a Z b .

The first term on the right-hand side of the above equation vanishes (since ∂ a Tab = 0),
as does the second term [since ∂a Z b = ∂a (∂/∂t)b = 0], and hence

∂a W a = 0 . (6.4.5)

Therefore,

∂μ
0 = ∂μ W μ = ∂0 W 0 + ∂i W i = ∂0 μ + ∂i wi = +∇ ·w. (6.4.6)
∂t
Since μ and wa are respectively the energy density and energy flux density measured
by the frame R, the equation above looks quite like the continuity equation (∂ρ/∂t) +
∇ · j = 0 in electrodynamics. Following the reasoning of the conservation of the
electric charge from the latter, one can deduce that (6.4.6) leads to the conservation
of energy. 

Remark 2 One can also derive the conservation of 3-dimensional momentum and
angular momentum from ∂ a Tab = 0, and thus ∂ a Tab = 0 is also called the conser-
vation equation.
[Optional Reading 6.4.2]
The conservation of energy can also be derived directly from (6.4.5) using the 4-
dimensional version of Gauss’s Theorem as follows: let  to be the 4 dimensional “cuboid”
bounded by several hypersurfaces (3d!) in R4 (see Fig. 6.33, one dimension is suppressed
in the figure), i.e., (a segment of) the world tube of the 3-dimensional rectangular box ω
(shown in Fig. 6.34). It follows from Gauss’s theorem and (6.4.5) that
   
0= W a na = W a na + W a na + W a na . (6.4.7)
∂ σ1 σ2 

σ1 and σ2 are the “upper and lower bases” of , and  represents all of the “sides” of .
Noticing the requirement on the direction of the normal vector in (5.5.7 ), we can see that
the normal vector of σ1 , σ2 and 1 (one of the side surfaces) is in the direction shown in
Fig. 6.33. Thus,
210 6 Special Relativity
   
W a na = (μZ a + wa )n a = μZ a Z a = − μ
σ1 σ1 σ1 σ1
= −E 1 = − (the energy of the 3d box ω at t1 ) ,

where (6.4.4) is used in the first equality. Similarly,



W a n a = E 2 = (the energy of ω at t2 ).
σ2

On the other hand,


   
W a na = − Tab Z b (∂/∂ x)a = w1 = w1 ε̂ ,
1 1 1 1

where ε̂ is the 3-dimensional volume element induced by the 4-dimensional volume element
ε = dt ∧ dx ∧ dy ∧ dz on 1 , i.e.,

ε̂abc = (∂/∂ x)d (dt)d ∧ (dx)a ∧ (dy)b ∧ (dz)c = −(dt)a ∧ (dy)b ∧ (dz)c .

Hence,
    t2  y2  z 2
W a na = − w1 dt ∧ dy ∧ dz = w1 dtdydz = (w1 dydz)dt ,
1 1 1 t1 y1 z1
(6.4.8)
where the minus sign is dropped in the second equality because {t, y, z} is a left-handed
coordinate system measured by ε̂ = −dt ∧ dy ∧ dz. Recalling that w1 is the energy flux
y z
density along the direction of (∂/∂ x)a , we can see that y12 z 12 (w1 dydz)dt is the energy
flowing out of the side wall S1 of ω within a time dt, and hence the right-hand side of
(6.4.8) is the energy flowing out of ω from the side wall S1 (see Fig. 6.34) in a time t2 − t1 ,
and −  W a n a is the energy flowing into ω from each side wall in t2 − t1 , i.e., the energy
increase in this period of time. Therefore, (6.4.7) indicates that:

the energy increase of the box ω in t2 − t1


= the energy flowing into ω from each side wall.

Thus, the energy is conserved.


Finally, we shall point out one subtlety in the derivation of σ1 W a n a = −E 1 . The expression
σ1 W n a is an abbreviation of σ1 (W n a )ε̂, where ε̂ is the induced volume element on σ1 ,
a a

which seems should be expressed according to (5.5.6) as ε̂abc = n d εdabc . However, the n a
in (5.5.6) is an outgoing unit normal vector, which differs from the n a here (see Fig. 6.33)
by a minus sign, and thus ε̂ should be expressed using the n a here as

ε̂abc = −n d εdabc = −(∂/∂t)d (dt)d ∧ (dx)a ∧ (dy)b ∧ (dz)c


= −(dx)a ∧ (dy)b ∧ (dz)c .

This indicates that the coordinate system {x, y, z} on σ1 is left-handed measured by ε̂.
Therefore,
   
(W a n a )ε̂ = − με̂ = μ(dx)a ∧ (dy)b ∧ (dz)c = − μdxdydz = −E 1 .
σ1 σ1 σ1 σ1
6.5 Perfect Fluid Dynamics 211

Fig. 6.33  is the world


tube of the 3-dimensional
box ω in Fig. 6.33 (with one
dimension suppressed)

Fig. 6.34 The


3-dimensional box ω (spatial
diagram)

[Since {x, y, z} is a left-handed system, we used (5.2.6) in the third equality]. Although the
conclusion is still σ1 W a n a = −E 1 , one should note that there are two minus signs showing
up which cancel each other that assures the same result.
[The End of Optional Reading 6.4.2]

6.5 Perfect Fluid Dynamics

Definition 1. A perfect fluid is a matter field whose energy-momentum tensor can


be expressed as

Tab = μUa Ub + p(ηab + Ua Ub ) = (μ + p)Ua Ub + pηab , (6.5.1)

where u and p are functions (scalar fields), and U a is a future-directed timelike vector
field which satisfies U a Ua = −1, called the 4-velocity field of the perfect fluid.
A fluid itself can be viewed as a reference frame. Suppose the 4-velocity (e0 )a
of an instantaneous observer ( p, (eμ )a ) satisfies (e0 )a = U a | p , then this observer
is at rest relative to the fluid reference frame, and thus is called an instantaneous
rest observer. However, to another reference frame, this observer moves with the
fluid, and hence ( p, U a | p ) is also called an instantaneous comoving observer. For
a comoving observer,

Tab (e0 )a (e0 )b = Tab U a U b = (μ + p)Ua Ub U a U b + pηab U a U b = (μ + p) − p = μ .

Thus, the μ in (6.5.1) is the energy density measured by a comoving observer,


also called the proper energy density. Let (ei )a represent the triad of a comoving
observer, it follows from (6.5.1) that
212 6 Special Relativity

Tab (ei )a (e j )b = pηab (ei )a (e j )b = pδi j .

Thus, the 3-dimensional stress tensor measured by a comoving observer has the
matrix form ⎛ ⎞
p 0 0
⎝0 p 0⎠ ,
0 0 p

i.e., there is only pressure but no shear stress (which is exactly an important property
of a perfect fluid4 ). From T11 = T22 = T33 = p and the arbitrariness of the triad of a
comoving observer we can see that a perfect fluid is isotropic.5 Also, Tab (e0 )a (ei )b
indicates that the energy flux density measured by a comoving observer is zero, and
thus there is no thermal conduction. All of these are important properties of a perfect
fluid.
It is necessary to give an explanation of the physical meaning of the 4-velocity
field U a . A perfect fluid is a continuous medium, which is a model obtained from the
statistical average over the microscopic discreet structure of the particles. Usually,
a fluid volume element that is large enough microscopically while small enough
macroscopically is called a fluid particle or fluid point mass [see Landau and
Lifshitz (1987) p. 1; Zhou et al. (2000) pp. 15–17]. The U a in (6.5.1) is the vector
field formed by the 4-velocity of all fluid particles. A comoving observer is the
observer at rest relative to a fluid particle, and a comoving reference frame (rest
reference frame) is the reference frame of the observers whose 4-velocity field is U a .
One should note the conceptual difference between fluid particles and microscopic
particles that form a fluid. This difference is especially prominent for an ideal gas
(which is an example of a perfect fluid). Due to frequent collisions, the world lines
of the gas molecules intersect a lot. Since the 4-velocity of a molecule has a sudden
change during a collision, the world lines of the molecules are significantly distinct
from Fig. 6.35, and so do not treat the U a in (6.5.1) as the 4-velocity of a specific
molecule. In fact, we have already taken the statistical average over the microscopic
motion of the molecules when we regard an ideal gas as a perfect fluid, and U a
is the 4-velocity field after the average. Consider a box at rest in an inertial frame
{t, x, y, z}, which contains an ideal gas in thermal equilibrium. Since there is no
special direction, the average 3-velocity of the gas molecule is zero, and hence
U a = (∂/∂t)a , whose integral curves are the t-coordinate lines as shown in 6.36.
Thus, a comoving observer is not an observer moving with a gas molecule, but is the
inertial observer at rest relative to the box.
The pressure p and the mass density μ of a perfect fluid have the following
well-known relation:

4 In Newtonian hydrodynamics, a perfect fluid is defined as a fluid with no thermal conductivity or


viscosity (and thus no shear stress for a rest observer); see Landau and Lifshitz (1987) p. 3.
5 As long as there exists a reference frame, in which the measurement of a fluid has no directional

preference, then the fluid is said to be isotropic. We have shown that a comoving frame meets this
requirement, so a perfect fluid is isotropic.
6.5 Perfect Fluid Dynamics 213

Fig. 6.35 The tangent


vectors of the world lines of
fluid particles form the fluid
4-velocity field U a

Fig. 6.36 The 4-velocity


field of an ideal gas (as a
perfect fluid)

μu 2
p= , (6.5.2)
3

where u 2 is the average of the square of the random motion velocity of each molecule.
Since u 2  c2 , we have ( p/c2 )  μ, which in the unit system with c = 1 is p  μ,
i.e., the pressure is much less than the density. This conclusion holds for any non-
relativistic fluid, like in a hurricane p/μ ∼ 10−12 , and in the Earth’s core p/μ ∼
10−10 . However, for relativistic fluids it will be quite different. The electromagnetic
radiation that reaches thermal equilibrium in an isothermal box (which is called
blackbody radiation) can be viewed as an example of an extreme relativistic perfect
fluid, where the reference frame at rest relative to the box is the rest frame (comoving
frame) of the fluid. The radiation inside the box is isotropic in this frame, and thus
this frame is also called the isotropic reference frame of blackbody radiation. The
electromagnetic radiation in the box has many similarities with an ideal gas, and can
be called a photon gas. The relation between the pressure p and energy density μ
of an photon gas also satisfies (6.5.2) (of course the derivation is different), also now
u 2 = 1, and hence
μ
p= . (6.5.3)
3
The key point for why blackbody radiation can be regarded as a perfect fluid is that,
relative to the isotropic reference frame, its photons have random motions in all
directions that are sufficiently disordered (see Appendix D in Volume II for details).
In contrast, the light rays coming from a searchlight cannot be regarded as a perfect
fluid, since there does not exist a reference frame in which these light rays are
isotropic.
214 6 Special Relativity

A perfect fluid in Newtonian mechanics obeys two important laws, namely the
continuity equation that describes the rate of change of the mass density μ,

∂μ
+ ∇ · (μu) = 0 (reflects the conservation of mass) , (6.5.4)
∂t

and the Euler equation that describes the rate of change of the 3-velocity u (see
Optional Reading 6.5 for a derivation)
 
∂u
−∇p = μ + (u · ∇)u . (6.5.5)
∂t

Now we will introduce the generalization of these two laws in relativistic perfect
fluid mechanics. Suppose a perfect fluid has no interaction with the outside, then its
energy-momentum tensor satisfies ∂ a Tab = 0. It follows from (6.5.1) that

0 = ∂ a Tab = Ua Ub ∂ a (μ + p) + (μ + p)(U a ∂a Ub + Ub ∂a U a ) + ∂b p . (6.5.6)

This is an equality of 4-vectors, which can be projected onto the spatial and time
directions of a comoving observer. Contracting U b with the equation above yields

0 = U b ∂ a Tab = −Ua ∂ a (μ + p) + (μ + p)(U b U a ∂a Ub − ∂a U a ) + U b ∂b p .

Noticing

1 a
U b U a ∂a Ub = U ∂a (U b Ub ) = 0 (since U b Ub = −1 = constant) ,
2
we have
U a ∂a μ + (μ + p)∂a U a = 0 . (6.5.7)

This is the projection of (6.5.6) in the time direction. To find the spatial projection,
we contract the projection map h c b = δc b + Uc U b with (6.5.6) and obtain

(μ + p)U a ∂a Uc + ∂c p + Uc U b ∂b p = 0 . (6.5.8)

Equations (6.5.7) and (6.5.8) are the relativistic equations of motion for a perfect
fluid. A perfect fluid with zero pressure is called a dust. For a dust, (6.5.8) can be
simplified as U a ∂a Uc = 0, and thus the world line of a dust particle is a geodesic. This
is pretty natural since p = 0 indicates that there is no force exerted on the particle. To
find the non-relativistic approximation of (6.5.7) and (6.5.8), we choose an arbitrary
inertial frame {t, x i } and make the 3 + 1 decomposition for U a [see (6.3.21)]:

U a = γ [(∂/∂t)a + u a ] ∼
= (∂/∂t)a + u a , (6.5.9)
6.5 Perfect Fluid Dynamics 215

where u a is the 3-velocity of the fluid in this system, and γ = −(∂/∂t)a Ua is approx-
imated as 1 in the non-relativistic limit. Plugging (6.5.9) into (6.5.7) and noticing
that p  μ, we get (the approximation symbol is omitted from now on)
 a
∂ ∂μ
0= ∂a μ + u a ∂a μ + μ∂a u a = + ∂a (μu a ) .
∂t ∂t

Since u a is a spatial vector in the inertial frame we are using, ∂a (μu a ) = ∂i (μu i ) =
∇ · (μu), and hence the equation above is exactly the continuity equation (6.5.4).
Contracting (∂/∂ x i )c with (6.5.8) and noticing (6.5.9) and p  μ, we get
 a   c  b 
∂ ∂ ∂
0=μ ∂a u i + u ∂a u i +
a
∂c p + u i +u b
∂b p
∂t ∂xi ∂t
 
∂u i ∂p ∂p ∂p
=μ + u a ∂a u i + + ui + ui u j j .
∂t ∂x i ∂t ∂x

In the non-relativistic case (u  1) we also have u i ∂ p/∂t  ∂ p/∂ x i and u i u j ∂ p/∂ x j


 ∂ p/∂ x i , and hence the last two terms in the equation above can be neglected com-
pared with ∂ p/∂ x i . Written in the form of a 3-vector equation this is exactly the Euler
equation (6.5.5) .
[Optional Reading 6.5.1]
The readers who have learned Newtonian fluid mechanics will know that there exists
two descriptions of a fluid, namely the Lagrangian approach and the Eulerian approach [see,
for example, Zhou et al. (2000)], the former of which focuses on fluid particles (the spatial
trajectory of a fluid particle is called a pathline), while the latter of which focuses on spatial
points (there is an flow velocity vector at each spatial point, and thus there is a flow velocity
vector field u in the 3-dimensional space, whose integral curves are called streamlines). One
advantage of the Eulerian description is that a flow velocity field can be defined on space.
Using the 4-dimensional language, one can acquire a deeper understanding of the difference
and relationship between these two descriptions. In the 4-dimensional language, the world
lines of fluid particles fill an open set O of the spacetime (∀ p ∈ O there is a unique world
line passing though p). Since the Lagrangian description focuses on fluid particles (point
masses), in the 4-dimensional language it focuses on the world lines, whose tangent vectors
U a form a 4-dimensional vector field. That is, the Lagrangian description can be naturally
transferred to 4-dimensional description. In contrast, the Eulerian description is intrinsically
a 3 + 1-description, since the concept of a “spacetime point” already exists in the 3 + 1-
language: it is nothing but a point p on a surface t of simultaneity in an inertial frame R . In
the 4-dimensional point of view, a spatial point is actually a world line of an inertial observer
in R , such as P in Fig. 6.37. ∀ p ∈ O, let û a | p be the projection of U a | p on the surface
t of simultaneity passing though p, then we can see from (6.3.30) that u a | p ≡ γ −1 û a | p
[where γ ≡ −(U a Z a )| p ] is the 3-velocity corresponding to the 4-velocity U a | p . Having a
value of u a at each point of the open set O, we obtain a spatial flow velocity field on O,
whose dependency on the spacetime point can be expressed using the inertial coordinates
t, x i of R as u a (t, x i ); at each surface t of simultaneity it gives rise to Euler’s flow velocity
vector field u at a time t. As an example, we will now derive Euler’s equation in order to
further interpret this idea. Imagine a fluid particle as a small cube with a volume V . It is not
difficult to show that the force f acting on the particle satisfies f /V = −∇ p, where p is
the pressure at where the particle is. Suppose the mass of the fluid particle is m, then
216 6 Special Relativity

Fig. 6.37 The definition of


Euler’s spatial flow velocity
field u a

f mdu/dt du
−∇p = = =μ , (6.5.10)
V V dt
where u is the 3-velocity of the fluid particle. There are two reasons that u changes with
time: ① the 3-velocity u of each spatial point can change with time ( p and p  in Fig. 6.37
can have different u a ); ② a fluid particle can move from one spatial point to another spatial
point (the mass point L in Fig. 6.37 moves from the spatial point P to Q), the way of its
moving is described by the parametric equations x i = x i (t) of its trajectory. Let u(t, x i (t))
represent the dependency of u on t due to these two factors. Then, (6.5.10) can be expressed
as    
du ∂u ∂ u dx i (t) ∂u
−∇ p = μ =μ + i =μ + (u · ∇)u .
dt ∂t ∂x dt ∂t
which is Euler’s equation (6.5.5).
[The End of Optional Reading 6.5.1]

6.6 Electrodynamics

6.6.1 Electromagnetic Fields and 4-Current Densities

As is well-known, Maxwell’s theory of electromagnetism is endowed with the


Lorentz covariance. The goal for this section is to reformulate the main contents
of electrodynamics using the 4-dimensional language.
There are two kinds of field involved in electrodynamics: ① the electromagnetic
field; ② the matter field (a continuous fluid) formed by all of the charged particles. The
latter of which is not only the source of the electromagnetic field, but also interacts
with the electromagnetic field.
In the 4-dimensional language, the electromagnetic field is described by a 2-form
field Fab in Minkowski spacetime (called the electromagnetic field tensor). The
electric field E and magnetic field B that are familiar to readers are two spatial vec-
tors obtained by an observer measuring Fab .
6.6 Electrodynamics 217

Definition 1 The electric field E a and the magnetic field B a measured by an instan-
taneous observer ( p, Z a ) are defined by the following equations

E a := Fab Z b , Ba := −∗ Fab Z b , (E a := ηab E b , B a := ηab Bb .) (6.6.1)

where ∗ Fab is the dual differential form of Fab (see Sect. 5.6), which is also a 2-form
field.

Proposition 6.6.1 E a and B a are spatial vector fields of the instantaneous observer
( p, (eμ )a ), (e0 )a = Z a , and

E 1 = F10 , E 2 = F20 , E 3 = F30 ; B1 = F23 , B2 = F31 , B3 = F12 .


(6.6.2)

Proof Since Fab = F[ab] , Z a Z b = Z (a Z b) , and ∗ Fab = ∗ F[ab] , we have

E a Z a = Fab Z a Z b = 0 , Ba Z a = −∗ Fab Z a Z b = 0 ,

and thus E a and B a are spatial vectors of the instantaneous observer ( p, Z a ). Since

E i = E a (ei )a = Fab Z b (ei )a = Fab (e0 )b (ei )a = Fi0 ,

we have E 1 = F10 , E 2 = F20 , E 3 = F30 . Also, since

1 1 1
Bi = Ba (ei )a = −∗ Fab Z b (ei )a = − εabcd F cd (e0 )b (ei )a = ε0icd F cd = ε0i jk F jk ,
2 2 2

we have B1 = 21 (ε0123 F 23 + ε0132 F 32 ) = F 23 = F23 , and similarly B2 = F31 , B3 =


F12 . 

From Proposition 6.6.1 we can see that the matrix constituted by the components of
Fab in terms of the observer’s tetrad (eμ )a is
⎡ ⎤
0 −E 1 −E 2 −E 3
⎢ E 1 0 B3 −B2 ⎥
(Fμν ) = ⎢
⎣ E 2 −B3 0 B1 ⎦ .
⎥ (6.6.3)
E 3 B2 −B1 0

Proposition 6.6.2 Suppose two inertial frames R and R  are related by the Lorentz
transformation

t = γ (t  + vx  ) , x = γ (x  + vt  ) , y = y , z = z . (6.6.4)

Then, the values ( E, B) and ( E  , B  ) of the same electromagnetic field Fab measured
by two observers in these two frames have the following relationship:
218 6 Special Relativity

E 1 = E 1 , E 2 = γ (E 2 − v B3 ) , E 3 = γ (E 3 + v B2 ) ;
(6.6.5)
B1 = B1 , B2 = γ (B2 + v E 3 ) , B3 = γ (B3 − v E 2 ) .

Proof Exercise 6.14. 

Proposition 6.6.3 Suppose the orthonormal tetrads of two instantaneous observers


( p, (eμ )a ) and ( p, (eμ )a ) at p have the following relation: (e2 )a = (e2 )a , (e3 )a =
(e3 )a . Then, the values ( E, B) and ( E  , B  ) of the same electromagnetic field mea-
sured by these two observers also have the relation (6.6.5), in which γ ≡ −(e0 )a (e0 )a .

Proof This proposition is only about the local measurement at p and does not involve
any derivative. Choose the inertial frame R such that the 4-velocity of the observer
whose world line passes p is (e0 )a , and choose another inertial frame R  such that
the 4-velocity of the observer whose world line passes p is (e0 )a . Then, the relation
between R and R  will be (6.6.4). Hence, we have (6.6.5). 

Proposition 6.6.3 indicates that (6.6.5) holds for any two instantaneous observers at
any spacetime point p that satisfy (e2 )a = (e2 )a and (e3 )a = (e3 )a , which clarifies
the misunderstanding that “(6.6.5) only holds for an inertial frame.”
[Optional Reading 6.6.1]
Propositions 6.6.2 and 6.6.3 can also be proved using the orthonormal frame transfor-
mation (see Fig. 6.38). According to (6.3.30), the 3 + 1 decomposition of the 4-velocity
U a ≡ (e0 )a of the instantaneous observer ( p, (e0 )a ) relative to the instantaneous observer
( p, (e0 )a ) gives
(e0 )a = γ (e0 )a + γ u a .
Since the 3-velocity u a is in the same direction as (e1 )a , and (e1 )a is normalized, we have
u a = u(e1 )a , and thus the above equation becomes

(e0 )a = γ (e0 )a + γ u(e1 )a . (6.6.6)


This is the expansion of (e0 )a in the orthonormal frame {(eμ )a }. Now suppose the expansion
of (e1 )a is
(e1 )a = α(e0 )a + β(e1 )a (α, β to be determined) ,
It follows from ηab (e1 )a (e0 )b = 0 and ηab (e1 )a (e1 )b = 1 that β = γ , α = γ u, and hence

(e1 )a = γ u(e0 )a + γ (e1 )a . (6.6.7)


Equations (6.6.6) and (6.6.7) together with (e2 )a = (e2 )a and (e3 )a = (e3 )a are the trans-
formation relations of the two orthonormal frames {(eμ  )a } and {(e )a }, using which it is
μ
easy to prove (6.6.5). Take E 2 for example:

E 2 = F20

= Fab (e2 )a (e0 )b = Fab (e2 )a [γ (e0 )b + γ u(e1 )b ]
= γ (F20 + u F21 ) = γ (E 2 − u B3 ) .

[The End of Optional Reading 6.6.1]

The sources of the electromagnetic field are electric charges and electric currents.
In the 4-dimensional language, the continuously distributed electric charges and cur-
rents can be viewed as a dust formed by a large amount of charged particles [see
6.6 Electrodynamics 219

Fig. 6.38 The relationship


between two orthonormal
frames

Fig. 6.39 The volumes V0


and V measured by the
comoving observer U a and a
non-comoving observer Z a
are different

Synge (1956), Chap. VIII Sect. 10, Chap. X Sect. 7]. To simplify the question, we
only talk about the case where all the charged particles are of the same kind (e.g.,
they are all electrons), whose electric charge is e.6 Let U a represents the 4-velocity
field of this charged dust, then ( p, U a ) is the instantaneous comoving observer at p.
Suppose there are N charged particles in the small volume V0 of the local surface
of simultaneity perpendicular to U a , then η0 = N /V0 is the particle number den-
sity measured by the comoving observer (called the proper number density). Let
( p, Z a ) be an arbitrary instantaneous observer at p. This observer will regard the
particle as in motion, i.e., will see a current, as long as it is not a comoving observer
(as long as Z a = U a ). Suppose the N particles above take a volume V in the local
surface of simultaneity of ( p, Z a ) perpendicular to Z a (see Fig. 6.39), then from the
Lorentz contraction we know that V0 = γ V , where γ ≡ −Z a Ua , and thus the particle
number density measured by the observer ( p, Z a ) is η = N /V = γ N /V0 = γ η0 .
Therefore, ρ0 ≡ eη0 and ρ ≡ eη are, respectively, the charge density observed by
the comoving observer ( p, U a ) and that observed by an arbitrary observer ( p, Z a ),
which have the relation ρ = γρ0 . Suppose u a is the 3-velocity of the charged particle
relative to ( p, Z a ), then j a := ρu a is the 3-current density measured by ( p, Z a ).
The 3-current density measured by the comoving observer is zero.

Definition 2 The 4-current density of a stream of charged particles is defined as

J a := ρ0 U a . (6.6.8)

6 This simplification does not affect the essence of the problem. What is important is that they form
a stream of particles, and unlike gas molecules which move randomly in all the directions, the value
of its 4-velocity field U a at each spacetime point is the 4-velocity of the dust particle whose world
line passes through this point.
220 6 Special Relativity

Proposition 6.6.4 J a can be 3 + 1-decomposed by means of an instantaneous


observer ( p, (eμ )a ) as follows:

Ja = ρ Za + ja . (6.6.9)

Proof J a = ρ0 U a = ρ0 γ (Z a + u a ) = ρ Z a + ρu a = ρ Z a + j a . 

Thus, the charge density ρ and 3-current density j a are respectively the time com-
ponent J 0 and spatial projection h a b J b of the 4-current density. The equation above
can also be expressed as
ρ = −Z a J a , ji = Ji .

Like mass, electric charge is also a physical quantity that describes an intrinsic
property of a charged particle. The charged particles and electric charges remain the
same when they are not involved in any interaction. When they are interacting with
other particles, the total charge must be the same before and after the interaction.
This is the law of conservation of charge, which is a result confirmed by all the
experiments so far. In the 3-dimensional language of electrodynamics, this law is
expressed as the continuity equation: (∂ρ/∂t) + ∇ · j = 0 (for any inertial frame).
It is not difficult to see that the corresponding 4-dimensional expression is ∂a J a = 0.

6.6.2 Maxwell’s Equations

In electrodynamics textbooks, the equations of motion of E and B are the well-


known Maxwell equations. From these equations one can derive the 4-dimensional
formulation of Maxwell’s equations

∂ a Fab = −4π Jb , (6.6.10)


∂[a Fbc] = 0 . (6.6.11)

In our current framework, we will treat the above two equations as the starting point,
i.e., we will assume the electromagnetic field tensor obeys (6.6.10) and (6.6.11).
Note that (6.6.10) already contains the law of conservation of charge, since from it
we get
∂ b Jb = −(4π )−1 ∂ b ∂ a Fab = −(4π )−1 ∂ (b ∂ a) F[ab] = 0 ,

and thus ∂ρ/∂t + ∇ · j = 0, which is exactly the conservation of charge.


Proposition 6.6.5 For any inertial frame {t, x, y, z}, from (6.6.10) and (6.6.11) one
can derive the 3-dimensional formulation of Maxwell’s equations
6.6 Electrodynamics 221

∂B
(a) ∇ · E = 4πρ , (b) ∇ × E = − ,
∂t (6.6.12)
∂E
(c) ∇ · B = 0 , (d) ∇ × B = 4π j + .
∂t
The first and fourth equations here correspond to (6.6.10), and the second and third
equations correspond to (6.6.11).
Remark 1 Here we adopt the geometrized Gaussian unit system (see Appendix A), in
which the coefficients of the 3-dimensional Maxwell equations are slightly different
from the common form.

Proof Let δab represent the (induced) Euclidean metric on a constant-t surface of the
chosen inertial frame, and let ∂ˆa and ∂a represent the derivative operators associated
with the metrics δab and ηab , respectively. Setting Z a ≡ (∂/∂t)a , and noticing that
the spatial vector E a satisfies E 0 = 0, we have

∂ Ei
∇ · E = ∂ˆ a E a = = ∂ a E a = ∂ a (Fab Z b ) = Z b (−4π Jb ) = 4πρ .
∂xi

This is (6.6.12)(a). Now we prove (6.6.12)(b). Suppose ε̂abc is the volume element
associated with δab on the constant-t surface, then from (c) of (5.6.5) we know that

(∇ × E)c = ε̂ab c ∂ˆa E b , (6.6.13)

where ∂ˆa E b can be expressed as [according to (3.1.9)]

∂ˆa E b = (dx i )a (dx j )b ∂ˆi E j = (dx i )a (dx j )b ∂i E j . (6.6.14)

On the other hand, E 0 = 0 leads to

∂a E b = (dx μ )a (dx j )b ∂μ E j = (dx 0 )a (dx j )b ∂0 E j + (dx i )a (dx j )b ∂i E j .

Comparing the projection of the above equation on the constant-t surface with
(6.6.14), and noticing that the projection of (dx 0 )a vanishes and the projection of
(dx i )a are themselves, we have

∂ˆa E b = h a d h b e ∂d E e . (6.6.15)

Since ε̂ab c is a spatial tensor, its projection is equal to itself. Plugging (6.6.15) into
(6.6.13) yields
(∇ × E)c = ε̂ab c h a d h b e ∂d E e = ε̂de c ∂d E e ,

and hence

(∇ × E)c = ε̂ab c ∂a E b = ε̂ab c ∂a (Fbe Z e ) = Z e ε̂ab c ∂a Fbe = −Z e ε̂ab c ∂e Fab − Z e ε̂ab c ∂b Fea ,
222 6 Special Relativity

where in the last step we used (6.6.11) and the antisymmetry of Fab . Also, the second
term of the right-hand side of this equation is equal to −ε̂ab c ∂a (Fbe Z e ), i.e., is equal
to −(∇ × E)c , and hence

2(∇ × E)c = −Z e ε̂ab c ∂e Fab . (6.6.16)

Suppose εabcd is the volume element associated with ηab , then it follows from (5.5.6)
that
ε̂cab = Z d εdcab , (6.6.17)

and hence (6.6.16) becomes

2(∇ × E)c = −Z e Z d εdc ab ∂e Fab = −Z e ∂e (εdc ab Fab Z d ) = −Z e ∂e (2∗ Fdc Z d ) = −2Z e ∂e Bc .

Thus,  c
∂ ∂ Bi
(∇ × E)i = (∇ × E)c = −Z e ∂e Bi = − ,
∂xi ∂t

and therefore
∂B
∇×E =− .
∂t
The derivation of the other two Maxwell’s equations are left to the reader in Exer-
cise 6.16. 
Remark 2 The 4-dimensional formulation of Maxwell’s equations is explicitly
Lorentz covariant, and is independent of the reference frame. The 3-dimensional
formulation of Maxwell’s equations is also Lorentz covariant, but it is not obvious
to see. Also, the 3-dimensional formulation only holds for inertial frames; for a non-
inertial frame, the equations derived from (6.6.10) and (6.6.11) will be different from
the regular 3-dimensional Maxwell equations.
[Optional Reading 6.6.2]
As a volume element associated with the induced metric h ab = ηab + Z a Z b on the
constant-t surface, ε̂cab can only be determined up to a minus sign (see the end of Optional
Reading 5.5.1), i.e., −Z d εdcab can also be taken as ε̂cab . Only after we take the orientation of
the constant-t surface into consideration can ε̂cab be uniquely determined as Z d εdcab . Unlike
the situation when we discuss Gauss’s theorem, here there does not naturally exist a manifold
N with boundary such that the constant-t surface can be treated as the boundary ∂ N , and
thus one cannot say whether its normal vector Z a is ingoing or outgoing. Equivalently, the
constant-t surface now does not have any induced orientation. The reason we write ε̂cab as
Z d εdcab rather than −Z d εdcab is based on the following consideration: the 3-dimensional
formulation of Maxwell’s equation ∇ × E = −∂ B/∂t involves curl, and the condition for
it to hold is that the chosen Cartesian coordinate system {x, y, z} is right-handed (otherwise
we have ∇ × E = ∂ B/∂t), i.e., the spatial orientation needs to be compatible with dx ∧
dy ∧ dz. Noting that εdcab = (dt)d ∧ (dx)a ∧ (dy)b ∧ (dz)c and Z d = (∂/∂t)d , we know
that the volume element ε̂cab = (dx)a ∧ (dy)b ∧ (dz)c , which is compatible with the needed
orientation.
[The End of Optional Reading 6.6.2]
6.6 Electrodynamics 223

6.6.3 Lorentz 4-Force

As we have pointed out previously, charged particles are the sources of the electro-
magnetic field (manifested by J a ), whose effect on the electromagnetic field Fab is
reflected by (6.6.10). Conversely, there are also forces exerted from the electromag-
netic field on the charged particles, namely the Lorentz force

f = q( E + u × B) , (6.6.18)

where q and u represent respectively the electric charge and 3-velocity of the point
mass. Combining the above equation and the definition of the 3-force f = d p/dt
yields the equation of motion of a charged particle in an electromagnetic field (assum-
ing no other force)
dp
= q( E + u × B) . (6.6.19)
dt
It should be pointed out that the equation above is Lorentz covariant (although it is
hard to see explicitly), which is also a manifestation of the conclusion “Maxwell’s
theory of electromagnetism is endowed with Lorentz covariance”. That is, for another
inertial frame R  , the equation of motion of the same point mass will have the same
form as (6.6.19), only the quantities that depend on the reference frame need to be
labeled by  , i.e.,
d p
= q( E  + u  × B  ) . (6.6.19 )
dt 
Note that q does not need to be primed, since the electric charge of a point mass is
an invariant.
Proposition 6.6.6 Suppose a point mass has electric charge q, 4-velocity U a and
4-momentum P a , then the force from the electromagnetic field Fab on it (called the
Lorentz 4-Force) is

F a = q F a bU b (where F a b ≡ ηac Fcb ) . (6.6.20)

Thus, the 4-dimensional equation of motion for a point mass that only experiences
the electromagnetic force is

q F a b U b = U b ∂b P a . (6.6.21)

Proof Suppose p is a point on the world line L of a charged particle, and ( p, Z a ) is


an instantaneous observer whose orthonormal tetrad is {(eμ )a }, where (e0 )a = Z a .
All we have to prove is that the components F i and F 0 of F a in (6.6.20) with respect
to this instantaneous observer satisfy
224 6 Special Relativity

Fi = γ f i , (6.6.22)
F =γ f ·u,
0
(6.6.23)

where γ = −Z a Ua , f i is the ith component of the Lorentz 3-force, and u ≡ u a is


the 3-velocity of the point mass relative to ( p, Z a ).
It follows from (6.6.20) that

F a = γ q F a b (Z b + u b ) = γ q(E a + F a b u b ) , (6.6.24)

or
Fa = γ q(E a + Fab u b ) .

Hence,
Fi = (ei )a Fa = γ q(E i + Fi j u j ) . (6.6.25)

If we can show that


Fi j u j = (u × B)i , (6.6.26)

then from (6.6.25) and (6.6.18) we immediately obtain Fi = γ f i , namely (6.6.22).


Now we will prove (6.6.26).

1 1
(u × B)c = ε̂c ab u a Bb = ε̂c ab u a (−∗ Fbd Z d ) = ε̂c ab u a (− εbd e f Fe f Z d ) = − u a ε̂cab ε bde f Fe f Z d
2 2
1 1
= u a Z g εgcab ε de f b Fe f Z d = (−3!)u a Z g δ [d g δ e c δ f ] a Z d Fe f = −3u a Z g Z [g Fca]
2 2
= −u a Z g (Z g Fca + Z a Fgc + Z c Fag ) = Fca u a − Z c u a E a , (6.6.27)

where in the second last equality we used Fca = −Fac , and in the last equality we
used Z g Z g = −1, Fag Z g = E a and u a Z a = 0. It follows from (6.6.27) that

(u × B)i = (ei )c (u × B)c = Fi j u j ,

which is exactly (6.6.26). The second term on the right-hand side of (6.6.27) is nec-
essary, otherwise the time component of the right-hand side would be nonvanishing,
which contradicts the fact that (u × B)c on the left-hand side is spatial. Now we will
prove (6.6.23).

F 0 = (e0 )a F a = γ q(e0 )a (E a + F a b u b ) = −γ q(e0 )a Fab u b = −γ q F0i u i


= γ q E i u i = γ q[E i + (u × B)i ]u i = γ f i u i = γ f · u ,

which is (6.6.23). In the second equality we used (6.6.24), in the third equality we
used (e0 )a E a = 0 and (e0 )a = −(e0 )a , in the sixth equality we used the orthogonality
between u × B and u, and in the seventh equality we used (6.6.18). 
6.6 Electrodynamics 225

6.6.4 The Energy-Momentum Tensor of an Electromagnetic


Field

In the 3-dimensional formulation of electrodynamics, the energy density, energy flux


density, momentum density and momentum flux density (i.e., the stress tensor) of an
electromagnetic field are already clearly defined [see, e.g., Griffiths (2013) Sects. 8.1
and 8.2]. These 3-dimensional quantities can be unified into a 4-dimensional tensor
(the energy-momentum tensor Tab of an electromagnetic field) as

1 1
Tab = (Fac Fb c − ηab Fcd F cd ) , (6.6.28)
4π 4
where Fac is the electromagnetic field tensor. Using the result in Exercise 5.9, one
can also rewrite the equation above into a more symmetric form:
1
Tab = (Fac Fb c + ∗ Fac ∗ Fb c ) , (6.6.28 )

where ∗ Fac is the dual form of Fac and ∗ Fb c = ηac∗ Fba . It is not difficult to verify
that this tensor has the properties 1 and 3 of an energy-momentum tensor described
in Sect. 6.4. Especially, after choosing an arbitrary inertial frame, from (6.6.28 ) one
can easily obtain that
1
T00 = (E 2 + B 2 ) ,

and from (6.6.28) one can easily obtain that (see Exercise 6.17)

1
wi = −Ti0 = ( E × B)i , i = 1, 2, 3 ,

which are exactly the energy density and energy flux density (which also equals the
momentum density) of the electromagnetic field measured by this inertial observer.
However, the property 2 of an energy-momentum tensor in Sect. 6.4 (i.e., ∂ a Tab = 0)
needs to be clarified here. When J a = 0 (source free), one can show that ∂ a Tab =
0 from the 4-dimensional formulation of Maxwell’s equation, i.e., a source-free
electromagnetic field obeys the conservation laws of energy, momentum and angular
momentum. However, if J a = 0, then the Tab in (6.6.28) does not satisfy ∂ a Tab = 0
[Exercise 6.18(a)]. This is quite natural, since then there are interactions between
the electromagnetic field and the charged particles, which involve the exchange of
energy, momentum and angular momentum [Exercise 6.18(b)]. Nevertheless, the
total energy-momentum tensor of the electromagnetic field and charged particles is
still conserved.
226 6 Special Relativity

6.6.5 Electromagnetic 4-Potential and Its Equation of


Motion, Electromagnetic Waves

Since Fab is a 2-form, one can rewrite Maxwell’s equation (6.6.11) using the notion
of exterior differentiation as dF = 0, i.e., F is a closed form. Since the background
manifold is R4 , from Remark 1 of Sect. 5.1 we can see that F is exact, i.e., there
exists a 1-form field Aa on R4 such that F = d A, or

Fab = ∂a Ab − ∂b Aa .

Definition 3 A 1-form field Aa that satisfies F = d A is called a 4-potential of the


electromagnetic field Fab .
If we decompose Aa into the time and spatial components using an arbitrary
inertial frame {t, x i }:
Aa = −φ(dt)a + aa , (6.6.29)

then it is not difficult to show that φ and aa are respectively the scalar potential and
the 3-vector potential of the electromagnetic field F (Exercise 6.19).
When F is given, the 4-potential will not be unique. Suppose A is a 4-potential
of F, and χ is an arbitrary C 2 function on R4 , then à ≡ A + dχ is also a 4-potential
of F since ddχ = 0. This is known as the gauge freedom of the electromagnetic
4-potential. One can impose an additional condition ∂ a Aa = 0 called the Lorenz7
gauge condition. The Aa that satisfies this condition always exists, since suppose
∂ a Aa = 0, then one can always choose a function χ such that à ≡ A + dχ satisfies
∂ a Ãa = 0, and to do so χ only has to satisfy ∂ a ∂a χ = −∂ a Aa . Noticing that

∂ 2χ ∂ 2χ ∂ 2χ ∂ 2χ
∂ a ∂a χ = ηab ∂b ∂a χ = − + + + ,
∂t 2 ∂x2 ∂ y2 ∂z 2

we can see that the nonzero solutions for ∂ a ∂a χ = −∂ a Aa not only exist, but also
they are numerous.
Using the 4-potential we can reformulate Maxwell’s equations. F = d A satisfies
(6.6.11) automatically, and (6.6.10) can be expressed as

− 4π Jb = ∂ a (∂a Ab − ∂b Aa ) = ∂ a ∂a Ab − ∂b ∂ a Aa . (6.6.30)

In the second equality we used ∂ a ∂b Aa = ηac ∂c ∂b Aa = ∂c ∂b (ηac Aa ) = ∂c ∂b Ac =


∂b ∂c Ac = ∂b ∂ a Aa . Therefore, an Ab under the Lorenz gauge will satisfy the following
simple equation:
∂ a ∂a Ab = −4π Jb . (6.6.31)

7 Named after the Danish physicist Ludwig Lorenz, not to be confused with H. A. Lorentz.
6.6 Electrodynamics 227

The equation above is equivalent to the d’Alembert equation for the scalar potential
φ and the vector potential a in the 3-dimensional formulation of electrodynamics.
For a source-free electromagnetic field, this will become a wave equation

∂ a ∂a Ab = 0 . (6.6.32)

We want to find the wave solutions of the form of Ab = Cb cos θ for (6.6.32), where
θ is a real scalar field called the phase; C b is a nonvanishing constant vector field
(“constant” means ∂a C b = 0) called the polarization vector. Plugging these into
(6.6.32) yields
cos θ (∂ a θ )∂a θ + sin θ ∂ a ∂a θ = 0 , (6.6.33)

and thus all the Ab = Cb cos θ that satisfy both

(∂ a θ )∂a θ = 0 , (6.6.34)
∂ ∂a θ = 0
a
(6.6.35)

are solutions to the wave equation (6.6.32). Now we will discuss this important kind
of solution in detail.
Let K a ≡ ∂ a θ . We can expand K a in terms of the dual coordinate basis of an
inertial coordinate system:

(dθ )a = ∂a θ = K a = K μ (dx μ )a .

In the following, we only consider the simplest (which is also the most important)
case where K a is a constant vector field (∂b K a = 0). In this case K μ is a constant,
and integrating the above equation yields

θ = K μ x μ + θ0 (constant) . (6.6.36)

To see the physical meaning of K a , let us look at the 3 + 1 decomposition of K a in


the inertial frame {t, x i }:
K a = ω(∂/∂t)a + k a , (6.6.37)

where k a and ω ≡ K 0 represent the spatial and time components of K a , respectively.


Now let
ka ≡ ηab k b , ki ≡ ka (∂/∂ x i )a ,

and set θ0 = 0. Then, (6.6.36) becomes

θ = −ωt + ki x i , (6.6.38)

and hence Ab = Cb cos θ can now be expressed as


228 6 Special Relativity

Ab = Cb cos(ωt − ki x i ) . (6.6.39)

This solution agrees with the familiar expression for a monochromatic plane wave,
and therefore can be called a monochromatic electromagnetic plane wave. “Plane”
means that the surface S0 of constant phase at a given time t0 , i.e., a wavefront,
described by ωt0 − ki x i = ϕ0 (constant), is a 2-dimensional plane in R3 . Since

∂a ϕ0 = (dϕ0 )a = −ki (dxi )a = −ka ,

we see that k a is the normal vector of S0 . Physically, k a is called the wave 3-vector,
which represents the direction of wave propagation, and ω is called the angular
frequency of the wave. Therefore, K a is called the wave 4-vector.
Now we will discuss K a in the 4-dimensional language. Consider a hypersurface
S of constant phase in spacetime, i.e., S ≡ { p ∈ R4 |θ p =constant}. We can easily
see that K a is the normal covector of S (Theorem 4.4.2), and thus K a is the normal
vector of S . On the other hand, (6.6.34) indicates that K a K a = 0, and hence K a is
a null vector field and S is a null hypersurface. In addition, K a K a = 0 also gives

0 = ∂b (K a K a ) = 2K a ∂b K a = 2K a ∂b ∂a θ = 2K a ∂a ∂b θ = 2K a ∂a K b , (6.6.40)

and thus the integral curves of K a are null geodesics lying on S . Also, from (6.6.35)
we can see that ∂ a K a = 0.
Suppose 0 is the surface of simultaneity of {t, x i } at t0 . Let S0 ≡ S ∩ 0 (see
Fig. 6.40), then S0 is the set of all the points in 0 that have the same phase, namely
a wavefront at t0 in the 3-dimensional language. When K μ is a constant, S is a
3-dimensional plane (a null hyperplane) and S0 is a 2-dimensional plane, and thus
once again we see that (6.6.39) represents a plane wave. S can be interpreted as the
world sheet of a 2-dimensional wavefront, which describes the time evolution of the
wavefront (the propagation of the wave). Suppose 1 is the surface of simultaneity
at t1 (> t0 ), then after a time t1 − t0 , S0 will propagate to a new plane S1 ≡ S ∩ 1 .
The direction of the propagation is the direction orthogonal to S0 in 0 , and the
speed of the propagation is exactly the speed of light (which is a consequence of the

Fig. 6.40 A monochromatic electromagnetic plane wave. The world sheet of a wavefront S0 in the
3-language is a null hypersurface S . The integral curve of a normal vector K a of S represents the
world line of a photon
6.6 Electrodynamics 229

fact that S is a null hypersurface). The integral curves of the projection of K a onto
0 , i.e., the wave 3-vector ka , are orthogonal to S0 , which represent the direction
of the wave propagation, and thus in the 3-dimensional language can be regarded
as light rays. Therefore, the integral curves of K a can be regarded as light rays in
the 4-dimensional language. In this perspective, we can also naturally see that K a
deserves the name wave 4-vector.
Given a monochromatic plane wave, its wave 4-vector is also naturally given
(which is a constant null vector field in R4 ); however, we can see from (6.6.37) that
its angular frequency ω and wave 3-vector k a will depend on the inertial frame we
choose. That is, K a is absolute, while ω and k a are relative. Similarly, the K a at
any point p can also be decomposed in terms of an arbitrary instantaneous observer
( p, Z a ) as
K a = ωZ a + k a , (6.6.41)

where
ω = −K a Z a (6.6.42)

and k a can be interpreted as the angular frequency and the wave 3-vector measured
by this observer, respectively. From the fact that the wave 4-vector K a is null, i.e.,
K a K a = 0, we can easily see the following relation between ω and k a :

ω2 = k a ka = k 2 . (6.6.43)

The method of describing a monochromatic electromagnetic plane wave in terms


of (either 3- or 4-dimensional) light rays is called the geometric optics approxi-
mation. However, the condition for a monochromatic electromagnetic plane wave
is that the Cb in Ab = Cb cos θ and K a ≡ ∂ a θ are constant vector fields in the whole
spacetime, which is an unattainable requirement and can only be a concept in theoret-
ical models. Luckily, many electromagnetic waves in practice can be approximately
be treated as this kind of wave within a certain region of spacetime, and thus can be
approximated using geometric optics. Consider such an electromagnetic wave whose
4-potential can be expressed as Ab = Cb cos θ , where although Cb and K a ≡ ∂ a θ do
not satisfy ∂a Cb = 0 and ∂a K b = 0 their changes with respect to the spacetime point
are much “slower” than the change of the phase factor cos θ . Then, we may say that
Ab is the product of the “slowly changing” amplitude Cb and the “rapidly changing”
phase factor cos θ . Let L̃ represent such a characteristic length that the change of Cb
or K a can only be observed when the spacetime scale is at least on the same order as
L̃. Then, in a spacetime region U whose scale is smaller than L̃ but cos θ has changed
by many periods in it, we can deal with this kind of electromagnetic wave using the
geometric optics approximation [in the 3-dimensional language, this is to say that the
spatial scale is much larger than the wavelength λ ≡ 2π/ω (where ω ≡ −Z a K a )].
An electromagnetic wave satisfying this condition is called a locally monochro-
matic plane wave. The idea of geometric optics is to describe the propagation of an
electromagnetic wave using light rays. This idea exhibits the particle nature of light,
which encourages us to describe the propagation of an electromagnetic wave using
230 6 Special Relativity

the terminology of photons. Properly, a photon is a quanta of the electromagnetic


field in the theory of quantum electrodynamics (QED); classical electrodynamics
can be viewed as the limit of quantum electrodynamics when the Planck constant
h → 0.8 Based on this description, a locally monochromatic electromagnetic plane
wave can be considered as a stream of photons, whose K a and C a are almost all
the same. A photon can be imagined as a particle similar to a regular point mass,
except that its mass m = 0. Now that the 4-momentum of a point mass defined by
(6.3.32) no longer applies, we can instead use the corresponding wave 4-vector of
the electromagnetic wave to define the 4-momentum of a photon as follows:

P a := K a (where  ≡ h/2π ) , (6.6.44)

and stipulate that the world line of the photon is a null geodesic such that its affine
parameter β satisfies
P a = (∂/∂β)a . (6.6.45)

Therefore, the world lines of the photons coincide with the integral curves of the
wave 4-vector of the corresponding electromagnetic wave. In terms of the 3 + 1
decomposition, we can follow that of a massive particle and define the time and
spatial components of a photon’s 4-momentum as the energy E and the 3-momentum
pa of the photon, respectively, i.e.,

P a = E Z a + pa . (6.6.46)

Noticing (6.6.44), we can compare the above equation with (6.6.41) and obtain

E = ω , pa = k a , (6.6.47)

i.e., the energy E and the 3-momentum pa of a photon are respectively proportional
to the angular frequency ω and the wave 3-vector k a of the corresponding electro-
magnetic wave, with a coefficient . From P a Pa = 0 one can easily see that the
energy E and the magnitude p of the 3-momentum pa has the following simple
relation:
E 2 = pa pa = p 2 . (6.6.48)

[Optional Reading 6.6.3]


Equation (6.6.39) indicates that the 4-potential Ab propagates in the manner of a
monochromatic plane wave, from which it is not difficult to show that the electric field E and
the magnetic field B corresponding to an inertial frame R also propagate as monochromatic
plane waves, and from which we can also find some important properties of the E wave

8 Note that a “photon” in the geometric optics approximation is still a classical concept since there
is no procedure of quantization. A key difference between a QED photon and this classical limit
is that the QED photon is not localizable, whereas the classical counterpart follows a specific ray
path.
6.6 Electrodynamics 231

and B wave. To proceed, we plug Ab = Cb cos θ into Fab = ∂a Ab − ∂b Aa . Noticing that


K a ≡ ∂a θ, we have

Fab = (Ca K b − Cb K a ) sin θ = 2C[a K b] sin θ . (6.6.49)


Using the gauge freedom of Ab , we can simplify the computation of finding E a and Ba from
Fab . Choose the Lorenz gauge condition ∂ b Ab = 0. Combining this with Ab = Cb cos θ and
K a ≡ ∂ a θ, we get K a Ca sin θ = 0, and thus

K a Ca = 0 . (6.6.50)
This is in fact an equivalent formulation for Aa satisfying the Lorenz gauge condition. Now
let
Ca = Ca + α K a (α = constant) , (6.6.51)
then from K a K a = 0 and (6.6.49) we can easily see that the electromagnetic field Fab 
 = F , and thus (6.6.51) is just a gauge transformation.
corresponding to Ca satisfies Fab ab
[It follows from K a Ca = 0 that (6.6.51) guarantees K a Ca = 0, and so it is also a gauge
transformation within the Lorenz gauge condition]. Using the fact that the time component
K 0 of K a is nonvanishing, we can choose α = −C0 /K 0 so that C0 = 0. Thus, one can
always choose a proper gauge and render the polarization vector C a a spatial vector. Later
on we will assume the fact that C a is a spatial vector.
Let Z a = (∂/∂t)a represent the zeroth coordinate basis vector of an inertial frame, then from
E a = Fab Z b and Ba = −∗ Fab Z b we can derive from (6.6.49) that

E a = Z b (Ca K b − Cb K a ) sin θ = −ωCa sin θ

[where in the second equality we used the facts that C a is spatial (Z b Cb = 0) and ω =
−Z b K b ] and also
1 1
Ba = −∗ Fab Z b = − Z b εabcd F cd = ε̂acd 2C [c K d] sin θ = ε̂acd C c K d sin θ ,
2 2
where ε̂ is the volume element associated with the spatial Euclidean metric. The above two
equations can be expressed in terms of “arrows” as

E = −ωC sin θ = ωC sin(ωt − ki x i ) , (6.6.52)


B = C × k sin θ = k × C sin(ωt − ki x ) . i
(6.6.53)
Therefore,
B = k̂ × E , (6.6.54)
where k̂ stands for the unit vector in the direction of k. This is exactly the often seen relation
of the electric field E, magnetic field B and the direction k̂ of propagation.
Since C0 = 0, the condition K a Ca = 0 can now be rewritten as k a Ca = 0, and thus the
3-vector C is perpendicular to k. And from (6.6.52) we also know that E is parallel to C, and
hence the electric field E is perpendicular to the direction k̂, i.e., the E wave is transverse.
On the other hand, from (6.6.54) we can see that B is perpendicular to both k̂ and E, and
hence the B wave is also transverse. Conclusion: in a monochromatic electromagnetic plane
wave, both the E wave and B wave are transverse waves with the same frequency and phase,
and the vectors E, B and k̂ have the simple relation (6.6.54).
Since C and k are both constant vector fields, (6.6.52) and (6.6.53) represent linearly polar-
ized light. To discuss the other polarizations, it is convenient to adopt the complex represen-
tation. First, we rewrite Ab = Cb cos θ as
232 6 Special Relativity

Ab = Re(Cb eiθ ) (Re stands for “take the real part”) , (6.6.55)
and then generalize C a to a constant complex vector field. This method will provide to
us even richer physics. The previous proof of K a Ca = 0 and the argument that C a can be
chosen to be a spatial vector field are still valid when C a is complex, and thus the discussions
and conclusions based on them still hold (including the transverse property of E and B).
The key consequence of C a being complex is that the linearly polarized light is generalized
to elliptically polarized light. Here we will only discuss the electric field E as an example.
Now (6.6.52) should be expressed as

E = Re[iωCe−i(ωt−ki x ) ] , (6.6.52 )
i

where C is now a complex vector. Let


i
ε ≡ iωCeiki x , (6.6.56)
then (6.6.52 ) becomes
E = Re(εe−iωt ). (6.6.57)
For an arbitrary observer G 0 in the frame R , ε is a fixed vector while E changes with time
according to (6.6.57). Thus, the end point of the vector (arrow) E draws a closed plane curve;
we will show that it is an ellipse. Express the complex vector field ε as the sum of its real
and imaginary parts:

ε = μ + iν (where μ and ν are real vector fields) . (6.6.58)


Let β be an arbitrary real scalar field, and define real vector fields

m ≡ μ cos β + ν sin β and n ≡ −μ sin β + ν cos β , (6.6.59)


then
ε = μ + iν = (m + in)eiβ . (6.6.60)
The advantage of introducing β is that we can choose its value such that m and n are
orthogonal to each other, and to do so β just needs to satisfy

2μ · ν
tan 2β = , (6.6.61)
μ2 − ν 2

where μ2 ≡ μ · μ, ν 2 ≡ ν · ν, and we have supposed μ2  ν 2 (without loss of generality).


Plugging (6.6.60) into (6.6.57) yields

E = m cos(ωt − β) + n sin(ωt − β) . (6.6.62)


Using the orthogonality between m and n we can choose an inertial coordinate system {t, x i }
in the inertial reference frame R according to the following two requirements: ① Let G 0 be
the origin of the spatial coordinates; ② the x- and y-axes point to the directions of m and n,
respectively. Then, the three coordinate components of E are accordingly,

E 1 = m cos(ωt − β) , E 2 = n sin(ωt − β) , E3 = 0 , (6.6.63)


where m ≡ (m · m)1/2 , n ≡ (n · n)1/2 . From these we can easily find that

E 12 E2
2
+ 22 = 1 . (6.6.64)
m n
6.6 Electrodynamics 233

Fig. 6.41 The figure for


discussing the Doppler effect
on a light wave

Thus, as time goes on, the end point of the vector E will draw an ellipse in the x y-plane, and
therefore E indeed represents elliptically polarized light. When m = n, it becomes circularly
polarized light, and when m or n is zero, it goes back to linearly polarized light.
[The End of Optional Reading 6.6.3]

6.6.6 The Doppler Effect on a Light Wave

With the knowledge above (especially the 3 + 1 decomposition of the 4-velocity U a


and the wave 4-vector K a ), the discussion of the Doppler effect on a light wave in
special relativity now becomes very accessible.
Suppose an observer and a light source are undergoing arbitrary motions (their
world lines are arbitrary timelike curves), and their 4-velocities are U a and V a (see
Fig. 6.41), respectively. The light emitted at p by the light source is received at q by
the observer. Assume that this light is a locally monochromatic plane wave (apply
the geometric optics approximation). Suppose the wave 4-vector of the photon is
K a , then from (6.6.42) we can see that the angular frequency measured by V a when
emitting the light is ω = (−K a Va )| p , and the angular frequency measured by U a
when receiving the light is ω = (−K a Ua )| p . Now let us find the relation between
ω and ω .
Since in flat spacetime we have the notion of absolute parallel transport, we can
parallelly transport U a |q and K a |q to p, and from the fact that parallel transport
preserves the inner product we obtain ω = (−K a Ua )| p . Later on we will drop the
subscript p, but one should remember that the calculation is at the point p. It follows
from (6.6.41) that
K a = ωV a + k a .

Let
γ ≡ −V a Ua ,
234 6 Special Relativity

then
U a = γ V a + γ ua ,

where γ u a is the projection of U a onto the “spatial small plane” of ( p, V a ). Hence,

ω = −(ωV a + k a )(γ Va + γ u a ) = γ (ω − k a u a ) .

Suppose the angle between the spatial vectors k a and u a is θ . It follows from (6.6.43)
that
ω = γ ω(1 − u cos θ ) . (6.6.65)

This is the quantitative relation of the Doppler effect. If θ = 0, i.e., the observer
moves away from the light source, then (6.6.65) gives

 1−u
ω = γ ω(1 − u) = ω < ω, (6.6.66a)
1+u

which represents a redshift; if θ = π , i.e., the observer moves towards the light
source, then 
1+u
ω = γ ω(1 + u) = ω > ω, (6.6.66b)
1−u

which represents a blueshift; if θ = π/2, i.e., the observer moves transversely, then
the relation of the frequencies is
ω = γ ω , (6.6.66c)

which is called the transverse Doppler effect. The above are all the Doppler effects
for a rest light source, following which one can also discuss the Doppler effects for
a rest observer (Exercise 6.20).

Exercises

˜6.1. The relative speed between two inertial observers is u = 0.6c. Both of their
clocks C and C  are zeroed when they meet each other. Use a spacetime
diagram to discuss the following questions: (a) In the inertial reference frame
of C (according to its judgement of simultaneity), what is the reading of C 
when the reading of C is 5 µs? (b) When the reading of C is 5 µs, what is the
actual reading of C  seen by the observer carrying C?
˜6.2. A celestial object is moving away from us with a constant speed 0.8c straight
forward. The light flash it radiates has a period of 5 days when detected by
us. Using a spacetime diagram, find the period of the light flash measured by
an observer on that celestial object.
6.6 Electrodynamics 235

Fig. 6.42 Figure for


Exercise 6.4

˜6.3. Denote the arc length of the segments oa and oe in Fig. 6.20 as τ and τ  ,
respectively. (a) Express τ  /τ in terms of the relative speed of the two clocks.
(b) Find the value of τ  /τ in the cases where u = 0.6c and u = 0.8c.
6.4. Three inertial point masses A, B and C are aligned and moving along a
straight line (see Fig. 6.42) with relative speeds u B A = 0.6c and u C A = 0.8c.
Suppose B thinks (measures) that C moves 60 m. Make a spacetime diagram
and find the time of this process measured by A.
˜6.5. A and B are two inertial observers in the same inertial frame that are emitting
neutrons toward each other. Each neutron leaves its neutron source at a relative
speed of 0.6c. Suppose the emission rate of the source B measured by B is
104 s−1 (i.e., 104 per second). Using a spacetime diagram, find the emission
rate of the source B measured in the reference frame of a neutron emitted by
A (according to the neutron’s standard clock).
˜6.6. The mean lifetime of rest muons is τ0 = 2 × 10−6 s. A muon produced by
cosmic rays is traveling down with a constant speed 0.995c relative to the
Earth. Using a spacetime diagram, find (a) the mean lifetime of the muon
measured by an Earth observer; (b) the distance that muon travels within its
lifetime measured by an Earth observer.
6.7. From the perspective of an inertial frame R, two standard clocks C1 and C2
at a place A start to move together with a constant speed v = 0.6c after being
zeroed. Both of the clocks arrive at another place B when their reading is 1 s.
C1 turns back to A with a constant speed v right after it arrives at B, while
C2 stays at B for 1 s (according to its reading) and then gets back to A with a
constant speed v. There is another clock C3 staying at A all the time, which
is also zeroed at the time when C1 and C2 leave A. (a) Sketch the world line
of C1 , C2 and C3 . (b) Find the readings τ1 , τ2 and τ3 of these three clocks
when C2 gets back to A.
˜6.8. (Multiple choice). A pair of twins A and B stand still at the same spatial point
in an inertial frame R. At some moment when A and B are the same age, A
starts to move eastward under an inertial motion with a speed u relative to
the frame R. A while later, B also moves eastward and catches up A with a
speed v > u. When they meet each other again, A will be
(1) older than B, (2) younger than B, (3) the same age as B.
˜6.9. Two standard clocks A and B stand still at the same spatial point in an
inertial frame. At some moment, A starts to move in a straight line with a
speed u = 0.6c. 2 s later (according to the clock A), A turns around and
moves back with a speed u = 0.6c. Both of the clocks are zeroed when they
are separated. (1) Find the readings of both clocks when they meet again. (2)
What is the reading of B viewed by A when A’s reading is 3 s.
236 6 Special Relativity

˜6.10. The equatorial speed of the Earth’s rotation is about 1600 km/h. A and B are
twins standing on the equator. A flies eastward by plane along the equator
for one lap in a speed of 1600 km/h and meets B again when he gets back.
(Ignore the effects of the gravitational fields of the Earth and the Sun. We
will see in Chap. 7 that the existence of gravitational fields corresponds to
a curved spacetime). (a) Sketch the world sheet of the Earth’s surface and
the world lines of A and B (note that the motion of A cancels the Earth’s
rotation, and thus A is the inertial observer). (b) Which one of A and B is
younger? (c) What is their age difference? (Answer: about 10−7 s). NB: This
experiment has been done in 1971 using cesium atomic clocks, not humans,
of course. See Hafele and Keating (1972a; 1972b).
˜6.11. A car whose rest length is l = 5 m moves into a garage with a constant
speed u = 0.6c. The garage has a solid back wall. To simplify the problem,
we assume the information of the car’s front hitting the wall propagates
in the speed of light, and each part of the car will stop once receiving this
information. (a) Suppose the doorman of the garage measures that the reading
of a clock C at the back of the car is zero, find the reading of C when the
back of the car “learns” that the front hits the wall. (b) Find the rest length lˆ
ˆ in terms
of the car after it comes to a complete stop. (c) Express the ratio l/l
of u.
6.12. Prove Proposition 6.3.4.
˜6.13. Suppose the world line of an observer is a hyperbola G in the t x-plane
(see Fig. 6.43), which satisfies x > 0 and x 2 − t 2 = K 2 (K is a constant).
Find Aa Aa , i.e., the magnitude square of the observer’s 4-acceleration Aa .
(The result is a constant, and thus G is called an observer undergoing con-
stant acceleration motion. Note that the acceleration here refers to the 4-
acceleration).
˜6.14. Prove Proposition 6.6.2.
*6.15. Suppose the electric field and the magnetic field measured from Fab by an
instantaneous observer are respectively E a and B a (also denoted by E and
B). Show that:
(1) Fab F ab = 2(B 2 − E 2 ),
(2) Fab ∗ F ab = 4 E · B. Hint: one may write Fab ∗ F ab as the expression for
the components in terms of an inertial coordinate system.
NB: this problem indicates that, although E and B are observer-dependent,
B 2 − E 2 and E · B are independent of the observer. In fact, these are the
only two independent invariants one can construct from Fab .
˜6.16. Prove Proposition 6.6.5 (one only needs to prove the last two Maxwell’s
equations).
˜6.17. Show that the energy density and the 3-momentum density of an electromag-
netic field measured by an instantaneous observer are respectively T00 =
References 237

Fig. 6.43 Figure for


Exercise 6.13

(E 2 + B 2 )/8π and wi = −Ti0 = ( E × B)i /4π , i = 1, 2, 3. Hint: using the


symmetric expression (6.6.28 ) for Tab , one can simplify the calculation of
T00 .
6.18 (a) Show that the energy-momentum tensor Tab for an electromagnetic field
Fab whose 4-current density is J a satisfies ∂ a Tab = −Fbc J c . (Thus, we can
see that ∂ a Tab = 0 when J a = 0).

(b) Show that the time component of the above equation in an inertial coor-
dinate system reflects the conservation of energy [cf. (6.108) of Jackson
(1998)]; the spatial components reflects the conservation of 3-momentum
[cf. (6.121) of Jackson (1998)]. Hint: rewrite Fbc J c as the Lorentz force
density using the expression (6.6.20) for the Lorentz 4-force.
6.19 Show that the a a and φ in (6.6.29) satisfy B = ∇ × a and E = −∇φ −
∂ a/∂t, and thus are indeed the 3-vector potential and the scalar potential in
electrodynamics.
6.20 Discuss the Doppler effects for a rest observer by following the discussion
in Sect. 6.6.6. You will find that the frequency relation for the transverse
Doppler effect is ω = γ −1 ω.
6.21 Read Optional Reading 6.1.6. (a) Show that ∇a (dt)b = 0, where t is the
absolute time, and ∇a is the derivative operator of the Newtonian spacetime.
Hint: start from (5.7.2). (b) Suppose wa is a spatial vector (i.e., a vector
tangent to a surface of absolute simultaneity), and v a is an arbitrary 4-vector.
Show that v a ∇a w b is still a spatial vector. Hint: notice that ∇a t is the normal
covector of a surface of absolute simultaneity.

References

Griffiths, D. J. (2013), Introduction to Electrodynamics, Pearson, London.


Guo, S.-H. (2008), Electrodynamics (in Chinese), Higher Education Press, Beijing.
Hafele, J. C. and Keating, R. E. (1972a), ‘Around-the-world atomic clocks: observed relativistic
time gains’, Science 177(4044), 168–170.
Hafele, J. C. and Keating, R. E. (1972b), ‘Around-the-world atomic clocks: predicted relativistic
time gains’, Science 177(4044), 166–167.
Jackson, J. D. (1998), Classical Electrodynamics, John Wiley & Sons, Inc., New York.
238 6 Special Relativity

Landau, L. D. and Lifshitz, E. M. (1987), Fluid Mechanics, Pergamon Press, Oxford.


Misner, C., Thorne, K. and Wheeler, J. (1973), Gravitation W H Freeman and Company, San
Francisco.
Rindler, W. (1982), Introduction to Special Relativity, Clarendon Press, Oxford.
Sachs, R. K. and Wu, H. (1977), General Relativity for Mathematicians, Spinger-Verlag, New York.
Synge, J. L. (1956), Relativity: The Special Theory North-Holland Publishing Company, Amster-
dam.
Wald, R. M. (1977), Space, Time, and Gravity: The Theory of the Big Bang and Black Holes, The
University of Chicago Press, Chicago.
Zhou, G., Yan, Z., Xu, S. and Zhang, K. (2000), Fluid Mechanics (in Chinese), Vol. 1, Higher
Education Press, Beijing.
Chapter 7
Foundations of General Relativity

7.1 Gravity and Spacetime Geometry

The principle of relativity requires that the laws of physics have the same mathemat-
ical expression in all inertial coordinate systems. When applied to special relativity,
this “law of laws” requires that the mathematical expressions for the laws of physics
be Lorentz covariant. Therefore, when formulating physics in the framework of
special relativity, all the known laws of physics should be inspected; those that sat-
isfy this requirement remain laws, while those that do not must be reformed until
they meet this criterion. First, we inspect Maxwell’s theory of electromagnetism.
Maxwell’s equations are endowed with Lorentz covariance (which can be seen more
explicitly in its 4-dimensional formulation, see Sect. 6.6), and thus can be integrated
into the framework of special relativity without being reformed. This is in fact not
strange at all, since one of the important reasons special relativity came about is that
Maxwell’s theory contradicts the notion of pre-relativity spacetime. Next, we will
inspect Newton’s laws of motion. As an example, consider the law of conservation
of momentum. As we pointed out at the beginning of Sect. 6.3, if the definition of
momentum p = m u is still used, then conservation of momentum violates Lorentz
covariance and must be modified. By redefining momentum as p = m u(1 − u 2 )−1/2 ,
the law of conservation of momentum is now Lorentz covariant, making it a valid law
in the framework of special relativity. Thirdly, let us inspect Newton’s theory of uni-
versal gravity. The basic equation in Newton’s theory of gravity is Poisson’s equation
∇ 2 φ = 4πρ, which indicates the relation between the gravitational potential φ and
the mass density ρ.1 This equation has Galilean covariance but not Lorentz covari-
ance, and hence should be modified. From another perspective, Poisson’s equation
∇ 2 φ = 4πρ has a solution of the following form:

1 In Chap. 6, we used ρ and μ to represent the charge density and mass density, respectively. From
this chapter on, since the charge density will show up less frequently, we will follow the convention
of the majority and use ρ to represent the mass density.
© Science Press 2023 239
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_7
240 7 Foundations of General Relativity

ρ( r  , t) 
φ(
r , t) = 
dV ,
|
r − r|

which indicates that the gravitational potential φ at a point r and a time t is determined
by the mass density ρ at all spatial points at t. This means that the gravitational field
has an infinite speed of propagation, which obviously contradicts special relativity.
Thus, Newton’s theory of gravity must be modified.
The form of Newton’s law of universal gravity is quite similar to Coulomb’s law
of electrostatics. Since James Clerk Maxwell reformulated and generalized electro-
statics to such a beautiful theory of electromagnetism, it does not seem that it should
be difficult to reformulate Newton’s theory of gravity into a theory that fits in the
framework of special relativity. However, the situation is much more complicated
than this. The key point is, although the law of universal gravity and Coulomb’s law
are similar, there exists a “sign difference”. There are two types of electric charge
(positive and negative; like charges repel, while opposite charges attract); however,
masses can only be positive, and hence can only attract other masses. Following the
theory of electromagnetism, one might construct a gravitational theory within the
framework of special relativity, and according to this theory there will be gravita-
tional waves similar to electromagnetic waves when the gravitational field changes,
which also propagate at the speed of light. Unfortunately, due to the sign difference
we just mentioned, the energy carried away by such a gravitational wave has to be
negative. This means that the energy of a system will increase when radiating gravi-
tational waves, which will result in the intensity of the radiation increasing, bringing
more energy into the system. This cycle inevitably leads to physically absurd con-
sequences. Although this difficulty can be overcame by modifying the theory, new
difficulties will show up. In fact, there exists far from one gravitational theory in the
framework of special relativity; however, each theory has its own problems. Although
one cannot completely rule out the possibility of building a satisfying gravitational
theory in the framework of special relativity, Albert Einstein struck out on his own
and successfully created a revolutionary gravitational theory independent of special
relativity, this brand new theory is named general relativity. Interestingly, after hav-
ing tried to modify a gravitational theory in the framework of special relativity in
order to overcome its difficulties, what people obtained at last is a theory exactly the
same as Einstein’s general relativity!
There are two important factors that motivated Einstein to set up general relativ-
ity: the “universality” of gravity and Mach’s principle. Here we will only introduce
the former one. The meaning of the “universality” of Newtonian gravity is twofold:
① Every massive object exerts forces on other massive objects as a source of the
gravitational field, and any massive object in a gravitational field will in return expe-
rience a gravitational force. (A neutral object in an electrostatic field neither exerts
nor experiences any electric force, and hence the electric force is not universal). ②
Any two objects with the same initial position and velocity experiencing only a grav-
itational force must have the same position and velocity as one another at any given
moment, regardless of their mass and composition. This conclusion has been verified
by numerous increasingly precise experiments, which can be expressed as: any two
7.1 Gravity and Spacetime Geometry 241

point masses at the same point in a gravitational field have the same gravitational
acceleration. Although this is not a surprising conclusion at all, why is this so? Two
point charges in an electrostatic field are not like this. Suppose the mass of a point
 then the electric force
charge q is m, located at a place where the electric field is E,
acting on it is f = q E,
 and the acceleration it acquires is

f q
a = = E . (7.1.1)
m m

If we place another point charge q  with a mass m  at this same point, then its
 a  and a are not equal unless they have the same
acceleration will be a  = (q  /m  ) E.
charge-to-mass ratio. When having a similar discussion about gravity, we may also
distinguish the “mass” and the “charge”. The “charge” of a point mass is a measure
of the amount of matter it contains, which determines the force it experiences in
a gravitational field, and thus can be called the gravitational mass, denoted by
m G ; the “mass” of a point mass is a measure of its inertia, which determines its
acceleration when a force is applied, and thus can be called the inertial mass, denoted
by m I [i.e., the m in (7.1.1)].2 Following the discussion above it is not difficult
to determine that the gravitational acceleration of a point mass in a gravitational
field is a = (m G /m I )g , where g is the gravitational field strength at this point. If
different point masses have different mass-to-charge ratios, then they cannot have the
same gravitational acceleration at the same point in a gravitational field. However,
countless experiments, each one more precise than the last, have shown that the ratio
m G /m I is the same for any point mass; by adjusting the gravitational constant G one
can even set the ratio to 1 and make it as simple as m G = m I . This fact is usually
called the equivalence principle (see Sect. 7.5 for details). This is an extremely
unusual experimental fact which deserves serious consideration. The “charge” and
“mass” for gravity are two completely different concepts, so how could they be
equal? This question cannot be answered by Newton’s theory of gravity. In Newton’s
theory of gravity, this is admitted as an experimental fact (it is an axiom in Newton’s
formalism). Is m G = m I just a coincidence? Could there be any deeper reason hiding
underneath this fact? Could there exist a theory that is more beautiful, in which
m G = m I can be proved by reasoning? Pondering over the equivalence principle,
in addition to the inspiration from Mach’s principle, led Einstein to the creation of
general relativity.
The fact that m G = m I is equivalent to the fact that all the objects in a gravitational
field that experience no force other than gravity and have the same position and
velocity will “march together”. This kind of characterless collective behavior strongly
implies that gravity is an intrinsic property of the whole spacetime background, which

2 Up to now, we have been discussing in terms of Newtonian gravity. In Newton’s theory of gravity,
there are two types of gravitational mass: active and passive. The former refers to the mass of an
object as a source of its gravitational field, which determines the strength of the gravitational field
it produces; the latter refers to the gravitational mass of the object as a test point mass in an external
gravitational field, which determines the strength of the gravitational force it experiences in a given
gravitational field. The gravitational mass in the main text refers to the passive gravitational mass.
242 7 Foundations of General Relativity

is substantially different from all the other forces. Physics is the study of the motion
(evolution) of physical objects. Physical objects can be compared to actors. Just like
the performance of actors cannot be done without a stage, the evolution of physical
objects also always happens on some kind of stage (or background), and this stage
(background) is spacetime. Before general relativity came out, people used to assume
that the background spacetime of relativity is Minkowski spacetime. Minkowski
spacetime is so simple that people often forgot that it exists. The “marching together”
phenomenon in the gravitational field attracted Einstein’s attention to the spacetime
background. Just like the actors on a lifting stage can be raised simultaneously without
any effort due to the behavior of the stage itself, this “marching together” phenomenon
under the gravitational force rather strongly implies that gravity is purely an effect
of spacetime background. One may speculate as follows: when gravity is negligible,
then the spacetime is flat; when gravity is non-negligible (e.g., when the gravitational
field of the Earth or the Sun must be considered), the spacetime becomes curved,
and how it is curved depends on the distribution of the matter which produces the
gravitational field. According to this hypothesis, gravity is so distinct from other
forces that in the 4-dimensional language it is not even a force, but the effect of
curved spacetime! Therefore, a point mass that experiences no force other than gravity
should be called a free point mass. Recalling that the world line of a free point mass in
Minkowski spacetime is a geodesic, one can further assume naturally that the world
line of a free point mass in curved spacetime is also a geodesic (of that spacetime).3
A free point mass is the simplest point mass and a geodesic is the simplest world
line, and thus this assumption is also in conformity with aesthetic principles. Instead
of a 4-dimensional force called “gravity” exerting on a point mass, the existence of
gravity is manifested by a curved spacetime, which changes the motion of a point
mass by changing its geodesic. This is the most basic postulate of general relativity.
Based on this postulate, one can deduce m G = m I as a logical consequence. (Here
we come to the decisive step). Suppose two free point masses have the same initial
velocity and position, i.e., their world lines intersect and their tangent vectors are
equal at the intersection. Since the world line of a free point masses is a geodesic,
which is uniquely determined by the initial conditions, i.e., the starting point of the
geodesic and the tangent vector there (see Theorem 3.3.4), these two world lines must
coincide. Translating to the language of physics, this is to say that the states of two
free point masses with the same initial condition in a gravitational field must be the
same at any time later, which is exactly an equivalent expression for m G = m I . Thus,
once realizing that gravity in essence is the curvature of 4-dimensional spacetime,
the experimental fact m G = m I , which was long mysterious in origin, is now a very
natural conclusion. In its unique and elegant way, general relativity interprets gravity
as a geometric effect of a 4-dimensional spacetime for the first time (which also
unifies gravity and geometry for the first time), and the key to success is adding
the time dimension. Solely using the 3-dimensional spacetime one cannot interpret
gravity as a geometric effect.

3The gravitational field produced by the point mass is ignored (similar to the treatment of a test
charge in electromagnetism).
7.1 Gravity and Spacetime Geometry 243

Remark 1 Optional Reading 8.3.2 will provide a more specific interpretation for the
statement “gravity is an effect of curved spacetime” in detail.

The discussion above indicates that general relativity is a theory independent of


the framework of special relativity. The framework of special relativity cannot fit
general relativity or gravity.
Formulated in more modern language, the most basic postulates of general rel-
ativity can be summarized as the following three points. (The basic postulates of
general relativity are summarized differently in different literature. Here is just a
pedagogical way of listing them).
(a) Gravity in the 3-dimensional space in essence is the effect of the 4-dimensional
spacetime curvature. That is, when gravity exists, the spacetime background is no
longer Minkowski spacetime (R4 , ηab ); instead, it is a curved spacetime (M, gab ),
where M is a 4-dimensional manifold and gab is a non-flat Lorentzian metric field
on M. This postulation boldly identifies gravity in physics as a pure geometric effect
of the spacetime. Based on this, a point mass that experiences no force other than
gravity is naturally a free point mass.
(b) The world line of a free point mass is a geodesic of the curved spacetime
(M, gab ) it is in. Upon postulate (a), it is pretty natural to have postulate (b). When
gravity does not exist, the spacetime background is Minkowski spacetime (R4 , ηab ).
According to Sect. 6.3, the equation of motion for a point mass is

F a = U b ∂b P a , (7.1.2)

where ∂b is the derivative operator associated with the Minkowski metric ηab . When
gravity exists, a natural assumption is to change the ∂b in the above equation to
the derivative operator ∇b associated with the curved metric gab , and to regard the
4-force on a free point mass as vanishing. Hence, its equation of motion is

0 = U b ∇b (mU a ) = mU b ∇b U a , (7.1.3)

and thus a free point mass moves along a geodesic. This proposition is very similar to
the corresponding proposition without gravity, the only difference is: when gravity
does not exist, the world line of a free point mass is a geodesic of Minkowski space;
when gravity exists, the world line of a free point mass is a geodesic in curved space-
time. This is exactly a manifestation of the fact that general relativity is independent
of special relativity. In general relativity, gravity is not represented by a 4-force on
the left-hand side of the equation of motion (7.1.2), but its effect on the motion of
a point mass is manifested by making the spacetime curved and requiring the point
mass to move along a geodesic in the curved spacetime. In other words, the effect of
gravity is substituting ∇b for ∂b on the right-hand side of (7.1.2).
(c) The way that the spacetime is curved is affected by the matter distribution. The
specific relation is described by Einstein’s equations. [For details, see Sect. 7.7; once
we have Einstein’s equations, it will be clear that (b) is not an independent postulate
any more].
244 7 Foundations of General Relativity

It can be proved that when gravity is weak enough, and the velocity of the point
mass is low enough, the calculation results of general relativity agree with those of
Newtonian mechanics approximately. Thus, Newtonian mechanics can be regarded as
the weak-field and low-speed approximation of general relativity mechanics (see Sect.
7.8.2).Nonetheless,weshouldpointoutthat,althoughtheresultsareapproximatelythe
same, the viewpoints are explicitly different. Take the free fall of an apple as an exam-
ple. According to Newtonian mechanics, this apple acquires an acceleration because
it experiences the Earth’s gravity, and thus undergoes a non-inertial motion. However,
according to general relativity, the apple does not experience a 4-force, and thus is a
free point mass. The effect from the Earth is that the spacetime becomes curved, and
the world line of the apple is a geodesic in this curved spacetime, whose 4-acceleration
(defined as Aa ≡ U b ∇b U a , whereU a is the 4-velocity and ∇b is the derivative operator
associated with the metric of the curved spacetime) is zero. That is, for the same motion
oftheapple’sfreefall,inNewton’stheoryithasa(3-dimensional)acceleration(relative
to an inertial frame), while in general relativity it does not have any (4-dimensional)
acceleration. Conversely, now suppose the apple is at rest on the ground. In Newton’s
theory, the Earth’s gravity is canceled by the normal force from the ground, and thus the
appleremainsatrestwithazero(3-dimensional)acceleration,whichundergoesaniner-
tial motion; while in general relativity, the apple only experiences one 4-dimensional
force (the normal force from the ground), and thus its world line is not a geodesic and
its (4-dimensional) acceleration is nonzero. Have you realized that while you sit cosily
reading this book, your 4-dimensional acceleration is not zero due to the curved space-
time caused by the Earth?
Attributing gravity to curved spacetime is a great triumph of human wisdom.
Bernhard Riemann presented the concept of the intrinsic curvature as well as how
to compute it when he was only 28 years old (in 1854). Before his early death (at
age 40), Riemann had attempted to find a theory that unifies electromagnetism and
gravity. The most important reason that it did not work out is that he focused on
space and the spatial curvature rather than spacetime and the spacetime curvature. It
was not until 1905 when special relativity came out that space and time were treated
equally (in fact, it was not until 1908 when Hermann Minkowski brought up the
absolute concept of spacetime, see Chap. 14 in Volume II). Finally, a few years after
that, the groundbreaking idea that “gravity in essence is the curvature of spacetime”
is gradually established along Einstein’s conception of general relativity.

7.2 Physical Laws in Curved Spacetime

In the view of general relativity, every physical phenomenon is nothing but the evolu-
tion of physical objects in some curved spacetime background (M, gab ). Therefore, to
study physics from the viewpoint of general relativity, one first needs to find the evo-
lution equations of those physical objects on the given curved spacetime background.
Since the gravitational field in practical life or in a laboratory is too weak, the differ-
ence between general relativity and Newton’s theory of gravity is normally hard to
7.2 Physical Laws in Curved Spacetime 245

be measured, and it is hopeless to deduce the physical laws in curved spacetime from
observations or experiments. Therefore, one can only “guess” these laws by making
hypotheses based on some fundamental principles, and the validity of the hypotheses
can be verified by the consistency of the conclusions derived from them as well as,
if possible, the results of the experiments. Of course, this “guess” is warranted, and
one of the important bases is the principle of general covariance. When producing
general relativity, Einstein proposed the following principle of general covariance:
the mathematical expressions for all physical laws does not change under an arbitrary
coordinate transformation. However, an article by E. Kretschmann in 1917 argued
that this formulation for the principle of general covariance imposes no restriction
on the laws of physics. Even Newton’s equation of motion can be made generally
covariant by a non-substantive reformulation [see Ohanian and Ruffini (1994)]. This
criticism triggered a heated discussion among physicists (including Einstein him-
self), and thus many different formulations for the principle of general covariance
were raised. Here we introduce a formulation as follows that not only grasps the
essence but is also convenient to apply [see Wald (1984) pp. 57, 68]:

Principle of General Covariance. The spacetime metric and quantities derivable


from it are the only background geometric quantities that are allowed to appear in
the expressions of physical laws.
Remark 1 Physical objects are like actors, while the spacetime is like the stage (back-
ground). Once a spacetime (M, gab ) is given, the actors have a stage. In physical laws,
there will certainly be physical quantities (dynamical quantities) that represents phys-
ical objects, such as the 4-momentum P a of a point mass and the electromagnetic field
tensor Fab , etc. However, physical laws can also have spacetime geometric quantities
that reflect the stage (background), which are the spacetime metric gab and quantities
derivable from it (such as the derivative operator ∇a associated with the metric gab
and its Rabc d , Rab , R, etc.). The essence of the principle of general covariance is to
eliminate all the human factors (independent of the spacetime intrinsic geometry)
in the expressions of the physical law. For instance, the ordinary derivative operator
∂a of a coordinate system or a vector field va assigned artificially cannot appear in
a physical law, since they are neither the physical objects we study nor the intrinsic
factors of the spacetime background (M, gab ). Allowing ∂a to appear in a physical
law means that the coordinate system corresponding to ∂a is in a special position
among all the coordinate systems, which is not allowed by the principle of general
covariance.
Remark 2 The above formulation of the principle of general covariance is partic-
ularly suitable for textbooks adopting the abstract index notation. Many textbooks
that do not use abstract indices have the following conclusion: Any physical law that
can be expressed as an equality of tensors must be generally covariant. For example,
suppose T and S are both tensors of type (1, 1), then the equation T = S is gener-
ally covariant since its component expressions in any two coordinate systems {x μ }
and {x μ } are obviously T μ ν = S μ ν and T μ ν = S μ ν , i.e., the component expres-
sions have the same form under any coordinate transformation (which agrees with
246 7 Foundations of General Relativity

the formulation for the principle of general covariance by Einstein). In contrast, the
Christoffel symbols  σ μν do not obey the tensor transformation law, which means
an equation that contains Christoffel symbols is not an equality of tensors, and thus
is not generally covariant. However, in textbooks that use abstract indices, even the
Christoffel symbol  c ab is regarded as a tensor (associated with a coordinate system);
the same holds for ∂a vb , the result of ∂a of a coordinate system acting on a vector
field va . The equations that contain  c ab and ∂a vb are still to be viewed as equalities
of tensors. The reason why they are not generally covariant is because they do not
satisfy the formulation for the principle of general covariance we introduced above,
since they contain quantities not derivable from gab , i.e.,  c ab and ∂a vb , which puts
the coordinate system corresponding to  c ab and ∂a vb in a special position. In a word,
both kinds of textbooks say that an equation containing Christoffel symbols is not
generally covariant, but their reasons are different (due to different formulations of
the principle of general covariance).
Based on the discussion above, we can put forward two principles that the physical
laws in curved spacetime must obey: (a) the principle of general covariance; (b) when
gab equals the Minkowski metric ηab , they should go back to the physical laws in
special relativity.4 Although these two necessary criteria cannot uniquely determine
the physical laws in curved spacetime, one can use them as guidance, together with
physical and aesthetic considerations, to acquire the physical laws naturally in many
cases. Since the difference between general relativity and special relativity is nothing
but the difference between the spacetime background [i.e., between (M, gab ) and
(R4 , ηab )], the 4-dimensional description of physical objects in special relativity
can be naturally generalized to general relativity. For instance, the world lines of
point masses and photons are still timelike and null curves, respectively (of course,
this actually already generalizes the connotation of “the principle of invariant light
speed” and “point masses must move slower than light” to general relativity); the
proper time of a point mass is still the length of its world line, the 4-velocity U a
of a point mass is still defined as the unit tangent vector of its world line, and the 4
momentum is still defined as P a := mU a (m is the rest mass); the energy of a point
mass relative to an instantaneous observer ( p, Z a ) is still defined as E := −P a Z a ,
and an electromagnetic field is still described by a 2-form field Fab , etc. In order to
find the physical laws obeyed by these physical quantities, in most of the cases one
only needs to substitute all the ηab and ∂a in the expressions for the corresponding
laws in special relativity with gab and ∇a . This method may be dubbed the “minimal
substitution rule”. It is easy to see that a formula obtained in this manner obeys the
two principles we stated above. Here are some examples of applying this rule: the
4-acceleration of a point mass in curved spacetime is defined as

Aa := U b ∇b U a , (7.2.1)

4 Principle (a) is put in the same way in all textbooks (although the formulation for the principle of
general covariance may be different); however, there are at least two ways of stating the principle
(b) in different books. The other one is: (b) the equivalence principle. With regard to the effects of
the physical laws being derived, these two ways are equivalent. For details, see Sect. 7.5.
7.2 Physical Laws in Curved Spacetime 247

and the 4-force exerting on the point mass is defined as

F a := U b ∇b P a . (7.2.2)

For a free point mass, F a = 0 (gravity is not a 4-force!), and the equation above
becomes U b ∇b U a = 0, i.e., the geodesic equation, which agrees with the basic pos-
tulate (b) of general relativity (see Sect. 7.1). For a point mass in an electromagnetic
field, its equation of motion is then

q F a b U b = U b ∇b P a . (7.2.3)

Note that the effect from the electromagnetic field Fab on the point mass is manifested
on the left-hand side of the equation (as a 4-force q F a b U b ), while the effect from
gravity on the point mass is manifested on the right-hand side of the equation (by
the derivative ∇a not being ∂a ). The equations of motion of the electromagnetic field
Fab (Maxwell’s equations in curved spacetime) should be

∇ a Fab = −4π Jb , (7.2.4)


∇[a Fbc] = 0 . (7.2.5)

The energy-momentum tensor of the electromagnetic field should be expressed as

1 1
Tab = (Fac Fb c − gab Fcd F cd ) . (7.2.6)
4π 4
Another important basis for this equation holding in curved spacetime is that it
satisfies ∇ a Tab = −Fbc J c [see Exercise 6.18 (a)], which indicates that the total
energy, momentum and angular momentum of the electromagnetic field and charged
particle field are all conserved (see the end of Sect. 6.6.4). The reader should verify
this equation.
Since (7.2.5) can be expressed as dF = 0, we can at least locally introduce an
electromagnetic 4-potential A such that F = d A, and hence (7.2.4) can be expressed
in terms of A as

− 4π Jb = ∇ a (∇a Ab − ∇b Aa ) = ∇ a ∇a Ab − ∇ a ∇b Aa . (7.2.7)

In special relativity, the second term on the right-hand side of the equation above
is −∂ a ∂b Aa , which can be easily rewritten as −∂b ∂ a Aa , and then using the Lorenz
gauge condition we can express (7.2.7) in special relativity as

∂ a ∂a Ab = −4π Jb [cf. (6.6.31)] .

However, now ∇a and ∇b do not commute, if we want to use the Lorenz condition
∇ a Aa = 0 we need to rewrite the second term on the right-hand side of (7.2.7) using
(3.4.4) as −∇ a ∇b Aa = −∇b ∇ a Aa − Rb d Ad = −Rb d Ad , which turns (7.2.7) into
248 7 Foundations of General Relativity

∇ a ∇a Ab − Rb d Ad = −4π Jb . (7.2.8)

Interestingly, if we use the minimal substitution rule directly to the equation (6.6.31)
in special relativity, we have

∇ a ∇a Ab = −4π Jb , (7.2.9)

which is obviously different from (7.2.8). This example indicates that the minimal
substitution rule does not uniquely determine the physical laws in some circum-
stances. More consideration needs to be taken when cases like this are encountered.
For this example, it can be shown that (7.2.8) leads to the law of charge conservation
∇a J a = 0 (Exercise 7.1) while (7.2.9) does not. From this physical consideration,
we choose (7.2.8) as the equation of motion of the 4-potential A. The ambiguity of
this example comes from the non-commutativity of the derivative operators, which is
a problem that all the equations containing second or higher derivatives (with two or
more ∇a acting successively) will encounter when transferred from special relativity
to general relativity. The reader may compare this with the following fact: When
transferred from classical mechanics to quantum mechanics, the non-commutativity
of the operators is also the source of ambiguity.
[Optional Reading 7.2.1]
For a source-free electromagnetic field, (7.2.8) becomes

∇ a ∇a Ab − Rb d Ad = 0 . (7.2.8 )
Inspired by the discussion at the end of Sect. 6.6.5 (before Optional Reading 6.6.5), we
want to consider a wave solution Ab = Cb cos θ of the equation above, which is a product
of the “slowly changing” amplitude Cb and the “rapidly changing” phase factor, and look
for the possibility of applying the geometric optics approximation. The difference between
(7.2.8 ) and the corresponding equation ∂ a ∂a Ab = 0 in Minkowski spacetime is that the
former contains the curvature term Rb d Ad , which needs to be negligible if we want to apply
the geometric optics approximation. Consider three length scales as follows:
(1) The characteristic length L̃ above which the change of Cb or K a ≡ ∇ a θ is notable;
(2) The length that describes the “magnitude” of the spacetime curvature

R̃ ≡ |Rμνσρ |−1/2 ,

where Rμνσρ is a typical component of Rabcd in a typical local inertial frame (see Sect. 7.5
for details);
(3) The wavelength λ (λ ≡ 2π/ω, ω ≡ −Z a K a ) of Ab relative to the local inertial frame
we mentioned above.
If these three satisfy λ  L̃ and λ  R̃, then both the derivative term ∇ a ∇a Cb and the
curvature term Rb d Ad can be neglected, and thus we have approximately

(∇ a θ)∇a θ = 0 . (7.2.10)
Hence, K a ≡ ∇ a θ is still the null normal vector of the null hypersurface S = { p ∈ R4 | θ p =
C} (C = constant), the integral curves of K a are still null geodesics (the proof is similar to
Sect. 6.6.5, note that ∇a being torsion free assures that ∇a ∇b θ = ∇b ∇a θ), a light signal still
7.3 Fermi-Walker Transport and Non-Rotating Observers 249

propagates along a null geodesic, and the angular frequency of the electromagnetic wave
(photon) relative to an observer with a 4-velocity Z a is still

ω = −K a Z a , (7.2.11)
and so on. Thus, the geometric optics approximately holds when λ  L̃ and λ  R̃. This
approximation is used in many places in this text (such as Sect. 9.2.1 and Sect. 10.2.2).
References for the geometric optics in curved spacetime are: Wald (1984) p. 71; Misner et al.
(1973) Sect. 22.5; Straumann (1984) pp. 100–103.
[The End of Optional Reading 7.2.1]

[Optional Reading 7.2.2]


Maxwell’s equations (7.2.4) and (7.2.5) in curved spacetime also have the following
formulation in terms of the exterior differentiation operator:

d∗ F = 4π ∗ J , (7.2.4 )
dF = 0 , (7.2.5 )
where ∗ F is the dual form of F ≡ Fab (see Sect. 5.6), which is still a 2-form, and ∗ J is the
dual 3-form of the 1-form Ja . The equivalence of (7.2.5 ) and (7.2.5) can be seen directly
from the definition of exterior differentiation, while the equivalence of (7.2.4 ) and (7.2.4) is a
bit more tricky to show. By definition, (d∗ F) f ab = d f (εabcd F cd /2) = 3∇[ f (εab]cd F cd )/2.
Contracting the right-hand side with εe f ab yields 3εe f ab εcdab (∇ f F cd )/2 = −3×
4δc e δd f (∇ f F cd )/2 = −6∇ f F e f , and so εe f ab (d∗ F) f ab = 6∇ f F f e . Contracting this equa-
tion again with εegcd yields −(d∗ F)gcd = εegcd ∇ f F f e . It is not difficult to see from the
definition ∗ Jgcd ≡ J e εegcd that the above equation can be expressed as (7.2.4 ) if and only
if (7.2.4) holds. Thus, (7.2.4 ) and (7.2.4) are equivalent.
[The End of Optional Reading 7.2.2]

7.3 Fermi-Walker Transport and Non-Rotating Observers

After reading Sect. 7.1, many readers may want to learn more about topics like the
equivalence principles, Einstein’s elevator, local inertial frames, and the relationship
between gravity and an inertial force. To have a precise understanding of these topics
some basic concepts will be necessary. This section will introduce an important one,
namely the concept of a non-rotating observer. (The observer in an Einstein elevator
is not only free-falling, but also non-rotating).
Imagine you are traveling around the world on an airplane. A small arrow is fixed
in front of you, perpendicular to your chest and pointing away from you. At a proper
time τ1 you take a nap, and until you wake up at τ2 , the arrow will of course still be
perpendicular to your chest, but the spatially-pointed direction can be different from
that at τ1 since the motion of the airplane is arbitrary. If the pointed direction has
changed, it is natural to say that the arrow “changed its direction” in τ ≡ τ2 − τ1 ,
or it rotated in τ . However, what does it mean by “changing its direction”? How do
we judge if the direction is changed or not? This is actually to ask: what is a rotation?
How to determine if a rotation occurs? The answer is clear in Newtonian mechanics:
250 7 Foundations of General Relativity

the axis of a gyroscope flywheel (or “a gyroscope axis” for short) represents a fixed
direction [see Sachs and Wu (1977) pp. 50, 52]. If you have a gyroscope in your hand,
the arrow and the gyroscope axis are parallel at τ1 but not parallel at τ2 , then we can
conclude that the arrow has rotated within τ . This criterion can be generalized to
general relativity.
Now we translate this criterion into the 4-dimensional language. Let G(τ ) repre-
sent your world line, then at the time τ1 the arrow is represented by a spatial vector wa
at a point p1 ≡ G(τ1 ) (“spatial” means it is perpendicular to your 4-velocity Z a | p1
at τ1 ). For convenience’s sake, we set the magnitude of wa to 1. As your proper time
flows, the arrow corresponds to a spatial vector field with unit length on the curve
G(τ ). Similarly, if we also represent the direction of the gyroscope axis at each time
using a unit vector, then the gyroscope axis corresponds to another spatial vector field
X a with unit length on G(τ ). The 3-dimensional description we mentioned before
indicates that wa and X a coincide at p1 ≡ G(τ1 ) but do not coincide at p2 ≡ G(τ2 )
(see Fig. 7.1). Since we stipulate X a to represent the non-rotating direction, we say
that wa rotated in τ ≡ τ2 − τ1 . To describe the rotating vector field wa on the world
line G(τ ), we should first describe the non-rotating vector field X a , since it is the
criterion for measuring the rotation of wa . As a non-rotating spatial vector field on
G(τ ), what mathematical property does X a have? A natural guess is: X a is a vec-
tor field parallelly transported along G(τ ). However, except for special cases, this
is not a correct guess. The key point is that the vector field parallelly transported
along G(τ ) determined by a spatial vector X a | p1 at p1 ≡ G(τ1 ) is not a spatial vec-
tor field in general. [Proof: suppose X a is parallelly transported along G(τ ), then
Z b ∇b (X a Z a ) = X a Z b ∇b Z a = X a Aa , where ∇a is the derivative operator associ-
ated with the spacetime metric gab , and Aa is the 4-acceleration of G(τ ). As long
as G(τ ) is not a geodesic, and X a is not orthogonal to Aa , then the right-hand side
of the above equation is nonzero. Hence, X a Z a is not a constant along G(τ ), and
cannot be everywhere vanishing on G(τ )]. To describe the motion of a non-rotating
spatial vector field X a along G(τ ), E. Fermi (in 1922) and A. G. Walker (in 1923)
introduced a derivative notion along a curve, which is of physical importance and
closely related to, but different from, a covariant derivative. This derivative, dubbed
the Fermi-Walker derivative, is defined as follows:

Fig. 7.1 X a and wa are both G


spatial vector fields on G(τ ). a
Z
X a represents the gyroscope
axis. The spatial rotation of wa
Wp2 p2 a
wa can be seen by comparing X
with X a
a
Z
wa = X a
Wp1
p1
7.3 Fermi-Walker Transport and Non-Rotating Observers 251

Definition 1 Suppose G(τ ) is a timelike curve5 (where τ is the proper time) in the
spacetime (M, gab ), and FG (k, l)6 represents the collection of all smooth tensor
fields of type (k, l) along G(τ ). A map D F /dτ : FG (k, l) → FG (k, l) is called a
Fermi-Walker derivative operator (or Fermi derivative for short) if it satisfies the
following conditions:

(a) Linearity ;
(b) Leibniz rule ;
(c) Commutativity with contraction ;
DF f df
(d) = ∀ f ∈ FG (0, 0) ; (7.3.1)
dτ dτ
a a
DF v Dv
(e) = + (Aa Z b − Z a Ab )vb ∀va ∈ FG (1, 0) , (7.3.2)
dτ dτ

where Z a ≡ (∂/∂τ )a represents the 4-velocity of G(τ ), Aa ≡ Z b ∇b Z a represents


the 4-acceleration of G(τ ), and Dva /dτ is another notation for Z b ∇b va , the covariant
derivative along G(τ ) (where ∇b satisfies ∇b gac = 0).

Remark 1 Condition (e) stipulates the expression for the Fermi derivative of a vector
field, and combining it with the other conditions yields the results of D F /dτ acting
on an arbitrary tensor field.

Proposition 7.3.1 The Fermi derivative has the following properties:


(1) If G(τ ) is a geodesic, then DF va /dτ = Dva /dτ ;
(2) DF Z a /dτ = 0;
(3) If wa is a spatial vector field on G(τ ) (wa Z a = 0 for each point on the world
line), then
DF wa /dτ = h a b (Dwb /dτ ) , (7.3.3)

where h ab = gab + Z a Z b , and h a b = g ac h cb is the projection map at each point


of G(τ ). This property guarantees that the Fermi derivative of a spatial vector
field is still a spatial vector field.
(4) DF gab /dτ = 0, and equivalently

DF (gab va u b )/dτ = gab va DF u b /dτ + gab u b DF va /dτ ∀va , u a ∈ FG (1, 0) .


(7.3.4)

5 We only discuss the case where G(τ ) is a non-self-intersecting curve, otherwise one will encounter
causal difficulties (see Chap. 11 in Volume II). In fact, the timelike curves representing observers
in this text are all assumed to be non-self-intersecting curves.
6 Note that we are abusing the notation here, since F (k, l) technically denotes the collection of
M
all the tensor fields of type (k, l) on the manifold M but some fields in FG (k, l) here do not lie on
the curve G.
252 7 Foundations of General Relativity

Proof Property (1) can be easily seen from (7.3.2). Property (2) can be easily proved
from (7.3.2) and the definition of Aa (using Aa Z a = 0). The proof for property (3)
is left as Exercise 7.3. The proof for property (4) is as follows:

gab va DF u b /dτ + gab u b DF va /dτ = va DF u a /dτ + u a DF va /dτ


= va (Du a /dτ + 2 A[a Z b] u b ) + u a (Dva /dτ + 2 A[a Z b] vb )
= va Du a /dτ + u a Dva /dτ + 4 A[a Z b] v(a u b) = D(va u a )/dτ = DF (gab va u b )/dτ ,

where in the last step we used (7.3.1). 

Definition 2 A vector field va is said to be Fermi-Walker transported along G(τ )


if
DF v a
= 0.

Fermi-Walker transport is also called Fermi transport for short.

Remark 2 Property (1) of the Fermi derivative indicates that Fermi transport along
a geodesic is parallel transport; property (2) indicates that the 4-velocity of G(τ ) is
always Fermi transported along G(τ ); from property (4) we can see that DF va /dτ =
0 = DF u a /dτ ⇒ d(gab va u b )/dτ = 0, which can be abbreviated as “Fermi transport
preserves the inner product”, similar to “parallel transport preserves the inner prod-
uct”.

Proposition 7.3.2 p ∈ G and va ∈ V p uniquely determine a vector field Fermi trans-


ported along G(τ ).

Proof Omitted. [The reader may refer to Sachs and Wu (1977) p. 51 and the reference
therein]. 

Remark 3 ① From the fact that Z a is Fermi transported along G(τ ) and that Fermi
transport preserves the inner product, we can see that the vector field va Fermi
transported along G(τ ) determined by a spatial vector va | p ∈ V p is everywhere per-
pendicular to Z a , and hence is a spatial vector field. ② Each basis vector of an
orthonormal tetrad (whose zeroth basis vector equals Z a | p ) at p ∈ G determines a
vector field Fermi transported along G(τ ) based on Proposition 7.3.2, and from the
fact that Fermi transport preserves the inner product we can see that these four vector
fields are orthonormal at each point of the curve. Thus, an orthonormal tetrad at p
uniquely determines an orthonormal tetrad field Fermi transported along G(τ ), in
which the zeroth basis vector field is the tangent vector field Z a along G(τ ).

Fermi transport has an important physical meaning: the necessary and sufficient
condition for a spatial vector field wa with a constant magnitude on a world line
G(τ ) to have no spatial rotation is that wa is Fermi transported along G(τ ), i.e.,
DF wa /dτ = 0 (for the reason see Proposition 7.3.6). Therefore, a gyroscope axis
(which can be viewed as a unit vector) is a spatial vector field Fermi transported along
7.3 Fermi-Walker Transport and Non-Rotating Observers 253

Fig. 7.2 A rigid motion can C1


be decomposed into a
rot
translation and a rotation ati
on
C2
a

tran w a'
slat C'2
ion
o

the world line of the gyroscope. For instance, suppose {t, x, y, z} is a Lorentzian
system of Minkowski spacetime, and G(τ ) is a t-coordinate line of this system,
then the coordinate basis vectors (∂/∂t)a , (∂/∂ x)a , (∂/∂ y)a , (∂/∂z)a are all Fermi
transported along G(τ ), and thus the latter three are non-rotating spatial vector fields
on G(τ ), which physically represent the three axes of the gyroscope (orthogonal to
each other). Conversely, if a spatial vector wa (with a constant magnitude) is not
Fermi transported along G(τ ), then it has a spatial rotation.
In order to introduce Proposition 7.3.6, we first talk about the definition of a spatial
rotation. In Newtonian mechanics, any motion of a rigid body can be decomposed
into a translation and a rotation. Figure 7.2 represents the motion of a rigid body
from a configuration C1 to another configuration C2 . This can be done in two steps:
first move to a configuration C2 by a translation, and then arrive at C2 by a rotation
with respect to a fixed point o (called the “base point”). To describe this rotation,
one can choose another point of the rigid body, whose position turns from a to a 
during the rotation. Just as the motion of the base point represents the translation of
the body, the motion of the point a (from a to a  ) represents the rotation of the body.
 be the position vector of a relative to o, then the rotation of the rigid body is
Let w(t)

manifested by dw(t)/dt = 0, and thus can be described by the rotation of the vector

w(t).  with one end fixed at o is said to be rotating if
More precisely, the vector w(t)
there exists a vector ω(t)
 such that


dw(t)
= ω(t)
 × w(t)
 , (7.3.5)
dt

where ω(t)
 is called the (instantaneous) angular velocity of the rotation. Noticing
that d(w · w)/dt
 = 2w  · dw/dt
 = 2w
 · (ω
 × w)
 = 0, we can see that a rotation pre-
serves the magnitude of a vector. From the above definition of a vector’s rotation,
one can prove using Newtonian mechanics that a gyroscope axis (as a unit vector)
is non-rotating, i.e., its ω
 = 0. Hence, a gyroscope axis represents a non-rotating
direction.
To generalize the Newtonian definition above for a rotation of a vector to special
relativity (and then to general relativity), we first rewrite (7.3.5) in terms of the
components in a Cartesian system (or physically called a Galilean system) as

dwi (t)
= εi jk ω j wk , (7.3.5 )
dt
254 7 Foundations of General Relativity

and imagine that there is an observer G at the base point o (the end of ω).  Since o
is at rest relative to an inertial frame, the world line G(τ ) of G should be a geodesic
when carried over to special relativity, and w  is a spatial vector field wa on the curve.
Let {t, x } represent the coordinates of the observer G’s inertial frame, then on G(τ )
i

we have t = τ . Hence, we have the following generalization for the definition of a


rotation in special relativity: a spatial vector field wa (τ ) on a timelike geodesic G(τ )
in Minkowski spacetime is said to be rotating if there exists a spatial vector field
ωa (τ ) on G(τ ) such that
dwi (τ )
= εi jk ω j wk , (7.3.6)

where wi and ω j are the ith and jth components of wa and ωa , respectively, in
the system {t, x i }. For any point p on G(τ ), if we lower the index of the angular
velocity vector ωa and make it an angular velocity 1-form ωa using the induced
metric h ab of W p , and use ab to represent the dual differential form of ωa in W p ,
i.e., ab ≡ (∗ ω)ab = ωc εcab (where εcab is the volume element associated with h ab ),
then ab is called the angular velocity 2-form, using which one can rewrite (7.3.6)
as
dwi
= −i j w j . (7.3.7)

Take an orthonormal spatial triad field {(ei )a } on the world line such that (e3 )a
is parallel to ωa , then ω1 = ω2 = 0, ω3 = 0, and so we can say that wa is rotat-
ing with respect to the axis (e3 )a . On the other hand, from ab = ωc εcab we know
that {ω1 = ω2 = 0, ω3 = 0} corresponds to {23 = 31 = 0, 12 = 0}, and hence
one can also say that ωa is rotating in the (1, 2)-plane (generally, a rotation in the
(i, j)-plane means that the nonzero components of ab are i j and  ji ). These two
statements are equivalent for a 3-dimensional vector space W p , but the latter one is
more convenient to be carried over to 4 dimensions. Now, it is not necessary to restrict
the spatial rotation of a spatial vector field on a geodesic in Minkowski spacetime.
Here we will generalize the definition for the “spacetime rotation” of an arbitrary
vector field on an arbitrary timelike curve in any spacetime.

Definition 3 Suppose G(τ ) is the world line (not necessarily a geodesic) of an


arbitrary observer in the spacetime (M, gab ), and va is a vector field on G(τ ) (not
necessarily a spatial vector field). If there exists a 2-form field ab on G(τ ) such that

Dva
= −ab vb , (7.3.8)

then we say that va undergoes a spacetime rotation with an angular velocity ab .
In other words, the angular velocity 2-form for the spacetime rotation of va is ab .
If Dva /dτ = 0, then we say va has no spacetime rotation.
Proposition 7.3.3 Suppose two vector fields va and u a on G(τ ) undergo the same
spacetime rotation ab , then va u a is a constant on G(τ ).
7.3 Fermi-Walker Transport and Non-Rotating Observers 255

Proof

D a Dva Du a
(v u a ) = u a + va = u a (−ab vb ) + va (−ab u b ) = −2ab v(a u b) = 0 ,
dτ dτ dτ

where the antisymmetry of ab is used in the last step. 


Proposition 7.3.3 indicates that a spacetime rotation preserves the magnitude of a
vector (which can be easily seen by taking va = u a ), and thus only a vector field
va with a constant magnitude along G(τ ) can be a vector field undergoing space-
time rotation. Conversely, one can show that (Exercise 7.4) a vector field va with a
(nonvanishing) constant magnitude on G(τ ) must undergo a spacetime rotation.
Remark 4 Suppose ω  satisfies (7.3.5), and a spatial vector λ satisfies λ
 ×w  =0
 = β w),
(i.e., there exists a coefficient β such that λ  then ω  ≡ ω
 +λ  also satisfies
(7.3.5). This reflects nothing but the following fact: no matter how w rotates, one can
always add to this rotation an additional arbitrary rotation λ  = βw with respect to
 itself, since a vector “rotating with respect to itself” is the same as non-rotating.
w
Similarly, suppose ab satisfies (7.3.8), and a 2-form ab satisfies ab vb = 0, then

ab = ab + ab also satisfies (7.3.8). This ab reflects the “gauge freedom” of
ab , i.e., to va there is essentially no difference if two ab only differ by a ab
satisfying ab vb = 0. One can choose the most convenient one among these ab in
a discussion (see Optional Reading 7.3.1). For instance, according to Definition 3,
one can say that a necessary and sufficient condition for va to have no spacetime
rotation is that its ab = 0, although there also exist many choices of nonvanishing
ab that satisfy Dva /dτ = 0. (Thus, this “necessary and sufficient condition” can
differ by a gauge transformation. The same for some other “necessary and sufficient
conditions” in this section).
A non-geodesic world line has DZ a /dτ = 0, and hence its Z a undergoes a space-
time rotation. We now would like to find the angular velocity for this spacetime
rotation.
Proposition 7.3.4 The angular velocity 2-form for the spacetime rotation of the
˜ ab = Aa ∧ Z b , where Aa is the 4-acceleration of G(τ ).
4-velocity Z a of G(τ ) is 
Proof − ˜ ab Z b = −(Aa Z b − Z a Ab )Z b = Aa = DZ a /dτ , comparing with (7.3.8)
we can see that the angular velocity 2-form for the spacetime rotation of Z a
˜ ab .
is  
From  ˜ ab = Aa ∧ Z b we can see that the spacetime rotation represented by  ˜ ab
takes place in the (Z , A )-plane, whose spatial components in the orthonormal tetrad
a a

with Z a as (e0 )a are ˜ i j = 0. Such a spacetime rotation is called a pseudo-rotation


[see Misner et al. (1973) p. 170]. Conversely, a spacetime rotation ab that only has
spatial components (i.e., 0i = 0, i = 1, 2, 3) is called a (pure) spatial rotation, or
simply a rotation when there is no confusion.
It follows from  ˜ ab = Aa ∧ Z b that the 4-acceleration Aa = 0 [i.e., G(τ ) deviates
from a geodesic] is the basic reason (necessary and sufficient condition) for Z a to
256 7 Foundations of General Relativity

Fig. 7.3 The cause of a G( )


pseudo-rotation: G(τ ) being
non-geodesic forces the
a
4-velocity Z a to “rotate” in Z |q
the (Z a , Aa )-plane during
the transition from p to q

a
Z |p Z
a
|p

a
p A |p

undergo a pseudo-rotation. This can be explained intuitively by means of Fig. 7.3.


According to Theorem 3.2.4, we can see from Aa = Z b ∇b Z a that

1
Aa | p = lim ( Z̃ a | p − Z a | p ) ,
τ →0 τ

where Z̃ a | p is the result of Z a | p parallel transported along G(τ ) to p, and τ ≡


τ (q) − τ ( p). It can be intuitively seen from Fig. 7.3 that: ① The deviation of G(τ )
from a geodesic forces its tangent Z a to “rotate” (pseudo-rotate) during the transition
from p to q; ② This pseudo-rotation is indeed in the (Z a , Aa )-plane.
Since any spatial vector field wa on G(τ ) is orthogonal to Z a , the fact that Z a
undergoes a pseudo-rotation  ˜ ab also forces wa to undergo such a pseudo-rotation.
Now we will prove that subtracting this inevitable pseudo-rotation from the spacetime
rotation of wa must yield a pure spatial rotation.

Proposition 7.3.5 Suppose  ˜ ab is the pseudo-rotation experienced by the 4-velocity


Z of G(τ ), and ab is the spacetime rotation experienced by a spatial vector field
a
ˆ ab ≡ ab − 
wa (= 0) on G(τ ), then  ˜ ab is a pure spatial rotation (which may differ
by a gauge transformation).

Proof See Optional Reading 7.3.1. 

Proposition 7.3.6 The necessary and sufficient condition for a spatial vector field
wa with a constant magnitude on the world line G(τ ) of an observer to have no
spatial rotation is that wa is Fermi transported along G(τ ), i.e., D F wa /dτ = 0.

Proof Since wa has a constant magnitude, from the paragraph above Remark 4 we
know that wa undergoes a spacetime rotation, i.e., there exists an ab such that
Dwa /dτ = −ab wb . Combining this with ˆ ab ≡ ab −  ˜ ab yields

Dwa
ˆ ab wb =
− ˜ ab wb .
+ (7.3.9)

7.3 Fermi-Walker Transport and Non-Rotating Observers 257

˜ ab = Aa Z b − Z a Ab , the equation above can also be expressed as


Noticing that 

DF wa
ˆ ab wb .
= − (7.3.10)

Since ˆ ab represents the spatial rotation of wa , the necessary and sufficient condition
for a spatial vector field wa with a constant magnitude to have no spatial rotation is
that DF wa /dτ = 0. 
ˆ ab (the
Conversely, suppose wa has a spatial rotation, let ωa be the dual form of 
Hodge dual in the 3-dimensional space W p of p ∈ G), i.e.,

ˆ ab = ωc εcab .
 (7.3.11)

Equation (7.3.10) can then be rewritten as

DF wa
= −εa bc wb ωc . (7.3.12)

Or, let εabcd represent the volume element associated with gab , then (7.3.12) can also
be written using εbcd = Z a εabcd as

DF wb
gab = εabcd Z b wc ωd . (7.3.12 )

The ωa defined by (7.3.11) is called the spatial angular velocity (or angular velocity
for short) of the spatial vector field wa . That is, a non-Fermi transported spatial vector
field wa can be described by a nonzero spatial angular velocity ωa .
Suppose {(ei )a } is an orthonormal spatial triad field on G(τ ). Since any two basis
vectors are orthogonal, they have a “rigid relationship”, and one can expect that these
three basis vectors have the same spacetime angular velocity ab , and thus have the
same spatial angular velocity  ˆ ab . See the following proposition:

Proposition 7.3.7 The three basis vector fields in any orthonormal spatial triad
ˆ ab (no more gauge
field {(ei )a } on G(τ ) have the same spatial angular velocity 
freedom).
Proof See Optional Reading 7.3.1. 
Remark 5 ① This  ˆ ab shared by each (ei )a is called the angular velocity 2-form
for the spatial rotation of this triad field, and the corresponding ωa (satisfying
ˆ ab = ωc εcab ) is called the spatial angular velocity vector of this triad field. ②
One may ask: suppose (e1 )a and (e2 )a rotates with respect to (e3 )a with an angu-
lar velocity ωa [parallel to (e3 )a ], then (e3 )a is non-rotating, and hence has zero
angular velocity. How can one say that these three vectors have the same angular
velocity? The answer is: using the “gauge freedom” (see Remark 4), one can say
that the angular velocity of (e3 )a is also ωa (since a rotation with respect to itself is
258 7 Foundations of General Relativity

equivalent to no rotation), and so there is no contradiction. Thus, we can also see that
the proof of Proposition 7.3.7 requires the use of the gauge freedom. It should be
emphasized that: when one finds that a basis vector in a spatial triad field [e.g., (e3 )a ]
is non-rotating along a curve, one cannot assert based on Proposition 7.3.7 that the
other two basis vector are also non-rotating, since they can rotate with respect to (e3 )a .

Remark 6 The discussion above indicates that an observer is determined by two


factors: ① a world line G(τ ), and ② an orthonormal tetrad field on G(τ ) [which
satisfies (e0 )a = Z a ]. In some cases factor ② is not critical, so one only needs to
specify the world line G(τ ) when talking about an observer. Therefore, some authors
treat an observer as a world line [e.g., Sachs and Wu (1977) p. 41 defines an observer
as a future-directed timelike curve with a unit tangent vector field]. However, in many
cases both of these two factors are critical, and in such cases one should interpret
an observer as a world line G(τ ) equipped with a specific orthonormal tetrad field
[where (e0 )a equals the 4-velocity]. This world line describes the orbital motion of
the observer (as a point mass), while the spatial angular velocity ωa of the triad field
describes the rotation of the observer. As we mentioned at the end of Sect. 6.3, an
inertial observer in Minkowski spacetime refers to a non-rotating (ωa = 0) observer
whose world line is a geodesic (the 4-acceleration Aa = 0). This is the simplest
type of observer. Similarly, a free-falling ( Aa = 0) non-rotating (ωa = 0) observer
in curved spacetime also belongs to the simplest type of observer, which has great
significance for understanding the equivalence principle and the concept of a local
inertial frame (see Sect. 7.5 for details). A clear understanding on the two factors of
an observer will be very helpful for distinguishing an inertial force and a Coriolis
force (see Sect. 7.4 for details).
[Optional Reading 7.3.1]
To prove Propositions 7.3.5 and 7.3.7, it is necessary to have a quantitative discussion for
the gauge freedom of ab . Suppose ab is the angular velocity for the spacetime rotation
of a spatial vector field wa on G(τ ), i.e.,

Dwa
= −ab wb . (7.3.13)

Choose an orthonormal tetrad field on G(τ ) such that (e0 )a = Z a , (e1 )a = αwa (where α
 ≡ +
is the normalization factor), then a necessary and sufficient condition for ab ab ab
to satisfy (7.3.13) is that  (e )b = 0. Thus,
ab 1

0 = ab (e1 )b = μν (eμ )a (eν )b (e1 )b = μ1 (eμ )a = 01 (e0 )a + 21 (e2 )a + 31 (e3 )a ,

and hence 01 = 21 = 31 = 0. Since ab (e1 )b = 0 is the only restriction on ab , and
there is no restriction on the other 3 components 02 , 03 and 23 , one can choose 02 ,
03 and 23 arbitrarily. This is the gauge freedom of the spacetime angular velocity ab of
wa .

Proof of Proposition 7.3.5 Choose an orthonormal tetrad field such that (e0 )a = Z a , (e1 )a =
αwa (where α is the normalization factor). It follows from
7.4 The Proper Coordinate System of an Arbitrary Observer 259

D a DZ a Dwa
0= (Z wa ) = wa + Za = −wa  ˜ ab Z b − Z a ab wb
dτ dτ dτ
˜ ab − ab )Z a wb = (ab − 
= ( ˜ ab )(e0 )a (e1 )b α −1 = (01 −  ˜ 01 )α −1

that 01 = ˜ 01 . Using the gauge freedom of ab we can let 02 = 
˜ 02 and 03 = ˜ 03 .
˜ ˆ ˜
Noticing that i j = 0, we see that ab ≡ ab − ab = i j (e )a (e )b is a pure spatial
i j

rotation. 

ˆ i )ab represent the angular velocity 2-form for the spatial


Proof of Proposition 7.3.7 Let (
rotation of (ei )a . From

D[(e1 )a (e2 )a ] D[(e2 )a (e3 )a ] D[(e3 )a (e1 )a ]


0= = =
dτ dτ dτ
we can see that
ˆ 1 )12 = (
(a) ( ˆ 2 )12 , ˆ 2 )23 = (
(b) ( ˆ 3 )23 , ˆ 3 )31 = (
(c) ( ˆ 1 )31 . (7.3.14)
Using the freedom of ( ˆ 1 )23 one can set it to equal ( ˆ 2 )23 , and thus (b) in the equation above
can be developed into ( ˆ 1 )23 = (ˆ 2 )23 = ( ˆ 3 )23 . Similarly we have ( ˆ 1 )12 = ( ˆ 2 )12 =
ˆ 3 )12 and (
( ˆ 1 )31 = ( ˆ 2 )31 = (ˆ 3 )31 , and hence ( ˆ 1 )ab = ( ˆ 2 )ab = (ˆ 3 )ab . The reader
should prove that  ˆ ab no longer has any gauge freedom. 

[The End of Optional Reading 7.3.1]

7.4 The Proper Coordinate System of an Arbitrary


Observer

The tetrad of an observer is only defined on the world line of the observer. In order to
record the events (experimental results) near the world line, one needs to extend this
tetrad in some way and form a coordinate system. We certainly want the coordinate
basis of this system on the world line to coincide with the tetrad of the observer. This
section will introduce a coordinate system which satisfies this requirement and is
quite convenient, called the proper coordinate system of an observer. This system
should be determined by two ingredients of the observer—the world line G(τ ) and the
orthonormal tetrad field on G(τ ). Since we will talk about general observers, G(τ )
is not necessarily a geodesic, and it can have an arbitrary 4-acceleration Âa (the hat
stands for the 4-acceleration of the observer, as distinguished from the 4-acceleration
of a point mass being measured). Also, the orthonormal triad field {(ei )a } is not
necessarily Fermi transported along G(τ ), but can have an arbitrary angular velocity
wa . Of course, both ωa and Âa are spatial vector fields on G(τ ), i.e., ωa Z a = 0,
Âa Z a = 0. Suppose μ(s) is an arbitrary spacelike geodesic that starts from p on
G(τ ) and is orthogonal to G(τ ) at p, where s is the affine parameter that is equal
to the arc length, i.e., T a ≡ (∂/∂s)a is the unit tangent vector. Let q be a point near
G(τ ), then there exists a unique spacelike geodesic μ(s) passing through q. [See
Fig. 7.4. If q is far from G(τ ), then there may be more than one such geodesic, or
there may not be any such geodesic. Luckily, the observer G only cares about events
260 7 Foundations of General Relativity

Fig. 7.4 Defining the proper G


coordinates of q relative to G
a
Z

p q
(s)

close to themselves]. Suppose the spacelike geodesic μ(s) passing through q starts
from a point p = μ(0) on G, we would like to define four coordinates (called proper
coordinates) t, x 1 , x 2 , x 3 for q using this geodesic μ(s). Suppose V p is the tangent
space of p, and W p is the 3-dimensional subspace in V p that is orthogonal to Z a | p ,
then T a | p ∈ W p . Denote T a | p as wa for short, and denote its components in (ei )a as
wi , then the four proper coordinates of q are defined as

t (q) := τ p , x i (q) := sq wi , i = 1, 2, 3 , (7.4.1)

where τ p is the proper time of p (as a point on G), and sq is the parameter value of
μ(s) at q, namely the arc length of the segment pq on μ(s). As long as p is near G(τ ),
we can use (7.4.1) to define the coordinates, and thus we obtain the proper coordinate
system {t, x i } of the observer G, whose coordinate patch is an open neighborhood
of G(τ ) [or of a segment of G(τ )]. As the simplest example, we point out that any
Lorentzian coordinate system in 4-dimensional Minkowski spacetime can be viewed
as the proper coordinate system of the inertial observer whose world line is an x 0 -
coordinate line of this system. (Note that the word “inertial” has already required
the triad to be Fermi transported along the curve, which is parallelly transported here).

Proposition 7.4.1 The coordinate basis vectors of a proper coordinate system at


any point p ∈ G(τ ) are identical to the orthonormal tetrad of the observer G(τ ),
and therefore the components of a metric gab | p in a proper coordinate system are
gμν | p = ημν .

Proof Let (e1 )a represent the first basis vector of the orthonormal tetrad at p, and treat
it as the wa we mentioned above. The proper coordinates of each point on the spacelike
geodesic μ1 (s) determined by (e1 )a satisfy x 2 = x 3 = 0, t = τ p , and thus μ1 (s) is
an x 1 -coordinate line. For this curve, w1 = 1 in x 1 (q) = sq w1 , and hence x 1 = s for
each point on the curve. Thus, the coordinate basis (∂/∂ x 1 )a | p = (∂/∂s)a | p = wa =
(e1 )a . In a similar manner we have (∂/∂ x 2 )a | p = (e2 )a and (∂/∂ x 3 )a | p = (e3 )a .
Moreover, it is not difficult to see that G(τ ) is the coordinate line for the proper
coordinate t, and t = τ on this curve, and hence Z a | p = (∂/∂t)a | p . This indicates
that the proper coordinate basis {(∂/∂ x μ )a } coincides with the orthonormal tetrad
{Z a | p , (ei )a | p }. Therefore, the components of gab | p in the proper coordinate system
are gμν | p = ημν . 
7.4 The Proper Coordinate System of an Arbitrary Observer 261

gμν | p = ημν is a major feature of the proper coordinate system. Of course, this
simple result does not necessarily hold for a point outside G(τ ).
A proper coordinate system has many uses. For example, by means of it one can
define the 3-velocity and 3-acceleration for a point mass.

Definition 1 Suppose {t, x i } is the proper coordinate system of an observer G, and


(at least a segment L of) the world line of a point mass is located in the proper coor-
dinate patch of G, then the 3-velocity u a and the 3-acceleration a a are accordingly
defined as
 
dx i (t) ∂ a
ua : = , (7.4.2)
dt ∂xi
 
d2 x i (t) ∂ a
a :=
a
, (7.4.3)
dt 2 ∂xi

where x i (t) are the parametric representations for L with t as the parameter in the
proper coordinate system.

Remark 1 If p is the intersection of L and G, then following (6.3.28) we can also


define the 3-velocity of L at p relative to the observer G as

ha bU b
u a := , (7.4.4)
γ

where U a is the 4-velocity of L, γ ≡ −Z a Ua , h ab ≡ gab + Z a Z b , and h a b ≡ g ac h cb .


Now we will show that (7.4.4) is equivalent to (7.4.2). Suppose τ L is the proper time
of the point mass L, then the 4-velocity U a = (∂/∂τ L )a at p can be expanded in
terms of the proper coordinate basis as
 a  a
∂ dt ∂ dx i
U =a
+ .
∂t dτ L ∂xi dτ L

In the equation above, (∂/∂t)a is exactly Z a , whose spatial projection vanishes;


(∂/∂ x i )a is orthogonal to Z a , and thus its projection is equal to itself. Hence,
 a
∂ dx i
h bU =
a b
. (7.4.5)
∂xi dτ L

Using the proper coordinate system, one can also find another expression for γ ≡
−Z a Ua :

γ = −gab Z a U b | p = −gμν Z μ U ν | p = −ημν (∂/∂t)μ U ν | p ,


= −η00 (∂/∂t)0 U 0 | p = U 0 | p = dt/dτ L | p , (7.4.6)
262 7 Foundations of General Relativity

where Proposition 7.4.1 is used in the third equality. It follows from (7.4.5) and
(7.4.6) that h a b U b /γ = (∂/∂ x i )a dx i /dt, and thus (7.4.4) is equivalent to (7.4.2).

The 3-velocity defined above can help deepen the understanding of inertial forces
and Coriolis forces in Newtonian mechanics (and their generalizations in curved
spacetime). According to Newtonian mechanics, Newton’s second law does not hold
when a non-inertial observer G measures the motion of a point mass. To preserve
the form of this law, people introduced the concept of a fictitious force. Suppose the
3-acceleration of G relative to an inertial frame is aˆ . (The hat is added to represent
the 3-acceleration of the observer, in order to distinguish from the 3-acceleration a
of the point mass being measured). When G makes a measurement, if they regard
any point mass L being measured as experiencing an imaginary inertial force −m aˆ
(where m is the mass of the point mass), then the equation of motion of a free point
mass after the inertial force is taken into account is −m aˆ = m a , and thus the 3-
acceleration of L relative to G is a = −aˆ . This can be called the inertial acceleration
of L relative to G which, when multiplied by m, is the inertial force. (We stipulate
that the observer and the world line of the point mass intersect, and the measurement
is made at the intersection). When G is rotating, however, a Coriolis force must be
introduced in addition to the inertial force to preserve the form of Newton’s second
law. However, the phrase “the observer is rotating” may sometimes cause confusion,
so it is necessary to discuss this in greater detail.
Consider a large rigid disk which rotates around its own axis. A swivel chair is put
on the edge of the disk, and the chair base is fixed on the disk (but the chair can rotate
around the axis fixed on its base). Due to the rotation of the disk, the observer in the
swivel chair undergoes a circular motion (the world line is a helix), which is a special
case of orbital motion. Of course, the observer in the swivel chair can also rotate with
respect to their own axis. (This motion is unrelated to the shape of the world line; it is
described by the motion of the orthonormal frame attached to the observer along the
world line). Since the observer has been regarded as a point mass, and the motion of
a point mass cannot be separated as a rotation and a translation, “the orbital motion
of an observer on a rotating disk is circular motion” is the most accurate way to refer
to this type of motion. However, in our daily life we also often refer to the circular
motion of a point mass as a rotation, which can be easily confused with the rotation of
its frame. Unfortunately, distinguishing orbital motion and a frame rotation happens
to be the key for distinguishing inertial forces and Coriolis forces. Therefore, we refer
to the circular motion (a special case of orbital motion) of the observer caused by
the rotation of the disk and the rotation of the frame realized using a swivel chair as
revolution and rotation, respectively. This is similar to calling the Earth’s (viewed as
a point mass) circular motion around the Sun as revolution, while calling the Earth’s
(now treated as a rigid body) rotation around its axis as rotation. Certainly, the word
revolution is not as appropriate as the term orbital motion when the world line of
the observer is not a helix. Later we will see that inertial forces and Coriolis forces
originate from the orbital motion and the rotation of the observer, respectively. Now
let us have a quantitative discussion with an arbitrary spacetime as the background;
in the low speed approximation, the conclusions for Minkowski spacetime agree
7.4 The Proper Coordinate System of an Arbitrary Observer 263

with Newtonian mechanics. For simplicity, we only discuss the measurements on


a free point mass by an arbitrary observer G. Although L is a free point mass, the
arbitrariness of the observer G (including the fact that the world line may not be a
geodesic and the orthonormal spatial triad may not be Fermi transported) means that
measurements of L by G will have an inertial acceleration and a Coriolis acceleration.
[An inertial (Coriolis) force in Newtonian mechanics is equal to the inertial (Coriolis)
acceleration times the mass of L]. See the following proposition.
Proposition 7.4.2 Suppose an observer G has a 4-acceleration Âa and an angular
velocity ωa (i.e., the angular velocity for the rotation of its spatial triad). Also suppose
the world lines of G and the free point mass L being measured intersect at p, and
the 3-velocity of L at p relative to G is u a . Then the 3-acceleration of L at p relative
to G is

a a ≡ (d2 x i /dt 2 )(ei )a = − Âa − 2εa bc ωb u c + 2( Âb u b )u a , (7.4.7)

where (ei )a is the orthonormal spatial triad of the observer at p, εabc ≡ Z d εdabc ,
Z d is the 4-velocity of G at p, and εabcd is the volume element associated with the
spacetime metric gab .

Proof See Optional Reading 7.4.1. 

Now let us discuss the physical meaning of each term on the right-hand side of
(7.4.7). If G is a freely falling non-rotating observer (for Minkowski spacetime this
is an inertial observer), i.e., Âa = 0, ωa = 0, then from (7.4.7) we can see that the
3-acceleration of L as measured by G is a a = 0. Take Minkowski spacetime as an
example, this indicates nothing but the simple fact that there is only a relative velocity
but no relative acceleration between two point masses undergoing inertial motion. In
contrast, if G is not a freely falling non-rotating observer, then there are the following
three possibilities:
(a) The world line of G is not a geodesic ( Âa = 0), but G is still a non-rotating
observer (ωa = 0, i.e., its tetrad is Fermi transported along the world line). Now
(7.4.7) becomes
a a = − Âa + 2( Âb u b )u a . (7.4.8)

Let  and u represent the magnitudes of the spatial vectors Âa and u a , and let θ be
the angle between them. Then the magnitude of the second term on the right-hand
side of the above equation is 2 Âu 2 cos θ  2 Âu 2 , and hence the second term can be
neglected under the non-relativistic approximation u  1. For Minkowski spacetime,
suppose G I is the instantaneous rest inertial observer of G at p (see Fig. 7.5), and
â a is the 3-acceleration of G relative to G I , then it follows from Proposition 6.3.6
that â a = Âa . Since in Newtonian mechanics, −â a is exactly the inertial acceleration
added for a point mass when observed by a non-inertial observer, the first term − Âa
on the right-hand side of (7.4.8) can be interpreted as an inertial acceleration, and
the second term is the relativistic correction term for the inertial acceleration (which
vanishes under the Newtonian approximation u  1). For curved spacetime, it can be
264 7 Foundations of General Relativity

Fig. 7.5 The 3-acceleration L GI G


aˆ of G relative to the
instantaneous rest inertial
observer G I is equal to the
4-acceleration Âa of G
a
p =
Wp

proved that (Lemma 7.4.3 is used, left as Exercise 7.6) as long as we interpret G I as
the freely falling observer that is at rest relative to G at p, then we still have â a = Âa
(â a is the 3-acceleration of G relative to G I ), and hence the first and second terms on
the right-hand side of (7.4.8) can still be interpreted as the inertial acceleration and the
corresponding correction term, respectively. In conclusion, the inertial acceleration
is caused by the 4-acceleration Âa of the observer (which depends on its orbital
motion).
(b) The world line of G is a geodesic ( Âa = 0), but G has a rotation (ωa = 0),
such as a rotating observer in the swivel chair fixed on the floor of a freely falling
spaceship. Now (7.4.7) becomes

a a = −2εa bc ωb u c = 2
u×ω
. (7.4.9)

This 3-acceleration of the free point mass L relative to G comes completely from
the rotation of the observer (ωa = 0). The right-hand side of the equation above is
the same as the expression for the Coriolis acceleration in Newtonian mechanics,
and hence in curved spacetime is also called the Coriolis acceleration. This clearly
indicates the difference between an inertial acceleration and a Coriolis acceleration:
the former originates from the non-geodesic motion of the observer, while the latter
comes from the rotation of the observer. In the case of a rotating disk, many textbooks
on mechanics assume that the observer on the rotating disk must have a corresponding
rotation due to the revolution, and attribute Coriolis forces to the revolution of the
observer. Actually, the rotation and revolution of the observer on a disk are in principle
independent. Suppose an observer is holding a gyroscope, sitting in a swivel chair
whose base is fixed on the edge of the disk. Then the observer can adjust (“rotate”)
the swivel chair properly and always face the direction indicated by the gyroscope,
and thus is non-rotating while revolving with the disk. In this case, a point mass
being measured will only have an inertial acceleration but no Coriolis acceleration!
(c) The world line of G is not a geodesic ( Âa = 0), and G has a rotation (ωa = 0).
A free point mass observed by G will have both an inertial acceleration and a Coriolis
acceleration.
Many authors regard Coriolis force as a type of inertial force, this is nothing
but a problem of name, which is totally fine. However, in order to distinguish the
orbital motion and rotation of an observer, this text prefers the name used by some
other authors [e.g., Misner et al. (1973)], i.e., to call the fictitious forces caused by
7.4 The Proper Coordinate System of an Arbitrary Observer 265

the orbital motion and rotation of an observer as inertial forces and Coriolis forces,
respectively.
[Optional Reading 7.4.1]
To prove Proposition 7.4.2, we first prove the following Lemma.

Lemma 7.4.3 The Christoffel symbols of the spacetime metric gab in the proper coordinate
system of G(τ ) have the following simple forms:

 0 00 =  σ i j = 0 ,  0 0i =  0 i0 =  i 00 = Âi ,
(7.4.10)
 i 0 j =  i j0 = −ωk ε0ki j , σ = 0, 1, 2, 3 , i, j, k = 1, 2, 3 .

where Âa and ωa are the 4-acceleration and spatial angular velocity of the observer G,
respectively, and ε0ki j are the components of the volume element associated with gab in the
proper coordinate system.

Proof Since the orthonormal triad {(ei )a } of the observer G has a spatial rotation with an
angular velocity ωa , from Sect. 7.3 we know that

(e0 )b ∇b (eμ )a = D(eμ )a /dτ = −ab (eμ )b , μ = 0, 1, 2, 3 , (7.4.11)


where
ab = Âa ∧ Z b + εabc ωc . (7.4.12)
From (5.7.2) we know the Christoffel symbols satisfy the following equation:

(∂/∂ x ν )b ∇b (∂/∂ x μ )a =  σ μν (∂/∂ x σ )a , (7.4.13)


where {(∂/∂ x μ )a } are the coordinate basis in the coordinate system associated with the
Christoffel symbols. Now we are in the proper coordinate system of G(τ ), and the proper
coordinate basis is the same as the orthonormal frame. Hence, (7.4.13) on G(τ ) can also be
expressed as
(e0 )b ∇b (eμ )a =  σ μ0 (eσ )a . (7.4.14)
Comparing (7.4.11) and (7.4.14) yields  σ μ0 (eσ )a = −a b (eμ )b = −σ μ (eσ )a , and there-
fore
 σ μ0 = −σ μ , σ, μ = 0, 1, 2, 3 .
If we rewrite (7.4.14) into the component form, then the above equation becomes

 σ μ0 = −( Âσ Z μ − Z σ Âμ + Z α ωρ εαρσ μ ) = −( Âσ Z μ − Z σ Âμ + ωρ ε0ρσ μ ) ,

where in the last step we used Z i = 0 and Z 0 = −1. Using also Z 0 = 1, Â0 = 0 = Â0 , we
have

 0 00 = −( Â0 Z 0 − Z 0 Â0 − ωρ ε0ρ0 0 ) = 0 ,


 0 i0 = −( Â0 Z i − Z 0 Âi − ωρ ε0ρ0 i ) = Âi ,
 i 00 = −( Âi Z 0 − Z i Â0 − ωρ ε0ρi 0 ) = Âi ,
 i j0 = −( Âi Z j − Z i  j − ωρ ε0ρi j ) = ωk ε0ki j = −ωk ε0ki j ,

where i, j, k = 1, 2, 3. Finally, we show that  σ i j = 0. Suppose μ(s) is a spacelike geodesic


starting from p ∈ G (where s is the arc length), whose tangent vector T a at p is orthogonal
to Z a , then along μ(s) we have

x 0 ≡ t = τ p = constant, x i = sT i , T i = constant, i = 1, 2, 3 .
266 7 Foundations of General Relativity

Thus, d2 x σ /ds 2 = 0, σ = 0, 1, 2, 3. Hence, from the geodesic equation we have

d2 x σ dx μ dx ν dx i dx j
0= 2
+  σ μν = σ i j , σ = 0, 1, 2, 3 .
ds ds ds ds ds

That is, 0 =  σ i j T i T j (i = 1, 2, 3) ∀ unit vectors T a ∈ W p , and thus 0 =  σ i j wi w j , ∀wa ∈


W p . Therefore at p we have  σ i j = 0, i, j = 1, 2, 3 and σ = 0, 1, 2, 3. Since p ∈ G is
arbitrary, this equation holds for any point on G(τ ). 

Proof of Proposition 7.4.2 The world line of a free point mass is a geodesic, and its equation
in the proper coordinate system of G(τ ) is

d2 x μ dx ν dx σ
2
+  μ νσ = 0, (7.4.15)
dτ L dτ L dτ L

where the affine parameter τ L of the geodesic is the proper time of the point mass L. Choose
t ≡ x 0 as another parameter [the coordinate
μ
time of the proper coordinate system of G(τ )]
dx μ dt dx μ
of L, and denote dt/dτ L as γ . Then, dx
dτ L = dt dτ L = γ dt , and hence
     2 μ 
d2 x μ d dx μ d dx μ d x dγ dx μ
=γ =γ γ =γ γ + . (7.4.16)
dτ L2 dt dτ L dt dt dt 2 dt dt

Setting μ = i (= 1, 2, 3) in the above equation yields


 
d2 x i dγ i
=γ γ ai + u . (7.4.17)
dτ L2 dt

Setting μ = i in (7.4.15), and plugging in (7.4.17), we get


 
dγ i dx ν dx σ 2
γ γ ai + u +  i νσ γ = 0.
dt dt dt

Hence,

a i = −γ −1 u i dγ /dt − ( i 00 + 2 i 0 j u j +  i jk u j u k )
= γ −1 u i dγ /dt − ( Âi − 2ωk ε0ki j u j ) = −γ −1 u i dγ /dt − Âi − 2εi jk ω j u k , (7.4.18)

where in the second equality we used Lemma 7.4.3, and in the third equality we used
ε0ki j = εki j . To derive γ −1 dγ /dt, we set μ = 0 in (7.4.16), and find d2 t/dτ L2 = γ dγ /dt.
Then setting μ = 0 in (7.4.15) yields

d2 t dx ν dx σ dγ dt dx i 2 dγ
0= 2
+  0 νσ =γ + 2 0 0i γ =γ + 2 Âi u i γ 2 ,
dτ L dτ L dτ L dt dt dt dt

where Lemma 7.4.3 is used in both the second and third equalities. From the above equation
we get −γ −1 dγ
dt = 2 Âb u . Plugging this into (7.4.18) and rewriting it using the abstract
b

indices, we obtain a = − Âa − 2εa bc ωb u c + 2( Âb u b )u a .


a 

[The End of Optional Reading 7.4.1]


7.5 Equivalence Principles and Local Inertial Frames 267

7.5 Equivalence Principles and Local Inertial Frames

Any inertial coordinate system in Minkowski spacetime is globally defined (the


coordinate patch covers the whole manifold), and thus is also called a global inertial
coordinate system. Suppose {t, x, y, z} is a global inertial coordinate system, and
G(τ ) is an arbitrary t-coordinate line in this system, then {(∂/∂t)a , (∂/∂ x)a , (∂/∂ y)a ,
(∂/∂z)a }| p is a non-rotating orthonormal tetrad field on G. This coordinate line
together with this tetrad field form an inertial observer, and {t, x, y, z} is exactly
the proper coordinate system of this observer. Now we discuss to what extent can
the concepts above be generalized to curved spacetime. First, an inertial observer
in Minkowski spacetime corresponds naturally to a freely falling (the world line is
a geodesic) non-rotating observer. To study the results of a measurement from this
observer, let us discuss the famous “Einstein’s elevator”. Suppose an elevator close to
the ground is freely falling due to the breaking of the cable. The rest observer inside
this elevator will experience weightlessness, which is already a fact in Newtonian
mechanics. Suppose he lets go of an apple in his hand, he will find that the apple does
not fall as usual but instead remains at rest. The reason is very simple: the elevator
observer G has a gravitational acceleration g relative to an inertial frame (the Earth),
and thus is a non-inertial observer (based on the viewpoint of Newtonian mechanics,
based on general relativity it will be the opposite). Hence, he will think there are
two forces being applied to the apple: the gravitational force m G g (where m G is the
gravitational mass of the apple) and the inertial force −m I g (where m I is the inertial
mass of the apple). Since m G = m I (which is the crux of this argument), the net force
vanishes, and therefore it is unaccelerated, or, in a state of weightlessness. If he is
an astronaut, he will feel that this apple behaves the same as an apple in an inertial
spaceship far away from celestial bodies (and thus the spacetime is approximately
flat). By extension, since m G = m I , according to Newtonian mechanics, every (non-
gravitational)7 mechanical experiment in Einstein’s elevator has the same result as
the corresponding experiment in an inertial spaceship far way from celestial bodies.
This is exactly the reason why m G = m I is called an equivalence principle.
During the process of conceiving general relativity, Einstein carried over this prin-
ciple further hypothetically from mechanical experiments to all physical experiments,
i.e., he assumed every (non-gravitational) experiment in a freely falling elevator has
the same results as the corresponding experiment in an inertial spaceship far away
from celestial bodies (i.e., in flat spacetime). Based on this, he derived conclusions
like the gravitational redshift of light and light rays following curved paths in a
gravitational field. The principle corresponding to m G = m I is dubbed the weak
equivalence principle (WEP), and the one generalized by Einstein is dubbed the
Einstein equivalence principle (EEP). Now we will discuss this principle from the
perspective of general relativity.

7 A non-gravitational experiment refers to an experiment where the gravitational interaction between


the objects in the lab can be ignored, but there may exist a gravitational field produced by an object
outside the lab (e.g., the Earth).
268 7 Foundations of General Relativity

Proposition 7.5.1 Suppose G(τ ) is a freely falling non-rotating observer in curved


spacetime (e.g., an observer in Einstein’s elevator), gμν are the components of the
metric gab in the proper coordinate system G(τ ), and  σ μν are the Christoffel symbols
of the derivative operator ∇a associated with gab in this system, then

gμν | p = ημν ,  σ μν | p = 0 (σ, μ, ν = 0, 1, 2, 3) , ∀p ∈ G . (7.5.1)

Proof gμν | p = ημν is the conclusion of Proposition 7.4.1 (which holds for the
proper coordinate system of any observer). Lemma 7.4.3 gives  σ μν | p = 0 (σ, μ, ν =
0, 1, 2, 3) when G(τ ) is a geodesic and the corresponding observer is
non-rotating. 

Taking the electromagnetic phenomenon as an example, let us discuss the applica-


tions of the above proposition. According to the minimal substitution rule (see Sect.
7.2), the expressions for Maxwell’s equations and the equation of the Lorentz force
in curved spacetime are

DP a
(a) ∇ a Fab = −4π Jb , (b) ∇[a Fbc] = 0 , .
(c) q F a b U b = U b ∇b P a ≡

(7.5.2)
Suppose {x μ } is an arbitrary local coordinate system, we want to write down the
expressions for the components of (7.5.2) in this system. First we look at (a). Recall
that the coordinate components of ∇a vb are denoted by vν ;μ (see Sect. 3.1), i.e., vν ;μ ≡
(dx ν )b (∂/∂ x μ )a ∇a vb . Similarly, one should denote the coordinate components of
∇a F c b as F σ ν;μ , i.e., F σ ν;μ ≡ (dx σ )c (∂/∂ x μ )a (∂/∂ x ν )b ∇a F c b . Hence,
 a  b  b  b
∂ ∂ ∂ ∂
F μ ν;μ = (d x μ )c ∇a F c b = δ a c ∇a F c b = ∇a F a b
∂xμ ∂xν ∂xν ∂xν

are the coordinate components of ∇a F a b , and the component expression for (7.5.2)(a)
is
F μ ν;μ = −4π Jν . (7.5.3a)

Similarly, the coordinate components of ∇a Fbc are denoted by Fνσ ;μ , and the coor-
dinate component expression for (7.5.2)(b) is

F[νσ ;μ] = 0 . (7.5.3b)

Finally, for (7.5.2)(c), the coordinate components of the left-hand side are obviously
q F μ ν U ν . Using DP μ /dτ to represent the coordinate components of DP a /dτ , we
have
DP μ
q F μν U ν = . (7.5.3c)

7.5 Equivalence Principles and Local Inertial Frames 269

Note that in general DP a /dτ = dP a /dτ , because it is not difficult to show that (see
Exercise 3.6) DP a /dτ = dP a /dτ +  a νσ U ν P σ . Since ∀ p ∈ G we have  μ νσ | p = 0
for the proper coordinate system of G, the above equations can be written as

dP μ
(a) F μ ν;μ = −4π Jν , (b) F[νσ ;μ] = 0 , (c) q F μ ν U ν = . (7.5.4)

These are exactly the expressions for the corresponding laws in (7.5.2) in a global
inertial (Lorentzian) coordinate system in Minkowski spacetime. The discussion
above can be generalized to other physical laws. Thus, the proper coordinate system
of a freely falling non-rotating observer is similar to a global inertial (Lorentzian)
coordinate system, and therefore is called a local inertial frame, also called a local
Lorentz system or local Lorentz frame.
People often say: The laws of physics are the same in any local Lorentz system of
curved spacetime as in an inertial coordinate system in Minkowski spacetime [Misner
et al. (1973) p. 207], and thus all the physical experiments done by a freely falling
non-rotating observer G have the same (equivalent) results as the corresponding
experiments done by an inertial observer in flat spacetime. This is the conclusion
required by the Einstein equivalence principle. However, the statement above is not
quite precise, since all we can be certain about is  σ μν | p = 0, ∀ p ∈ G, and once
one deviates from G(τ ), we cannot guarantee that  σ μν = 0. In fact, if  σ μν really
vanish in a neighborhood of G(τ ), then ∀ p ∈ G we have

Rμνρ σ | p = (−2∂[μ  σ ν]ρ + 2 λ ρ[μ  σ ν]λ )| p = 0 ,

i.e., the curvature at each point on G(τ ) vanishes, which is inconsistent with the
curved spacetime we supposed. The heart of the problem is that, by choosing a
coordinate system one can only make the  σ μν on G(τ ) vanish but not the curvature
(curvature is independent of the coordinate system). Thus, the statement “the laws of
physics are the same in any local inertial frame of curved spacetime as in an inertial
frame in Minkowski spacetime” is not necessarily true for a point in the coordinate
patch but outside the curve G(τ ). Nevertheless, when the observer G is doing an
experiment, a “ finitely small” spacetime neighborhood U of the world line is usually
involved (e.g., an elevator is involved for an observer in the elevator, see Fig. 7.6),
and thus the problem becomes not that simple. Luckily, the effect of spacetime
curvature can only be made manifest (detected by experiments) in a sufficiently
large spacetime region. Hence, as long as the spacetime neighborhood that is involved
in an experiment is sufficiently small (for an elevator, as long as its spatial scale
and the falling time are sufficiently small), the result of the experiment will be
virtually indistinguishable from the corresponding experiment in flat spacetime. [This
is similar to the following simple example: at each point on a 2-dimensional sphere,
Rabc d is nonvanishing; however, if one only cares about a small piece of the sphere
S in the vicinity of a point, then S can be substituted approximately by a small
region S of the tangent plane of this point (see Fig. 7.7). For instance, in order to
measure the angle between two meridians of the Earth at the North Pole, one may
270 7 Foundations of General Relativity

Fig. 7.6 The “small” G


spacetime neighborhood U
involved in the experiment
done by an observer
p

Fig. 7.7 A small piece of


sphere S in the vicinity of
the north pole can be
substituted approximately by
a small region S of the
tangent plane

treat a small segment of each meridians as a straight line]. In an arbitrary curved


spacetime, if one only cares about a point’s neighborhood that is sufficiently small,
special relativistic laws of physics can be used as an approximation.
The proper coordinate system of a freely falling non-rotating observer is a coordi-
nate system in curved spacetime that is very similar to an inertial coordinate system
in Minkowski spacetime, and thus is called a local Lorentz system. In addition, in
curved spacetime, as long as the Christoffel symbols of a coordinate system vanish
at a point p in the coordinate patch (i.e.,  σ μν | p = 0), this system can also be called
a local Lorentz system at p.
In Newtonian mechanics, suppose a spaceship far away from celestial bodies is
undergoing uniformly accelerated motion. An observer in the spaceship (which is
an accelerating observer) will see that the apple flying out of their hand undergoes
a uniformly accelerated motion in the opposite direction, just like what will happen
near the Earth. It is not difficult to believe that all the (non-gravitational) mechan-
ical experiments in the spaceship will have approximately the same results as the
corresponding experiments near the ground; this can be regarded as another formu-
lation of the weak equivalence principle. Based on this, people also often say that
“an astronaut in an accelerating spaceship finds themselves in a gravitational field”
or “an acceleration is equivalent to a gravitational field”. One should have a proper
interpretation for these two statements. Based on the first statement, beginners may
raise a question like this: since the astronaut in an accelerating spaceship feels grav-
ity, and gravity is the spacetime curvature, does the astronaut not feel that he is in
a curved spacetime? The answer is negative: since we have already stipulated that
the spaceship is far away from celestial bodies, the spacetime local to where the
spaceship is located must be approximately flat, no matter to which observer. The
7.5 Equivalence Principles and Local Inertial Frames 271

reason that leads to the incorrect conclusion above is that the word “gravity” is used
twice in the deduction, while they have different meanings. The “gravity” felt by the
astronaut is only a fictitious apparent gravity, which is not produced by matter and
does not correspond to curved spacetime; the name only comes from the feeling of
the astronaut.
Physicists have varied opinions about the meaning and the value of the equivalence
principles. Especially, the opinions on their value are also different due to different
opinions on the meaning of equivalence principles. Some consider that they are of
great significance. For example, Misner et al. (1973) p. 386 said that “The principle
of equivalence has great power. With it one can generalize all the special relativistic
laws of physics to curved spacetime.” They also said (p. 207) “The vehicle that
carries one from classical mechanics to quantum mechanics is the correspondence
principle. Similarly, the vehicle between flat spacetime and curved spacetime is the
equivalence principle.” Some others, however, take a completely opposite view. For
example, J. L. Synge wrote in the preface of Synge (1960) that: “I have never been able
to understand this principle. ...... Does it mean that the effects of a gravitational field
are indistinguishable from the effects of an observer’s acceleration? If so, it is false.
In Einstein’s theory, either there is a gravitational field or there is none, according
as the Riemann tensor does not or does vanish. This is an absolute property; it has
nothing to do with any observer’s world line. Spacetime is either flat or curved, and
in several places in this book I have been at considerable pains to separate truly
gravitational effects due to curvature of spacetime from those due to curvature of the
observer’s world line (in most ordinary cases the latter predominate). The principle
of equivalence performed the essential office of midwife at the birth of general
relativity, ...... I suggest that the midwife be now buried with appropriate honors
and the facts of absolute spacetime be faced.” This view on equivalence principles
might be somewhat extreme, but some statements in the quotation above can yet
be regarded as a sobering pill which prevents us from misconstruing concepts. For
instance, his warning on distinguishing the real gravity caused by the spacetime
curvature from the apparent (fake) gravity caused by the observer’s world line being
curved (non-geodesic) is extremely necessary.
Here, we talk briefly about our humble understanding of equivalence principles.
Firstly, the Einstein equivalence principle is a hypothetical generalization of the
weak equivalence principle posed by Einstein during the conception of general rel-
ativity, which is very important as a midwife at the birth of general relativity. Even
Synge agreed with this.
Secondly, as mentioned in Sect. 7.2, physical laws in curved spacetime must
obey two principles: (a) the principle of general covariance, and (b) when gab equals
ηab , they can go back to the corresponding laws in special relativity. This is the
how this text and some other textbooks state them. More textbooks, however, state
principle (b) in another way: (b ) the Einstein equivalence principle. From (a) and
(b ) one can obtain their minimal substitution rule: “the equation of a physical law
in a local Lorentz system of curved spacetime can be obtained by changing the
commas in the equation of the corresponding physical law in a Lorentzian coordinate
system of Minkowski spacetime to semicolons (i.e., changing partial derivatives to
272 7 Foundations of General Relativity

covariant derivatives).” Thus, using the Einstein equivalence principle (together with
the principle of general covariance) we can obtain the laws of physics in general
relativity from the corresponding laws of physics in special relativity, and therefore
it can be said to be the “bridge that brings us from spacial relativity to general
relativity”. However, just like what we did in Sect. 7.2, one can also get the physical
laws of curved spacetime not by mentioning equivalence principles but by saying (in
adding to the principle of general covariance) that “the physical laws should go back
to the corresponding laws in special relativity when gab equals ηab ”. (Either way, we
obtain the minimal substitution rule).8 Once the physical laws in curved spacetime are
accepted (and thus general relativity is formulated), one can totally discuss physics
problems without using equivalence principles (although many authors like to use
equivalence principles in many problems). Therefore, from this perspective, “burying
the midwife” seems have no influence on general relativity.
Thirdly, for some complicated situation (such as when talking about if a charged
particle moving along a geodesic in curved spacetime has electromagnetic radiation),
“whether or not the principle of equivalence is violated” has been a controversial issue
for a long time. We think the point is that the precise meaning of the “principle of
equivalence” in these situations has yet to be clarified (another important problem is
the definition of radiation). In this sense, maybe it is not excessive at all when Synge
said “I have never been able to understand this principle.”
Fourthly, besides general relativity, there exist tons of different gravitational the-
ories [see, for example, Will (2018)]. All the gravitational theories can be classified
into two major kinds, namely metric theories (which require the spacetime to have a
metric, and the world line of a free point mass is the geodesic of this metric, etc.) and
non-metric theories. General relativity is of course a metric theory. There are also
many other metric theories out there. For example, another famous and competitive
metric theory is called the Brans-Dicke theory, in which the quantities describing
gravity also contains a scalar field φ other than the metric field gab . The criterion
for judging which gravitational theory is the correct one is of course experiments.
To this end, we need a theory about gravitational experiments. R. H. Dicke had been
working on this kind of theory since the 1960s. His pioneering works have gradu-
ally deepened people’s understanding of equivalence principles and their meaning.
At last, people realized that one should put equivalence principles at the important
position of inspecting the foundation of gravitational theories (not just for general
relativity). There are three levels of equivalence principles, namely the weak equiv-
alence principle (WEP), the Einstein equivalence principle (EEP), and the strong
equivalence principle (SEP). The difference between the SEP and the EEP is that:
the EEP (and WEP) only consider the external gravitational field of a system (e.g.,
an elevator) but do not consider the self-gravitational field generated by the objects
in the system, i.e., they only consider the passive aspects of gravity but ignore the

8 However, this rule will lead to an ambiguity in the order of the operators when two derivative
operators act successively; other considerations need to be taken into account to overcome this issue
(See Sect. 7.2). Therefore, the claim that equivalence principles “can carry all the special relativistic
laws of physics to curved spacetime” seems to be too strong. Misner et al. (1973) pp. 390–391 has
a specific discussion on this.
7.6 Tidal Forces and the Geodesic Deviation Equation 273

active aspects; however, the SEP considers both the active and passive aspects, and
talks about “self-gravitational systems” which includes from the self-gravity of stars
all the way to the gravity between two lead balls in the Cavendish experiment. The
EEP can be regarded as the special case of the SEP when self-gravity is negligible.
The experimental verification for these three equivalence principles are significant
for choosing the gravitational theory. Any gravitational theory will satisfy the WEP
(because the WEP has been verified by experiments which are more and more pre-
cise, no one would like to create a theory that violates the WEP), but this is not true
for the EEP and the SEP. Study shows that [see Will (1995) and Will (2014)], if the
EEP is true, then only metric theories can be correct. This indicates that if experi-
ments that are more and more precise can verify the EEP, then there will be less and
less room for non-metric theories. Further discussion also shows that [still see Will
(1995) and Will (2018)], general relativity satisfies the SEP while none of the other
known theories (including the Brans-Dicke theory) does. (Unfortunately, this discus-
sion is not a rigorous proof, and thus till now the conclusion above is technically still
a conjecture). Therefore, if experiments that are more and more precise can verify
the SEP, then general relativity is very likely to be the only correct gravitational
theory. Thus, we can see that the experimental verification for the three equivalence
principles has very significant theoretical meaning, and these experiments are now
under way with higher and higher precision.

7.6 Tidal Forces and the Geodesic Deviation Equation

Proposition 7.5.1 only shows that the components  σ μν of the Christoffel symbol
in the proper coordinate system of a freely falling non-rotating observer vanish on
the world line of this observer. Once off the world line,  σ μν can be nonvanishing.
To see the physical effect of this statement, let us consider the following thought
experiment. Put eight balls in an elevator into a circular pattern (the plane of the
circle is perpendicular to the ground), as shown in the left part of Fig. 7.8. First we
will discuss what happens using Newtonian mechanics. Suppose the line that goes
through balls 1 and 2 happens to pass through the Earth’s center, and each ball is
at rest relative to the other at the beginning. Since the gravitational field at ball 1 is
slightly stronger than that at ball 2, the gravitational acceleration of ball 1 is slightly
greater than ball 2, and thus the distance between them will gradually increase. A
while later, the whole system will look like what is shown in the right part of Fig. 7.8,
which is not round anymore. Imagine ball 1 is an observer, they will find that the
distance between them and ball 2 increases with time. However, if the eight balls are
arranged in a circle in an inertial spaceship in a region without gravity (and relatively
at rest at the beginning), then ball 1 (as an observer) will not find that the distance
between them and ball 2 has any change. Thus, even for mechanical experiments,
an elevator on the Earth’s surface is not completely equivalent to a spaceship in a
region without gravity.
274 7 Foundations of General Relativity

Fig. 7.8 The pattern of balls 2


deforms during freely falling
3 4

2
g

3 4

ground ground

Although these are thought experiments, phenomena with a similar principle can
also be found in daily life. One example is the changing of the tides. Now we will
have a simplified analysis of this phenomenon using Newtonian mechanics, in order
to highlight the essence of this concept. The leading cause of the tidal phenomenon
is the Moon, while the Sun gives a secondary contribution. Ignoring the effect of the
Sun can simplify the problem a lot without changing the essence of it. The Earth, as
an object, is located in the gravitational field of the Moon. Assume that the Earth’s
surface is covered by a layer of sea water. Consider two points A and B on the
water’s surface, such that the line going through them passes though the Earth’s
center. Suppose at a certain moment A is the closest to the Moon, then B is the
furthest from the Moon. The gravitational forces from the Moon on A and B are
different, so the two points will move away from each other; thus, the sea’s surface
near A and B will bulge outwards (left, Fig. 7.9).9 As the Earth rotates, A will not
face the Moon, and the sea level will drop. After the Earth rotates half a cycle, A is
the furthest from the Moon (right, Fig. 7.9), and the sea water will rise again. For
someone who is freely falling near the ground, the distance from the Earth’s center to
their head and feet are different, so there also exists a force that stretches their body
(if one only considers the Earth’s gravitational field), although this “tidal force” is
so small that they will not be able to feel it. If you are freely falling at the surface of
a neutron star, the tidal force can be as large as 1011 N, and you will be torn apart
and dead. Remark: ① A neutron star is a celestial object that is composed mainly of
neutrons, whose density can be as high as 1014 times that of water! The high density
causes an extremely high gradient of the surface gravitational field, see Sect. 9.3. ②
According to the usual estimation, the critical pressure or tension that a human body
can tolerate (above which the body will be torn apart) is about 107 N/m2 .
The discussion above indicates that any object in the gravitational fields of the
Earth and the Moon experiences a tidal force. In fact, the tidal phenomenon is a

9 From the viewpoint of an observer on the Earth, the reason for A to bulge is the combination of
two forces: (a) the Moon’s gravitational force, and (b) the centrifugal force caused by the Earth’s
circular motion around the barycenter of the Earth and the Moon. The net force of these two forces
is called the tide-generating force (or tide-raising force).
7.6 Tidal Forces and the Geodesic Deviation Equation 275

Fig. 7.9 Schematic figures B A


for the tidal phenomena

Earth Earth

A B

Moon Moon

Fig. 7.10 The small balls


2
put everywhere inside an
Einstein’s elevator r (t ) + (t )
(t )

1
r (t )

o g

universal feature of gravitational fields. We will discuss the tidal phenomenon quan-
titatively using Newton’s theory of gravity and general relativity, respectively.
First, we use Newton’s theory of gravity. Without loss of generality, we still take
the example of an Einstein elevator close to the Earth’s surface. Suppose there are
small balls everywhere inside the elevator (see Fig. 7.10). Let r(t) and r(t) + λ  (t)
represent the position vectors of two balls 1 and 2 next to each other relative to the
origin o in a Cartesian coordinate system, then λ(t)  is the position vector of ball 2
 /dt 2 is the acceleration (tidal acceleration) of ball
relative to ball 1, and hence d2 λ
2 relative to ball 1. To calculate the tidal acceleration, one can use the gravitational
potential φ and Newton’s second law to write that

d2 x i ∂φ 
=− i ,
dt 2 ∂ x r
  
d (x + λ )
2 i i
∂φ  ∼ ∂φ  ∂ ∂φ  j
= − = − − λ ,
dt 2 ∂ x i r+λ ∂ x i r ∂ x j ∂ x i r

Subtracting these two equations yields



d2 λi ∂ 2 φ  j
= − λ , i = 1, 2, 3 , (7.6.1)
dt 2 ∂ x i ∂ x j r

which is the expression for the tidal acceleration in Newton’s theory of gravity.
276 7 Foundations of General Relativity

Using (7.6.1) one can have a clear idea about the change of the distance between
the two balls in Fig. 7.8. Choose a coordinate system {x, y, z} such that the z-axis is
pointing straight upward, then the z-component of the relative acceleration between
balls 1 and 2 is
d2 λ z d2 φ d2 φ
ã z ≡ 2
= − 2 λz = − 2 λz , (7.6.2)
dt dz dr

where r is the distance between ball 1 and the Earth’s center. The Earth’s gravi-
tational potential is φ⊕ = −G M⊕ /r⊕ , and hence ã z = 2G M⊕ λz /r⊕3 . Suppose the
initial distance between the two balls is λz = 1 m. Plugging in the following
numerical values in SI: G = 6.67 × 10−11 , M⊕ = 6 × 1024 , r⊕ = 6.37 × 106 yields
ã z = 0.31 × 10−5 m·s−2 . Suppose the two balls are initially at rest with respect to
each other, then the increment of their distance after t = 5 s will be

1 z 1
λz = ã (t)2 = × 0.31 × 10−5 × 52 ∼
= 4 × 10−5 m . (7.6.3)
2 2
Now we will investigate the tidal phenomenon from the perspective of general rel-
ativity. What we will show is that the tidal phenomenon is an inevitable outcome
of the intrinsic curvature of spacetime. Again we will take Fig. 7.10 as an example.
Each ball can be viewed as a freely falling observer, whose world line is a timelike
geodesic with the proper time τ as the affine parameter. These geodesics form a
geodesic congruence in an open subset U of the spacetime10 (physically it corre-
sponds to a freely falling reference frame), and the tangent vectors Z a ≡ (∂/∂τ )a of
the geodesics form a timelike vector field on U . Let μ0 (s) be a smooth transverse
curve11 [transverse means that the tangent vector of any point on μ0 (s) is not tangent
to the geodesic passing through this point], then each geodesic γ (τ ) in the congru-
ence that intersects μ0 (s) can be labeled by s, i.e., it can be denoted by γs (τ ), in
which s is the value of s at the intersection of this geodesic and μ0 (s). Choose the
initial setting of the proper time of each γs (τ ) such that the τ at the intersection of
μ0 (s) and each γs (τ ) is zero. Suppose φτ is an element of the one-parameter (local)
group of diffeomorphisms corresponding to the vector field Z a and μτ (s) represents
the image of the curve μ0 (s) under the map φ(τ ) (see Fig. 7.11). All the curves μτ (s)
with different values of τ cover a subset of S , on which each point is determined by
two real numbers (coordinates) τ and s, and therefore S is a two dimensional mani-
fold. All the geodesics on S forms a subset of the geodesic congruence, where each
geodesic can be labeled using a parameter s, and thus this subset is also called a one-
parameter family of geodesics. (The geodesics in the congruence fill a 4-dimensional
open subset U of the spacetime, while this one-parameter family of geodesics only
covers a 2-dimensional surface S ). In conclusion, a given transverse curve μ0 (s)
picks a one-parameter family of geodesics {γs (τ )}. Let η ≡ (∂/∂s)a , then Z a and ηa
are the coordinate basis vector fields of S , and hence they commute:

10 A congruence of curves in U is a family of curves, such that for each p ∈ U there is a unique

curve in this family passing through p.


11 Some other conditions also need to be satisfied (e.g., non-self-intersecting).
7.6 Tidal Forces and the Geodesic Deviation Equation 277

Fig. 7.11 A transverse 0 s


curve picks a one-parameter
family of geodesics {γs (τ )},
the paper represents the
2-dimensional surface
spanned by this family Za (s)

=0 s)
0(

0 = [Z , η]a = Z b ∇b ηa − ηb ∇b Z a , (7.6.4)

where ∇a can be any torsion-free derivative operator. Choose the ∇a associated with
the spacetime metric, then

Z b ∇b (ηa Z a ) = ηa Z b ∇b Z a + Z a Z b ∇b ηa = Z a Z b ∇b ηa
1
= Z a ηb ∇b Z a = ηb ∇b (Z a Z a ) = 0 , (7.6.5)
2
where the second equality used the fact that Z a is the tangent vector of the geodesic,
the third equality used (7.6.4) and the fifth equality used the fact that Z a Z a = −1
at each point. Equation (7.6.5) indicates that ηa Z a is a constant along any geodesic
γs (τ ). Therefore, as long as we choose μ0 (s) in the first place such that it is orthogonal
to all γs (τ ) (which is always possible), then any μτ (s) will be orthogonal to γs (τ ).
After this choice, the ηa of each point on S can be viewed as a spatial vector of
the geodesic observer γs (τ ) passing through this point, and thus from now on we
will denote ηa by wa in this text. Suppose s is small, then γ0 (τ ) and γs (τ ) can
be viewed as the world lines of ball 1 and ball 2 in Fig. 7.10, respectively. Now we
call γ0 (τ ) the fiducial observer, and set λa ≡ wa s, then λa can be regarded as the
 in Fig. 7.10, namely the position vector of ball 2 relative to the fiducial observer
λ
(ball 1). Hence, ũ b ≡ Z a ∇a λb can now be interpreted as the 3-velocity of ball 2
relative to the fiducial observer. [Note that it is a spatial vector field on the world
line γ0 (τ ) of ball 1 since Z b (Z a ∇a λb ) = Z a ∇a (Z b λb ) − λb Z a ∇a Z b = 0, where in
the second equality we used the geodesic equation Z a ∇a Z b = 0 and the fact that
λb is spatial (i.e., Z b λb = 0)]. Similarly, ã c ≡ Z a ∇a (Z b ∇b λc ) can be interpreted as
the 3-acceleration of ball 2 relative to ball 1 [which is also a spatial vector field on
γ0 (τ )]. Consider a third geodesic γs̄ (τ ) in the one-parameter family of geodesics
(which corresponds to a ball 2̄ next to ball 2 on the line passing through 1 and 2). The
position vector λ̄a of it relative to ball 1 is naturally λ̄a = wa s̄, and hence the ratio
of the tidal accelerations of ball 2̄ and ball 2 is a constant s̄/s. Thus, instead of
considering specific balls 2, 2̄, etc. (i.e., using λa ), we can directly use wa to define
the following universal quantities which apply to all the balls close to ball 1 in the
one-parameter family of geodesics:
278 7 Foundations of General Relativity

u b := Z a ∇a wb , (7.6.6)
a := Z ∇a u = Z ∇a (Z ∇b w ) ,
c a c a b c
(7.6.7)

both of which are spatial vector fields living on the fiducial geodesic γ0 (τ ). In fact, wa
plays the role of a measuring unit of the position vectors of this family: the position
vector of any γs (τ ) is equal to wa times s. Note that wa has different names in
different works, we refer it to as the separation vector, which is in agreement with
Misner et al. (1973) and Hawking and Ellis (1973). Similarly, u b and a c also play the
roles of measuring units for the 3-velocity and 3-acceleration of this family, which are
called the 3-velocity and the 3-acceleration (tidal acceleration) measured by ball 1,
respectively. Given a one-parameter family of geodesics and a fiducial geodesic γ0 (τ )
in the family, a 3-velocity field u b and a 3-acceleration field a c will be determined. Our
mission is to reveal the close relationship between a c and the spacetime curvature,
see the following proposition:
Proposition 7.6.1 The tidal acceleration measured by an arbitrary fiducial geodesic
γ0 (τ ) in any one-parameter family of timelike geodesics has the following relation
with the spacetime curvature tensor [called the geodesic deviation equation] :

a c = −Rabd c Z a wb Z d . (7.6.8)

Proof

a c = Z a ∇a (Z b ∇b wc ) = Z a ∇a (wb ∇b Z c ) = wb Z a ∇a ∇b Z c + (Z a ∇a wb )∇b Z c = p c + q c ,
(7.6.9)
[in the second step we used (7.6.4), i.e., [Z , w]b = 0] where p c ≡ wb Z a ∇a ∇b Z c ,
and q c ≡ (Z a ∇a wb )∇b Z c . Also,

p c = wb Z a ∇b ∇a Z c − wb Z a Rabd c Z d = wb ∇b (Z a ∇a Z c ) − (wb ∇b Z a )∇a Z c − Rabd c Z a wb Z d


= −(Z b ∇b wa )∇a Z c − Rabd c Z a wb Z d = −q c − Rabd c Z a wb Z d ,

where in the third equality we used the geodesic equation and (7.6.4). Plugging the
above equation into (7.6.9) yields (7.6.8). 
Now we will make a few more comments on the geodesic deviation equation.
(1) The geodesic deviation equation (7.6.8) is an equation that describes the rel-
ative acceleration a c between two neighboring (“infinitesimally nearby”) geodesics,
and a c is the second order derivative of the separation vector wa that describes the
separation of the two curves. Surely there will be a separation between the two curves
(wa = 0), and the separation vector may change with time (u a = 0), but there is not
necessarily a deviation (a c is not necessarily nonvanishing).12

12 There exists such geodesic families in flat spacetime, in which we have u b = 0 and a c = 0 on a
fiducial geodesic γ0 (τ ) (such as a parallel geodesic family). There also exists such geodesic families
in flat spacetime, where we have u b = 0 on γ0 (τ ) [one can just let γ0 (τ ) and the nearby geodesic
become not parallel]. However, there does not exist such a geodesic family, where a c = 0 on γ0 (τ )
unless the spacetime is not flat.
7.6 Tidal Forces and the Geodesic Deviation Equation 279

(2) Equation (7.6.8) reflects the close relationship between a c and the spacetime
curvature tensor Rabc d : for flat spacetime (Rabc d = 0), a c must vanish, and thus the
geodesics that are initially parallel will always be parallel [see the footnote after (1)].
However, as long as Rabc d = 0, there will exist a geodesic family whose geodesic
deviation (characterized by a c ) is nonvanishing, this is reflected by the fact that the
geodesics that are initially parallel will eventually no longer be parallel. The pre-
cise meaning of “initially parallel” is that u b |τ =0 ≡ Z a ∇a wb |τ =0 = 0. This equation
indicates that, by means of the physical meaning of u b with respect to a timelike
geodesic family, the relative 3-velocity between two neighboring geodesics is zero
at the beginning (τ = 0), and hence is said to be “initially parallel”. However, as
long as a c |τ =0 ≡ Z b ∇b u c |τ =0 = 0, after a while u b will not be zero anymore, i.e.,
the two geodesics will “become not parallel”. Just as we said in Sect. 3.5, one of
the equivalent formulations for the curvature tensor being nonvanishing is that there
exist geodesics that are parallel at first which become not parallel.
(3) The Christoffel symbols  σ μν depend on the coordinate system. By choos-
ing the proper coordinate system of a freely falling non-rotating observer one can
make the Christoffel symbols vanish on the world line of the observer (see Proposi-
tion 7.5.1), and this can account for the weightlessness of the observer in Einstein’s
elevator. However, the tidal acceleration a c is directly related to the Riemann tensor
Rabc d (7.6.8), and as a tensor, the latter cannot be made to vanish by choosing any
coordinate system. Thus, the tidal acceleration cannot be eliminated by a coordinate
transformation. Although the observer in Einstein’s elevator cannot feel gravity (the
“gravitational field strength” at this observer is zero), they can still feel the tidal force.
This is an interpretation of Fig. 7.8 from general relativity. On the other hand, at least
indirectly, the λz in (7.6.3) being small verifies the statement that “the effect of
spacetime curvature is only manifested in a spacetime region which is large enough.”
(4) So far we only focused on timelike geodesic families. We choose the proper
time as the affine parameter, and choose μτ (s) to be orthogonal to the geodesics.
This is for no reason except to emphasize the physical meaning that a c is the tidal
acceleration (in order to have a better correspondence with Fig. 7.10). From the
pure mathematical perspective, the geodesic deviation equation (7.6.8) also holds
for spacelike and null geodesic families, one just needs to interpret τ as the affine
parameter of a geodesic. In this case a c no longer has the physical interpretation
of the tidal acceleration, and the separation vector ηa does not need to be orthogo-
nal to Z a . Actually, the geodesic deviation equation also holds for a metric with a
non-Lorentzian signature. Furthermore, one can even talk about geodesics on a man-
ifold without a metric as long as there is a derivative operator; although orthogonality
is not defined, there is still a geodesic deviation equation, i.e., we have the following
proposition:
Proposition 7.6.1 The geodesic deviation equation of an arbitrary one-parameter
family of geodesics {γs (λ)} in (M, ∇a ) is

a c = −Rabd c T a ηb T d , (7.6.8 )
280 7 Foundations of General Relativity

where Rabd c is the Riemann tensor, T a ≡ (∂/∂λ)a is the tangent vector of the fiducial
geodesic γ0 (λ), ηa is the separation vector on γ0 (λ) (as defined before), and a c ≡
T a ∇a (T b ∇b ηc ).
Proof The same as the proof of Proposition 7.6.1. 
[Optional Reading 7.6.1]
The tidal acceleration a c in (7.6.8) is defined in terms of wa (7.6.7). To compare with
Newton’s theory of gravity, we introduced λa ≡ wa s and considered it as corresponding
to the relative position vector λ . Why can λa be interpreted as the position vector of ball
2 relative to ball 1? Suppose p and q are two arbitrary points in flat space, wa is the unit
tangent vector of the line between p and q at p, and s  is the length of the line between
the two points, then λa ≡ wa s  can be referred to as the position vector of p relative to
q (note that |λa | = s  ). Back to the problem of the geodesic deviation in curved space.
Take any μτ (s) in the family of transverse curves, and let p ≡ μτ (0), q ≡ μτ (s) ( p and q
represent the fiducial observer and the point mass being measured at a time τ , respectively).
Use the arc length s  to reparametrize μτ (s), i.e., μτ (s  ) = μτ (s), and let wa and wa be the
tangent vectors of μs (s) and μτ (s  ) at p, respectively, then wa = wa ds/ds  . Hence, if we set
λa ≡ wa s, then we have λ ≡ wa s = wa s  when s is small. Noticing that |wa | = 1,
we see that |λa | = s  ; comparing with the position vector in flat space, we may say that
λa is the position vector of ball 2 relative to ball 1. In the main text one does not need to
introduce the arc length parameter s  , and thus there is no wa , so one just needs to care about
wa whose length changes with τ (note that s does not change with τ ). The change of the
“distance” between balls 1 and 2 is completely manifested by the change of |wa | with τ ,
and so defining the relative 3-velocity and relative 3-acceleration using λa ≡ wa s has a
perfect correspondence with Fig. 7.10.
[The End of Optional Reading 7.6.1]
[Optional Reading 7.6.2]
If we add a constant related to s to the τ of each γs (τ ), then μτ (s) will become non-
orthogonal to the geodesics, and thus whether or not ηa and Z a are orthogonal depends on the
zero setting of the proper time of each geodesic. Further, what if we take an arbitrary affine
parameter τ  to substitute for τ ? Since τ is an affine parameter, it follows from Theorem
3.3.3 that τ  is an affine parameter if and only if τ  = ατ + β. α and β should of course be
constants on each geodesic (and α = 0), but they can be different for different geodesics, i.e.,
α and β can be functions of s: τ  = α(s)τ + β(s). This change of the affine parameter can
be viewed as a coordinate transformation {τ, s} → {τ  , s  } on the 2-dimensional manifold
S , where
s = s , τ  = α(s)τ + β(s) . (7.6.10)
Let Z a and ηa represent the new coordinate basis vectors, i.e., Z a ≡ (∂/∂τ  )a and ηa ≡
(∂/∂s  )a , then it is not difficult to show that

Z a = α −1 Z a , ηa = ηa + ν Z a , (7.6.11)
where ν(τ, s) ≡ −(τ dα/ds + dβ/ds) can be viewed as a function on S . Since we only care
about the separation between the fiducial geodesic γ0 (τ  ) and a geodesic γs (τ  ) next to it,
ηa and ηa can be viewed as vectors describing the same separation (Fig. 7.12). That is, if
the separation vectors ηa and ηa only differ by multiplication by a factor, they describe the
same separation. Thus, there exists a “gauge arbitrariness” on the choice of the separation
vector. If one insists to use the proper time, but allows each geodesic to have arbitrary zero
setting, this is equivalent to setting α = 1 in (7.6.10) while letting β(s) be arbitrary. Then
Z a = Z a , ηa = ηa + ν Z a , and ν = −dβ/ds. Equation (7.6.8) can be expressed as
7.6 Tidal Forces and the Geodesic Deviation Equation 281

Fig. 7.12 ηa and ηa 0( ' '


describe the same separation

q'
a

q
a

a c = −Rabd c Z a ηb Z d = −Rabd c Z a (ηb + ν Z b )Z d = a c ,


(where we used Rabd c Z a Z b Z d = R[ab]d c Z (a Z b) Z d = 0) and thus the zero setting does not
affect the value of a c . However, if one does not insist on using the proper time, i.e., allows
α = 1, then one can only have

a c = −Rabd c Z a ηb Z d = α −2 a c .

This is natural since substituting τ  for the proper time τ is equivalent to substituting a
“coordinate clock” for the standard clock. The rate of this coordinate clock is α −1 times the
rate of the standard clock, and the “tidal acceleration” measured using this clock is naturally
α −2 times the result measured by the standard clock.
[The End of Optional Reading 7.6.2]
[Optional Reading 7.6.3]
A solution ηb to the geodesic deviation equation (7.6.8 ) is called a Jacobi field on the
geodesic γ (λ) being considered. Two points p, q ∈ γ (λ) are said to be conjugate if there
exists a non-vanishing Jacob field ηb on γ (λ), which vanishes at p and q. In this case, we
also say that p and q are a pair of conjugate points on the geodesic γ (λ). For instance, the
south and north poles s and n on the 2-dimensional sphere shown in Fig. 7.13 are a pair of
conjugate points on the geodesic γ from s to n (half of the great circle). It is not difficult
to accept the following intuitive statement: p, q ∈ γ are a pair of conjugate points if there
exists a geodesic from p to q that is infinitesimally close to but different from γ (such as the
γ  in the figure). The precise meaning of the condition after the word “if” is: there exists a
one-parameter family of geodesics from p to q which includes γ . The logic above can be
formulated as:
There exists a geodesic from p to q that is infinitesimally close to but different from γ
⇔ there exists a one-parameter family of geodesics from p to q which includes γ .
⇒ p, q ∈ γ are a pair of conjugate points ⇔ there exists a non-vanishing Jacob field ηb on
γ (λ), which vanishes at p and q.
This logic can help us clarify two subtle problems, which will be introduced as follows
in the manner of Q&A (we stipulate that γ is a geodesic):
Q: Suppose p, q ∈ γ are a pair of conjugate points, does there exist a geodesic from p
to q that is different from but infinitesimally close to γ ?
A: Not necessarily. Because the ⇒ in the relation above cannot be changed to ⇔. There
does exist such situations, in which p, q ∈ γ are conjugate but one cannot find a geodesic
from p to q that is different from but infinitesimally close to γ (omitted).
Q: Suppose there exists a geodesic γ  that passes through p, q ∈ γ and is different from
γ , can we say that p and q are conjugate?
282 7 Foundations of General Relativity

Fig. 7.13 s and n are a pair a n


of conjugate points, while s T
and d are not a d

'

''
s

A: No. Because only γ  existing cannot guarantee that there exists a one-parameter family
of geodesics between p and q which includes γ . A counter example: extending the great
arc γ to d, and denote this major arc as γ̃ , then s, d ∈ γ̃ and there exists a geodesic γ 
(the minor arc of the great circle) which passes s and d and is different from γ̃ . However,
s, d ∈ γ are not a pair of conjugate points since (intuitively speaking) there does not exist a
geodesic connecting s and d that is “infinitesimally close” to γ̃ (γ  is certainly not close to
it), or (precisely speaking) there does not exist a nonvanishing Jacobi field ηb satisfying the
vanishing at the end points condition.
For the significance of conjugate points on the arc length problem, see Sect. 3.3; for the
use of it in the proofs of the singularity theorems, see Wald (1984) pp. 223–233.
[The End of Optional Reading 7.6.3]

7.7 The Einstein Field Equation

Since the distribution of matter produces gravity, and gravity is manifested by the
spacetime curvature, a natural hypothesis is that the spacetime curvature is affected by
the matter distribution. The matter distribution is described by the energy-momentum
tensor Tab , and hence there should exist an equation that relates Tab and the spacetime
curvature. Considering that Newton’s theory of gravity should be the weak-field and
low-speed approximation of general relativity, the comparison between the geodesic
deviation equation (7.6.8) and the tidal force acceleration (7.6.1) in Newton’s theory
of gravity provides important clues for seeking (guessing) this equation. Since the a c
in (7.6.8) is defined in terms of wa instead of λa , for convenience’s sake, we should
change the λi in (7.6.1) to wi . Suppose {x i } is a Cartesian system of the 3-dimensional
Euclidean space, then (7.6.1) can be written as
      
∂ c ∂ c d2 w i ∂ c j ∂ ∂φ
ac = ai = = − w
∂xi ∂xi dt 2 ∂xi ∂x j ∂xi
 c    c  
∂ ∂φ ∂ ∂φ
=− w b
∂ b = −w b
∂ b = −wb ∂b ∂ c φ .
∂xi ∂xi ∂xi ∂xi

This is the tidal acceleration derived from Newton’s theory of gravity, which should be
an approximation of the a c derived from general relativity. Therefore, the comparison
7.7 The Einstein Field Equation 283

between the above equation and (7.6.8) implies the following correspondence:

Rabd c Z a Z d ↔ ∂b ∂ c φ . (7.7.1)

Contracting the indices b and c yields

Rabd b Z a Z d ↔ ∂b ∂ b φ = ∇ 2 φ = 4πρ = 4π Tad Z a Z d ,

where ∇ 2 φ = 4πρ is Poisson’s equation in Newton’s theory of gravity, and in the


last step we used the property 3(a) of Tab in Sect. 6.4 (μ is changed to ρ). From the
above correspondence, we expect the following equation to hold:

Rad Z a Z d = 4π Tad Z a Z d . (7.7.2)

The simplest assumption that satisfies the above equation is

Rab = 4π Tab . (7.7.3)

In fact, this is what Einstein assumed and published initially. However, from Sect.
6.4 we can see that the energy-momentum tensor Tab satisfies ∂ a Tab = 0; using the
minimal substitution rule in Sect. 7.2 we have ∇ a Tab = 0, and hence (7.7.3) leads to

∇ a Rab = 0 , (7.7.3 )

which will lead to physically unacceptable consequences. Contracting the Bianchi


identity ∇[a Rbc]d e = 0 yields ∇[a Rbc]d a = 0, and thus

0 = ∇a Rbcd a + ∇c Rabd a + ∇b Rcad a = ∇a Rbcd a − ∇c Rbd + ∇b Rcd .

Raising the index d using the metric and contracting it with the lower index b yields

0 = ∇a Rc a − ∇c R + ∇b Rc b = 2∇ a Rca − ∇c R ,

and so (7.7.3 ) requires that


∇c R = 0 . (7.7.4)

This is an additional condition enforced on Rab by (7.7.3 ). To illustrate that this


condition is unacceptable, let T ≡ g ab Tab , and raise the index b in (7.7.3) and con-
tracting it with the lower index a. Then we have R = 4π T , and hence (7.7.4) leads
to ∇c T = 0, i.e., T is a constant in the whole matter field. Take a perfect fluid as an
example, it follows from (6.5.1) (changing μ to ρ) that

T = Ta a = ρUa U a + p(δa a + Ua U a ) = −ρ + 3 p .
284 7 Foundations of General Relativity

Under the Newtonian approximation we have ρ  p, and hence T = ∼ −ρ. Therefore,


T being a constant means that the proper energy density ρ is a constant in the whole
fluid field. This obviously does not agree with the perfect fluid case in physics, and
thus (7.7.3) must be modified. The problem is that ∇ a Tab = 0 while ∇ a Rab should
not vanish. If one can find a symmetric tensor G ab of type (0, 2) that only depends
on the spacetime geometry, which not only satisfies ∇ a G ab = 0, but can also replace
Rab in an equation similar to (7.7.3) which still leads to (7.7.2), then the issue will be
solved. This G ab is not difficult to find, since it can be easily seen from the equation
above (7.7.4) that

1 1 1
0 = ∇ a Rab − ∇b R = ∇ a Rab − gab ∇ a R = ∇ a (Rab − Rgab ) ,
2 2 2
what is inside the parenthesis on the right-hand side of this equation can be taken as
G ab . Therefore, we can define G ab as

1
G ab ≡ Rab − Rgab , ∇ a G ab = 0 , (7.7.5)
2
(G ab is called the Einstein tensor, see Definition 3 and Theorem 3.4.8 in Sect. 3.4),
and substitute (7.7.3) by the equation G ab = 8π Tab , namely we assume

1
Rab − Rgab = 8π Tab . (7.7.6)
2

The left-hand side of this equation satisfies ∇ a (Rab − 21 Rgab ) = 0 automatically,


and thus it is compatible with ∇ a Tab = 0. On the other hand, it is not difficult to see
that the above equation will return to (7.7.2) under the Newtonian approximation
(T ∼
= −ρ) as we want. First, it follows from (7.7.6) that 8π Ta a = Ra a − 21 δa a R =
R − 2R = −R, i.e.,
R = −8π T , (7.7.7)

and hence (7.7.6) leads to

1 1 1
Rab = 8π Tab + gab R = 8π Tab + gab (−8π T ) = 8π(Tab − gab T ) . (7.7.6 )
2 2 2
Thus,

1 1
Rab Z a Z b = 8π(Tab Z a Z b − gab Z a Z b T ) = 8π(ρ + T )
2 2
∼ 1
= 8π(ρ − ρ) = 4πρ = 4π Tab Z a Z b ,
2
7.7 The Einstein Field Equation 285

which is exactly (7.7.2). Therefore, one should take (7.7.6) as the equation describing
the relation between the spacetime curvature and a matter field. This equation is
dubbed the Einstein field equation, which is a basic postulate of general relativity.13
In Minkowski space Rabc d = 0 everywhere; hence, G ab = 0, and from Einstein’s
equation we know that Tab = 0. However, is there any physics if there is no matter?
In fact, special relativity studies the motion of physical objects and their interactions,
but the gravitational interaction between them is ignored, i.e., the gravitational fields
produced by the physical objects are ignored, and therefore the spacetime is approx-
imately flat. Thus, special relativity is the approximation of general relativity when
gravity (spacetime curvature) can be ignored. As long as gravity is not negligible,
the spacetime cannot be treated as flat, and in principle special relativity cannot be
applied.
An important special case is Tab = 0, in which Einstein’s equation becomes

1
Rab − Rgab = 0 , (7.7.8)
2
called the vacuum Einstein equation. Given a coordinate system, the components
Rμν of the Ricci tensor can be expressed by the components gμν of the metric and its
partial derivatives (up to the second order) [see (3.4.21)], and the dependence of Rμν
on gμν is highly nonlinear.14 Therefore, (7.7.8) can be viewed as a set of nonlinear
2nd-order partial differential equations for the unknown functions gμν , each solution
gab is a vacuum metric. The Minkowski metric is naturally a solution to the equation
(7.7.8), while a solution to (7.7.8) can be a curved metric. An important example
is the vacuum solution found by Karl Schwarzschild within two months after the
publication of Einstein’s equation, see Sect. 8.3 and Chap. 9 for details.
It is not difficult to show that the scalar curvature R vanishes when Tab = 0, and
thus the vacuum Einstein equation (7.7.8) can be simplified as

Rab = 0 . (7.7.8 )

This indicates that the Riemann tensor of a vacuum metric (i.e., a solution to the
vacuum Einstein equation) gab is equal to its Weyl tensor (see Definition 2 of Sect.
3.4), which is usually nonvanishing.
Equation (7.7.6) with Tab = 0 is called Einstein’s equation with source, which
is similar to Maxwell’s equations with source in Minkowski spacetime [see (6.6.10)],
except there is an important difference. For Maxwell’s equations, one can solve for
the unknown Fab when the source (4-current density J a ) is assigned. It seems that
for Einstein’s equation one can also assign Tab (as a given quantity) and then solve

13 The story being told here is a cleaned up version of the much more convoluted path which Einstein
actually followed originally. In fact, Einstein did not define the Einstein tensor first, and the form
of his equation published in November 1915 was (7.7.6 ) instead of (7.7.6).
14 Specifically, the dependence of G
μν on the second order derivatives of gμν is linear, while the
dependence on the first order derivatives is quadratic. What is worse, G μν also contains the inverse
g μν of gμν (for raising the indices), which is very complex when expressed as a function of gμν .
286 7 Foundations of General Relativity

for the unknown quantity gab ; however, there is an issue: Tab is not meaningful when
gab is undetermined. Take a perfect fluid with zero pressure (dust) as an example. To
define a dust as a matter field, we mean to assign a 4-velocity field U a and a proper
density field ρ to it. The energy-momentum tensor of the dust is Tab = ρUa Ub ,
where Ua ≡ gac U c . Therefore, as long as gac is undetermined, the value of Tab is not
known. Moreover, the 4-velocity field U a should be timelike and normalized, and
both of these concepts involve the metric gab , and so one can hardly view U a as a given
quantity when gab is unknown. Thus, it is improper to treat gab and Tab as respectively
unknown and given quantities. The source of this difference between Einstein’s
equation and Maxwell’s equations is that: the spacetime background (Minkowski
spacetime) is already stipulated in Maxwell’s theory, and the right-hand side of
the equation ∂ a Fab = −4π Jb will be a given quantity −4π ηbc J c when a 4-current
vector J a is given; for Einstein’s equation, however, gab that describes the spacetime
background is yet to be determined, and unfortunately, it appears on both sides of
the equation, and thus one cannot simply consider the right-hand side as being given
beforehand. When solving Einstein’s equation, one should treat gab and the quantities
describing matter fields (e.g., for a dust they are U a and ρ) together as unknown
quantities and solve for them simultaneously. We will provide an example of solving
Einstein’s equation in Sect. 8.4, where the “matter field” will be an electromagnetic
field.15
The non-linearity of Einstein’s equation means that it does not satisfy the super-
position principle, which leads to many consequences. For instance, the sum of
two solutions to an equation is not a solution. This is another significant difference
between Einstein’s equation and Maxwell’s equations.
The Einstein tensor satisfies ∇ a G ab = 0 [see (7.7.5)], and therefore Einstein’s
equation contains ∇ a Tab = 0, which includes a lot of information about the motion
of matter. In fact, for a perfect fluid, this is the equation of motion for the matter
field (see Sect. 6.5). For a perfect fluid with zero pressure, i.e., a dust, it follows
from ∇ a Tab = 0 that the world line of a dust particle is a geodesic [see (6.5.8) and
a few sentences after that]. This conclusion can also be generalized to any object
whose self-gravity is weak enough [Fock (1939); Geroch and Jang (1975)]. Thus,
the postulate in Sect. 7.1 about the world lines of free particles being geodesics is no
longer an independent postulate.
Another completely different approach to obtain Einstein’s field equation is
through the Lagrangian formulation of general relativity, which will be introduced
in Chap. 16 (Volume III). Since it does not involve any knowledge that has not been
covered so far, readers who want to learn about deriving Einstein’s equation through
the variational principle may refer to Sect. 16.1 (except for the optional reading)
directly after reading this section.

15Conventionally, an electromagnetic field is not classified as a matter field, but as the source of a
gravitational field we will later on refer it to as a matter field for convenience.
7.8 Linear Approximation and the Newtonian Limit 287

7.8 Linear Approximation and the Newtonian Limit

7.8.1 Linearized Theory of Gravity

The non-linearity of the Einstein field equation brings many difficulties to the task of
solving the equation as well as the study of general relativity in general. In most of the
cases the gravitational field is weak, and one can approximate the field equation as a
linear equation, which will significantly simplify the problem. In the 4-dimensional
language, a weak gravitational field means that the spacetime metric gab is close to
the Minkowski metric ηab .16 Define γab using the following equation:

gab = ηab + γab , (7.8.1)

then γab is “small”, which means that the components of γab in a Lorentzian coordi-
nate system of ηab satisfy |γμν |  1, so that the second and higher order terms can all
be neglected. Under this approximation, γab can be treated as some kind of physical
field (similar to the electromagnetic field) in Minkowski spacetime. The difference
between γab and an ordinary physical field is that the sum of γab and ηab gives the
spacetime metric. From this perspective (plus the fact that γab is “small”), γab can be
viewed as a perturbation of ηab . For convenience and to avoid confusion, we stipulate
that the tensor indices are all raised and lowered by ηab and ηab (instead of g ab and
gab ), with only one exception, which is g ab . g ab will still represent the inverse of gab
rather than ηac ηbd gcd . Under the linear approximation, it is not difficult to see from
(7.8.1) that
g ab = ηab − γ ab , (7.8.2)

as from this we have g ab gbc = δ a c − (second-order terms in γ ). Suppose ∂a and ∇a


are the derivative operators associated with ηab and gab , respectively, then from
(3.2.10) we know that the Christoffel symbol (i.e., the “difference” between ∂a and
∇a ) in a Lorentzian system is

1 cd
 c ab = g (∂a gbd + ∂b gad − ∂d gab ) . (7.8.3)
2
Plugging (7.8.1) and (7.8.2) into the above equation and only keeping the first-order
terms in γab , we have

1 cd
 (1)c ab = η (∂a γbd + ∂b γad − ∂d γab ) . (7.8.4)
2

16 In the linearized theory of gravity, people usually discuss the spacetime with the background
manifold R4 , or a spacetime region where a flat Lorentzian metric η̃ab can be defined. In the former
case the Minkowski metric ηab is globally defined, and in the latter case it is convenient to denote
the (locally) flat metric η̃ab as ηab .
288 7 Foundations of General Relativity

Using the property that  (1)c ab itself is a first-order small term, plugging the above
equation into (3.4.20) yields the first-order approximation of the Riemann tensor
(with lower indices) of gab (called the linearized Riemann tensor)
(1)
Racbd = ∂d ∂[a γc]b − ∂b ∂[a γc]d . (7.8.5)

Using ηcd to raise and contract the indices, we obtain the first-order approximation
of the Ricci tensor of gab (the linearized Ricci tensor)

(1) 1 1
Rab = ∂ c ∂(a γb)c − ∂ c ∂c γab − ∂a ∂b γ , (7.8.6)
2 2

where γ ≡ γ a a = ηab γab . From this one can easily get the first-order approximation
of the Einstein tensor (called the linearized Einstein tensor)

(1) (1) 1 1 1 1
G ab = Rab − ηab R (1) = ∂ c ∂(b γa)c − ∂ c ∂c γab − ∂a ∂b γ − ηab (∂ c ∂ d γcd − ∂ c ∂c γ ) .
2 2 2 2
(7.8.7)
Therefore,

1 1 1
∂ c ∂(a γb)c − ∂ c ∂c γab − ∂a ∂b γ − ηab (∂ c ∂ d γcd − ∂ c ∂c γ ) = 8π Tab (7.8.8)
2 2 2
is called the linearized Einstein equation. Let

1
γ̄ab ≡ γab − ηab γ , (7.8.9)
2
then the linearized Einstein equation can be further simplified as

1 1
− ∂ c ∂c γ̄ab + ∂ c ∂(a γ̄b)c − ηab ∂ c ∂ d γ̄cd = 8π Tab . (7.8.8 )
2 2

The left-hand side of this equation vanishes when ∂ b ≡ ηbc ∂c acts on it, and thus
the equation above assures ∂ b Tab = 0. This has an important physical meaning: it
indicates that the divergence of the energy-momentum tensor vanishes in the lin-
earized theory of gravity, and hence assures that the laws of conservation of energy,
momentum and angular momentum also hold in the linearized theory of gravity (as
a physical theory).
Equation (7.8.8 ) can also be further simplified. In order to do this, we first review
a heuristic example. Maxwell’s equation ∂ a Fab = −4π Jb in Minkowski spacetime
can be expressed using the electromagnetic 4-potential Aa as [see (6.6.30)]

∂ a ∂a Ab − ∂b ∂ a Aa = −4π Jb . (7.8.10)
7.8 Linear Approximation and the Newtonian Limit 289

Suppose χ is an arbitrary scalar field, then the following transformation for Aa :

Ãa = Aa + ∂a χ (7.8.11)

is called a gauge transformation since Ãa and Aa correspond to the same Fab . One
can always choose χ so that the 4-potential satisfies the Lorenz gauge:

∂ a Aa = 0 , (7.8.12)

then (7.8.10) can be simplified as

∂ a ∂a Ab = −4π Jb . (7.8.13)

In the linearized theory of gravity, there exists a very similar gauge freedom. Suppose
ξ a is an infinitesimal vector field (“infinitesimal” means that the components ξ μ of ξ a
are small enough so that the product with γαβ or itself can be regarded as second-order
terms and neglected), the following transformation of γab :

γ̃ab = γab + ∂a ξb + ∂b ξa (7.8.14)

is called a gauge transformation in the linearized theory of gravity, since it is not


difficult to verify from the commutativity of ∂a and ∂b that ηab + γ̃ab and ηab + γab
(1)
have the same linearized Riemann tensor. Rabcd being invariant leads to the fact
(1) (1)
that Rab and G ab are invariant. Therefore, if γab is a solution to the linearized
Einstein equation, then γ̃ab will also be one. This gauge invariance allows us to
choose an appropriate γab among all the equivalent ones (i.e., to choose an appropriate
gauge) to simplify the linearized Einstein equation (7.8.8). As an analogue of the
electromagnetic Lorenz gauge condition (7.8.12), we will show below that there
exists a subclass in the equivalence class, in which the γ̄ab of each γab satisfies the
following equation:
∂ b γ̄ab = 0 , (7.8.15)

called the Lorenz gauge condition of the linearized theory of gravity.17 From the
equation above we can see that the second and third terms on the right-hand side
of the linearized Einstein equation (7.8.8 ) of this type of γ̄ab vanish, and hence the
equation can be simplified as

∂ c ∂c γ̄ab = −16π Tab , (7.8.16)

which is very similar to (7.8.13)! Now we will show that (7.8.15) can always be
satisfied by choosing ξ a . Suppose that γ̄ab does not satisfy (7.8.15), in order to
choose ξ a such that γ̃ab determined by (7.8.14) has a corresponding

17 Also called the de Donder gauge condition or harmonic gauge condition of the linearized theory
of gravity.
290 7 Foundations of General Relativity

1
γ̃¯ab = γ̃ab − ηab γ̃ (γ̃ ≡ ηab γ̃ab )
2

that satisfies (7.8.15). A simple calculation starting from (7.8.14) shows that ∂ b γ̃¯ab =
∂ b γ̄ab + ∂ b ∂b ξa , and hence as long as we choose a ξ a satisfying

∂ b ∂b ξa = −∂ b γ̄ab , (7.8.17)

then ∂ b γ̃¯ab = 0 is guaranteed. A ξ a that satisfies (7.8.17) must exist, since the com-
ponent form of this equation in an inertial coordinate system will be the following
familiar equation:

∂ 2 ξμ ∂ 2 ξμ ∂ 2 ξμ ∂ 2 ξμ
− + + + = −∂ ν γ̄μν .
∂t 2 ∂x2 ∂ y2 ∂z 2

When γ̄μν is given, the solutions to it not only exist, but also they are numerous.
[Optional Reading 7.8.1]
There is a subtlety in the derivation from (7.8.3) to (7.8.4) that we should specify. Take
the term g cd ∂a gbd as an example, it can be expressed as

g cd ∂a gbd = (ηcd − γ cd )∂a γbd , (7.8.18)


but why do we only keep ηcd ∂a γbd ? Seeing that the off-diagonal components of ηcd vanish
but the off-diagonal components of γ cd can be nonzero, why can it still be neglected? This
can be interpreted from the perspective of perturbation theory. Consider a one-parameter
family of metrics, gab (s), and a one-parameter family of energy-momentum tensors Tab (s)
(with s as the parameter) satisfying
(a) G ab (s) = 8π Tab (s) [where G ab (s) is the Einstein tensor of gab (s)];
(b) gab (0) = ηab , Tab (0) = 0;
(c) There exists a small quantity ε > 0 such that (gab (ε), Tab (ε)) is the (gab , Tab ) of the
spacetime we are concerned with.
Moreover, we also require that gab (s) and Tab (s) can both be Taylor expanded:
(1) (2)
gab (s) = ηab + sgab + s 2 gab + O(s 3 ) ,
(1) (2)
Tab (s) = sTab + s 2 Tab + O(s 3 ) .

Plugging the two equations above into G ab (s) = 8π Tab (s), and ignoring all the O(s 2 )
and higher order terms, what we obtain will be the linear (first-order) approximation of the
Einstein equation, namely (7.8.8). And the derivation from (7.8.3) to (7.8.4) is one of the
steps in this procedure. Since neither γ cd nor ∂a γbd in (7.8.18) contains a zeroth-order term
of s, γ cd ∂a γbd is at least a second-order term. Thus, this term can be neglected and we have
(7.8.4).
[The End of Optional Reading 7.8.1]

[Optional Reading 7.8.2]


In the main text above we have introduced gauge transformations in the linearized theory
of gravity using the active language. In the passive language, such a transformation is the
result of an infinitesimal coordinate transformation as follows:
7.8 Linear Approximation and the Newtonian Limit 291

x μ = x μ − ξ μ (x) , (7.8.19)
(the x in the parentheses is an abbreviation for x σ ) where ξ μ (x) are four arbitrary infinitesimal
functions of the same order as γab . [See Misner et al. (1973) pp. 439–440]. Consider the
coordinate components gρσ = ηρσ + γρσ , under the above coordinate transformation the
tensor transformation law
 ∂xρ ∂xσ
gμν (x  ) = gρσ (x) (7.8.20)
∂ x μ ∂ x ν
can be reduced to
  
 ∂ξ ρ ∂ξ σ
gμν (x  ) = δ ρ μ + μ δ σ ν + ν gρσ (x)
∂x ∂x
∂ξ σ ∂ξ ρ
= gμν (x) + ν gμσ (x) + μ gρν (x)
∂x ∂x
∂ξν ∂ξμ
= ημν + γμν + μ + ν , (where ξμ = ημρ ξ ρ ) ,
∂x ∂x
 = g  − η . Then, up to higher
up to terms of higher order than γab and ξ a . Define γμν μν μν
order terms,

γμν = γμν + ξμ,ν + ξν,μ .
 (x) = g  (x  ) − η
On the other hand, γμν  
μν μν = [gμν (x ) − gμν (x)] + [gμν (x) − ημν ] turns
out to be
 
γμν (x) = [gμν (x  ) − gμν (x)] + γμν . (7.8.21)
 (x  ) − g (x) = ξ
Hence, up to terms of higher order than γab and ξ a , we have gμν μν μ,ν + ξν,μ .
To see how the above coordinate description is related to the gauge transformation (7.8.14) in
the active language, we consider the one-parameter local group of diffeomorphisms generated
by a vector field X a , denoted by φλ (see Optional Reading 2.2.2), with λ as the parameter.
Here we choose X a such that the infinitesimal vector field ξ a in the gauge transformation
is ξ a = ε X a , where ε is an infinitesimal number. For both gab and g̃ab (λ) ≡ φλ∗ gab we can
split them as

gab = ηab + γab , g̃ab (λ) = ηab + γ̃ab (λ) ,

and obtain that ηab + γ̃ab (λ) = g̃ab (λ) = φλ∗ gab = φλ∗ ηab + φλ∗ γab , i.e.,

γ̃ab (λ) = φλ∗ γab + φλ∗ ηab − ηab .

When λ is small, one can rewrite the above equation by means of Lie derivatives as

γ̃ab (λ) = γab + λL X γab + λL X ηab + O(λ2 ) = γab + LλX γab + LλX ηab + O(λ2 ) ,

where the last step can be easily seen from (4.2.8). Ignoring the higher order terms, we have

γ̃ab ≡ γ̃ab (ε) = γab + Lξ ηab = γab + ∂a ξb + ∂b ξa ,

where we have set λ = ε, and (4.3.1 ) is used in the last step. Therefore, the gauge transfor-
mation (7.8.14) can be obtained from changing the metric gab to g̃ab (ε) by a one-parameter
local group of diffeomorphisms, with the perturbation background ηab being unchanged.
Suppose λ is so small that both the domain U and the range φλ [U ] of the diffeomorphism
φλ : U → φλ [U ] are contained in the coordinate patch of {x μ }. Then four functions y μ (λ) ≡
292 7 Foundations of General Relativity

∗ x μ (μ = 0, 1, 2, 3) form a coordinate system on φ [U ]. When λ = ε, the corresponding


φ−λ λ
y (ε) is denoted by x μ . Then, since ε is infinitesimal, we have
μ

x μ = y μ (ε) = φ−ε
∗ μ
x = x μ − εL X x μ = x μ − Lξ x μ = x μ − ξ μ ,

where in the last step we used (4.2.2) and (2.2.3 ), and ξ μ are the components of ξ a in {x μ }.
This is exactly the infinitesimal coordinate transformation (7.8.19). Noticing that gab =
(φλ )∗ g̃ab (λ), according to Theorem 4.1.3, ∀q ∈ U the coordinate components of gab |φλ (q)
in {y μ (λ)} equal the corresponding coordinate components of g̃ab (λ)|q in {x μ }. Especially,
for λ = ε, this yields

 ∂ c  ∂ d   ∂ c   ∂ d 

g̃cd (ε)|q   = gcd |φε (q)   = gμν (x  )|φε (q) ,
∂xμ q ∂xν q ∂ x μ φε (q) ∂ x ν φε (q)

 (x  ) are the coordinate components of g in {y μ (ε) ≡ x μ }, as shown in (7.8.20).


where gμν cd
Hence
 ∂ c  ∂ d 
 
g̃ab (ε)|q = g̃cd (ε)|q   (dx μ )a |q (dx ν )b |q = gμν
 (x  )| μ ν
φε (q) (dx )a |q (dx )b |q .
∂xμ q ∂xν q

 (x  )|    
Since gμν φε (q) = gμν x |φε (q) = gμν (x|q ), it turns out that g̃ab (ε) = gμν (x)
μ ν
(dx )a (dx )b . Then, precisely to the order of ε, we have

Lξ gab = ε L X gab = g̃ab (ε) − gab = [gμν (x) − gμν (x)] (dx μ )a (dx ν )b .

This means that the coordinate components of Lξ gab in {x μ } are actually gμν
 (x) − g (x),
μν
 
not gμν (x ) − gμν (x) in (7.8.21). However, their difference

[gμν (x  ) − gμν (x)] − [gμν
 
(x) − gμν (x)] = gμν (x  ) − gμν

(x)

∂gμν 
∂γμν
= − ξρ + O(ε2 ) = −ξ ρ + O(ε2 ) = O(ε2 )
∂xρ ∂xρ
is negligible on the order of ξ a . Therefore, the gauge transformation (7.8.14) and the infinites-
imal coordinate transformation (7.8.19) are equivalent up to terms of higher order than ξ a .
In fact, what we just saw is a special case of gauge transformations in general relativity. We
will come back to a more general discussion in Sect. 8.10.
[The End of Optional Reading 7.8.2]

7.8.2 The Newtonian Limit

In this subsection we will show that Newton’s theory of gravity can be regarded as
the limit of general relativity under the weak-field and low-speed condition. First,
let us give an interpretation for the “weak-field and low-speed condition”. Take the
gravitational field around the Earth as an example, it corresponds to a slightly curved
metric field gab = ηab + γab , where γab is “small”. In Fig. 7.14, E and D represent,
respectively, the world lines of the Earth and a shell shot from a cannon on the ground
(their relative speed u E D  1), and μ represents the world line of a “high speed”
muon from a cosmic ray. Its “high speed” is from the perspective of an observer
on the Earth; the muon regards itself as being at rest while E is moving at a high
7.8 Linear Approximation and the Newtonian Limit 293

Fig. 7.14 The world lines of E


the Earth E, the shell D and
the muon D

speed. Either way, their relative speed is close to the speed of light (u μE ∼ = 1). As a
flat metric field, ηab has many inertial coordinate systems, such as the inertial frame
{t, x i } which uses the world line of E as a t-coordinate line, and the inertial frame
{t  , x i } which uses the world line of μ as a t  -coordinate line; these two systems
differ by a boost. The 3-speeds of the Earth, the shell as well as cars, airplanes, etc.
relative to the system {t, x i } are all very small, while the 3-speeds of them relative
to {t  , x i } are large. The “weak-field and low-speed limit” should be interpreted as
follows: there exists an inertial coordinate system of ηab (in the example above it
is {t, x i }), in which all the objects we are concerned with have coordinate speeds
much less than 1 (and thus in {t, x i } one cannot use the Newtonian theory to discuss
a problem that involves a muon), and |γμν | ≡ |gμν − ημν |  1.
Specifically speaking, the “weak-field and low-speed” condition guarantees that
there exists an ηab such that γab = gab − ηab is “small”, and there exists an inertial
coordinate system {t, x i } of ηab which satisfies:
(1) The energy-momentum tensor Tab of the source of the gravitational field can
be expressed in this system as:

Tab ∼
= ρ(dt)a (dt)b . (7.8.22)

That is, only T00 , the time-time component of Tab , is nonvanishing in this system.
The space-time components T0i vanish since the small velocity of the source leads
a the small momentum density; the space-space components Ti j vanishing indicates
that, compared with the mass density, the 3-dimensional stress can be ignored (for
instance, the pressure p in the Earth’s center is only 10−10 times the density ρ). Thus,
although in general relativity each component of the energy-momentum tensor Tab of
the matter field contributes to the spacetime curvature, in Newton’s theory of gravity
(as is known to all) only the mass density ρ contributes to the gravitational field.
(2) (a) The spacetime geometry changes slowly due to the low-speed motion of
the source, and hence ∂ γ̄μν /∂t can be ignored; (b) the low-speed motion of an object
in the gravitational field leads to the fact that its 4-velocity U a is approximately equal
to the 4-velocity Z a ≡ (∂/∂t)a of an observer in the {t, x i } system, i.e., U a ∼ = Za.
The linearized Einstein equation under the Lorenz gauge condition can be sim-
plified under the above approximations:

components of the l.h.s. of (7.8.16) = ∂ σ ∂σ γ̄μν = ∂ 0 ∂0 γ̄μν + ∂ i ∂i γ̄μν ∼


= ∂ i ∂i γ̄μν = ∇ 2 γ̄μν ,
294 7 Foundations of General Relativity

where we used the approximation condition (2) in the third equality, and ∇ 2 is the
 in the 3-dimensional coordinate system {x i }. On
square of the derivative operator ∇
the other hand, from the approximate condition (1) we can see that the components of
the right-hand side of (7.8.16) ∼
= −16πρ when μ = ν = 0, and the other components
vanish, i.e.,

∇ 2 γ̄00 = −16πρ , (7.8.23)


∇ γ̄0i = 0 ,
2
(7.8.24)

∇ 2 γ̄i j = 0 . (7.8.24 )

The unique solutions γ̄0i and γ̄i j for equations (7.8.24) and (7.8.24 ) that are well-
behaved at infinity are constants, which can be set to zero by means of a gauge
transformation. Thus, the only nonzero component of γ̄μν is γ̄00 , which satisfies
equation (7.8.23). Let
1
φ ≡ − γ̄00 , (7.8.25)
4
and interpret φ as the Newtonian gravitational potential, then equation (7.8.23) will
become the well-known Poisson equation in Newton’s theory of gravity:

∇ 2 φ = 4πρ . (7.8.26)

The conclusion that the only nonzero component of γ̄μν is γ̄00 can also be expressed
in terms of a tensor equation as

γ̄ab = γ̄00 (dt)a (dt)b = −4φ(dt)a (dt)b . (7.8.27)

Hence,
γ̄ ≡ ηab γ̄ab = γ̄00 ηab (dt)a (dt)b = −γ̄00 = 4φ . (7.8.28)

Also, from γab = γ̄ab + ηab γ /2 we get γ = ηab γab = ηab γ̄ab + ηab ηab γ /2 = γ̄ +
2γ , and thus γ = −γ̄ . Therefore,

1
γab = γ̄ab − ηab γ̄ . (7.8.29)
2
By means of (7.8.27) and (7.8.28), the above equation can be rewritten as

γab = −φ[4(dt)a (dt)b + 2ηab ] . (7.8.30)

Based on the discussion above, we can derive the equation of motion for a point mass
under the Newtonian approximation. Suppose there is no force acting on the point
mass other than gravity, then from the viewpoint of general relativity its world line
should be a geodesic, whose equation in the inertial coordinate system of ηab is
7.8 Linear Approximation and the Newtonian Limit 295

d2 x μ μ dx ν dx σ
+  νσ = 0, (7.8.31)
dτ 2 dτ dτ
where τ is the proper time of the point mass. Under the Newtonian approximation, the
condition U a ∼= Z a satisfied by the 4-velocity U a of the point mass assures that τ ∼
=t
(the proper time is approximately equal to the coordinate time) and u i ≡ dx i /dt ∼ =0
(the 3-velocity is approximately zero), and hence U ν ≡ dx ν /dt is approximately
(1, 0, 0, 0). Therefore, (7.8.31) can be expressed approximately as

d2 x μ
= − μ 00 . (7.8.32)
dt 2
It follows from (7.8.4) that [the superscript (1) of  is omitted]

1 00 1 ∂γ00 ∼
 0 00 = η (γ00,0 + γ00,0 − γ00,0 ) = − = 0,
2 2 ∂t
1 1 1 ∂γ00
 i 00 = ηi j (γ j0,0 + γ0 j,0 − γ00, j ) ∼
= − δ i j γ00, j = − , i = 1, 2, 3 ,
2 2 2 ∂xi
(7.8.33)

where the second equality is due to γ j0 = γ̄ j0 + 21 γ η j0 = 0. Hence, (7.8.32) gives an


identity when μ = 0, and gives ddtx2 = 21 ∂γ
2 i
00
∂ xi
(i = 1, 2, 3) when μ = i. Then, from
(7.8.29) and (7.8.28) we have γ00 = γ̄00 /2 = −2φ. Plugging this into the equation
above, and noticing (7.8.25), we obtain d2 x i /dt 2 = −∂φ/∂ x i . Since d2 x i /dt 2 is
the ith component of the 3-acceleration a of the point mass relative to the inertial
coordinate system {t, x i }, the above equation can be expressed in terms of an equality
of 3-vectors as
a = −∇φ . (7.8.34)

This is exactly the equation of motion for a point mass that only undergoes gravi-
tational force in Newton’s theory of gravity. Equations (7.8.26) and (7.8.34) are the
basic equations of Newton’s theory of gravity; thus, Newton’s theory of gravity can
be regarded as the weak-field and low-speed limit of general relativity. From

1 1
φ ≡ − γ̄00 = − γ00 (7.8.35)
4 2
we have g00 = η00 + γ00 = −(1 + 2φ), or,

1
φ = − (1 + g00 ) . (7.8.36)
2
This reflects the close relation between the metric component g00 and the Newtonian
gravitational potential under the Newtonian approximation. Again, take the balls 1
and 2 in Fig. 7.10 as an example. Choose the inertial coordinate system {t, x, y, z}
of ηab such that the z-axis is vertically upwards, then Z a = (∂/∂t)a in (7.6.8).
296 7 Foundations of General Relativity

Noticing (7.6.7), one can rewrite (7.6.8) as ã c = −R0b0 c λb , whose z-component


is ã z = −R0z0z λz (z is not summed over). In the Newtonian approximation, the
derivative with respect to t can be ignored, and hence it follows from (7.8.5) that
R0z0z = − 21 ∂ 2 γ00 /∂z 2 = ∂ 2 φ/∂z 2 = d2 φ/dr 2 , and therefore ã z = −(d2 φ/dr 2 )λz ,
which is in agreement with (7.6.2). This is a verification of the following state-
ment: the tidal acceleration in general relativity determined by the curvature tensor
according to (7.6.8) will return to the tidal acceleration in the Newtonian mechanics
determined by (7.6.2) under the weak-field and low-speed approximation.

7.9 Gravitational Radiation

The resemblance between the gravitational field and the electromagnetic field makes
people expect that there exists gravitational radiation in general relativity similar to
the electromagnetic radiation. Actually, the fact that there exists a wave solution
to Einstein’s equation which propagates at the speed of light was already well-
known soon after general relativity was published. Nevertheless, for quite a while the
authenticity of gravitational waves was in doubt. A. S. Eddington suggested in 1922
that a gravitational wave solution only represents the wave motion of the spacetime
coordinates, and thus has no observational effect. The situation has turned around
since the 1950s. Using a coordinate independent method, H. Bondi and collaborators
showed that gravitational waves indeed carry energy and momentum, and the mass of
the system must decrease when it emits gravitational waves. This led to the physical
authenticity and observability of gravitational radiation being gradually accepted.

7.9.1 Gauge Conditions of the Linearized Theory of Gravity

First we will discuss the gravitational waves under the approximation of linearized
gravity. Before introducing the wave solutions to the linearized Einstein equation,
let us first discuss some useful gauge conditions in the linearized theory of gravity.
As we have seen in Sect. 7.8.1, the Lorenz gauge condition

1
∂ b γ̄ab = ∂ b γab − ∂a γ = 0 (7.9.1)
2
in linearized gravity is inspired by the Lorenz gauge condition ∂ a Aa = 0 of the
electromagnetic field. However, in electrodynamics, ∂ a Aa = 0 and the wave equation
(with source) ∂ b ∂b Aa = −4π J a cannot determine the 4-potential Aa completely,
since another 4-potential

Aa = Aa + ∂a χ (7.9.2)
7.9 Gravitational Radiation 297

also satisfies ∂ a Aa = 0 and ∂ a ∂a Ab = −4π Jb as long as

∂ a ∂a χ = 0 . (7.9.3)

In a source-free (J a = 0) region, it can be proved that there exists a function χ


satisfying the above condition such that A0 = 0. The condition A0 = 0 together
with the Lorenz gauge condition ∂ a Aa = 0 is called the radiation gauge condition.
In order to find such a χ , we first find a function χ  satisfying

∂χ 
= −A0 . (7.9.4)
∂t

As long as A0 is a C 2 function, there is no problem for the existence of χ  . Then, in


a source-free region we have

∂ a ∂χ 
(∂ ∂a χ  ) = ∂ a ∂a = −∂ a ∂a A0 = 0 ,
∂t ∂t

which indicates that ∂ a ∂a χ  is time-independent. Then, we can find a time-


independent function χ0 that satisfies

∇ 2 χ0 = ∂ a ∂a χ  . (7.9.5)

It is easy to verify that χ = χ  − χ0 is the function χ that makes Aa = Aa + ∂a χ


satisfy the radiation gauge condition. To put it more precisely, we have the following
proposition:
Proposition 7.9.1 Let U be a non-empty open subset of a spacetime, on which a
flat Lorentzian metric ηab is defined, and let {x μ } (x 0 ≡ t) be a coordinate system
where the components of ηab are ημν .18 Suppose A0 is a C 2 function on U satisfying
∂ a ∂a A0 = 0. Then, ∀ p ∈ U there exists a C 3 function χ on U satisfying ∂ a ∂a χ = 0
and ∂χ /∂t = −A0 in an open neighborhood U  ⊂ U of p. Moreover, if A0 is smooth,
then there exists a smooth χ on U satisfying ∂ a ∂a χ = 0 and ∂χ /∂t = −A0 on U  .
Proof [Optional Reading]
According to the discussion above, all we have to prove is that there exists a χ0 satisfying
(7.9.5) in some neighborhood U  of p. The Lorentzian system {x μ } defines a chart (U, ψ)
of M. Then one can choose a neighborhood U   U of p such that ψ[U  ] = I ×   ⊂ R4 ,
where I is an open interval and   ⊂ R3 is enclosed by a closed piecewise smooth surface.
In R3 , one can choose an open ball  satisfying    . Then, one can construct a time-
independent C 1 (or smooth if so is A0 ) function ρ on R4 as follows:

18 Here we only require ηab to be flat on U , and the same for Proposition 7.9.2 (see the first footnote
in Sect. 7.8). It follows from Theorem 3.4.9 that for any (locally) flat metric there exists a coordinate
system such that the metric components are constant. For Lorentzian signature, one can further find
a coordinate transformation and turn them into ημν .
298 7 Foundations of General Relativity

(∂ a ∂a χ  )|q , if (t, x) = ψ(q) ∈ ψ[U  ] ,


ρ(
x) =
0, if (t, x) ∈ R4 − ψ −1 [R × ] .

Then, there exists the following integral:


 
1 ρ(x ) 1 ρ(x )
φ(
x) = − dx 1 ∧ dx 2 ∧ dx 3 = − dx 1 ∧ dx 2 ∧ dx 3 ,
4π R3 x − x  |
| 4π  x − x  |
|

which satisfies Poisson’s equation ∇ 2 φ = ρ (this is a well-known result in electrostatics). It


is easy to see that φ is C 3 if A0 is C 2 and φ is smooth if A0 is smooth. Now define χ0 = ψ ∗ φ,
then ∇ 2 χ0 = ψ ∗ ρ, which equals ∂a ∂ a χ  when restricted to U  . 

The situation of linearized gravity is very similar: the linearized Einstein equation
and the Lorenz gauge condition ∂ a γ̄ab = 0 cannot determine γab completely, since
if we set

γab = γab + ∂a ξb + ∂b ξa , (7.9.6)

 
then γab also satisfies (7.8.16) and ∂ a γ̄ab = 0 as long as ξa satisfies

∂ b ∂b ξa = 0 . (7.9.7)

(The existence of such a ξ a will be proved in Optional Reading 7.9.1). In a source-free


region, one can further set γ = 0 and γ0i = 0 (i = 1, 2, 3). Together with the Lorenz
gauge condition, this is called the radiation gauge condition of the linearized theory
of gravity. Furthermore, as we will show below, one can also set γ00 = 0, and the
gauge condition becomes

∂ b γ̄ab = 0 , γ = 0, γ0ν = 0 , ν = 0, 1, 2, 3 , (7.9.8)

called the transverse-traceless gauge condition, or TT gauge condition for short.


Proposition 7.9.2 Let U be a non-empty open subset of a spacetime, on which a flat
Lorentzian metric ηab is defined, and let {x μ } (x 0 ≡ t) be a coordinate system where
the components of ηab are ημν . Suppose γab is a smooth symmetric tensor field which
satisfies on U the Lorenz gauge condition ∂ a γ̄ab = 0 and

∂ c ∂c γ = 0 , ∂ c ∂c γ0ν = 0 , ν = 0, 1, 2, 3 . (7.9.9)

Then, ∀ p ∈ U there exists a smooth vector field ξ a on U and an open neighborhood



U  ⊂ U of p such that γab = γab + ∂a ξb + ∂b ξa satisfies the transverse-traceless

gauge condition on U .

Proof See Optional Reading 7.9.1. 

Note that in the above proposition, γab is not necessarily a solution to the linearized
Einstein equation. Now we consider γab as a solution to the source-free linearized
Einstein equation in the Lorenz gauge, then (7.8.16) with Tab = 0 is reduced to
7.9 Gravitational Radiation 299

1
∂ c ∂c γab − ηab ∂ c ∂c γ = 0 . (7.9.10)
2

Contracting both sides of the above equation with ηab yields ∂ c ∂c γ = 0, and (7.9.10)
becomes

∂ c ∂c γab = 0 . (7.9.11)

In this case, the conditions in (7.9.9) are both satisfied. In fact, it is obvious that
(7.9.10) and (7.9.11) are equivalent to each other, since if one is satisfied, so is the
other. From the Lorenz gauge condition (7.9.1) we also see that ∂ a ∂ b γab = 21 ∂ a ∂a γ ,
and hence ∂ c ∂c γ = 0 also leads to

∂ a ∂ b γab = 0 . (7.9.12)

As we have discussed above, given a solution γab of the source-free linearized



Einstein equation satisfying the Lorenz gauge condition, γab = γab + ∂a ξb + ∂b ξa is
automatically a solution of the same equation as long as it also satisfies the Lorenz
gauge condition (∂ b ∂b ξa = 0). Applying this to Proposition 7.9.2, we have the fol-
lowing conclusion:

Corollary 7.9.3 Suppose a smooth symmetric tensor field γab is a solution of the the
source-free linearized Einstein equation satisfying the Lorenz gauge condition. Then,

for each point p in the domain U of γab , there exists γab = γab + ∂a ξb + ∂b ξa in an

open neighborhood U ⊂ U of p, which is a solution of the source-free linearized
Einstein equation satisfying the transverse-traceless gauge condition.

Now let us count the degrees of freedoms of γab in the TT gauge. It follows from
γμν = γνμ that γab has at most 10 independent components, while they are also con-
strained by (7.9.8). The conditions in (7.9.8) contain in total 4 + 4 + 1 = 9 equations,
but ∂ ν γ0ν = 0 is also an outcome of γ0ν = 0, and so among these 9 equations only
8 are independent. Therefore, γab has only 10 − 8 = 2 independent components.19
Later we will see that in physics they correspond to the two independent polarization
states (modes) of gravitational plane waves, see Sect. 7.9.2.
For the linearized Einstein equation (not necessarily source-free), there are also
some other common gauge conditions, such as the transverse gauge condition,
which requires

19 Note that this is a handwaving discussion, since the constraint counting is actually very subtle
when it comes to partial differential equations. For example, the second equation in (7.9.15) can be
regarded as a constraint for ξ0 in the first equation, but it does not mean that ξ0 has no degree of
freedom! For another example, the 1-dimensional wave equation ∂t2 u − c2 ∂x2 u = 0 has the general
solution u = f + (x − ct) + f − (x + ct), with f ± being arbitrary C 2 functions of one variable. If
the wave equation is considered to be a constraint, is the number of constraints 1 or −1?
300 7 Foundations of General Relativity

1
∂i γ 0i = 0 , ∂i s i j = 0 (where si j = γi j − δ kl γkl δi j ) , i, j = 1, 2, 3 ,
3
(7.9.13)

and the synchronous gauge condition, which requires

γ0μ = 0 , μ = 0, 1, 2, 3 . (7.9.14)

The reader may refer to Carroll (2019) for more discussions about these gauge
conditions.
[Optional Reading 7.9.1]
Proof of Proposition 7.9.2 (1) According to Proposition 7.9.1, there exists a function ξ0 on
U such that ∀ p ∈ U ,
∂ξ0 1
∂c ∂ c ξ0 = 0 , = − γ00 (7.9.15)
∂t 2
are both satisfied on a neighborhood U0 of p.
(2) For each of i = 1, 2, 3, there is obviously a smooth function ξi on U satisfying

∂ξi ∂ξ0
= −γ0i − i . (7.9.16)
∂t ∂x
Then, using (7.9.16) and the second equation of (7.9.15) we can derive that

∂ 2 ξi ∂γ0i ∂ ∂ξ0 ∂γ0i 1 ∂γ00


∂c ∂ c ξi = − + ∇ 2 ξi = + i + ∇ 2 ξi = − + ∇ 2 ξi .
∂t 2 ∂t ∂ x ∂t ∂t 2 ∂xi
(7.9.17)
On the other hand, from (7.9.16) we also have on U0 that

∂ ∂ξ  ∂
∂c ∂ c ξi = ∂c ∂ c i = −∂c ∂ c γ0i − i ∂c ∂ c ξ0 = 0 ,
∂t ∂t ∂x
where (7.9.9) and the first equation in (7.9.15) are used in the last step. Thus, the right side
of (7.9.17) is independent of t on U0 , and thus there exist smooth functions X i (i = 1, 2, 3)
that satisfy on U0

∂ X i ∂γ0i 1 ∂γ00
= 0, ∇ 2 X i = − + − ∇ 2 ξi . (7.9.18)
∂t ∂t 2 ∂xi
Combining (7.9.16), (7.9.17) and (7.9.18), we can see that each function ξi + X i on U0
satisfies
∂(ξi + X i ) ∂ξ0
+ i = −γ0i , ∂c ∂ c (ξi + X i ) = 0 . (7.9.19)
∂t ∂x

(3) For convenience, denote ξ ≡ (ξ1 , ξ2 , ξ3 ) and X  ≡ (X 1 , X 2 , X 3 ). For example, the nota-
∂ξ 
 · ξ can be regarded as an abbreviation of δ i j ij . From the first equation in (7.9.19),
tion ∇ ∂x
we obtain on U0
7.9 Gravitational Radiation 301

  ∂γ
 · ∂ξ + ∇
∇  · ∂ X + ∇ 2 ξ0 = −δ i j 0 j . (7.9.20)
∂t ∂t ∂xi
Then, one can find on U0 that
 
∂ 1  · ξ − ∇
 · X  = 0 .
− (γ00 + γ ) − ∇
∂t 2

[The reader should complete the proof. Hint: use (7.9.20), (7.9.1) and (7.9.15)]. Thus,
 · ξ − ∇
− 21 (γ00 + γ ) − ∇  · X  is independent of t when restricted to U  . This allows us to
0
find a function φ defined on an open neighborhood Uφ ⊂ U0 of p such that

∂φ 1  · ξ + ∇
 · X  .
= 0, ∇2φ = (γ00 + γ ) + ∇ (7.9.21)
∂t 2

(4) Applying ∇ 2 on both sides of the second equation in (7.9.21), one finds on Uφ that

∇2∇2φ = 0 . (7.9.22)
[The reader should complete the proof. Hint: use (7.9.18), (7.9.1) and (7.9.9)]. This is
 · ∇∇
equivalent to say that ∇  2 φ = 0, namely the 3-vector field ∇∇ 2 φ is divergence-free.
Thus, there exists an open neighborhood Uφ ⊂ Uφ of p diffeomorphic to R4 such that
 2φ = ∇
∇∇  × Y is satisfied on U  for some 3-vector field Y defined on U . Since φ does
φ
not depend on t when restricted on Uφ , we can require that Y does not depend on t on Uφ .
Thus, there exists a 3-vector field X on U which is independent of t such that ∇ 2 X = Y is
satisfied on an open neighborhood U  ⊂ Uφ of p. Then, we have on U  that

 × X − ∇φ)
∇ 2 (∇  = 0. (7.9.23)

(5) So far we have introduced a series of functions and 3-vector fields, whose domains
are open neighborhoods of p. Since we do not care about their behaviors outside these
neighborhoods, they can be extended arbitrarily to smooth functions or 3-vector fields on U .
Thus, from now on, all concerned functions and 3-vector fields are defined on U , while the
equations they satisfy are valid on U  ⊂ U .
Now we define a 3-vector field ξ = (ξ1 , ξ2 , ξ3 ) on U as follows:

ξ = ξ + X  − ∇φ  × X .
 +∇ (7.9.24)

When restricted to U  ⊂ Uφ ⊂ Uφ ⊂ U0 ⊂ U , both φ and X are independent of t, and so
(7.9.19) gives

∂ξi ∂ξ0
+ i = −γ0i , (7.9.25)
∂t ∂x

 × X − ∇φ
∂c ∂ c ξ = ∂c ∂ c ∇  × X − ∇φ
 = ∇2 ∇  = 0, (7.9.26)
where (7.9.23) is used in the last step of (7.9.26). Using the second equation in (7.9.21), we
have

 · ξ = ∇
∇  · X  − ∇ 2 φ = − 1 (γ00 + γ ) .
 · ξ + ∇
2
302 7 Foundations of General Relativity

Now, let ξ0 , ξ1 , ξ2 and ξ3 be the coordinate components of a 1-form ξa on U . Then, combining


the above equation and the second equation in (7.9.15) yields

∂ξν ∂ξ0  · ξ = − 1 γ .
∂a ξ a = ημν =− +∇ (7.9.27)
∂xμ ∂t 2

Similarly, the wave equations (7.9.15) and (7.9.26) for ξ0 and ξ can be combined into

∂c ∂ c ξa = 0 . (7.9.28)
Finally, the second equation in (7.9.15) can be combined with (7.9.25) into

∂ξν ∂ξ0
+ ν = −γ0ν . (7.9.29)
∂t ∂x

(6) Now let us consider the tensor field γab  = γ + ∂ ξ + ∂ ξ on U . It follows that
ab a b b a
 = γ + 2 ∂ ξ a . From now on the equations will be restricted on U  . First, we
γ  = ηab γab a
can see that γ  = 0 due to (7.9.27). From (7.9.27) and (7.9.28) we obtain that ∂ b γab  =
b 1 
∂ γab − 2 ∂a γ = 0, i.e., γab satisfies the Lorenz gauge condition. Also, it follows from
(7.9.29) that

 ∂ξν ∂ξ0
γ0ν = γ0ν + + ν = 0, ν = 0, 1, 2, 3 .
∂t ∂x
 satisfies
Having these, we have proved the existence of a gauge transformation such that γab
the TT gauge condition on U  . 
[The End of Optional Reading 7.9.1]

7.9.2 Gravitational Plane Waves

The source-free linearized Einstein equation is a good description for the gravitational
waves emitted by a source far away from an observer. In Sect. 7.8, we have seen
that under a gauge transformation γab satisfies the Lorenz gauge condition. Then,
according to Corollary 7.9.3, a further gauge transformation can make it satisfy the
transverse-traceless (TT) gauge condition at least in an open neighborhood of the
observer. Under the TT gauge condition, now we will investigate wave solutions of
the source-free linearized Einstein equation.
In the TT gauge, γab satisfies (7.9.8). The traceless condition reduces the Lorenz
gauge condition to ∂ b γab = 0, and the source-free linearized Einstein equation
becomes (7.9.11). From now on, all the equations in this subsection are valid on
an open neighborhood U of the observer’s world line, on which a flat Lorentzian
metric ηab is defined, whose components in a coordinate system {x μ } are ημν . Then,
the ordinary derivative operator ∂a of the coordinate system {x μ } is the derivative
operator associated with ηab .
As an ansatz, let us consider a solution to (7.9.11) of the following form:

γab = f (K μ x μ )Hab , (7.9.30)


7.9 Gravitational Radiation 303

where f is a C 2 function of one variable, K ν = K μ ημν are the components of a


constant 4-vector field K a in {x μ }, and Hab is a constant symmetric tensor field of
type (0, 2). K a and Hab being constant vector and tensor fields, respectively, means
that

∂b K a = 0 , ∂c Hab = 0 . (7.9.31)

In other words, all the components of K a and Hab in {x μ } are constants. Note that
γab in (7.9.30) remains unchanged if we replace f by C f and Hab by Hab /C for any
nonzero constant C. Hence, if the range of f is bounded, we can assume that −1 
f  1 with | f (K μ x μ )| = 1 at some spacetime point. In this way, Hab represents the
amplitude of the wave solution (7.9.30), called the polarization tensor, and K a will
be the wave 4-vector for a gravitational wave.
Noticing that

∂c (K μ x μ ) = K μ ∂c x μ = K μ (dx μ )c = K c , (7.9.32)

we have

∂c γab = f  (K μ x μ )K c Hab , ∂c ∂d γab = f  (K μ x μ )K c K d Hab , (7.9.33)

where f  and f  are the first and the second order derivatives of f , respectively.
Hence, to obtain a solution that is nonzero and non-constant, we should consider
f = 0, f  = 0 and Hab = 0 (meaning that they are not identically zero, with possible
zero points). Then, the TT gauge condition is now equivalent to

K b Hab = 0 , ηab Hab = 0 , H0ν = Hν0 = 0 , ν = 0, 1, 2, 3 . (7.9.34)

Plugging (7.9.30) into (7.9.11) yields

K c K c Hab f  (K μ x μ ) = 0 . (7.9.35)

First we consider a special case, namely f  = 0. Since we have assumed that f  = 0,


without loss of generality, we can set f to be f (λ) = λ + λ0 with λ0 a constant. The
corresponding γab then reads

γab = (K μ x μ + λ0 )Hab . (7.9.36)

Since ∂c ∂d γab = 0, it follows from (7.8.5) that the first-order Riemann curvature of
gab = ηab + γab vanishes, i.e., gab is a flat metric in the linear (first-order) approxima-
tion. Then, there exists a coordinate system {x μ } such that gab = ημν (dx μ )a (dx ν )b
(see the first footnote in Sect. 7.9.1) in the linear approximation. (Note that the above
coordinate transformation does not correspond to a gauge transformation described
in Optional Reading 7.8.1). Therefore, a solution of the form (7.9.36) turns out to be
304 7 Foundations of General Relativity

Fig. 7.15 The spacetime


diagram of the gravitational
plane wave propagating
along the z-axis. S is a
constant-phase surface in the
spacetime, and S0 is the
wavefront at a time t0

a trivial solution at least in the linear approximation, and hence it is not regarded as
having any physical effect.
From now on we will assume f  = 0. In this case, (7.9.35) implies

K a Ka = 0 . (7.9.37)

If K a = 0, the corresponding γab is a constant tensor field, which is not interesting in


physics. Thus, we will focus on the case of K a = 0, i.e., K a is a nonzero null vector.
Under the ansatz (7.9.30) with f  = 0, (7.9.34) and (7.9.37) are the necessary and
sufficient condition to determine a nontrivial solution of the source-free linearized
Einstein equation in the TT gauge. Such a solution represents a (traveling) gravita-
tional plane wave, whose wavefronts are surfaces described by (x) ≡ K μ x μ = 0.
As shown in (7.9.32), K a = (d)a is the normal covector of each of surfaces. It
follows from (7.9.37) that

g ab K a K b = (ηab − γ ab )K a K b = ηab K a K b − f (K μ x μ )H ab K a K b = ηab K a K b = 0 ,

where (7.9.34) is used in the third equality. This indicates that in the linear approx-
imation, a wavefront S is a null surface with respect to either ηab or gab . In
other words, gravitational waves described by (7.9.30) propagate at the speed of
light in vacuum just in a way similar to electromagnetic waves in Sect. 6.6.5, see
Fig. 7.15. (gab = ηab + γab and its curvature R a bcd correspond to the electromagnetic
4-potential Aa and the electromagnetic field Fab , respectively).
To demonstrate an important property of this plane wave solution, let us define
K̃ a = g ab K b (note that in linearized gravity we stipulate that K a = ηab K b ). In the
linear approximation we have g ab = ηab − f (K μ x μ )H ab , and thus up to higher order
terms,
K̃ a = ηab K b − f (K μ x μ )H ab K b = K a ,

where (7.9.34) is used in the second equality. By means of the linearity and the
Leibniz rule of L K̃ , the Lie derivative of gab = ηab + f (K μ x μ )Hab can be written
as

L K̃ gab = L K ηab + Hab L K [ f (K ν x ν )] + f (K ν x ν ) L K Hab .


7.9 Gravitational Radiation 305

Using (4.2.8) and setting the ∇a therein to the ordinary derivative ∂a in {x μ }, we have

L K Hab = K c ∂c Hab + Hcb ∂a K c + Hac ∂b K c = 0 ,

where (7.9.31) is used in the last step. Similarly one finds L K ηab = 0. Then,

L K̃ gab = Hab L K [ f (K ν x ν )] = Hab K c ∂c [ f (K ν x ν )] = Hab K c K c f  (K ν x ν ) = 0 ,


(7.9.38)
Hence, K a is a Killing vector field with respect to ηab , and K̃ a = K a is a Killing
vector field with respect to gab in the linear approximation. Then, (7.9.38) gives rise
to the Killing equation ∇a K b + ∇b K a = 0, where ∇a is the torsion-free derivative
operator associated with gab . Since K b = (d)b = ∇b , the torsion-free condition
of ∇a leads to ∇b K a = ∇a K b . Thus, the Killing equation becomes

∇a K b = 0 , i.e., ∇a K̃ b = 0 . (7.9.39)

This indicates that the rays of these gravitational plane waves are parallel to each
other. Therefore, they are called plane-fronted gravitational waves with parallel
rays, or pp-waves for short.20 Generally speaking, any spacetime that admits a
nonzero null vector field K̃ a satisfying ∇a K̃ b = 0 is called a pp-wave, see Stephani
et al. (2003).
Now we will find this wave solution explicitly. Since K a is nonzero, in a Lorentzian
coordinate system {x μ } (with t ≡ x 0 ) it can be decomposed as

K a = ω(∂/∂t)a + k a , (7.9.40)

then ω and k a can be interpreted as the angular frequency and the wave 3-vector,
respectively. Also, K a being null indicates that ω2 = k a ka ≡ k 2 . One can further
choose {x μ } such that k a is in the z-direction (z ≡ x 3 ), i.e., the wavefront of each
time t is a constant-z plane (the phase K μ x μ = −ωt + kz at t is only a function of
z). Then, K a can be expressed as

K a = ω(∂/∂t)a + k(∂/∂z)a , (7.9.41)



with ω = k 2 ≡ k. In this coordinate system, the conditions in (7.9.34) result in

H11 + H22 = 0 , Hν3 = H3ν = H0ν = Hν0 = 0 , ν = 0, 1, 2, 3 . (7.9.42)

Thus, among the components Hμν of Hab , the nonvanishing ones can only be H11 =
−H22 and H12 = H21 . Therefore, Hab can be written as
(+) (×)
Hab = H11 Hab + H12 Hab , (7.9.43)

20 Notice that gab = ηab + f (K μ x μ )Hab is a pp-wave only in the linear approximation.
306 7 Foundations of General Relativity

where
(+)
Hab = (dx 1 )a (dx 1 )b − (dx 2 )a (dx 2 )b , (7.9.44)
(×)
Hab = (dx )a (dx )b + (dx )a (dx )b .
1 2 2 1
(7.9.45)

In (7.9.43), H11 and H12 are arbitrary real numbers, corresponding to the two degrees
of freedom we discussed in Sect. 7.9.1 by counting the degrees of freedom. Corre-
spondingly, if we define
(+) (+)
γab = f (K μ x μ )Hab = f (−ωt + kz)[(dx 1 )a (dx 1 )b − (dx 2 )a (dx 2 )b ] , (7.9.46)
(×) (×)
γab = f (K μ x μ )Hab = f (−ωt + kz)[(dx 1 )a (dx 2 )b + (dx 2 )a (dx 1 )b ] , (7.9.47)

then a solution in the form of (7.9.30) can be expressed as


(+) (×)
γab = H11 γab + H12 γab . (7.9.48)

Plugging the above solution into (7.8.5) yields the linearized Riemann curvature
tensor
(1)
Racbd = (K d K [a Hc]b − K b K [a Hc]d ) f  (K μ x μ ) .

To verify that the curvature is indeed nonzero, we can decompose it into two terms:
(1) (1)(+) (1)(×)
Racbd = H11 Racbd + H12 Racbd , (7.9.49)

where
(1)(+) (+) (+)
Racbd = K d K [a Hc]b − K b K [a Hc]d f  (K μ x μ ) , (7.9.50)
(1)(×) (×) (×)  μ
Racbd = K d K [a Hc]b − K b K [a Hc]d f (K μ x ) . (7.9.51)

(1)(+)
It is obvious to see that Racbd = 0 since, for example,
 d
(1)(+) ∂ f  (−ωt + kz)
Racbd = K b [K c (dx 1 )a − K a (dx 1 )c ] ,
∂x1 2

(1)(×) (+) (×) (1)


and similarly Racbd = 0. Since Hab and Hab are linearly independent, Racbd =
0 if either H11 or H12 is nonzero. Therefore, for a nontrivial γab in (7.9.48), the
metric gab = ηab + γab is not flat, and hence it indeed describes a gravitational plane
wave. In the special case where f (K μ x μ ) = cos(K μ x μ + θ0 ) with θ0 a constant, the
gravitational wave is called a monochromatic gravitational plane wave.
A gravitational wave of the form (7.9.46) is said to be plus-polarized or of mode
+, and a gravitational wave of the form (7.9.47) is said to be cross-polarized or of
mode ×. Besides the plus-polarized mode and cross-polarized mode one can also
7.9 Gravitational Radiation 307

(+) (×)
have an arbitrary polarized mode with Hab = α Hab + β Hab satisfying α 2 + β 2 =
1. All these polarization modes are on an equal footing. In fact, if we set

1
x 0 = x 0 = t , x 1 = √ (x 1 + x 2 ) ,
2
(7.9.52)
1
x 3 = x 3 = z , x 2 = √ (−x 1 + x 2 ) ,
2

then {x μ } is another Lorentzian coordinate system of ηab . It is easy to verify that


(+)
γab = − f (−ωt + kz) [(dx 2 )a (dx 1 )b + (dx 1 )a (dx 2 )b ] , (7.9.53)
(×) 1 1 2 2
γab = f (−ωt + kz) [(dx )a (dx )b − (dx )a (dx )b ] . (7.9.54)

(+) (×)
Thus, in the new coordinate system {x μ }, gravitational waves γab and γab are now
cross-polarized and plus-polarized, respectively, which shows that the plus-polarized
and cross-polarized modes are equivalent up to a choice of the Lorentzian coordinate
system.
[Optional Reading 7.9.2]
To see more precisely that all the polarization modes of a gravitational wave are on an
equal footing, we define the following vector fields:

(+) a (+) a
(e0 )a = (∂/∂t)a , e1 = (∂/∂ x 1 )a , e2 = (∂/∂ x 2 )a ,
(×) a 1 (+) a (+) a  (×) a 1 (+) a (+) a 
e1 = (∂/∂ x 1 )a = √ e1 + e2 , e2 = (∂/∂ x 2 )a = √ − e1 + e2 ,
2 2

where x μ are given in (7.9.52). Then, it is easy to verify that


(+) b (+) a
(H (+) )a b (e0 )b = 0 , (H (+) )a b e1 = e1 ,
(+) a (+) a (+) b (+) a
(H ) b K = 0,
b
(H ) b e2 = − e2 ,
(×) a
(H (×) )a b e1(×) e1(×) ,
b a
(H ) b (e0 ) = 0 ,
b
=
(×) a (×) b (×) a
(H ) b K = 0,
b
(H (×) )a b e2 = − e2 .

We can see that ① (e0 )a , K a and their linear combinations, such as (e3 )a = ω K − (e0 ) ,
1 a a
(+) a (+) a
are all eigenvectors of both (H (+) )a b and (H (×) )a b with eigenvalue 0; ② (e1 ) and (e2 )
are eigenvectors of (H (+) )a b with eigenvalues ±1, respectively; ③ (e1(×) )a and (e2(×) )a are
eigenvectors of (H (×) )a b with eigenvalues ±1, respectively.

For the polarization tensor Hab expressed in (7.9.43), we can set H ≡ (H11 )2 + (H12 )2
and 0  ψ < π such that H11 = H cos 2ψ and H12 = H sin 2ψ. Then, (7.9.43) can be
written as
(ψ) (ψ) (+) (×)
Hab = H Hab , where Hab = Hab cos 2ψ + Hab sin 2ψ . (7.9.55)
a
It is easy to see that K a , (e0 )a and their linear combinations are all eigenvectors of H (ψ) b
with eigenvalue 0. Moreover,
308 7 Foundations of General Relativity

(ψ) a (+) a (+) a (ψ) a (+) a (+) a


e+ = e1 cos ψ + e2 sin ψ and e− = − e1 sin ψ + e2 cos ψ
(7.9.56)
a
are also eigenvectors of H (ψ) b
with eigenvalues ±1, respectively. Especially, we have
(+) (0) (×) (π/4)
Hab = Hab and Hab = Hab , and correspondingly
(+) a (0) a (+) a (0) a
e1 = e+ , e2 = e− , (7.9.57)
(π/4) a (π/4) a
e1(×) e2(×)
a a
= e+ , = e− . (7.9.58)
(+) (×) (ψ)
Therefore, we can see clearly that Hab and Hab are nothing but two special cases of Hab ,
(ψ)
and all Hab are on an equal footing.
(ψ) a
The geometric meaning of the e± in (7.9.56) is clear: by rotating about the z-axis by
(+) a (+) a (+)
an angle ψ, the eigenvectors e1 = (∂/∂ x 1 )a and e2 = (∂/∂ x 2 )a of Hab transform
(ψ) a (ψ) a (ψ) (ψ)
to e+ and e− , respectively, and become eigenvectors of Hab . In (7.9.55), Hab also
(+) (×)
looks like a rotation in the “plane” containing Hab and Hab . However, if the eigenvector
rotates by an angle ψ, the corresponding rotation angle of the polarization tensor is 2ψ.
This indicates that the polarization tensor will come back to itself after rotating about the
z-axis by (an integer times) π , which is different from the fact that the polarization of an
electromagnetic wave will come back after rotating by at least 2π . This difference manifests
that gravitons and photons have different spins. It is generally believed that general relativity
eventually must be combined with quantum theory and become a complete and consistent
quantum theory of gravity. Although until now this theory has yet to be found, physicists
still often talk about the quantization of the gravitational field and its quanta—gravitons.
Roughly speaking, the relation between gravitons and gravitational plane waves is similar
to the relation between photons and electromagnetic plane waves. Gravitons have no rest
mass just like photons, as they both propagate at the speed of light in vacuum, while the
different rotation angles between their polarization modes is closely related to the following
fact: photons have a spin of 1, while gravitons have a spin of 2.
[The End of Optional Reading 7.9.2]

Now let us discuss the physical effect of polarized gravitational waves. Consider
the following monochromatic gravitational plane wave solution:

γab = h cos(ωt − kz) [(dx 1 )a (dx 1 )b − (dx 2 )a (dx 2 )b ] . (7.9.59)

This is a plus-polarized gravitational wave, with the positive constants h, ω and


k being the amplitude, angular frequency and wavenumber (the magnitude of the
wave 3-vector) of the gravitational wave, respectively. Imagine that there are some
particles in the source-free region, each labeled by a unique parameter ϕ ∈ [0, 2π ).
Suppose the world line of the particle labeled by the parameter ϕ is described by the
following parametric equations:

t = t (τ ) , x 1 = x 1 (τ ) = a cos ϕ ,
(7.9.60)
z = z(τ ) = 0 , x 2 = x 2 (τ ) = a sin ϕ ,
7.9 Gravitational Radiation 309

where a > 0 is a constant. When there is no gravitational wave, these particles are
located along a circle of radius a at rest in the reference frame of {x μ }. When the
gravitational wave of the form (7.9.59) passes through this region, the metric becomes

gab = ηab + γab = −(dt)a (dt)b + [1 + h cos(ωt − kz)] (dx 1 )a (dx 1 )b


+ [1 − h cos(ωt − kz)] (dx 2 )a (dx 2 )b + (dz)a (dz)b .

It can be proved that for the particles on the circle described by (7.9.60), their world
lines are still geodesics with respect to gab . [See Exercise 7.10. In fact, the result
of which shows that the t-coordinate lines for any gravitational wave of the form
(7.9.30) are geodesics]. However, the coordinates x 1 and x 2 are no longer the spatial
Cartesian coordinates of these particles at a time t. Instead, their spatial Cartesian
coordinates at t are now
√ √
y 1 = x 1 1 + h cos ωt , y 2 = x 2 1 − h cos ωt

and z. From the parametric equations of the world lines of these particles, we can
see that they are located along an ellipse at t, described by
 2  2
y1 y2
√ + √ = 1, z = 0. (7.9.61)
a 1 + h cos ωt a 1 − h cos ωt

For any integer n, when 2nπ − π2  ωt  2nπ + π2 , the major and the minor axes
of the ellipse are along the x 1 -axis and the x 2 -axis, respectively; when 2nπ + π2 
ωt  2nπ + 3π 2
, the major and the minor axes of the ellipse are exchanged, along the
x 2 -axis and x 1 -axis, respectively. Therefore, as the gravitational wave passes through,
these particles are located along an oscillating ellipse, as shown in Fig. 7.16. The
eccentricity of the ellipse at t can be calculated as

2h| cos ωt| ∼ 
egrav (t) = = 2h| cos ωt| . (7.9.62)
1 + h| cos ωt|

x2 x2 x2 x2

x1 x1 x1 x1

ωt = 0 ωt = 12– π ωt = π ωt = 32– π

Fig. 7.16 The effect of a linearly polarized gravitational plane wave on a circle in one period
310 7 Foundations of General Relativity
 √
2h ∼
Hence, the maximum value of egrav (t) is 1+h = 2h, which only depends on the
amplitude h. It is important that the directions of the major and the minor axes are
eigenvectors of the polarization tensor, which can be referred to as the polarization
directions of the gravitational wave. From the viewpoint of continuum mechanics,
the effect of a weak gravitational wave can be regarded as a strain.
[Optional Reading 7.9.3]
The polarization modes of gravitational waves we discussed above are analogous to
the linear polarization modes of electromagnetic waves, whose polarization directions are
fixed. As we know, electromagnetic waves can be circularly/elliptically polarized. Similarly,
gravitational waves can also be circularly/elliptically polarized. For example, given two
nonzero constants h (+) and h (×) , the gravitational wave described by
(+) (×)
γab = h (+) Hab cos(−ωt + kz) + h (×) Hab sin(−ωt + kz) (7.9.63)
is elliptically polarized. It can be seen from (7.9.55) and (7.9.56) that when h (+) h (×) > 0,
the angular velocity of the polarization (i.e., the angular velocity of the eigenvector with
eigenvalue +1) along the propagation direction is −ω/2; when h (+) h (×) < 0, the angular
velocity of the polarization is ω/2.
Notice that the metric gab = ηab + γab with the γab given in (7.9.63) does not abide by the
ansatz (7.9.30), and thus the conclusions for (7.9.30) may not be applicable to it. However,
one can show that (exercise) in the linear approximation, the spacetime corresponding to
(7.9.63) is still a pp-wave, and the t-coordinate lines are still geodesics.
[The End of Optional Reading 7.9.3]

By means of the geodesic deviation equation, the effect of polarized gravitational


waves can also be discussed by considering the tidal acceleration of nearby geodesic
observers. In this way, one can study how a family of geodesics will be distorted
by gravitational waves. This effect will be analyzed in Optional Reading 7.9.4, the
discussion therein can even be applied to gravitational waves without linear approx-
imation.
So far we have discussed wave solutions to the linearized Einstein equation. How-
ever, Einstein’s equation is a nonlinear equation, and general relativity is a nonlinear
theory. Although in many cases we can apply the weak-field approximation, the non-
linearity must not be ignored for a strong gravitational field. This is a significant dif-
ference between electromagnetic waves (in Minkowski spacetime) and gravitational
waves. Maxwell’s equations are linear equations, where the superposition princi-
pal is applicable, and so two electromagnetic waves propagating in the same space
do not influence each other. In contrast, generally speaking, there exists interaction
(scattering) between two gravitational waves. The collision of gravitational plane
waves has been investigated in the pioneering works of R. Penrose, K. Khan and
P. Szekeres, the readers may refer to d’Inverno (1992) for a review. For an example
of gravitational plane waves not limited to the linear approximation, see Optional
Reading 7.9.2.
[Optional Reading 7.9.4]
Now we introduce a specific example of gravitational plane waves not limited to the linear
approximation [see Sachs and Wu (1977)]. Suppose {t, x, y, z} is a Lorentzian system in
7.9 Gravitational Radiation 311

Minkowski spacetime (R4 , ηab ). Let u ≡ t − z, and f (u) and g(u) be two arbitrary smooth
functions of u with the only requirement being that f 2 + g 2 is nonvanishing. Suppose P is
a function of the coordinates x, y and u defined as follows:
1
P(x, y, u) = f (u)(x 2 − y 2 ) + g(u)x y . (7.9.64)
2
It is not difficult to verify that

gab := ηab + 2P(du)a (du)b = ηab + 2P[(dt)a − (dz)a ][(dt)b − (dz)b ] (7.9.65)
is a Lorentzian metric field on R4 . Firstly, it can be easily seen from the above equation that
gab is symmetric. Secondly, let

K a ≡ (∂/∂t)a + (∂/∂z)a , (7.9.66)


then it is easy to verify that gab K a K b = 0, i.e., K a is a null vector field measured by gab .
Introduce a basis (tetrad) field on R4 :

(e1 )a = (∂/∂ x)a , (e2 )a = (∂/∂ y)a , (e3 )a = K a ,


1 (7.9.67)
(e4 )a = [(∂/∂t)a − (∂/∂z)a ] + P K a .
2
By a straightforward calculation (exercise) we can see that the metric components gμν ≡
gab (eμ )a (eν )b can be arranged into the following matrix:
⎡ ⎤
10 0 0
⎢0 1 0 0 ⎥
(gμν ) = ⎢ ⎥
⎣ 0 0 0 −1 ⎦ . (7.9.68)
0 0 −1 0

The matrix being invertible indicates that gab is non-degenerate, and thus is a metric tensor
field. It is not difficult to see that it has the Lorentzian signature. The above discussion
indicates that (R4 , gab ) is a spacetime, which has the same base manifold as R4 but has a
different metric field. By calculating the curvature tensor we can see that this is a curved
spacetime (see Proposition 7.9.4). The inverse matrix of (7.9.68) equipped with the basis
vectors in (7.9.67) gives

g ab = (∂/∂ x)a (∂/∂ x)b + (∂/∂ y)a (∂/∂ y)b − (1 + 2P)(∂/∂t)a (∂/∂t)b
+ (1 − 2P)(∂/∂z)a (∂/∂z)b − 2P[(∂/∂t)a (∂/∂z)b + (∂/∂z)a (∂/∂t)b ] .
(7.9.69)
We will use g ab and gab to raise and lower indices.

Proposition 7.9.4 The gab defined by (7.9.65) is a non-flat solution to the vacuum Einstein
equation.

Proof First we compute the Riemann tensor Rabc d of gab using the tetrad method introduced
in Sect. 5.7. Step one: choose the tetrad in (7.9.67). It follows from (7.9.68) that this is a
rigid tetrad (although not orthonormal). It is easy to verify that its dual tetrad reads

1
(e1 )a = (dx)a , (e2 )a = (dy)a , (e3 )a = [(dt)a + (dz)a ] − P(du)a , (e4 )a = (du)a .
2
(7.9.70)
Step two: compute the connection 1-forms using Theorem 5.7.4. One finds that there are
only four nonvanishing ωμν :
312 7 Foundations of General Relativity

−ω41 = ω14 = ω144 e4 = −( f x + gy)du ,


(7.9.71)
−ω42 = ω24 = ω244 e4 = −(gx − f y)du .

From the inverse of (7.9.68) we can see that the components g μν of g ab in the dual basis
can also be arranged into the matrix on the right-hand side of (7.9.68), and hence it follows
from ωμ ρ = g ρν ωμν that the nonvanishing ωμ ρ are

ω4 1 = ω1 3 = ( f x + gy)du , ω4 2 = ω2 3 = (gx − f y)du . (7.9.72)


The third step is to compute all the curvature 2-forms Rμ ν from ωμ ν using Cartan’s second
equation of structure. Since all the nonvanishing ωμ ρ are shown in (7.9.72), we have ωμ λ ∧
ωλ ρ = 0, and hence Rμ ν = dωμ ν . Therefore, all the nonvanishing Rμ ν are

R4 1 = R1 3 = f dx ∧ du + gdy ∧ du = f e1 ∧ e4 + ge2 ∧ e4 ,
(7.9.73)
R4 2 = R2 3 = gdx ∧ du − f dy ∧ du = ge1 ∧ e4 − f e2 ∧ e4 .

Thus, we obtain the Riemann tensor

Rabc d = Rab1 3 (e1 )c (e3 )d + Rab2 3 (e2 )c (e3 )d + Rab4 1 (e4 )c (e1 )d + Rab4 2 (e4 )c (e2 )d
= [ f (e1 )a ∧ (e4 )b + g(e2 )a ∧ (e4 )b ][(e1 )c (e3 )d + (e4 )c (e1 )d ]
+ [g(e1 )a ∧ (e4 )b − f (e2 )a ∧ (e4 )b ][(e2 )c (e3 )d + (e4 )c (e2 )d ] .
(7.9.74)
This is a nonvanishing tensor, since at least one of the following components is nonvanishing
(the requirement for f and g is that f 2 + g 2 is nonvanishing):

R414 1 = Rabc d (e4 )a (e1 )b (e4 )c (e1 )d = − f , R424 1 = Rabc d (e4 )a (e2 )b (e4 )c (e1 )d = −g .

This indicates that (R4 , gab ) is not a flat spacetime. It is easy to find the Ricci tensor from
(7.9.74):
Rac = Rabc b = ( f − f )(e4 )a (e4 )c = 0 ,
and thus gab is a solution to the vacuum21 Einstein equation. 

For later use, we can also derive Rabcd from (7.9.74), see the following proposition:

Proposition 7.9.5

Rabcd = [ f (e1 )a ∧ (e4 )b + g(e2 )a ∧ (e4 )b ](e4 )c (e1 )d


(7.9.75)
+ [g(e1 )a ∧ (e4 )b − f (e2 )a ∧ (e4 )b ](e4 )c (e2 )d .

Proof Exercise 7.11. Hint: use Rabcd = gde Rabc e , and notice that

gde (e3 )e ≡ (e3 )d = g3μ (eμ )d = g34 (e4 )d = −(e4 )d , gde (e1 )e ≡ (e1 )d = g11 (e1 )d = (e1 )d .

Given the importance of the null vector K a in the propagation of gravitational waves, let us
prove the following proposition:

21 In fact, this equation is Rac = −(∂1 ∂1 P + ∂2 ∂2 P)(e4 )a (e4 )c . P taking the specific form in
(7.9.64) makes ∂1 ∂1 P = f = −∂2 ∂2 P, which assures Rac = 0.
7.9 Gravitational Radiation 313

Proposition 7.9.6 Suppose ∇b is the torsion-free derivative operator associated with the
gab in (7.9.65), then ∇b K a = 0.

Proof Adopt the tetrad in (7.9.67) as well as its dual tetrad (7.9.70) and notice that
K a = (e3 )a . It follows from (5.7.4) that ω3 ν a = −γ ν 3τ (eτ )a , ν = 1, 2, 3, 4. Since the non-
vanishing ωμ ν a are shown in (7.9.72), we have ω3 ν a = 0, ν = 1, 2, 3, 4. Thus, from the
above equation we get γ ν 3τ = 0, ν, τ = 1, 2, 3, 4, and hence it follows from (5.7.1) that

(eτ )b ∇b (e3 )a = γ ν 3τ (eν )a = 0 , τ = 1, 2, 3, 4 .

Since (eτ )b is an arbitrary basis vector, the above equation indicates that ∇b (e3 )a = 0.
Noticing that (e3 )a = K a , we have ∇b K a = 0. 

From Proposition 7.9.6 we obtain K b ∇b K a = 0 and ∇(a K b) = 0, and thus ① the integral
curves of K a are (null) geodesics; ② K a is a Killing vector field.
The above discussions are purely mathematical. Physically speaking, the curved space-
time (R4 , gab ) represents a gravitational plane wave. It follows from (7.9.65) that P is
the only available quantity that determines (R4 , gab ), and thus the first thing we should
investigate when studying gravitational waves is the function P(x, y, u). To facilitate under-
standing, we first look at a simple example. Suppose f (u) and g(u) can be expressed as

f (u) = F cos ωu , g(u) = G cos ωu , (7.9.76)


where F, G and ω are positive constants, then

2P(x, y, u) = [F(x 2 − y 2 ) + Gx y] cos(ωt − kz) (where k ≡ ω). (7.9.77)

The allure of the above equation is that it looks like some kind of monochromatic plane
wave. However, notice that although (∂/∂t)a and (∂/∂z)a are respectively timelike and
spacelike vector field when measured by ηab , this is not necessarily true when measured by
gab . If (∂/∂t)a were not timelike or (∂/∂z)a were not spacelike, one could not treat t and z as
time and spatial coordinates, and the wave interpretation of (7.9.77) would become unclear.
Fortunately, it can be proved that there indeed exist certain spacetime regions in (R4 , gab ),
where (∂/∂t)a and (∂/∂z)a are timelike and spacelike when measured by gab , and thus at
least in these regions we can interpret (7.9.77) as a monochromatic gravitational plane wave
propagating along the z-direction at the speed of light c = 1. The product of K a defined by
(7.9.66) and ω can be interpreted as the wave 4-vector ωK a , since (7.9.66) indicates that
the time and spatial components of ωK a in the coordinate system {t, x, y, z} are the angular
frequency ω and the wave 3-vector k measured in this system:

ωK 0 = ω , ωK 1 = ωK 2 = 0 , ωK 3 = k = ω .

K a (and hence ωK a ) being null reflects the fact that the phase ωu of the above gravitational
wave propagates at the speed of light, see Fig. 7.15 (in which K a should be substituted
by ωK a ). Suppose G 1 and G 2 are two inertial observers (measured by ηab ), whose spatial
coordinates are (x, y, z 1 ) and (x, y, z 2 ), respectively. They have different phases at the time
t1 , which are ωt1 − kz 1 and ωt1 − kz 2 . Suppose after some amount of time t2 − t1 , G 2
“acquires” the phase of G 1 at t1 , i.e.,

ωt2 − kz 2 = ωt1 − kz 1 ,

then we say that the value of the phase ωt1 − kz 1 propagates from G 1 to G 2 in a time interval
t2 − t1 , and so the speed of the propagation is
314 7 Foundations of General Relativity

Fig. 7.17 The phase value G1 G2


ωt1 − kz 1 of the observer G 1
at t1 propagates to G 2 after
t2 − t1
p 2= (t2, x,y,z2 )

Ka

p 1= (t1,x,y,z1 )

z2 − z1 ω
v= = = 1.
t2 − t1 k
Thus, the speed of the propagation of gravitational waves is the speed of light. (This is
only the coordinate speed, what is more meaningful in the geometric language is the phase
velocity. The wavefront being null in the 4-dimensional language assures that this phase
velocity is the speed of light). Figure 7.17 is a 4-dimensional illustration of this discussion,
in which γ is an integral curve of the null 4-vector K a (a null geodesic), and p1 and p2 are
the intersections of γ and the world lines of G 1 and G 2 . The phase value ωt1 − kz 1 at p1 is
“acquired” by G 2 at p2 : the phase propagates from p1 to p2 along the null geodesic. Note
that the physical interpretation above can only apply to some certain regions of (R4 , gab ),
where (∂/∂t)a is timelike and (∂/∂z)a is spacelike. However, now we can pull out the non-
intrinsic factors such as observers and coordinates and only leave the null geodesic γ and
two arbitrary points p1 and p2 on it. In this way, the wave interpretation can be carried
over to the whole spacetime. In fact, K a represents the direction of the propagation of all
the information (not only the phase) of the gravitational wave. The reason is as follows: as
K a is a Killing vector field, its corresponding one-parameter group of diffeomorphisms is
a one-parameter group of isometries, and the integral curves of K a are exactly the orbits
of this isometry group. Suppose U2 is an arbitrary neighborhood of p2 (see Fig. 7.18) ,
then there must exist a neighborhood U1 of p1 and an isometry φ : U1 → U2 such that
p2 = φ( p1 ). Therefore, any information about the gravitational wave in U2 is completely
contained in U1 (due to the isometry). In this sense, we can say that all the information
of the gravitational wave propagates along K a (at the speed of light). This interpretation
based on the isometries can be applied to not only the special case in (7.9.76), but also
the gab defined by (7.9.64) [in which f (u) and g(u) are arbitrary] and (7.9.65). Hence,
we say that there exists a gravitational plane wave in the spacetime (R4 , gab ), or refer to
(R4 , gab ) as a gravitational plane wave spacetime. Sachs and Wu (1977) also provides
a deeper argument for this gravitational plane wave interpretation from the perspective of
group theory by comparing it with the electromagnetic plane waves in Minkowski spacetime.
Furthermore, Proposition 7.9.6 indicates that gab is a pp-wave.22 When f and g are linearly
dependent, then (R4 , gab ) is called a monochromatic gravitational plane wave spacetime.

22 In fact, the metric for any pp-wave can be expressed in the Brinkmann coordinate system in the
following general form:

ds 2 = 2P(u, x, y)du 2 − 2dudv + dx 2 + dy 2 ,

where P is an arbitrary smooth function. It is not difficult to see that this is equivalent to (7.9.65)
(by setting v = t−z
2 ), and taking P to be of the form (7.9.64) is just a special case.
7.9 Gravitational Radiation 315

Fig. 7.18 K a carries the


information of the
gravitational wave in U1 to
U2 faithfully p2
U2
Ka

p1
U1

To further understand the gravitational wave of (R4 , gab ), we supplement the above with
the following propositions and remarks. For generality, we do not put any constraint on the
the form of the function P(x, y, u) in the following two propositions.

Proposition 7.9.7 Let ∇a represent the derivative operator associated with gab in (7.9.65),
then
∂2 P ∂2 P
∇ a ∇a P = + . (7.9.78)
∂x 2 ∂ y2

Proof Exercise 7.14. 

Remark 1 Given a function Q(t, x, y, z) in Minkowski spacetime, then

∂ μ ∂μ P(t, x, y, z) = Q(t, x, y, z) (7.9.79)


in mathematical physics is called a wave equation (with source) for the function P(t, x, y, z),
the physical quantity P(t, x, y, z) satisfying this equation represents some kind of wave
motion. ∇ a ∇a P on the left-hand side of (7.9.78) can also be written as g μν ∇μ ∇ν P, when
gab = ηab it goes back to ∂ μ ∂μ P. Thus, ∇ a ∇a P is the generalization of ∂ μ ∂μ P in curved
spacetime, and hence (7.9.78) represents some kind of wave motion of the physical quantity
P(x, y, u) in the curved spacetime (R4 , gab ). When P takes the form of (7.9.64), we have

∂2 P ∂2 P
+ = 0,
∂x 2 ∂ y2

and hence ∇ a ∇a P = 0, i.e., the P(x, y, u) in (7.9.64) is a solution to the source-free wave
equation in curved spacetime. Together with Rac = 0 (i.e., gab satisfies the vacuum Einstein
equation), we can see the legitimacy of the statement “the curved spacetime (R4 , gab ) rep-
resents a gravitational wave in vacuum”. This also shows (at least partially) the motivation
for taking P to be of the form in (7.9.64).

Proposition 7.9.8 The constant-u surfaces in (R4 , gab ) are null hypersurfaces.

Proof It follows from (7.9.67) that K a = gab K b = gab (e3 )b . Following the derivation in
(2.6.10a) we get gab (e3 )b = g3μ (eμ )a , and hence

K a = g3μ (eμ )a = g34 (e4 )a = −(e4 )a = −∇a u ,

where in the last step we used (7.9.70). Noticing that ∇a u is a normal covector of a constant-u
surface, we can see that its normal vector ∇ a u = −K a is null. 
316 7 Foundations of General Relativity

Remark 2 In the special case of (7.9.76), ωu = ωt − kz represents the phase of the wave,
while ω is a constant, and hence a constant-u surface is a 3-dimensional wavefront S in
the 4-dimensional language. S being a null hypersurface indicates that the gravitational
wave in (7.9.76) propagates at the speed of light. Proposition 7.9.8 guarantees that the
constant-u surfaces are still null hypersurfaces (still have K a as the normal vector) for
general P = P(x, y, u). Therefore, one may regard u as some kind of (generalized) phase,
and the constant-u surfaces being hypersurfaces indicates that the phase velocity of the
gravitational wave represented by this general P(x, y, u) is still the speed of light.

[The End of Optional Reading 7.9.4]

7.9.3 Emission of Gravitational Waves

Now we introduce the emission of gravitational waves. First, let us make a comparison
with electromagnetic waves. If a charged particle in a system undergoes a non-
uniform velocity (relative to an inertial frame), it will emit electromagnetic waves.
As is well-known, the major contribution to the radiation field comes from electric
dipole radiation, which is much stronger than the magnetic dipole radiation and
electric quadruple radiation (these two are of the same order). Similarly, under the
Newtonian approximation, if a point mass in a system undergoes a non-uniform
velocity, it will emit gravitational waves. What corresponds to the electric dipole
moment is the mass dipole moment

 =
D m p rP , (7.9.80)
P

where m P and rP are the mass and position vector of the point mass P, and the right-
hand side of the above equation is summed over all the point masses in the system.
Since the intensity of electric dipole radiation is proportional to the square of the
second order time derivative of the electric dipole moment, one may expect that the
contribution from the mass dipole moment to the intensity of gravitational radiation
¨ However, from (7.9.80) we can see that D
is proportional to D. ˙ =  m r˙ is
P P P
equal to the total momentum p of the system; it follows from the conservation of
momentum that p˙, and thus D ¨ = 0, i.e., gravitational waves do not include grav-
itational dipole radiation corresponding to electric dipole radiation. According to
the theory of electromagnetic radiation, the intensity of magnetic dipole radiation is
proportional to the square of the second order time derivative of the magnetic dipole
moment. The quantity in a gravitational system corresponding to the magnetic dipole
moment is 
μ
= rP × (m P u P ) ,
P

where u P is the velocity of the point mass P, and m P u P is the current contribu-
tion of P. The right-hand side of the above equation is nothing but the total angular
7.9 Gravitational Radiation 317

momentum of the system. It follows from the conservation law of angular momen-
tum that μ ˙ = 0, and hence gravitational waves do not include gravitational dipole
radiation corresponding to magnetic dipole radiation either. In short, there does not
exist any dipole radiation in gravitational waves. One can only get a nonvanishing
result when studying quadrupole radiation [see Misner et al. (1973) pp. 974–978 for
details]. Since the order of quadrupole radiation is higher than dipole radiation, the
gravitational waves emitted from a gravitational system are weaker than the electro-
magnetic waves emitted by an electromagnetic system in a similar condition.
The source emitting a strong gravitational wave is usually considered to be related
to a dramatic change of an astrophysical or cosmological process, such as the col-
lapse of a star that is not spherically symmetric,23 a supernova explosion (see Sect.
9.3.2), the dramatic disturbance inside an active galactic nucleus, the merger of
a pair of black holes or neutron stars, cosmic inflation (see Chap. 15), etc. [See
Cai et al. (2017) for a review of different sources of gravitational waves]. In these
cases the gravitational field is not weak, and thus the linear approximation is not
applicable. The rigorous analysis of these process must involve the arduous task of
solving the nonlinear Einstein equation in a non-spherically symmetric case. The
emission of gravitational waves is still a problem that has not been fully compre-
hended. Nowadays, the understanding of this problem has been furthered with the
help of numerical analysis and computational simulation, which has developed into
an important branch called numerical relativity.

7.9.4 Detection of Gravitational Waves

Since general relativity predicts the physical existence of gravitational radiation, the
detection of gravitational waves becomes a significant subject. As we have discussed,
sources for the gravitational waves that reach the solar system are all very far away.
Hence, the gravitational waves being detected can be totally regarded as plane waves,
and they are so weak that the linear approximation is applicable. Unfortunately, this
also makes it very difficult to directly detect a gravitational wave on or near the Earth.
(The currently observed gravitational waves have amplitudes as small as h ∼ 10−21 ).
Due to such a difficulty, there were no direct observations of gravitational waves until
2015, although Joseph Weber initiated the detection of gravitational waves early in
the 1960s. In the 20th century, evidence of the existence for gravitational waves
merely came from indirect detections, among which the most important one is the
observation of binary pulsars.

23 According to Birkhoff’s theorem (see Sect. 8.3.3), the spherical evolution of any spherically
symmetric star (such as collapse and oscillation) will not emit a gravitational wave no matter
how dramatic it is, just like there does not exist a spherically symmetric electromagnetic wave in
Maxwell’s theory. (The spherical wave of an oscillating electric dipole in a distant region is not a
spherically symmetric electromagnetic wave, since the fields E and B are not spherically symmetric.
In fact, a spherically symmetric electromagnetic wave corresponds to the radiation of an electric
monopole, but this kind of radiation does not exist in Maxwell’s theory).
318 7 Foundations of General Relativity

A pulsar is a rapidly rotating neutron star (see Sect. 9.3.2), which has a mechanism
of emitting electromagnetic waves. If the Earth lies in the sweeping range of a beam
of radiation, then one can receive radio pulse signals with a precise period. An
approximately isolated gravitational system formed by two stars orbiting around
their center of mass is called a binary star, which emits gravitational waves due to
the accelerating motion of the two stars. Like electromagnetic waves, gravitational
waves carry energy and momentum as well as angular momentum when they are
emitted. As a consequence, the radii of the orbits of the stars become smaller and
smaller, and the period becomes shorter and shorter. However, unlike many other
astrophysical processes, the emission of gravitational waves from a binary system
is very weak, and so the linearized theory of gravity can be applied to calculate the
loss of energy and the change of the orbital period. In order to be detectable, these
effects need to satisfy at least two conditions: ① the orbit is sufficiently small (i.e.,
the two stars are close enough), such that the effect of general relativity is evident;
② A method for measuring the orbital period with rather high precision is available.
The binary pulsar PSR 1913+16 discovered by R. A. Hulse and J. H. Taylor in 1974
happens to satisfy these two conditions. [A binary pulsar is a binary that contains
a pulsar, PSR is the identifier for pulsars, while 1913 and +16 stands for its right
ascension and declination (angular coordinates)]. The maximum distance between
the two stars in this binary is only about 3 × 109 m (about 4.8 solar radii) which
satisfies the condition ①; the pulsar in the binary makes it satisfy the condition ②:
since the period of the radio signal emitted from a pulsar is reputed to be “as precise
as the tick of a clock”, one can use this to record how its orbital period changes,
and compare with the result calculated from general relativity. If the observation
agrees with the calculation on account of gravitational waves, it will be evidence
for the existence of gravitational waves. Taylor and collaborators carried out this
observation with extraordinarily high accuracy and obtained the rate of change of
the orbital period. After thousands of observations, their results were announced in
1978, which agrees very well with the predictions calculated from the quadrupole
radiation formula in the linearized theory of gravity. This was the first quantitative
evidence of gravitational waves ever since gravitational waves were proposed, even
though it was indirect evidence. Hulse and Taylor were awarded the 1993 Nobel
Prize in Physics for this discovery.
The first attempt to directly detect gravitational waves was started by Joseph Weber
at the University of Maryland in 1966. He designed a resonant mass antenna for
detecting gravitational waves, called the Weber bar. It is a suspended aluminum
cylinder with length 153 cm and diameter 66 cm, which has a resonance frequency
of 1660 Hz. When a gravitational wave near the resonant frequency passes through
the Weber bar in a proper direction, the resonance of the bar will be excited, which will
amplify the vibration and could potentially be detected by piezoelectric sensors if the
change of the bar’s length is large enough. After years of efforts, Weber announced
that the evidence of gravitational waves was observed from the detectors in two
different locations. Unfortunately, Weber’s observation could not be confirmed by
the experiments of any other group [Ohanian and Ruffini (1994); Liu and Zhao
(2004)].
7.9 Gravitational Radiation 319

Fig. 7.19 Schematic


diagram of an interferometric
detector. The light paths
along the two arms change
slightly when a gravitational
wave comes by, which
causes the interference of the
composite signal

In the 1970s, there appeared another important type of gravitational wave detector,
namely a laser interferometer. An interferometer has two long orthogonal arms,
and the idea is similar to the resonant mass antenna, i.e., to detect the length pertur-
bations of its arms due to gravitational waves. However, the range of the detectable
frequencies of a laser interferometer is much wider instead of only near a resonance
frequency. Here we briefly review the principle of interferometers. An interferometer
consists of two mirrors and a beam splitter (see Fig. 7.19). When a laser beam is shot
to the beam splitter through the vertical arm in Fig. 7.19, part of it will be transmitted
while the remaining part will be reflected, and thus the laser will be divided into two
beams which propagate along the two arms of the interferometer. Each of the two
beams hits the mirror placed at the end of each arm and gets bounced back to the beam
splitter. After that the beams recombine and propagate towards the right through the
horizontal arm in Fig. 7.19, which will be received by the sensor at the end of the hor-
izontal arm. When there is no gravitational wave, the recombined beams are tuned
to have opposite phases (a crest meets a trough) by applying a waveplate, so that
the composite signal vanishes. However, when a gravitational wave comes by, the
lengths of the arms will change slightly (similar to the effect shown in Fig. 7.16) and
so the light paths of the two beams will change slightly, causing a nonvanishing com-
posite signal to be received by the sensor, called an interference signal. Therefore,
interference will be present when there is a gravitational wave passing by.
Based on the above idea, the study groups in MIT and Caltech started to jointly
build the Laser Interferometer Gravitational-Wave Observatory (LIGO) since
the 1980s (early discussions and attempts on interferometric detectors began in the
late 1960s). After decades of preparation, LIGO started its first operation in 2002.
However, it was not sensitive enough to detect any gravitational wave successfully.
In 2010, LIGO was shut down and upgraded into an improved version—Advanced
LIGO, whose sensitivity is about ten times its previous version. The operation of
320 7 Foundations of General Relativity

LIGO restarted in 2015. At 09:50:45 UTC on 14 September 2015, LIGO made the
first direct observation of gravitational waves [Abbott et al. (2016)]. The signal of
this event was named GW150914, which comes from a merger of two black holes
occurred 1.3 billion light-years away, with the amplitude of γμν (the components of
γab in a Lorentzian coordinate system) being so small that it is equivalent to changing
a length of 4 km by a thousandth of the width of a proton. Due to this unprecedented
observation, three leaders of LIGO, Rainer Weiss, Barry Barish and Kip Thorne,
were awarded the 2017 Nobel Prize in Physics.
To make precise detections, the LIGO observatory consists of two identical inter-
ferometers, located in Washington state and Louisiana state, USA, respectively. The
distance between them is about 3030 km over the Earth’s surface (the straight line
distance is about 3002 km). Besides making independent measurements, an impor-
tant utility of having two detectors far apart is to determine the location of the source
of the gravitational wave. Since the gravitational wave travels at the speed of light,
it would take 10 ms to propagate from one LIGO interferometer to the other. In the
GW150914 event, the time delay between the two detectors was 7 ms. Using this time
delay, the source of the signal can be located through triangulation. This is exactly
the principle of how human ears identify the location of the source of a sound wave.
Interestingly, the signal of GW150914 has a frequency varying between 35 Hz–
250 Hz, which happens to be inside the human audible range. In 2017, the Virgo
interferometer in Italy started to detect gravitational waves which provides “a third
ear” for locating the source of the gravitational wave more precisely. Furthermore,
having two identical LIGO detectors also helps to extract the actual gravitational
wave signal from the noise. Since the detectors are extremely sensitive, any vibra-
tion from the local environment will be recorded, and one of the challenges of the
detection is to remove these noises. By comparing the signals obtained by the two
detectors located far apart, one can filter out the random vibrations that do not happen
at both places, with the gravitational wave signals that are identical remaining. To
minimize the noises, LIGO also applied a series of mechanisms to isolate the vibra-
tions, including optics suspensions and seismic isolation, and many techniques in
the data analysis, such as matched filtering. The reader may refer to Saulson (2017)
for more technical details of noise reduction.
Since the first direct observation in 2015, there have already been numerous events
of direct observation of gravitational waves, mainly detected by LIGO and Virgo.
Nevertheless, now there are more and more gravitational wave detectors becom-
ing available or under preparation. For example, KAGRA (Kamioka Gravitational
Wave Detector) started its observation in 2020. Also, third-generation interferometric
detectors with longer arms and a greater sensitivity, such as the Einstein Telescope
and Cosmic Explorer, have been proposed and are expected to be available in the
2030s. Besides the ground-based interferometers, there are also multiple on-going
projects for space-based interferometric detectors, such as LISA (Laser Interferom-
eter Space Antenna), TianQin, Taiji, and DECIGO (Deci-hertz Interferometer Grav-
itational wave Observatory), where the long arms are replaced by the laser beams
between spacecrafts. Once available, they will be used to detect low-frequency grav-
itational waves. In addition to interferometric detectors, there are also other methods
7.9 Gravitational Radiation 321

Table 7.1 Methods of gravitational wave detection and their frequency bands
Frequency band Frequency range Detection method Current and future
observatories
High-frequency 10 Hz–106 Hz Ground-based LIGO, Virgo,
interferometer KAGRA, Einstein
Telescope, Cosmic
Explorer
Low-frequency 10−7 Hz–10 Hz Space-based LISA, TianQin, Taiji,
interferometer DECIGO
Very-low-frequency 10−10 Hz–10−7 Hz Pulsar timing array IPTA
Extremely-low- 10−18 Hz–10−14 Hz CMB polarization BICEP, AliCPT
frequency

of detecting gravitational waves, such as by using pulsar timing arrays [e.g., IPTA
(International Pulsar Timing Array)] one can detect very-low-frequency gravita-
tional waves, and by measuring the polarization pattern of the cosmic microwave
background (CMB) [e.g., BICEP (Background Imaging of Cosmic Extragalactic
Polarization), AliCPT (Ali CMB Polarization Telescope)] one can detect extremely-
low-frequency gravitational waves, including the primordial gravitational waves gen-
erated in the early universe (see Sect. 10.3). For a detailed introduction to the methods
of the pulsar timing array and CMB polarization, see, for example, Maggiore (2018).
The above-mentioned detecting methods and their corresponding frequency bands
are summarized in Table 7.1 [see also Chen et al. (2017)].
The observation of gravitational waves is significant not only because it con-
firmed the last undetected prediction of general relativity, but more importantly, it
also opened up a brand new window for observing the universe. Traditionally, people
could only make astronomical observations by detecting the electromagnetic waves in
different frequency bands. Now that gravitational waves can also be directly detected,
it enables more possibilities for astronomical observation. For example, since the
electromagnetic field interacts with matter, the electromagnetic waves from a distant
celestial object can be easily scattered or absorbed during the propagation. However,
the interaction between gravitational waves and matter is much more weaker, so it is
possible to observe celestial events we could not observe before (like the binary black
hole merger of GW150914). Furthermore, the earliest electromagnetic radiation we
can observe is the cosmic microwave background radiation when photon decou-
pling occurred (see Sect. 10.3), but through gravitational waves it is now possible to
make observations of the early universe. With these prospects, gravitational-wave
astronomy is currently emerging, and hopefully it will lead to more revolutionary
discoveries of the universe in the near future.
[Optional Reading 7.9.5]
Using the example of gravitational plane waves in Optional Reading 7.9.2, we will now
introduce the mechanism of receiving gravitational waves in a geodesic reference frame
(where the world lines of the observers are geodesics) [see also Sachs and Wu (1977)]. In a
322 7 Foundations of General Relativity

vibrating mechanical detector like a Weber bar, each molecule of the aluminum bar can be
considered as an observer, and the bar can be viewed as a reference frame in a sub-spacetime
of (R4 , gab ). Since there also exists non-gravitational interactions between the molecules,
the world lines of the molecules are not geodesics. However, in practice one can still use a
geodesic reference frame (which is the simplest choice). This is because the response to the
gravitational waves in the reference frame of the bar can be derived from the response in the
geodesic frame through Newtonian mechanics and solid state physics [see Weber (1961)].
The relative acceleration of two neighboring observers in a geodesic reference frame
under the action of the spacetime curvature is the tidal acceleration (see Sect. 7.6). Under
the action of the gravitational wave in (7.9.77), the magnitude and direction of the tidal
acceleration will change periodically, which leads to a relative oscillation between two
neighboring observers. Take a geodesic γ (τ ) as the fiducial observer, let us compute the
tidal 3-acceleration a c of the neighboring observers around this observer. Suppose p ∈ γ ,
Z a is the 4-velocity of γ at p (namely the unit tangent vector of γ ), and W p is the 3-
dimensional subspace in the tangent space V p of p which is orthogonal to Z a (in a picture
it would be a small plane orthogonal to Z a ), then a spatial separation vector wa represents a
neighboring observer (Sect. 7.6).24 The tidal acceleration a c of the observer corresponding to
wa relative to the fiducial observer γ (τ ) is given by the geodesic deviation equation (7.6.8):

a c = −Rabd c Z a wb Z d . (7.9.81)
∀wb ∈ Wb , the above equation determines an a c ∈ W p , and thus the above equation defines
a linear map ψ : W p → W p . From the “multifaceted view of tensors” (see Sect. 2.4) we can
see that ψ can be viewed as a tensor of type (1, 1) on W p , denoted by ψ c b , i.e.,

a c = ψ c b wb . (7.9.82)
Comparing with (7.9.81) yields

ψ c b = −Rabd c Z a Z d . (7.9.83)
In order to compute ψ c b , one can first choose a convenient orthonormal triad {(E i )a }:

(E 1 )a = (∂/∂ x)a + E −1 Z 1 K a ,
(E 2 )a = (∂/∂ y)a + E −1 Z 2 K a , (7.9.84)
−1
(E 3 ) = E
a
K −Z ,
a a

where E ≡ −gab Z a K b > 0, Z 1 ≡ gab Z a (∂/∂ x)b = Z b (∂/∂ x)b (and hence Z 1 is a coor-
dinate component of Z b instead of a frame component), and Z 2 ≡ gab Z a (∂/∂ y)b =
Z b (∂/∂ y)b . The reader should verify that: ① {(E i )a } is indeed orthogonal measured by
gab ; ② (E 3 )a is the result of normalizing h a b K b = K a + Z a Z b K b , namely the projection
of K a at p onto W p ; ③ {(E i )a } is parallelly transported (and thus is Fermi transported)
along a geodesic. [Hint for the proof: It follows from γ (τ ) being geodesic and ∇a K b = 0
that E is a constant along the curve, from which one can easily show that Z b ∇b (E 3 )a = 0.
Noticing that ∇b (∂/∂ x)a = −K a ω1 3 b , one can show that Z b ∇b (E 1 )a = 0]. Let S be
the wavefront that includes p ∈ γ (see the null hypersurface in Fig. 7.20), Sˆ repre-
sent the 3-dimensional subspace formed by all the elements in V p tangent to S , and
S p ≡ Sˆ ∩ W p = {wa ∈ W p |gab wa K b = 0}, then {(E 1 )a , (E 2 )a } is a basis of S p . Since
in a picture we always draw a subspace (e.g., W p ) as a small plane (draw a subspace of V p as
a subspace of M), there is no difference between Sˆ and S in Fig. 7.20. The physical meaning

24More precisely, wa only gives the direction of the “separation”, it is really wa s (where s is
small) that determines a neighboring observer in this direction, see Sect. 7.6.
7.9 Gravitational Radiation 323

Fig. 7.20 In the view of a


geodesic observer γ (τ ), the
gravitational wave passes by
along the spatial direction
(E 3 )a at p, the wavefront S p
is orthogonal to (E 3 )a

of the mathematical settings above is very clear: in the view of a geodesic observer γ (τ ), the
gravitational wave passes by along the spatial direction (E 3 )a , and the 2-dimensional wave-
front S p is orthogonal to the direction of propagation (E 3 )a (see Fig. 7.20). The components
of ψ c b in the triad {(E i )a } are

ψ i j = ψ c b (E i )c (E j )b = ψcb (E i )c (E j )b = ψcb (E i )c (E j )b = −Rabcd Z a (E j )b Z c (E i )d ,


(7.9.85)
where we used the property (E i )c = δ i j (E j )c = (E i )c of an orthonormal frame. Plugging
(7.9.84) and the Rabcd in (7.9.75) into the equation above yields the matrix of ψ i j :
⎡ ⎤
α β 0
(ψ j ) = ⎣ β −α 0 ⎦ ,
i
α ≡ −E 2 f , β ≡ −E 2 g . (7.9.86)
0 0 0

The derivation of the above equation is left as Exercise 7.13. Hints:


(1) Make use of (e1 )a (E 2 )a = (e2 )a (E 1 )a = (e4 )a (E 1 )a = (e4 )a (E 2 )a = 0;
(2) (e4 )a Z a = gab Z a (e4 )b = gab Z a g 43 (e3 )b = −gab Z a K b = E, where g 43 is a compo-
nent of g ab in the frame {(eμ )a }, see (7.9.68);
(3) (e4 )a (E 3 )a = (e4 )a [E −1 (e3 )a − Z a ] = −(e4 )a Z a = −E.
Now we discuss the physical meaning of (7.9.86). Suppose γ (τ ) is the fiducial observer,
p ∈ γ , and Q is the sphere orthogonal to the “small plane” W p with a small radius whose
center is p, then each point on the sphere can be viewed as the behavior of a neighboring
observer at the moment p (see Fig. 7.21). Using (7.9.82) and (7.9.86), let us discuss the
tidal acceleration a c of these neighboring observers relative to γ (τ ) under the action of
gravitational waves. Each point on the sphere corresponds to a wb . Suppose its components
in the orthonormal triad {(E i )a } are w1 , w2 , w3 , then the column matrix constituted by the
components of its 3-acceleration is

Fig. 7.21 Each point on the


small sphere Q in the
orthogonal plane W p with p
as the center represents the
behavior of a neighboring
observer at the same time as
the event p
324 7 Foundations of General Relativity
⎡ ⎤ ⎡ ⎤⎡ 1 ⎤
a1 α β 0 w
⎣ a ⎦ = ⎣ β −α 0 ⎦ ⎣ w2 ⎦ .
2 (7.9.87)
a3 0 0 0 w3

If w1 = w2 = 0, i.e., wa is parallel to (E 3 )a , the direction of the gravitational wave prop-


agation, then it follow from (7.9.87) that a 1 = a 2 = a 3 = a 4 = 0, and thus this kind of
neighboring observer has no 3-acceleration at all. This is a physical manifestation of the
transverseness of gravitational waves: a neighboring observer in the longitudinal direction
(which is parallel to the direction of propagation) will experience nothing, and only the
neighboring observers in the transverse direction will be affected. In other words, all the
tidal accelerations are orthogonal to the direction of propagation (E 3 )a , and thus lie in the
wavefront S p in Fig. 7.20. Hence, we only care about the transverse response, i.e., simplify
(7.9.87) as  1    1
a α β w
= = , (7.9.88)
a 2 β −α w2
i.e., only care about the response of the points on a small circle in the 2-dimensional subspace
spanned by (E 1 )a and (E 2 )a . Take 8 representative points A, B, C, D, E, F, G, H on the
circle (see Fig. 7.22). Let us discuss the following two special cases: (a) β ≡ 0, α > 0; (b)
α ≡ 0, β > 0. From a straightforward calculation one can obtain the results in Table 7.2 and
Fig. 7.22. The deformation shown in Fig. 7.22 is called a shear (see Sect. 14.1 for details).
Table 7.2 and Fig. 7.22 only reflect the tidal acceleration of the circle (and the trend of its
deformation) at a certain moment. To figure out the situation of the deformation (oscillation)
of the circle in a period of time, one needs the specific form of the functions f (u) and
g(u). We still only discuss the case of f (u) = F cos(ωt − kz) and g(u) = G cos(ωt −
kz). Equation (7.9.86) indicates that the direct factors that determine the tidal acceleration
are E 2 f and E 2 g instead of f and g. However, it follows from K a being geodesic (see
below Proposition 7.9.6) that E is a constant on the geodesic γ (τ ), and thus what the tidal
acceleration reflects is also the values of f and g. Moreover, since the u on the geodesic γ (τ )
and the proper time τ have a linear relation du/dτ = E (the proof is left as an exercise), the
a i -τ curve measured by the observer reflects the f -u or g-u curves of the gravitational wave
after suitable rescaling of the horizontal and vertical coordinates. The two basic polarization
modes of a gravitational wave are: ① G = 0 [and thus g(u) ≡ 0], corresponding to mode
+; ② F = 0 [and thus f (u) ≡ 0], corresponding to mode ×. In the approximation where
the gravitational wave is weak enough, the oscillation patterns of the circle in one period
under the actions of these two modes are illustrated in Fig. 7.23. A general oscillation can
be expressed as the superposition of these two modes, which has been discussed in Sect.
7.9.2.
The effect of a gravitational wave on the test particles shown in Figs. 7.22 and 7.23 is
different from that of an electromagnetic wave. A gravitational wave is the “propagation
of the oscillation of curvature”, and curvature leads to tidal acceleration. Therefore, the

Fig. 7.22 The deformation


of a circle under a
gravitational wave at a
certain time (see Table 7.2)
Table 7.2 The tidal acceleration a of 8 points on a circle relative to the center (see Fig. 7.22 for the overall effect)
(a) α = 0, β > 0
7.9 Gravitational Radiation

A B C D E F G H
     √     √     √   
w1 1 1/ 2 0 −1/ 2 −1 −1/ 2 0  √ √ 
√ √ √ 1/ 21/ 2
w2 0 −1/ 2 −1 −1/ 2 0 1/ 2 1
        √     √     √     √ 
a1 α 0 w1 α α/ 2 0 −α/ 2 −α −α/ 2 0 α/ 2
= √ √ √ √
a2 0 −α w2 0 α/ 2 α α/ 2 0 −α/ 2 −α −α/ 2
G H
F
A B C D E
a
(b) α = 0, β > 0
A B C D E F G H
     √     √     √     √ 
w1 1 1/ 2 0 −1/ 2 −1 −1/ 2 0 1/ 2
√ √ √ √
w2 0 −1/ 2 −1 −1/ 2 0 1/ 2 1 1/ 2
        √     √     √     √ 
a1 0 β w1 0 −β/ 2 −β −β/ 2 0 β/ 2 β β/ 2
= √ √ √ √
a2 β 0 w2 β β/ 2 0 −β/ 2 −β −β/ 2 0 β/ 2
E F
D
A B C G H
a
325
326 7 Foundations of General Relativity

Fig. 7.23 The oscillation of


a circle in one period under a
linearly polarized
gravitational plane wave

Fig. 7.24 The oscillation of


a charged particle in one
period under a linearly
polarized electromagnetic
wave

gravitational wave can be detected by measuring the relative acceleration between a free
particle and another (fiducial) free particle, as we have discussed above. An electromagnetic
wave is the propagation of the oscillation of the electromagnetic field, when detecting one
just needs to measure the acceleration of a charged particle relative to an inertial frame, whose
expression, namely a = (q/m) E,  is much simpler than the tidal acceleration. Suppose the
electromagnetic wave being detected is linearly polarized, then what corresponds to Fig. 7.23
is the much simpler Fig. 7.24. We have shown in Optional Reading 7.9.2 that the polarization
tensor of a gravitational wave will come back to itself after rotating by (an integer times) π
about the z-axis, while the polarization vector of an electromagnetic wave will come back
after rotating by (an integer times) 2π . This difference is also manifested by the polarization
patterns in Figs. 7.23 and 7.24: the pattern in any square in Fig. 7.23 (i.e., at any time) will
come back to itself after rotating by (an integer times) π about the direction of propagation
(the line perpendicular to the page that passes through the centre of symmetry), while the
pattern in any square in Fig. 7.24 will come back after rotating by (an integer times) 2π .
This difference between Figs. 7.23 and 7.24 reflects again the fact that photons are spin-1
while gravitons are spin-2, as mentioned in Optional Reading 7.9.2.
[The End of Optional Reading 7.9.5]

Exercises

˜7.1. Show that Maxwell’s equation in curved spacetime, ∇ a Fab = −4π Jb , contains
the law of conservation of charge, i.e., ∇a J a = 0. NB: ∇ a Fab = −4π Jb is
equivalent to (7.2.8) rather than (7.2.9), and hence this problem indicates that
(7.2.8) rather than (7.2.9) gives the charge conservation.
F ωa
˜7.2. Show that Ddτ = Dω

a
+ (Aa ∧ Z b )ωb , ∀ωa ∈ FG (0, 1).
7.9 Gravitational Radiation 327

˜7.3. Prove property (3) of the Fermi derivative in Proposition 7.3.1.


7.4. Show that a (nonvanishing) vector field va on a timelike curve G(τ ) with a
constant magnitude must undergo a spacetime rotation. Hint: Let u a ≡ Dva /dτ ,
then u a va = 0. First show that no matter whether va va vanishes or not, one
always has a vector field va on G(τ ) such that va va = 1. And then show that va

undergoes a spacetime rotation with the angular velocity 2-form ab ≡ 2v[a u b] .
7.5. Suppose {T, X, Y, Z } is a Lorentzian coordinate system in Minkowski space-
time, and the parametric equation of a curve G(τ ) is
T = A−1 sinh Aτ , X = A−1 cosh Aτ , Y =Z =0 (A is a constant).

(a) Show that G(τ ) is a timelike hyperbola (i.e., G in Fig. 6.43), τ is the proper
time, and A is the magnitude of the 4-acceleration Aa of G(τ ).
*(b) Show that any ray μ(s) starting from the origin o of the system {T, X, Y, Z }
that intersects G(τ ) is orthogonal to G(τ ).
*(c) Suppose the parameter s of μ(s) in (b) is the arc length of μ, as we collect
all of the rays μ(s) starting from o that intersect G(τ ), we obtain a spatial vec-
tor field wa ≡ (∂/∂s)a on G(τ ). Show that wa is Fermi transported along G(τ ).
*(d) Let Z a ≡ (∂/∂τ )a , and choose {Z a , wa , (∂/∂Y )a , (∂/∂ Z )a } as an orthonor-
mal tetrad field on G(τ ), find the proper coordinate system {t, x, y, z} of G(τ )
and specify its coordinate patch.
Answer: T = (A−1 + x) sinh At, X = (A−1 + x) cosh At, Y = y, Z = z.
(e) Write down the expression for the line element of the Minkowski metric
in the above proper coordinate system. Compute the Christoffel symbol of the
Minkowski metric in this system, and verify that it satisfies Lemma 7.4.3, i.e.,
(7.4.10).
7.6. Suppose G is a non-rotating, freely falling, instantaneous rest observer of a
point mass L at a point p ∈ L (i.e., the 4-velocity Z a of G and the 4-velocity
U a of L are tangent at p), Aa is the 4-acceleration of L at p, and a a is the
3-acceleration of L at p relative to G [defined by (7.4.3)]. Show that a a = Aa .
NB: This claim can be viewed as the generalization of Proposition 6.3.6 to
curved spacetime.
˜7.7. A metric gab is said to be Ricci flat if the Ricci tensor of gab vanishes. Show
that a necessary and sufficient condition for a 4-dimensional Lorentzian metric
gab being a solution to the vacuum Einstein equation is that gab is Ricci flat.
˜7.8. Suppose (M, gab ) is a Ricci flat spacetime (see the above problem for the def-
inition), and ξ a is one of the Killing vector fields of the spacetime. Show that
Fab := (dξ )ab satisfies the source-free (Ja = 0) Maxwell equation of (M, gab ).
Hint: use ∇a ξ a = 0 satisfied by any Killing vector field ξ a (the result of Exer-
cise 4.11).
7.9. Suppose γab satisfies (a) ∂ a γ̄ab = 0; (b) γ = 0; (c) γ0i = 0 (i = 1, 2, 3); (d)
γ00 = constant. Find an “infinitesimal” vector field ξ a such that γ̃ab ≡ γab +
∂a ξb + ∂b ξa satisfies the transverse-traceless gauge conditions:
(a) ∂ a γ̃¯ab = 0; (b) γ̃ = 0; (c) γ̃0i = 0 (i = 1, 2, 3); (d) γ̃00 = 0.
328 7 Foundations of General Relativity

7.10. Suppose gab = ηab + γab represents a gravitational wave with γab of the form
(7.9.30), {t, x i } is a Lorentzian coordinate system, ∇a is the derivative operator
associated with gab , and Z a ≡ (∂/∂t)a . Show that Z a ∇a Z b = 0, i.e., the t-
coordinate lines are geodesics. Hint: compute L Z gab by plugging in the ansatz
(7.9.30), contract it with Z a , then use (4.3.1 ) and the TT gauge condition.
NB: By a similar proof, this conclusion can also be applied to the elliptically
polarized waves of the form (7.9.63).
7.11. Prove Proposition 7.9.5.
7.12. Verify the properties ①–③ of {E i }a in (7.9.84).
7.13. Prove (7.9.86).
7.14. Prove (7.9.78), i.e., ∇ a ∇a P = (∂ 2 P/∂ x 2 ) + (∂ 2 P/∂ y 2 ).

References

Abbott, B. P. et al. (2016), ‘Observation of Gravitational Waves from a Binary Black Hole Merger’,
Phys. Rev. Lett. 116(6), 061102. arXiv:1602.03837.
Cai, R.-G., Cao, Z., Guo, Z.-K., Wang, S.-J. and Yang, T. (2017), ‘The Gravitational-Wave Physics’,
Natl. Sci. Rev. 4(5), 687–706. arXiv:1703.00187.
Carroll, S. M. (2019), Spacetime and Geometry, Cambridge University Press, Cambridge.
Chen, C.-M., Nester, J. M. and Ni, W.-T. (2017), ‘A brief history of gravitational wave research’,
Chin. J. Phys. 55, 142–169. arXiv:1610.08803.
Fock, V. A. (1939), ‘Sur le mouvement des masses finies d’Apres la theorie de gravitation Ein-
steinienne’, J. Phys. U.S.S.R. 1, 81–166.
Geroch, R. P. and Jang, P. S. (1975), ‘Motion of a body in general relativity’, J. Math. Phys.
16, 65–67.
Hawking, S. W. and Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
d’Inverno, R. A. (1992), Introducing Einstein’s Relativity, Clarendon Press, Oxford.
Liu, L. and Zhao, Z. (2004), General Relativity (in Chinese), Higher Education Press, Beijing.
Maggiore, M. (2018), Gravitational Waves: Volume 2: Astrophysics and Cosmology, Oxford Uni-
versity Press, Oxford.
Misner, C., Thorne, K. and Wheeler, J. (1973), Gravitation, W H Freeman and Company, San
Francisco.
Ohanian, H. C. and Ruffini, R. (1994), Gravitation and Spacetime, W W Norton and Company,
Inc., New York.
Sachs, R. K. and Wu, H. (1977), General Relativity for Mathematicians, Spinger-Verlag, New York.
Saulson, P. R. (2017), Fundamentals Of Interferometric Gravitational Wave Detectors, World Sci-
entific, Singapore.
Stephani, H., Kramer, D., MacCallum, M. A. H., Hoenselaers, C. and Herlt, E. (2003), Exact
Solutions of Einstein’s Field Equations, Cambridge University Press, Cambridge.
Straumann, N. (1984), General Relativity and Relativistic Astrophysics, Spinger-Verlag, Berlin.
Synge, J. L. (1960), Relativity: The General Theory, North-Holland Publishing Company, Amster-
dam.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Weber, J. (1961), General Relativity and Gravitational Waves, Wiley-Interscience, New York.
References 329

Will, C. M. (1995), Stable clocks and general relativity, in ‘30th Rencontres de Moriond: Euro-
conferences: Dark Matter in Cosmology, Clocks and Tests of Fundamental Laws’, pp. 417–428.
arXiv:gr-qc/9504017.
Will, C. M. (2014), ‘The confrontation between general relativity and experiment’, Living Reviews
in Relativity 17(1), 4. arXiv:1403.7377.
Will, C. M. (2018), Theory and Experiment in Gravitational Physics, Cambridge University Press,
Cambridge.
Chapter 8
Solving Einstein’s Equation

Solving Einstein’s Equation is an important problem in general relativity. Many exact


solutions play important roles in the study and development of general relativity.
Since Einstein’s equation is a highly nonlinear partial differential equation, finding
an (exact) solution in the general case is rather difficult. The first exact solution—
the vacuum Schwarzschild solution—was found by Karl Schwarzschild under the
premise that the spacetime is static and has spherical symmetry. Schwarzschild’s
solution, which is often regarded as one of the most important solutions in general
relativity, was found within two months after Einstein’s equation was published.1

8.1 Stationary Spacetimes and Static Spacetimes

Definition 1 A spacetime (M, gab ) is said to be stationary if it has a timelike Killing


vector field. In this case, we also call gab a stationary metric.

Suppose there exists a timelike Killing vector field ξ a in (M, gab ), whose integral
curves have the parameter t, i.e., ξ a = (∂/∂t)a . Choose any coordinate system {x μ }
where t is the zeroth coordinate (i.e., t = x 0 ) and the integral curve of ξ a is the
x 0 -coordinate line (namely the coordinate system adapted to ξ a , see Sect. 4.2). Let
gμν be the components of gab in this coordinate system, then

∂gμν
= (Lξ g)μν = 0 , (8.1.1)
∂t

1To be precise, within 34 days. Inspired by Einstein’s Mercury perihelion result of November 18,
1915, he looked for an exact solution. He communicated what he found in a letter to Einstein on
December 22, 1915. His solution was published in January 1916. Furthermore, this was in the
middle of World War I, and Schwarzschild was in the army on the Russian front!
© Science Press 2023 331
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_8
332 8 Solving Einstein’s Equation

where we used Theorem 4.2.2 in the first equality, and the second equality is due
to the fact that ξ a is a Killing vector field. Equation (8.1.1) indicates that all of the
components gμν are independent of the time coordinate t, i.e., gμν is “time-translation
invariant”. This is exactly where the term “stationary” comes from.
Inversely, if there exists a local coordinate system {x μ } in (M, gab ) such that

∂gμν
=0 (t ≡ x 0 is a timelike coordinate) , (8.1.2)
∂t

then ξ a ≡ (∂/∂t)a is a smooth vector field on the coordinate patch O, and {x μ }


is exactly a coordinate system adapted to this vector field. Hence, it follows from
Theorem 4.2.2 that
∂gμν
(Lξ g)μν = = 0,
∂t

which means that on O we have Lξ gab = 0, and thus ξ a ≡ (∂/∂t)a is a timelike


Killing vector field. Therefore, a stationary space can also be defined in terms of the
coordinate language as follows: if there exists a local coordinate system {x μ } (whose
coordinate patch is O) such that all of the components of gab are independent of the
timelike coordinate x 0 , then (O, gab ) is a stationary spacetime.
Intuitively speaking, a stationary spacetime corresponds to a gravitational field that
does not change with time. However, the notion of time depends on the observer. For
instance, since the Earth’s gravitational field on the ground is stronger than that in the
upper air, you (as an observer) will find that the Earth’s gravitational field “changes
with time” if you keep measuring the gravitational field while moving from the
ground up into the air. This certainly does not indicate that the Earth’s gravitational
field is not a stationary gravitational field. Thus, when judging the stationarity of
a gravitational field by means of an observer, one needs to choose an appropriate
observer (reference frame). If you somehow can keep yourself at a fixed height above
a certain point on the ground (your world line is parallel to a generatrix of the world
sheet of the Earth’s surface), you will see that the Earth’s gravitational field “does not
change with time”. That is, the spacetime corresponding to the Earth’s gravitational
field has the following property: there exists a specific class of timelike curves (which
coincides with the integral curves of the timelike Killing vector field), such that the
metric components measured by the observers whose world lines are these curves do
not change with time. Many spacetimes (e.g., the expanding universe) do not have
this property (i.e., do not have a timelike Killing vector field), and Definition 1 is
exactly the mathematical formulation of this property.
Example 1 Minkowski spacetime is a stationary spacetime, since the zeroth coordi-
nate basis vector field (∂/∂ x 0 )a of its Lorentzian coordinate system {x μ } is a timelike
Killing vector field.

Example 2 The metric of a certain 2-dimensional spacetime can be expressed in


some coordinate system {t, x} as ds 2 = −t −4 dt 2 + dx 2 , t > 0. Some people may
say this is not a stationary metric since its component g00 = −t −4 depends on the
8.1 Stationary Spacetimes and Static Spacetimes 333

time coordinate t. However, a simple coordinate transformation T = t −1 , X = x will


turn the line element into ds 2 = −dT 2 + dX 2 . This is nothing but a 2-dimensional
Minkowski metric, which is of course stationary!

Example 2 in a way suggests that confusion may arise if one does not take the
geometric perspective. Stationarity is an intrinsic property of the spacetime geometry,
which does not depend on the choice of the coordinate system. Note that both of the
following statements are wrong:
(1) (WRONG!) If some coordinate components gμν of the metric depend on the
timelike coordinate t of this coordinate system, then the spacetime is not stationary.
(2) (WRONG!) The spacetime in Example 2 is a stationary spacetime in the
coordinate system {T, X }, but is not a stationary spacetime in the coordinate system
{t, x}.

Definition 2 A vector field v a in (M, gab ) is said to be hypersurface orthogonal if


∀ p ∈ M there exists a hypersurface  that is everywhere orthogonal to v a such that
p ∈ .

Definition 3 A spacetime (M, gab ) is said to be static if it has a hypersurface orthog-


onal timelike Killing vector field. In this case, we also call gab a static metric.

Thus, a static spacetime must be stationary, but not vice versa.


Proposition 8.1.1 Suppose ξ a = (∂/∂t)a is a Killing vector field, and 0 = { p ∈
M|t ( p) = 0} is a hypersurface everywhere orthogonal to ξ a , then the hypersurface
t1 = { p ∈ M|t ( p) = t1 } is also everywhere orthogonal to ξ a .

Proof Exercise 8.1. Hint: t = φt [0 ], where φt is an element in the one-parameter


group of isometries corresponding to ξ a , i.e., an isometry. 


Suppose (M, gab ) is a static spacetime, ξ a = (∂/∂t)a is a timelike Killing field,


and 0 is a hypersurface orthogonal to ξ a . Choose the intersection of 0 and each
integral curve of ξ a as the zero of the curve’s parameter, and choose a local coordinate
system {x i } on 0 . Since we have ξ a = 0 at each point on 0 , we can “carry” these
three coordinates outside 0 (i.e., set the x i of each point on the integral curve of
ξ a to be the x i of the intersection of  and this curve), and take the parameter t of
each integral curve as the timelike coordinate x 0 (called the Killing coordinate time)
of each point on the curve, then we obtain a 4-dimensional local coordinate system
{t, x i }, whose t-coordinate lines are the integral curves of ξ a . Also, since the x i -
coordinate lines lie on the orthogonal surface t , the timelike coordinate basis vector
(∂/∂t)a is orthogonal to the spacelike coordinate basis vectors (∂/∂ x i )a . Therefore,

g0i = gab (∂/∂t)a (∂/∂ x i )b = 0 , i = 1, 2, 3 ,

and hence the expression for the line element of gab in this system is simplified as

ds 2 = g00 (x 1 , x 2 , x 3 )dt 2 + gi j (x 1 , x 2 , x 3 )dx i dx j . (8.1.3)


334 8 Solving Einstein’s Equation

Such a coordinate system is call a time-orthogonal coordinate system.


Suppose (M, gab ) is a stationary spacetime, then the reference frame correspond-
ing to the integral curves of the timelike Killing vector field ξ a is called a stationary
reference frame (“corresponding to” means to reparametrize the integral curves and
substitute the Killing time t with the proper time τ ). A stationary reference frame
whose ξ a is hypersurface orthogonal is called a static reference frame. An observer
is called a stationary (static) observer if they are an observer of a stationary (static)
reference frame. The t defined in Proposition 8.1.1 is called a surface of simul-
taneity of the static reference frame. Note that the “time” t here is the coordinate
time rather than the proper time τ of the static observer (unless g00 = −1); it is easy

to show that they have the following relation: dτ = −g00 dt.
A static spacetime has not only a time-translation invariance that any stationary
spacetime has, but also a time-reflection invariance (except for some possible subtle
cases). Suppose ξ a = (∂/∂t)a is a hypersurface orthogonal timelike Killing vector
field, then a time reflection transformation is referring to the diffeomorphism φ :
M → M satisfying t (φ( p)) = −t ( p), x i (φ( p)) = x i ( p), ∀ p ∈ M. Now we will
show that this φ is an isometry, and thus a static spacetime is said to be time reflection
invariant.
Suppose C(t) is the integral curve of ξ a passing through p, and p = C(t1 ).
From x i (φ( p)) = x i ( p) we can see that q ≡ φ( p) is also on C(t). First we
show that φ∗ [(∂/∂t)a | p ] = −(∂/∂t)a |q . Let v a ≡ (∂/∂t)a | p , u a ≡ −(∂/∂t)a |q , r ≡
C(t1 + t), and s ≡ φ(r ) (see Fig. 8.1). Suppose f is an arbitrary smooth function
on M, then the result of the vector φ∗ v a at q acting on f is

∂  1
(φ∗ v)( f ) = v(φ ∗ f ) = (φ ∗ f ) = lim [(φ ∗ f )|r − (φ ∗ f )| p ]
∂t t=t1 t→0 t

1
= lim ( f |s − f |q ) = u( f ) ,
t→0 t

and hence φ∗ v a = u a , i.e., φ∗ [(∂/∂t)a | p ] = −(∂/∂t)a |q . Similarly, one can show that

φ∗ [(∂/∂ x i )a | p ] = (∂/∂ x i )a |q , i = 1, 2, 3 .

Let gμν and (φ ∗ g)μν represent the components of gab and (φ ∗ g)ab , respectively, in
the system {t, x i }, then

Fig. 8.1 Time reflection


φ:M→M
8.1 Stationary Spacetimes and Static Spacetimes 335

Fig. 8.2 A strong static


spacetime becomes a weak
static spacetime after W is
removed

(φ ∗ g)00 | p = [(φ ∗ g)ab (∂/∂t)a (∂/∂t)b ]| p


= [gab (φ∗ ∂/∂t)a (φ∗ ∂/∂t)b ]|q
= [gab (∂/∂t)a (∂/∂t)b ]|q = g00 |q = g00 | p ,

where the last step is because of 0 = (Lξ g)μν = ∂gμν /∂t, i.e., gμν are constants along
C(t). Similarly, we have (φ ∗ g)i j | p = gi j | p , but (φ ∗ g)0i | p = −g0i | p . Luckily, g0i =
0 (where the hypersurface orthogonality is used), and hence (φ ∗ g)μν | p = gμν | p .
Noticing that p is arbitrary, we know that (φ ∗ g)ab = gab , and so φ : M → M is an
isometry.
[Optional Reading 8.1.1]
Technically, the definition of a Killing vector field has a strong version and a weak version.
The weak definition only cares about the local properties: any vector field ξ a satisfying the
Killing equation ∇(a ξb) = 0 (equivalent to Lξ gab = 0) is called a Killing vector field. This
ξ a may be incomplete, i.e., the range of its parameter t is not the whole R but an interval of
R. The strong definition, however, requires that ξ a be complete. Accordingly, the definitions
of stationary and static spacetimes also have a weak version and a strong one, depending on
whether or not the timelike Killing vector field is complete. When we are only concerned
with local issues, it is not necessary to emphasize the difference between them; however,
when global issues are involved, some conclusions only hold if the spacetime satisfies the
strong condition. For instance, if a region W is removed from a strong static spacetime
(M, gab ), this spacetime will become a weak static spacetime. Suppose what is shown in
Fig. 8.2 is the 0 in Proposition 8.1.1, then t1 = { p ∈ M|t ( p) = t1 } is meaningless when t
is sufficiently large, since the Killing field ξ a is not well-defined at the zero of the parameter
t of each integral curve inside the “shadow region” (and hence t is not well-defined). Thus,
it is possible that Proposition 8.1.1 only holds locally for a static spacetime.
In a word, the key difference between the strong and weak definitions is whether ξ a is
complete or not. ξ a generates a one-parameter group of isometries when it is complete,
while it only generates a one-parameter local group of isometries when it is incomplete. For
convenience’s sake, we usually omit the word “local” in the text.
[The End of Optional Reading 8.1.1]
336 8 Solving Einstein’s Equation

8.2 Spherically Symmetric Spacetimes

First we discuss a 2-dimensional sphere (S 2 , h ab ) in the 3-dimensional Euclidean


space (R3 , δab ), where h ab is the induced metric of δab . The expression for the line
element of h ab in the spherical coordinate system {θ, ϕ} is

ds 2 = r 2 (dθ 2 + sin2 θ dϕ 2 ) ,

where r is the radius of the sphere. Without loss of generality, here we only talk
about the unit sphere (r = 1), whose line element is

ds 2 = dθ 2 + sin2 θ dϕ 2 . (8.2.1)

It follows from the equation above that

ξ1a ≡ (∂/∂ϕ)a (8.2.2a)

is a Killing vector field, which reflects the invariance of (S 2 , h ab ) under a rotation


with respect to the z-axis. The integral curves of such rotations are all the circles
of latitude on the sphere (the circle at each of the two poles shrinks to a point), see
Fig. 8.3. It is intuitively not difficult to believe that (S 2 , h ab ) has maximal symmetry,
and thus should have 3 independent Killing vector fields. In fact, it does. It is not
difficult to verify that

ξ2a ≡ (∂/∂θ )a sin ϕ + (∂/∂ϕ)a cot θ cos ϕ , (8.2.2b)

and
ξ3a ≡ [ξ1 , ξ2 ]a = (∂/∂θ )a cos ϕ − (∂/∂ϕ)a cot θ sin ϕ (8.2.2c)

are also Killing fields, and ξ1a , ξ2a , ξ3a are linearly independent. From Sect. 4.3 we
have learned that the one-parameter group of diffeomorphisms corresponding to a
Killing vector field is a one-parameter group of isometries, and hence the collection
of all the isometries on (S 2 , h ab ) is a 3-parameter group, which is isomorphic to the
rotation group S O(3) of the 3-dimensional Euclidean space. Readers who are not
familiar with group theory do not have to worry too much about this, one just needs
to know that S O(3) is such a group, each element of which is a rotation that keeps
the origin in the 3-dimensional Euclidean space fixed (see Appendix G in Volume II
for details).
When talking about spacetime symmetries, one should pay attention to the rela-
tion and difference between isometries and diffeomorphisms. An isometry must be
a diffeomorphism, but not vice versa. Each smooth vector field corresponds to a
one-parameter group of diffeomorphisms (we will omit the term “local” from now
on), and so any manifold M has infinitely many one-parameter groups of diffeo-
morphisms. The collection of all the diffeomorphisms is a group of infinitely many
8.2 Spherically Symmetric Spacetimes 337

Fig. 8.3 The integral curves


of a Killing vector field on a
sphere

parameters, called the diffeomorphism group on M. Each Killing vector field on


(M, gab ) corresponds to a one-parameter group of isometries, which is a subgroup
of the diffeomorphism group on M. The collection of all the isometries is called
the isometry group of (M, gab ). Since a 4-dimensional spacetime has at most 10
independent Killing vector fields, the isometry group of this spacetime has at most
10 parameters. Suppose G 1 is a one-parameter group of diffeomorphisms on M,
then ∀ p ∈ M, the collection of points obtained by acting each element of G 1 on p
is called an orbit of G 1 passing through p (see Sect. 2.2). This definition of an orbit
can be carried over to any subgroup of the diffeomorphism group on M. It is not
difficult to see the following: suppose G 3 is the isometry group on (S 2 , gab ) [which
is isomorphic to S O(3)], then any orbit of G 3 passing through p ∈ S 2 is S 2 itself.

Definition 1 A spacetime (M, gab ) is said to be spherically symmetric if its isometry


group has a subgroup G 3 that is isomorphic to S O(3) and all the orbits of G 3
(except for the fixed points) are 2-dimensional spheres. These spheres are called
orbit spheres.

Remark 1 ① The isometry group of a spherically symmetric spacetime can be larger


than S O(3). For instance, the isometry group of Minkowski spacetime has 10 param-
eters, but it is a spherically symmetric spacetime, since it contains a subgroup iso-
morphic to S O(3), whose orbits (except for a fixed point) are all 2-dimensional
spheres. ② Precisely speaking, Definition 1 only defines a spherically symmetric
metric field rather than a spherically symmetric spacetime. If there exists a matter
field in spacetime (i.e., Tab = 0), then (M, gab ) is called a spherically symmetric
spacetime only if the metric field and the matter field are both spherically symmetric
(Sect. 8.6 will involve the relation between the symmetry of the matter field and the
symmetry of the metric field).

Fig. 8.4 An orbit sphere S on a surface  of simultaneity of an inertial frame in Minkowski


spacetime (with one dimension suppressed)
338 8 Solving Einstein’s Equation

The subgroup G 3 of the isometry group which is isometric to S O(3) corresponds


to three independent Killing vector fields ξ1a , ξ2a , and ξ3a . Suppose S is an orbit of
G 3 (a 2-sphere), then the integral curves of ξ1a , ξ2a , ξ3a starting from any point on S
all lie on S , and hence ξ1a , ξ2a , ξ3a at any point on S are all tangent to S . Suppose
ĝab is the 2-dimensional metric on S induced by gab , then from the definition of
an induced metric we can see that ξ1a , ξ2a , ξ3a on S are also Killing fields measured
by ĝab , and thus (S , ĝab ) has the maximal symmetry represented by ξ1a , ξ2a , and
ξ3a . Therefore (see Optional Reading 8.2.1 for a proof), ĝab can only be a standard
spherical metric h ab (the metric induced on a sphere by the 3-dimensional Euclidean
metric), i.e., there exists a constant K > 0 and a coordinate system {θ, ϕ} such that
the line element of ĝab can be expressed by

dŝ 2 = K (dθ 2 + sin2 θ dϕ 2 ) . (8.2.3)

Take Minkowski spacetime as an example. Suppose  is a surface of simultaneity


of an inertial frame. By assigning a set of concentric 2-spheres on  (see Fig. 8.4),
we can pick out from the 10-dimensional isometry group a subgroup G 3 isomorphic
to S O(3), whose orbit passing through any point p in  (except for the center o) is
the sphere that p lives in. The line element of the Minkowski metric in the chosen
inertial coordinate system is

ds 2 = −dt 2 + dr 2 + dŝ 2 ,

where
dŝ 2 = r 2 (dθ 2 + sin2 θ dϕ 2 ) .

Thus, for the Minkowski metric, the K in (8.2.3) is the square of the radius of
the orbit 2-sphere S which we have been discussing. To figure out the meaning
of K in a non-flat spacetime, a geometric concept will be helpful to us, namely
the area of S . Suppose ε̂ is the area element on S associated with ĝab , then the
area of S will be A = S ε̂. Also, ε̂ can be expressed using the coordinate system

{θ, ϕ} on S as ε̂ = ĝdθ ∧ dϕ, in which ĝ is the determinant of ĝab in the system
{θ, ϕ}. After reading off ĝi j from (8.2.3) we can find ĝ = K 2 sin2 θ , and hence
ε̂ = K sin θ dθ ∧ dϕ. Therefore,
 2π  π
A=K dϕ sin θ dθ = 4π K .
0 0

Thus, K is the area of the sphere divided by 4π . Define



A
r := , (8.2.4)

8.2 Spherically Symmetric Spacetimes 339

Fig. 8.5 A cylindrical


surfaces in the 3-dimensional
Euclidean space. The center
p of any circle in the surface
is not on the surface

and call r the radius, then K = r 2 , and (8.2.3) can be rewritten as

dŝ 2 = r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.2.5)

Seemingly, this is the same as the expression (8.2.3) for the dŝ 2 of Minkowski space-
time, and the r in each equation is called the radius. However, the radius r in general
does not necessarily have the meaning of “the distance between the center and each
point on S ”. In fact, the following three cases are all possible: ① There does not
exist a point that can be regarded as the center of S at all. Let us look at a simplified
example: suppose S 1 is a circle in the manifold R × S 1 (a cylindrical surface), then
the center of S 1 will not be on the manifold R × S 1 (Fig. 8.5). Similarly, in R × S 2
there does not exists a point that can be regarded as the center of S 2 either. ② There
exists a point in the spacetime that can be regarded as the center of S , but due to
the curved metric, the distance between S and this point is not equal to the radius r
defined by (8.2.4). ③ There exists more than one center of S .
[Optional Reading 8.2.1]
Before we wrote down (8.2.3) we have assumed the following claim: suppose (S , ĝab )
has the maximal symmetry represented by ξ1a , ξ2a , and ξ3a , then the line element of ĝab can
always be expressed as (8.2.3). Now we briefly introduce how to prove this claim. Suppose
the components of ĝab in the coordinate system {θ, ϕ} are ĝ11 , ĝ22 and ĝ12 , then from
ξ1a = (∂/∂ϕ) we can see that ĝ11 , ĝ22 and ĝ12 are not functions of ϕ. Writing down the
equations of the coordinate components of Lξ2 ĝab = 0 satisfied by ξ2 , and taking ĝ11 (θ),
ĝ22 (θ) and ĝ12 (θ) as functions to be solved for, we obtain ĝ12 = 0, ĝ11 = K (constant) and
ĝ22 = K sin2 θ. It is not difficult to verify that Lξ3 ĝab = 0, which completes the proof.
[The End of Optional Reading 8.2.1]
340 8 Solving Einstein’s Equation

Fig. 8.6 The orbit sphere


passing through any point on
 lies on  (with one
dimension suppressed). The
dashed line is an integral
curve of the vector field n a
normal to the orbit spheres

8.3 The Vacuum Schwarzschild Solution

8.3.1 Static Spherically Symmetric Metrics

Proposition 8.3.1 Suppose a static spherically symmetric spacetime (M, gab ) has
only one2 hypersurface orthogonal timelike Killing vector field ξ a , and G 3 is the
subgroup of its isometry group that is isometric to S O(3), then all of the orbit
spheres of G 3 must be orthogonal to ξ a .
Proof φ ∈ G 3 can be viewed as an isometry from M to M. Since whether or not
a vector field is timelike, Killing and hypersurface orthogonal are all determined
by the metric, one can believe that φ∗ ξ a is also a hypersurface orthogonal timelike
Killing vector field (see Exercise 4.12). Now that we only have one such vector field,
we have φ∗ ξ a = ξ a . Assume that ξ a is not orthogonal to an orbit sphere S of G 3 ,
then there exists a projection ξ̂ a of ξ a which is tangent to S . One can always find a
rotation φ̂ : S → S on the sphere such that ξ̂ a will change under this rotation, i.e.,
φ̂∗ ξ̂ a = ξ̂ a . However, φ̂ : S → S can be regarded as the result of some φ ∈ G 3
(φ : M → M) restricted to S . That is, as long as ξ̂ a is nonvanishing, there exists a
φ ∈ G 3 such that φ∗ ξ̂ a = ξ̂ a , and thus φ∗ ξ a = ξ a , which contradicts φ∗ ξ a = ξ a . 

Suppose  is a hypersurface orthogonal to ξ a , then according to Proposition 8.3.1,
an orbit surface of G 3 passing through any point of  lies on , as shown in Fig. 8.6.
Using this geometric property, we can further simplify the static line element (8.1.3).
To do this we only have to specify how to define the 3-dimensional local coordinate
system {x 1 , x 2 , x 3 } on the constant-t surface . x 1 can be defined using the radius
of the orbit sphere: the x 1 of each point is defined as the radius r of the orbit sphere
where the point stays. x 2 and x 3 can be defined using the “carry method”: suppose
S is an orbit sphere in , then it is a (2-dimensional) hypersurface in , on which
there exists a unit normal vector field n a tangent to . Since for any point on 
there exists an orbit sphere lying on  that passes through the point, n a is a vector
field defined on  whose integral curves (one of them is shown as the dashed line in
Fig. 8.6) are everywhere orthogonal to the orbit spheres. By choosing any spherical
coordinates θ and ϕ on S , we can “carry” these two coordinates to the other orbit
spheres by means of the integral curves of n a (that is, setting the values of θ and ϕ at

2Of course, ξ a multiplied by an arbitrary constant is also a Killing vector field. Here by “one” we
mean “one linearly independent”.
8.3 The Vacuum Schwarzschild Solution 341

each point on each integral curve as the values of them at the intersection of S and
this curve), then we get a local coordinate system {r, θ, ϕ} on . In this coordinate
system, gi j dx i dx j in (8.1.3) takes the simplest form. From the above definition of θ
and ϕ we can see that the integral curves of the normal vector field coincide with the
r -coordinate lines (only with different parameters), and thus gab (∂/∂r )a (∂/∂θ )b =
0, gab (∂/∂r )a (∂/∂ϕ)b = 0. Hence, the coefficients of the terms dr dθ and dr dϕ in
gi j dx i dx j vanish. Also considering that the induced metric of gi j dx i dx j on each
orbit sphere is given by (8.2.5), we have

gi j dx i dx j = g11 dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 ) ,

and therefore
ds 2 = g00 dt 2 + g11 dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.3.1)

According to (8.1.3), neither g00 nor g11 is a function of t. Considering the spherical
symmetry, we can believe that g00 and g11 are not functions of θ or ϕ either [motivated
readers may prove this using the property that θ and ϕ are constants on the integral
curves of (∂/∂r )a and (∂/∂t)a ]. Denote g00 and g11 as −e2 A(r ) and e2B(r ) , respectively,
then (8.3.1) becomes

ds 2 = −e2 A(r ) dt 2 + e2B(r ) dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.3.2)

This is a quite general line element expression for a spherically symmetric metric
that has a unique static Killing vector field in the above coordinate system {t, r, θ, ϕ}.
We emphasize that {t, r, θ, ϕ} is a local coordinate system in M, by which we mean
that its domain (coordinate patch) cannot be the whole manifold M. Surely, even the
coordinates θ and ϕ on each orbit sphere cannot be defined on the whole sphere (one
cannot use a coordinate system to cover the whole S 2 , see Sect. 2.1). Moreover, for
instance, a point where (dr )a = 0 is not in the coordinate patch of {t, r, θ, ϕ} (the
point at X = T = 0 in Fig. 9.13 is such a point).

8.3.2 The Vacuum Schwarzschild Solution

The static spherically symmetric metric satisfying the vacuum Einstein equation is
called the vacuum Schwarzschild solution, or Schwarzschild solution for short,
which in physics describes the outer gravitational field of a spherically symmetric
star (e.g., the Sun). We have pointed out in Chap. 7 that the vacuum Einstein equation
is equivalent to (see Exercise 7.7)

Rab = 0 . (8.3.3)

Since the general form of a static spherically symmetric metric (line element) (8.3.2)
only contains two undetermined functions of one variable, namely A(r ) and B(r ),
342 8 Solving Einstein’s Equation

solving this equation now becomes simple: one can just express the Ricci tensor
Rab in terms of these two functions, set it to zero, and then solve for A(r ) and B(r )
from the resulting differential equations. In Sect. 5.7 we have introduced in detail the
method and outcomes of computing the Riemann tensor of the line element (8.3.2)
using the orthonormal tetrad, from which we can easily obtain the expression of Rab
in terms of A(r ) and B(r ). To help the readers to better understand the coordinate
basis method of computing the curvature, here we compute Rab again directly using
the coordinate basis. First we compute the Christoffel symbols of the line element
(8.3.1). It follows from (3.4.19) that the nonvanishing Christoffel symbols are
0
01 = 0
10 =A , 1
00 = A e2(A−B) , 1
11 =B ,
1
1
22 = −r e−2B , 1
33 = −r sin2 θ e−2B , 2
12 = 2
21 = , (8.3.4)
r
1
2
33 = − sin θ cos θ , 3
13 = 3
31 = , 3
23 = 3
32 = cot θ ,
r

where stands for the derivative with respect to r . Plugging (8.3.4) into (3.4.21) we
find that the nonvanishing Rμν are

R00 = −e2(A−B) (−A + A B − A 2 − 2r −1 A ) , (8.3.5)


−1
R11 = −A + A B − A + 2r 2
B , (8.3.6)
−2B
R22 = −e [1 + r (A − B )] + 1 , (8.3.7)
R33 = −{e−2B [1 + r (A − B )] − 1} sin2 θ . (8.3.8)

Thus, Rab = 0 is equivalent to the following three differential equations for the
undetermined functions A(r ) and B(r ) [Equations (8.3.7) and (8.3.8) give the same
equation]:

−A + A B − A 2 − 2r −1 A = 0 , (8.3.9)
−1
−A + A B − A + 2r 2
B = 0, (8.3.10)
−2B
−e [1 + r (A − B )] + 1 = 0 . (8.3.11)

Subtracting (8.3.10) from (8.3.9) yields

A = −B , (8.3.12)

and hence
A = −B + α , α = constant. (8.3.13)

Noticing (8.3.12), (8.3.11) can be rewritten as an equation with only one undeter-
mined function B(r ):
1 − 2r B = e2B , (8.3.14)
8.3 The Vacuum Schwarzschild Solution 343

whose general solution is


 
C −1
e2B = 1 + , (8.3.15)
r

where C is a constant of integration. By a direct check we can see that (8.3.13) and
(8.3.15) also satisfy (8.3.9) and (8.3.10), and hence they are the general solutions of
the unsolved equations (8.3.9)–(8.3.11). Plugging the A and B in these two results
into the line element (8.3.2) yields
   
C 2α 2 C −1 2
ds 2 = − 1 + e dt + 1 + dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.3.16)
r r

Defining a new coordinate tˆ := eα t, we obtain


   
C C −1 2
ds = − 1 +
2
dtˆ + 1 +
2
dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.3.17)
r r

The fact that α is a constant assures that (∂/∂ tˆ)a is a Killing vector field just like
(∂/∂t)a . One may choose tˆ to be the Killing time coordinate in the first place when
the coordinate system {t, r, θ, ϕ} was defined, then the tˆ in (8.3.17) can be simply
written as t:
   
C C −1 2
ds = − 1 +
2
dt + 1 +
2
dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.3.17 )
r r

This is the vacuum Schwarzschild solution (Schwarzschild metric). When r is suffi-


ciently large, the equation above will approximately return to the expression for the
Minkowski line element in a spherical coordinate system, and thus the Schwarzschild
metric is asymptotically flat. However, when r → ∞, (8.3.16) can only approach

ds 2 = −e2α dt 2 + dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 ) ,

which shows one of the benefits of choosing tˆ as the time coordinate in the first place.
When r is sufficiently large, the linearized approximation of general relativity
(see Sect. 7.8.1) can be applied. Also, (1 + C/r )−1 ∼ = 1 − C/r , and hence (8.3.17 )
approximately gives

C 2
ds 2 = [−dt 2 + dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 )] − (dt + dr 2 ) .
r
The first term on the right-hand side of the above equation is a flat line element,
which can be rewritten as [−dt 2 + dx 2 + dy 2 + dz 2 ] by a coordinate transformation
x = r sin θ cos ϕ, y = r sin θ sin ϕ, z = r cos θ . Thus, the Schwarzschild metric can
be expressed as gab = ηab + γab when r is large, where the 00- (i.e., tt-) component
344 8 Solving Einstein’s Equation

Fig. 8.7 The spatial distance


between static observers G 1
and G 2 at t is the arc length
of the geodesic γ (lying on
t ) between p1 and p2

of the small quantity γab is γ00 = −C/r . Comparing with (7.8.35) we get φ = C/2r ,
and from Newton’s theory of gravity we also know that φ = −M/r (where M is the
mass of the star). Therefore, C = −2M, and hence (8.3.17 ) can be expressed as
   
2M 2M −1 2
ds 2 = − 1 − dt 2 + 1 − dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.3.18)
r r

This is the most common expression of the vacuum Schwarzschild solution, in which
M is the mass of the star. For a precise understanding of the concept of the “mass of
a star”, see Optional Reading 9.3.1 and Chap. 12 in Volume II.
Now, let us discuss the Schwarzschild metric in a more “physical” manner, i.e.,
we will discuss the spatial geometry outside a static spherically symmetric star.
The cylindrical surface in Fig. 8.7 represents the world sheet of the surface of a
static spherically symmetric star, and the spacetime geometry outside this surface
is described by the Schwarzschild metric. There exists a static reference frame in
Schwarzschild spacetime, in which each constant-t surface t can be interpreted
as the space in this reference frame at t. The intersecting surface S of t and the
cylindrical surface represents the surface of the star at t (which is suppressed as a
1-dimensional circle in the figure). Suppose G 1 and G 2 are two static observers who
have the same values of θ and ϕ, and the intersections p1 and p2 of their world
lines and t represent the positions of these two observers at t. The spatial geometry
outside of S in  is described by the induced metric h ab of the Schwarzschild metric;
the corresponding line element is
 
2M −1 2
dŝ 2 = 1 − dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.3.19)
r

Let us compute the spatial distance l between p1 and p2 . The distance between two
points in a Riemannian space (the metric is positive definite) is defined as the arc
length of the shortest curve among all the curves connecting these two points.3 It is
not difficult to show that the curve γ on t from p1 to p2 with θ and ϕ being constants
is the shortest curve between p1 and p2 , whose length (and thus the distance between
p1 and p2 ) is

3 Technically speaking, the distance between two points in a Riemannian space is defined as the
infimum of the set of the lengths of all the curves between these two points (as a subset of R).
8.3 The Vacuum Schwarzschild Solution 345

  r2  −1/2
2M
l= (h i j dx dx )
i j 1/2
= 1− dr > r2 − r1 ,
r1 r

where r1 and r2 are the r -coordinates of G 1 and G 2 , respectively. The equation above
indicates that the spatial distance between G 1 and G 2 at any time t is a constant (which
is a property of static observers). l is also called the proper distance between G 1
and G 2 , which is not equal to their coordinate distance r2 − r1 . This is exactly a
reflection of (t , h ab ) being non-Euclidean.
In this chapter, the main point regarding the Schwarzschild metric is about find-
ing the solution from Einstein’s equation. We will have a detailed discussion on
Schwarzschild spacetime later in Chap. 9.
To facilitate future lookup, here we list the components of the Christoffel sym-
bol and the Riemann tensor (with lower indices) of the Schwarzschild metric in
the Schwarzschild coordinate system as follows (in which x 0 , x 1 , x 2 , x 3 stand for
t, r, θ, ϕ, respectively):


M M ⎪
0
01 =
0
10 = (1 − 2M/r )−1 , 1
00 = (1 − 2M/r ) , ⎪


r2 r2 ⎪


1 M −1 , 1 1 2
11 = − 2 (1 − 2M/r ) 22 = −r (1 − 2M/r ) , 33 = −r (1 − 2M/r ) sin θ , ⎪
r ⎪ ⎪


2 2 1 2 3 3 1 3 3 ⎪

12 = 21 = , 33 = − sin θ cos θ , 13 = 31 = , 23 = 32 = cot θ ,
r r
(8.3.20)

2M M M ⎪

R0101 = − 3 , R0202 = (1 − 2M/r ) , R0303 = (1 − 2M/r ) sin2 θ , ⎬
r r r
M M ⎪
R1212 = − (1 − 2M/r )−1 , R1313 = − (1 − 2M/r )−1 sin2 θ , R2323 = 2Mr sin2 θ .⎪

r r
(8.3.21)

[Optional Reading 8.3.1]


We have repeatedly mentioned that “gravity is an effect of curved spacetime”. Now that
we have introduced the concept of a stationary spacetime, we can provide a deeper and more
specific interpretation for this. The earliest concept of gravity came from the study of the
motion of objects near the Earth. When you release an apple in your hand, it will fall to the
ground with an acceleration | g | = 9.8 m · s−2 , and so we say that the apple experiences the
Earth’s gravity, or the Earth produces a gravitational field outside itself. What does such an
important quantity | g | correspond to in general relativity? From the perspective of general
relativity, the apple undergoes geodesic motion with vanishing 4-acceleration. In contrast,
although you (as a stationary observer) feel that you are sitting comfortably in a chair, your
4-acceleration is nonzero. The 4-acceleration of a stationary observer is (see Exercise 8.3)

Aa = ∇ a ln χ , (8.3.22)
where χ ≡ (−ξa ξ a )1/2 , and ξ a is the timelike Killing vector field of the stationary space-
time. Since the 4-acceleration is orthogonal to the 4-velocity, Aa is a spatial vector field
on the world line of the stationary observer. This is an intrinsic vector field of the station-
ary spacetime geometry itself. The gravitational field strength g in the Newtonian language
must correspond to a certain intrinsic geometric quantity in general relativity. −Aa is exactly
346 8 Solving Einstein’s Equation

such a quantity, and thus can be called the “gravitational field” (gravitational acceleration
field) in the stationary spacetime. Now we will show that this terminology indeed agrees
g | = 9.8 m · s−2 in your mind. Consider
with the value of the gravitational field strength |
approximately that there is a Schwarzschild metric outside the Earth, then

χ ≡ (−ξa ξ a )1/2 = (−g00 )1/2 = (1 − 2M/r )1/2 ,

and (8.3.22) becomes


M
Aa = χ −1 ∇a χ = (1 − 2M/r )−1 (dr )a .
r2
Thus,
M M
|Aa | = gab Aa Ab = 2 (1 − 2M/r )−1 g ab (dr )a (dr )b = 2 (1 − 2M/r )−1 g 11 ,
r r
and hence
M
|Aa | = (1 − 2M/r )−1/2 . (8.3.23)
r2
Suppose the world lines of an apple G and a stationary observer G s are tangent at p (see
Fig. 8.8). Due to its free fall, G corresponds to an inertial observer in Minkowski spacetime,
and it follows from Proposition 6.3.6 and the equivalence principle that the 3-acceleration
a a of G at p relative to G s is equal to the negative of the (absolute) 4-acceleration of G s .
Also since a a is g, changing (8.3.23) back to the International System of Units (SI) we have
 
GM 2G M −1/2
|
g | = |Aa | = 1− 2 . (8.3.24)
r2 c r

Applying this to the Earth’s surface, and plugging in M = M⊕ = 6 × 1024 , r = r⊕ = 6.4 ×


106 , c = 3 × 108 , G = 6.7 × 10−11 , we find that the parentheses on the right-hand side of
the above equation equals 1 − 10−9 ∼ = 1. Hence,
G M⊕ ∼
g| ∼
| = 2 = 9.8 .
r⊕

Therefore, we say that there exists a gravitational field −Aa = −∇ a ln χ in a stationary


spacetime, which is the general relativity formulation for the Earth’s gravitational field g.
However, a new question arises: there is no stationary observer in a non-stationary spacetime,
and gravity in the above sense does not exist, so how do we interpret the statement “a curved
spacetime must have gravity”? As we have mentioned, as long as the spacetime is curved,
there will be a geodesic deviation effect (tidal effect), which can be referred to as a relative
gravitational effect. This effect is inherent to curved spacetime, which is different from the
gravitational effect in the former sense. (That is, in a stationary spacetime one can always
eliminate gravity in the first sense by choosing a freely falling elevator, but one cannot
eliminate the tidal effect). In fact, the geodesic deviation effect is a common property that
all the curved spacetimes share. When saying “a curved spacetime must have gravity”, for a
non-stationary spacetime this is referring to the relative gravity (tidal effect) between freely
falling bodies. When the spacetime curvature is everywhere vanishing, gravity in either sense
does not exist, and therefore we say “there is no gravity without curved spacetime”.
[The End of Optional Reading 8.3.1]
8.3 The Vacuum Schwarzschild Solution 347

Fig. 8.8 A freely falling


apple G has a 3-acceleration
a a = −Aa relative to the
stationary observer G s

[Optional Reading 8.3.2]


Suppose {t, r, θ, ϕ} is a Schwarzschild coordinate system outside an isolated static spher-
ically symmetric star, and G is a static observer outside the star, whose spatial coordinate
values are r = r G , θ = π/2 and ϕ = 0; G is a free observer undergoing circular motion
around the star due to the star’s gravitational field, whose θ value is always π/2 (always right
above the star’s equator). At the beginning (τ = 0), the world lines of G and G intersect
at p, and they intersect again at q after G goes around the star (see Fig. 8.9). It is easy to
see from (8.3.18) that the contribution from dt to the line elements of G and G are equal,
while the line element of G has another contribution from dϕ, namely r 2 sin2 θdϕ 2 , which
has an opposite sign. Hence, the curve G between p and q is shorter than G. How could a
timelike geodesic G be shorter than a non-geodesic G? First of all, “the length of a timelike
geodesic is a maximum” is talking about the comparison among the infinitesimally nearby
timelike curves (“local maximum”), while G and G are not nearby. Secondly, there does
exist a timelike curve that is infinitesimally close to G and longer than G , which is not
surprising since the necessary and sufficient condition for “the length of a timelike geodesic
is a maximum” to hold is that there does not exist a pair of conjugate points on the curve, and
G does not satisfy this condition. In fact, we can believe that (one can prove this based on the
definition of conjugate points in Optional Reading 7.6.3) there exist infinitely many pairs of
conjugate points on G [two points satisfying ϕ = ϕ0 and ϕ = ϕ0 + π (where 0  ϕ0 < π )
make a pair]. A timelike geodesic between p and q without conjugate points corresponds
to the following physical situation: suppose a free observer G is projected straight up with
some initial speed at an event p and falls freely (follows a radial geodesic), and then meets
the curve G again at q (see Fig. 8.9), then its world line is at least a local maximum due to
the nonexistence of conjugate points. It is actually longer than both G and G . [Note that
from (8.3.18) one can tell that the argument for G being shorter than G does not apply to
G since its r -value does not equal the r -value of G and so the contribution from dt to these
line elements are not equal].
[The End of Optional Reading 8.3.2]

8.3.3 Birkhoff’s Theorem

Schwarzschild showed that the static spherically symmetric solution to the vacuum
Einstein equation is the Schwarzschild solution, as we have introduced above. Later
it was found that the static condition can actually be removed, because in 1923
G. D. Birkhoff proved the following theorem: a spherically symmetric solution to
the vacuum Einstein equation must be static. Here we briefly sketch the idea of the
348 8 Solving Einstein’s Equation

Fig. 8.9 Observers G, G


and G part from each other
at an event p and reunite at
q. G and G are geodesics,
and their arc lengths have the
relation l G < l G < l G

(a) Spatial diagram (b) Spacetime diagram

proof. The general form of a static spherically symmetric line element is (8.3.2). If
one removes the static condition, the expression for the line element will not be as
simple, for example the coefficient of the cross term dtdr will be nonzero. However,
by an appropriate coordinate transformation, one can change the line element to the
same form as (8.3.2), and the only difference is that the functions of one variable
A(r ) and B(r ) now become functions of two variables A(t, r ) and B(t, r ). Let A ,
B , Ȧ and Ḃ represent ∂ A/∂r , ∂ B/∂r , ∂ A/∂t and ∂ B/∂t, respectively. Through a
procedure which is slightly more complicated than the computation in Sect. 8.3.2
[see Carmeli (1982); Stephani (1982)], we will still obtain the Schwarzschild line
element (8.3.18).
Birkhoff’s theorem is a powerful theorem, which asserts that as long as a non-static
matter distribution keeps being spherical symmetric (such as a star that is sharply
contracting, expanding, oscillating, or even exploding in the radial direction), the
external spacetime geometry will still be described by the vacuum Schwarzschild
solution. This provides great convenience for the study of stellar evolution (see Sects.
9.3 and 9.4).
Birkhoff’s theorem is very similar to the following theorem in electrodynamics:
the electromagnetic field of a spherically symmetric charge distribution (i.e., a spher-
ically symmetric solution to the vacuum Maxwell equations) must be an electrostatic
field. An electromagnetic wave is the propagation of a time-dependent electromag-
netic field in space, and “a spherically symmetric electromagnetic field must be an
electrostatic field” indicates that there does not exist any spherically symmetric elec-
tromagnetic wave. (A spherical electromagnetic wave is an electromagnetic wave
whose wavefront is a sphere; its electromagnetic field does not have spherical sym-
metry, and thus it is not a spherically symmetric electromagnetic wave). Similarly,
since a gravitational wave will not appear in a stationary gravitational field (station-
ary means time-independent), Birkhoff’s theorem indicates that there does not exist
any spherically symmetric gravitational wave. Noticing that spherically symmetric
radiation is monopole radiation, an equivalent statement of the conclusion above is:
there does not exist monopole electromagnetic or gravitational radiation. The major
contribution of electromagnetic radiation comes from dipole radiation. In contrast,
from Sect. 7.9 we can see that for gravity there exists neither monopole radiation
8.4 The Reissner-Nordström Solution 349

nor dipole radiation. The major contribution of gravitational radiation comes from
quadruple radiation. Table 8.1 provides a comparison between these two kinds of
radiation.
Later, it was found that the original formulation by Birkhoff was not precise
enough. The revised Birkhoff’s theorem can be formulated as follows: a spherically
symmetric solution to the vacuum Einstein equation must be the Schwarzschild
metric. The difference between this revised version and the original version is that
the extended Schwarzschild metric will be non-stationary in some spacetime region,
see Sect. 9.4.3 for details. The original Birkhoff’s theorem was first challenged by
A. Z. Petrov in 1963 [see Stephani et al. (2003) p. 232 and the references therein].
For a proof of the revised Birkhoff’s theorem, see Appendix B of Hawking and Ellis
(1973). Kuang and Liang (1988) further generalized this theorem by weakening the
spherical symmetry condition to “conformally spherical symmetry”. The definition
of the term “conformal” will be introduced in Sect. 12.1 (Volume II).

8.4 The Reissner-Nordström Solution

8.4.1 Electrovacuum Spacetimes and the Einstein-Maxwell


Equations

The Schwarzschild metric describes the curved spacetime (vacuum) outside a static
spherically symmetric star. Many actual stars (or celestial bodies) carry electric
charges, and their exterior spacetime is not vacuum but filled with an electromag-
netic field. A spacetime with only an electromagnetic field but without a matter
field is called an electrovacuum (or electrovac for short) spacetime. The Tab in the
electrovacuum Einstein equation G ab = 8π Tab is the energy-momentum tensor for
some electromagnetic field Fab (we will only talk about source-free electromagnetic
fields), i.e.,
1 1
Tab = (Fac Fb c − gab Fcd F cd ) . (8.4.1)
4π 4
Hence, the electrovacuum Einstein equation can also be expressed as
1 1
G ab ≡ Rab − Rgab = 2(Fac Fb c − gab Fcd F cd ) , (8.4.2)
2 4

Table 8.1 Comparative table for gravitational radiation and electromagnetic radiation
Monopole radiation Dipole radiation Quadrupole radiation
Electromagnetic Nonexistant Exists (major) Exists
radiation
Gravitational radiation Nonexistant Nonexistant Exists (major)
350 8 Solving Einstein’s Equation

where Fab satisfies the source-free Maxwell equations in curved spacetime

∇ a Fab = 0 , (8.4.3a)
∇[a Fbc] = 0 . (8.4.3b)

Here ∇a is the derivative operator associated with the metric gab , and gab must satisfy
(8.4.2). Thus, an electrovacuum spacetime is determined by three ingredients: a
background manifold M, a metric field gab and an electromagnetic field Fab , among
which gab and Fab are the solutions of the simultaneous equations formed by (8.4.2)
and (8.4.3). This system of equations is called the Einstein-Maxwell equations. It
is easy to show from (8.4.1) that (Exercise 8.4) the trace of the energy-momentum
tensor Tab of the electromagnetic field is T ≡ g ab Tab = 0, and hence from Einstein’s
equation Rab − 21 Rgab = 8π Tab one can easily see that (Exercise 8.4) the scalar
curvature R = 0. Therefore, the electrovacuum Einstein equation can be simplified
as
Rab = 8π Tab . (8.4.4)

Based on their physical properties, electromagnetic fields Fab can be classified


into null electromagnetic fields and nonnull electromagnetic fields. Define a complex
tensor field
ab := Fab + i∗ Fab , (8.4.5)

where ∗ Fab is the Hodge dual of Fab . Fab is called a null electromagnetic field if

ab  ab = 0 , (8.4.6)

otherwise it is called a nonnull electromagnetic field. It is easy to show that (Exer-


cise 8.5)
ab  ab = 2(Fab F ab + iFab ∗ F ab ) , (8.4.7)

and thus the null condition (8.4.6) of an electromagnetic field is equivalent to

Fab F ab = 0 , (8.4.8a)

and
Fab ∗ F ab = 0 . (8.4.8b)

The electric field and magnetic field measured by an instantaneous observer ( p, Z a )


at a point p are by definition E := Fab Z b and Ba := −∗ Fab Z b (see Sect. 6.1.1), from
which one can show that (see Exercise 6.15)

Fab F ab = 2(B 2 − E 2 ) , (8.4.9)


Fab ∗ F ab = 4 E · B ≡ 4g ab E a Bb . (8.4.10)
8.4 The Reissner-Nordström Solution 351

Thus, although both E and B depend on the observer, B 2 − E 2 and E · B are two
invariants (i.e., scalar fields). (In fact, these are the only two independent invariants
that one can construct out of E and B). The two equations above indicate that (8.4.8)
is equivalent to

B2 = E 2 , (8.4.11a)
E · B = 0 . (8.4.11b)

These two equations indicate that the E and B measured by an instantaneous observer
are orthogonal and have the same magnitude, which are exactly the two basic prop-
erties of an electromagnetic plane wave in Minkowski spacetime. It can be proved
that (see Appendix D in Volume II), suppose in an arbitrary spacetime there exists
a null electromagnetic field Fab whose energy-momentum tensor is Tab , then the
4-momentum density W a ≡ −T a b Z b of Fab (see Sect. 6.4) measured by an instan-
taneous observer ( p, Z a ) is a future-directed null vector.

8.4.2 The Reissner-Nordström Solution

Now we will solve the Einstein-Maxwell equations of a static spherically symmetric


star. According to the discussion in Sect. 8.3.1, in the static spherically symmetric
case one can choose a coordinate system {x μ } ≡ {t, r, θ, ϕ} adapted to two geomet-
ric properties (staticity and spherical symmetry) of the metric and express the line
element as the following simple form [i.e., (8.3.2)]:

ds 2 = −e2α(r ) dt 2 + e2β(r ) dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.4.12)

[The A(r ) and B(r ) in (8.3.2) may be confused with the 4-potential A and the
magnetic field B, and hence we now denote them by α(r ) and β(r )]. This coordinate
system can not only simplify the line element, but also simplify the components of
the electromagnetic field. The electromagnetic field Fab produced by a charged static
spherically symmetric star is also static and spherically symmetric. The components
Aμ of its electromagnetic 4-potential Aa are independent of the coordinates t, θ, ϕ,
and there is no component tangent to the orbit sphere, i.e., A2 = A3 = 0. Note that Aa
has a gauge freedom: suppose χ is an arbitrary function of r , then Ãa = Aa + ∇a χ
and Aa correspond to the same Fab . From this equation we get

Ã1 = (∂/∂r )a (Aa + ∇a χ ) = A1 + ∂χ /∂r .

Thus, for any given Aa one can always choose a suitable χ (r ) such that Ã1 = 0, and
hence A0 can be regarded as the only component of Aa . Also from

Fμν = 2∂[μ Aν] = ∂μ Aν − ∂ν Aμ


352 8 Solving Einstein’s Equation

we can see that the only nonvanishing Fμν are

dA0
− F01 = F10 = ∂1 A0 = , (8.4.13)
dr
i.e., Fab only has one independent component F01 , whose expression can be obtained
by solving Maxwell’s equations (8.4.3). Equation (8.4.3b) is automatically satisfied
since it follows from F = d A that dF = d(d A) = 0. The coordinate component
form of (8.4.3a) reads
F μν ;μ = 0 , ν = 0, 1, 2, 3 . (8.4.14)

Using a similar way of deriving (3.4.26) we get

1 ∂ √  1 ∂ √ 
F μν ;μ = √ μ
−g F μν + ν
σμF
μσ
=√ μ
−g F μν , (8.4.15)
−g ∂ x −g ∂ x

and it follows from (8.4.13) and (8.4.12) that the only nonvanishing −g F μν are
√ √
−g F 01 = − −g F 10 = r 2 F10 e−(α+β) sin θ . Hence, when ν = 1, 2, 3, (8.4.14) are
identities, and when ν = 0 it gives

d 2
[r F10 (r )e−α(r )−β(r ) ] = 0 ,
dr
whose general solution is

Q α+β
F10 = e , where Q = constant. (8.4.16)
r2
So far, an electromagnetic field Fab satisfying Maxwell’s equations has the following
expression:
Q
Fab = − 2 eα+β (dt)a ∧ (dr )b . (8.4.17)
r
The equation above still contains undetermined functions α(r ) and β(r ), which
should be obtained from Einstein’s equation (8.4.4). From the very beginning, we
have two sets of undetermined functions, namely {Fμν (r )} and {α(r ), β(r )}. Do not
naively think that the former only appears in Maxwell’s equations and the latter only
appears in Einstein’s equation, so that they can be solved independently. In truth, both
of them appear in both sets of equations, and thus the Einstein-Maxwell equations
are coupled equations, which means they are interdependent on each other. Now we
will solve Einstein’s equation Rab = 8π Tab . In order to do so, first we compute the
energy-momentum tensor Tab of Fab . It follows from (8.4.1) and (8.4.12) that the
nonvanishing coordinate components of Tab are
2 −2β 2 −2α
T00 = F10 e /8π , T11 = −F10 e /8π ,
2 −2(α+β) 2 −2(α+β)
(8.4.18)
T22 = r 2 F10 e /8π , T33 = r 2 F10 e sin2 θ/8π .
8.4 The Reissner-Nordström Solution 353

On the other hand, the expressions for the nonvanishing coordinate components Rμν
of the Ricci tensor Rab are given by (8.3.5)–(8.3.8), and hence the component equa-
tions for Einstein’s equation (8.4.4), R00 = 8π T00 and R11 = 8π T11 , are equivalent
to

−e2(α−β) (−α + α β − α 2 − 2r −1 α ) = F10


2 −2β
e , (8.4.19)
−1 2 −2α
−α + α β − α + 2r2
β = −F10 e . (8.4.20)

We can easily get from the two equations above that α = −β , which is the same as
(8.3.12) in the process of finding the Schwarzschild solution; hence, here we can also
set α = −β by redefining t. Under this premise, we can see from (8.4.16) that the
remaining two component equations R22 = 8π T22 and R33 = 8π T33 are equivalent
to
Q2
(r e2α ) = 1 − 2 .
r
Hence,
Q2 C
e2α = 1 + + , (8.4.21)
r2 r
and thus  
Q2 C −1
e2β = 1 + 2 + . (8.4.22)
r r

Plugging into (8.4.12) yields the spacetime line element


   
Q2 C Q2 C −1 2
ds 2 = − 1 + 2 + dt 2 + 1 + 2 + dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) ,
r r r r
(8.4.23)
and plugging α = −β into (8.4.16) yields

Q
F10 = . (8.4.24)
r2
One can now check that these expressions for α, β and F10 do satisfy (8.4.19) and
(8.4.20). When r is sufficiently large, Q 2 /r 2  C/r , and hence (8.4.23) becomes
approximately
   
C C −1 2
ds 2 ∼
=− 1+ dt 2 + 1 + dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.4.25)
r r

From the physical perspective, when r is sufficiently large, the gravitational field
of a charged spherically symmetric star should approximately obey Newton’s the-
ory of gravity, and the spacetime metric should be approximately the same as the
Schwarzschild metric, and thus C = −2M. On the other hand, the star can be viewed
354 8 Solving Einstein’s Equation

as a point charge when r is sufficiently large, and the F10 it produces should be equal
to its electric charge divided by r 2 , and hence from (8.4.24) we can see that the phys-
ical meaning of the constant Q is the electric charge of the star. Therefore, ultimately
(8.4.23) can be written as
   
2M Q2 2M Q 2 −1 2
ds = − 1 −
2
+ 2 dt + 1 −
2
+ 2 dr + r 2 (dθ 2 + sin2 θ dϕ 2 ) ,
r r r r
(8.4.26)
which is called the Reissner-Nordström line element (or RN line element for
short). It describes the exterior spacetime geometry of a static spherically symmetric
star (object) with a mass M and electric charge Q, whose corresponding electromag-
netic field Fab and 4-potential Aa are

Q Q
Fab = − (dt)a ∧ (dr )a , Aa = − (dt)a . (8.4.27)
r2 r
The metric gab expressed by (8.4.26) together with the electromagnetic field expressed
by (8.4.27) form the RN solution of the Einstein-Maxwell equations.
Now let us have some discussion on the electromagnetic field of the RN solution.
From (8.4.27) we can easily obtain Fab F ab = −2Q 2 /r 4 = 0, and thus the Fab of RN
spacetime is a nonnull electromagnetic field. People always say that the electromag-
netic field of RN spacetime is an electrostatic field. To understand this statement, one
should notice that an observer needs to be specified when talking about an electric
field and magnetic field. Now we will show that the electric field and magnetic field
for the Fab of an RN solution measured by a static observer G are, respectively, an
electrostatic field and zero. The 4-velocity of G is

Z a = f 1/2 (∂/∂t)a [where f ≡ 1 − (2M/r ) + Q 2 /r 2 ] .

Normalizing the dual coordinate basis vectors (dr )a , (dθ )a , (dϕ)a , we have the
orthonormal spatial triad of G:

(e1 )a = f −1/2 (dr )a , (e2 )a = r (dθ )a , (e3 )a = r sin θ (dϕ)a .

It is easy to show that (Exercise 8.5) the electric field E a ≡ Fab Z b and magnetic
field Ba ≡ −∗ Fab Z b measured by G are E a = rQ2 (e1 )a and Ba = 0, or

Q
Ea = (e1 )a , Ba = 0 [where (e1 )a ≡ f 1/2 (∂/∂r )a ] . (8.4.28)
r2
Thus, the result of Fab measured by a static observer in RN spacetime is an elec-
trostatic field generated by a point charge Q and with no magnetic field, which also
confirms the fact that Fab is nonnull.4

4 In Volume II we will introduce the electromagnetic duality transformation, which only changes
the formulation but does not change the essence of the physics. For instance, one can either say that
8.5 Axisymmetric Metrics [Optional Reading] 355

If we do not assume that the metric is static, i.e., we change the α(r ) and β(r )
in (8.4.12) to α(t, r ) and β(t, r ), then we will arrive at exactly the same result as
we obtained above. [For details of the derivation, see Carmeli (1982)]. This can be
regarded as a generalization of Birkhoff’s theorem: the electrovacuum spherically
symmetric solution to Einstein’s equation must be the RN solution.

8.5 Axisymmetric Metrics [Optional Reading]

Many celestial bodies also have rotation. Due to the rotation, the symmetry of a spherically
symmetric star will be degraded to axial symmetry. Moreover, an axisymmetric matter dis-
tribution will have axial symmetry whether or not it has any rotation with respect to the
axis. Mathematically speaking, a metric gab is said to be axisymmetric if there exists a one-
parameter group of isometries whose orbits (except for the fixed points) are closed spacelike
curves. Thus, in an axisymmetric spacetime there exists a spatial Killing vector field ψ a
whose integral curves are closed curves. An axisymmetric metric gab is said to be stationary
axisymmetric if it has a timelike Killing field ξ a , and ξ a commutes with the Killing field ψ a
which represents the axial symmetry:

[ξ, ψ]a = 0 . (8.5.1)


Using this commutativity we can choose a coordinate system {x 0 ≡ ≡
t, x 1 ϕ, x 2 , x 3 } such
that ξ a = (∂/∂t)a , ψ a = (∂/∂ϕ)a . Suppose gμν are the components of gab in this system,
then it follows from (4.2.3) and the fact that ξ a and ψ a are Killing that

∂gμν ∂gμν
= (Lξ g)μν = 0 , = (Lψ g)μν = 0 , (8.5.2)
∂t ∂ϕ

and hence gμν can only be functions of x 2 and x 3 . In order to further simplify the solving
process, here we only discuss the stationary axisymmetric metrics satisfying the following
condition: ∀ p ∈ M, ∃ a 2-dimensional surface S passing through p and orthogonal to both
ξ a | p and ψ a | p . That is, for any vector u a at p that is tangent to S we have gab u a ξ b | p =
gab u a ψ b | p = 0. (Note that since a 2-dimensional surface in a 4-dimensional spacetime is
not a hypersurface, it has more than one linearly independent normal vector, see Fig. 8.10).
Many important stationary axisymmetric metrics satisfy this condition. Choose an arbitrary
coordinate system {x 2 , x 3 } on an orthogonal surface S0 , carry x 2 and x 3 to any point outside
S0 using the integral curves of ξ a and ψ a (i.e., set the x 2 and x 3 on each integral curve
as constants), and set the zeros of the Killing parameters t and ϕ such that t and ϕ are
constants on each orthogonal surface S (from a proposition similar to Proposition 8.1.1
we can see that this is always possible). In this way we obtain a local coordinate system
{x 0 ≡ t, x 1 ≡ ϕ, x 2 , x 3 }, where the coordinate lines of x 0 and x 1 are the integral curves of
ξ a and ψ a , respectively, while the coordinate lines of x 2 and x 3 lie on the orthogonal surface
S. Thus, the components gμν of gab in this system satisfy

a charged static star carries electric charge but no magnetic charge, or say that it carries magnetic
charge but no electric charge, or even say that it has both electric and magnetic charges (and the
amount is flexible, as long as the sum of the squares of them is invariant). When we discuss the RN
solution in this section we adopt the most common formulation, i.e., the star carries only electric
charge but no magnetic charge, and the corresponding electromagnetic field has only an electrostatic
field but no magnetic field.
356 8 Solving Einstein’s Equation

Fig. 8.10 S is a
2-dimensional surface
orthogonal to both ξ a and
ψ a (with one dimension
suppressed in the figure)

g02 = g20 = gab ξ a (∂/∂ x 2 )b = 0 , g03 = g30 = gab ξ a (∂/∂ x 3 )b = 0 ,


g12 = g21 = gab ψ a (∂/∂ x 2 )b = 0 , g13 = g31 = gab ψ a (∂/∂ x 3 )b = 0 .

Let V ≡ −g00 = −gab ξ a ξ b , W ≡ g01 = gab ξ a ψ b , X ≡ g11 = gab ψ a ψ b , then the line ele-
ment can be expressed as

ds 2 = −V dt 2 + X dϕ 2 + 2W dtdϕ + g22 (dx 2 )2 + g33 (dx 3 )2 + 2g23 dx 2 dx 3 . (8.5.3)

From (8.5.2) we know that V, X, W, g22 , g33 , g23 can only be functions of x 2 and x 3 , and thus
solving Einstein’s equation can be boiled down to the problem of finding these 6 functions
of two variables. However, the problem can be further simplified. Define a function ρ using
the following equation:
ρ 2 := V X + W 2 . (8.5.4)
V, X, W are not functions of t and ϕ, which leads to ξ a ∇a ρ = ∂ρ/∂t = 0 and ψ a ∇a ρ =
∂ρ/∂ϕ = 0, i.e., ∇ a ρ is orthogonal to ξ a and ψ a , and thus is tangent to each S. We will do
two things on the surface S0 : ① choose ρ as the second coordinate x 2 , ② take any constant-ρ
line and define arbitrarily a 1-dimensional coordinate z on the line, and then carry z to the
other points on S0 using the integral curves of ∇ a ρ. The coordinate basis vector (∂/∂ρ)a of
the 2-dimensional coordinate system {x 2 ≡ ρ, x 3 ≡ z}5 obtained in this way is orthogonal
to (∂/∂z)a , and hence g23 | S0 = 0. Carry the x 2 and x 3 outside S0 using the integral curves
of ξ a and ψ a as we mentioned above, then we get a coordinate system {x μ }, in which
x 0 ≡ t, x 1 ≡ ϕ, x 2 ≡ ρ, x 3 ≡ z. Two points needs to be elucidated: ① ρ is defined by
(8.5.4), while x 2 ≡ ρ is only defined on S0 , and then we carry it outside the surface. Why
do we also have x 2 ≡ ρ outside the surface? This is the outcome of ξ a ∇a ρ = 0, ξ a ∇a x 2 =
0 (requirements of the carry method) [and the corresponding ψ a ∇a ρ = 0, ψ a ∇a x 2 = 0]
together with (x 2 − ρ)| S0 = 0. ② From g23 | S0 = 0, ξ c ∇c g23 = 0 and ψ c ∇c g23 = 0 one can
easily see that g23 = 0 holds on the whole coordinate patch. The proof of the latter two
equations are as follows (here we only take ξ c ∇c g23 = 0 as an example):

ξ c ∇c g23 = ξ c ∇c [gab (∂/∂ x 2 )a (∂/∂ x 3 )b ] = Lξ [gab (∂/∂ x 2 )a (∂/∂ x 3 )b ]


= gab [Lξ (∂/∂ x 2 )a ](∂/∂ x 3 )b + gab (∂/∂ x 2 )a Lξ (∂/∂ x 3 )b = 0 ,

where we used Lξ (∂/∂ x 2 )a = [ξ, ∂/∂ x 2 ]a = [∂/∂t, ∂/∂ x 2 ]a = 0 and Lξ (∂/∂ x 3 )a = 0 in


the last step.
Now let 2 ≡ g22 ,  ≡ g33 /2 , w = W/V , then (8.5.3) can be rewritten as

ds 2 = −V (dt − wdϕ)2 + V −1 ρ 2 dϕ 2 + 2 (dρ 2 + dz 2 ) . (8.5.5)


Thus, the number of functions of two variables that determine the components of the metric
is reduced from 6 to 4, namely V (ρ, z), w(ρ, z),(ρ, z) and (ρ, z). If the equation to
be solved is the vacuum Einstein equation, then (8.5.5) can also be simplified as [see Wald
(1984) p. 166]

5This definition will be invalid when ∇a ρ = 0, and hence the coordinate patch does not contain
points with ∇a ρ = 0.
8.6 Plane Symmetric Metrics [Optional Reading] 357

1
ds 2 = −V (dt − wdϕ)2 + V −1 [ρ 2 dϕ 2 + e2γ (dρ 2 + dz 2 )] , γ ≡ ln(V 2 ) .
2
(8.5.6)
The equation above indicates that the undetermined functions of two variables are now
reduced from 4 to 3, namely V (ρ, z), w(ρ, z) and γ (ρ, z). In the special case of V = 1, w =
γ = 0, the equation above will turn into the line element expression of the Minkowski metric
in the cylindrical coordinate system

ds 2 = −dt 2 + ρ 2 dϕ 2 + dρ 2 + dz 2 .

Readers interested in the derivation of (8.5.6) may refer to Chap. 20 in Stephani et al. (2003),
while those who only want to see the conclusion and a sketch of the derivation may refer to
Wald (1984) pp. 166–168.
An important example of a stationary axisymmetric solution to the vacuum Einstein solution
is the Kerr solution, which describes the exterior spacetime geometry of a particular kind of
uncharged rotating star,6 see Chap. 13 for details.
If an axisymmetric metric also has translational invariance along the axis of symmetry, then
it is called a cylindrically symmetric metric. Precisely speaking, besides the Killing vector
field reflecting the axial symmetry, for a cylindrically symmetric metric there also exists a
Killing vector field ηa reflecting the “translational invariance along the axis”, which satisfies
① [η, ψ]a = 0; ② the integral curves of ηa are homeomorphic to R.
Readers interested in cylindrically symmetric metrics may refer to Chap. 22 in Stephani et al.
(2003).

8.6 Plane Symmetric Metrics [Optional Reading]

Before the definition of a spherically symmetric metric was given in Sect. 8.2, we have
discussed the symmetry of a 2-dimensional surface (S 2 , h ab ) in 3-dimensional Euclidean
space. In a similar sense, we shall go over the symmetry of a 2-dimensional Euclidean plane
(R2 , δab ) before introducing the definition of a plane symmetric metric. In a simple manner,
we have found all 3 independent Killing vector fields of (R2 , δab ) in Example (1) of Sect.
4.3, i.e., ξ1a ≡ (∂/∂ x)a and ξ2a ≡ (∂/∂ y)a reflecting the translational invariance and ξ3a ≡
−y(∂/∂ x)a + x(∂/∂ y)a reflecting the rotational invariance. From the linear combinations
of ξ1a , ξ2a , ξ3a one can have infinitely many Killing vector fields (note that the coefficients
should be constants instead of functions on R2 ), and the corresponding isometries form a
3-parameter group of isometries, called the Euclidean group, denoted by E(2) (see Sect.
G.5.5 in Volume II for details). Following the definition of a spherically symmetric metric
(see Definition 1 of Sect. 8.2), we have the following definition of a plane symmetric metric:

Definition 1 A spacetime metric gab is said to be plane symmetric if its group of isometries
has a subgroup G 3 that is isomorphic to E(2), and all the orbits of G 3 are 2-dimensional
planes.

H. Taub proved the following theorem [Taub (1951)]: a plane symmetric solution to the
vacuum Einstein equation must be a static metric, whose line element expression is
1
ds 2 = √ (−dT 2 + dZ 2 ) + (1 + k Z )(dX 2 + dY 2 ) , (8.6.1)
1 + kZ

6 Not the exterior spacetimes of all uncharged rotating stars can be described by the Kerr solution,
see Hawking and Ellis (1973) p. 161 for this caveat.
358 8 Solving Einstein’s Equation

where k is a constant. The coefficient of (−dT 2 + dZ 2 ) being positive indicates that T and
Z are respectively timelike and spacelike coordinates. The components of the metric not
containing T means that (∂/∂ T )a is a timelike Killing field, and thus the metric is static. At
the beginning, Taub’s paper only required the metric to have the plane symmetry, i.e., it only
required three Killing vector fields (∂/∂ X )a , (∂/∂Y )a and −Y (∂/∂ X )a + X (∂/∂Y )a , based
on which he showed that it must contain the fourth (extra) Killing vector field (∂/∂ T )a .
This is very much like Birkhoff’s theorem. Moreover, Taub’s original theorem has the same
shortcoming as the Birkhoff’s theorem: it omitted another possibility when deriving (8.6.1)
which is on an equal footing with it. In fact, it can be proved from the vacuum condition
and the plane symmetry that the metric will have either the form of (8.6.1) or the following
form:
1
ds 2 = − √ (−dT 2 + dZ 2 ) + (1 + k Z )(dX 2 + dY 2 ) . (8.6.2)
1 + kZ
The coefficient of (−dT 2 + dZ 2 ) in the above equation is negative, which means Z is a
timelike coordinate and T is a spacelike coordinate. The metric components not depending on
T indicates that (∂/∂ T )a is a spacelike Killing field; together with the other two spatial Killing
fields (∂/∂ X )a and (∂/∂Y )a , this indicates that the spacetime is spatially homogeneous,
since it has the translational invariance in the three spatial directions (represented by the T -,
X - and Y -axes). This metric does not have a timelike Killing vector field, and hence is not
static. Thus, Taub’s theorem should be revised as follows: a plane symmetric solution to the
vacuum Einstein equation is either static or spatially homogeneous.
Another drawback of Taub’s original paper is that (8.6.1) contains an arbitrary constant
k, which may mislead people to think that (8.6.1), just like the Schwarzschild metric, is
a one-parameter family of metrics. (Indeed, the parameter M of the Schwarzschild metric
indicates that it is a one-parameter family). In the case k = 0, we introduce new coordinates
t = k −1/3 T , z = k −4/3 (1 + k Z ), x = k 2/3 X and y = k 2/3 Y , then (8.6.1) and (8.6.2) will
turn into

ds 2 = z −1/2 (−dt 2 + dz 2 ) + z(dx 2 + dy 2 ) , (8.6.1 )


−1/2
ds = −z
2
(−dt + dz ) + z(dx + dy ) .
2 2 2 2
(8.6.2 )
This indicates that in the case k = 0, each of (8.6.1) and (8.6.2) represents one metric rather
than a family of metrics. From this aspect, the Taub metric is very much different from the
Schwarzschild metric.
The study of plane symmetric solutions of the electrovac Einstein equation can be dated
back to 1926. However, the discovery of the general solution of this type started in the
1970s. Based on the work of Patnaik (1970), Letelier and Tabenski (1974) found the general
solution of a plane symmetric metric produced by a plane symmetric electromagnetic field
[see Stephani et al. (2003)]
1
ds 2 = Y (z)(−dt 2 + dz 2 ) + Y 2 (z)(dx 2 + dy 2 ) , (8.6.3)
2
where Y (z) ≡ dY/dz, and Y (z) is given implicitly by the following equation:

(Y − A)2 + 2 A2 ln(Y + A) = −C z , A, C are constants. (8.6.4)


The electromagnetic field Fab corresponding to (8.6.3) is a source-free nonnull electro-
magnetic field, whose coordinate components are (t, x, y, z are identified as x 0 , x 1 , x 2 , x 3 ,
respectively)
8.6 Plane Symmetric Metrics [Optional Reading] 359

C2 4π 2
F12 = C1 , F30 = Y Y −2 , A≡ (C1 + C22 ) , C1 , C2 are constants.
2 C
(8.6.5)
When Fab = 0, the metric (8.6.3) will be simplified to (8.6.1 ) [for (∇a Y )∇ a Y < 0] or
(8.6.2 ) [for (∇a Y )∇ a Y > 0].
The expression (8.6.3) represents the plane symmetric metric produced by a plane symmetric
electromagnetic field Fab . The so-called plane symmetric electromagnetic field refers to

Lξi Fab = 0 , i = 1, 2, 3 , (8.6.6)


where ξia represents the three Killing vector fields reflecting the plane symmetry, i.e.,

ξ1a ≡ (∂/∂ x)a , ξ2a ≡ (∂/∂ y)a , ξ3a ≡ −y(∂/∂ x)a + x(∂/∂ y)a . (8.6.7)
It is not difficult to verify that (Exercise 8.9) the Fab in (8.2.5) satisfies (8.6.6). However,
a plane symmetric metric can also be produced by a non-plane symmetric electromagnetic
field. An electromagnetic field with only translational symmetries but no rotational symmetry
[i.e., (8.6.6) only holds for i = 1, 2] is called a semi-plane symmetric electromagnetic
field (“2/3-plane symmetric” may be more appropriate). Some special solutions of a plane
symmetric metric produced by this kind of electromagnetic field are scattered in the literature.
Li and Liang (1985) found the general solutions of plane symmetric metrics produced by
semi-plane symmetric electromagnetic fields, and classified them into two types:

J (T + Z )
Type A ds 2 = ± √ (−dT 2 + dZ 2 ) + T (dX 2 + dY 2 ) , (8.6.8a)
T
J (T + Z )
Type B ds 2 = ± √ (−dT 2 + dZ 2 ) + (T + Z )(dX 2 + dY 2 ) , (8.6.8b)
T+Z

where J (T + Z ) is an arbitrary function satisfying J˙/J > 0 ( J˙ ≡ ∂ J/∂ T ).7 The electro-
magnetic field corresponding to (8.6.8a) and (8.6.8b) is a semi- (2/3-) plane symmetric
source-free null electromagnetic field. The general solutions (8.6.3) and (8.6.8) correspond
to a nonnull, plane symmetric and a null, semi-plane symmetric source-free electromagnetic
field, respectively. It is natural to ask: is there any plane symmetric metric produced by an
electromagnetic field (no matter what symmetry it has) other than (8.6.3) and (8.6.8)? Kuang
et al. (1987) proved that: ① The plane symmetric metrics produced by electromagnetic fields
only have three types, namely (8.6.3), (8.6.8a) and (8.6.8b) (and the line elements obtained
from them by coordinate transformations); ② The plane symmetric metric (8.6.3) cannot be
produced by an electromagnetic field with source; ③ Plane symmetric metrics (8.6.8a) and
(8.6.8b) can also be produced by electromagnetic fields with source, i.e., every metric of
type A or B can be interpreted as either being produced by a source-free electromagnetic
field or an electromagnetic field with source. [These two interpretations correspond to the
same energy-momentum tensor Tab , called a dual interpretation.8 ] Both of them are null
electromagnetic fields; the former is semi- (2/3-) plane symmetric (has only translational

7 The line elements given by two different functions J (T + Z ) based on either (8.6.8a) or (8.6.8b)
could differ only by a coordinate transformation (i.e., one can be obtained from the other via a
coordinate transformation). Such two line elements represent the same geometry, and thus such two
functions J (T + Z ) are said to be equivalent. To figure out all the different geometries described by
(8.6.8a) and (8.6.8b), one needs to find the criterion for determining whether two arbitrary functions
J (T + Z ) are equivalent. This necessary and sufficient criterion was found in Kuang et al. (1986).
8 The energy-momentum tensor T of the source of the electromagnetic field (dust) should also
ab
appear on the right-hand side of Einstein’s equation just like the energy-momentum tensor Tab of
the electromagnetic field, which makes the question very complicated. One simplified discussion
is to stipulate that Tab = 0, see Tariq and Tupper (1976) for its physical meaning.
360 8 Solving Einstein’s Equation

symmetries but no rotational symmetry), while the latter one, on the contrary, has only rota-
tional symmetry but no translational symmetry (i.e., Lξ3 Fab = 0, Lξ1 Fab = 0, Lξ2 Fab = 0),
which may also be called a semi-plane symmetric electromagnetic field (of another kind), or
more precisely a 1/3-plane symmetric electromagnetic field. With this, the plane symmetric
metrics produced by electromagnetic fields are finally exhausted.
The fact that a plane symmetric metric can be produced by a semi-plane symmetric elec-
tromagnetic field indicates that the symmetry of the electromagnetic field can be weaker
than the symmetry of the metric. It is natural to ask: can the symmetry of the metric be
weaker than the symmetry of the electromagnetic field? For example, does there exist a
semi-plane symmetric metric produced by a plane symmetric electromagnetic field? The
answer is affirmative: Li and Liang (1989) provided a specific example (a special solution).
We mention in passing that the three Killing fields reflecting the spherical symmetry are on an
equal footing, and there does not exist any spherically symmetric metric produced by a semi-
(2/3- or 1/3-) spherically symmetric electromagnetic field. A spherically symmetric metric
produced by an electromagnetic field can only be the RN metric, whose electromagnetic
field can only be a spherically symmetric, source-free nonnull electromagnetic field.

8.7 The Newman-Penrose (NP) Formalism [Optional


Reading]

Besides the coordinate basis method and the orthonormal tetrad method, there is also a third
commonly used method of computing curvature, that is the “null tetrad method” proposed
by Newman and Penrose (1962). This method can be viewed as a variant of the rigid tetrad
method: instead of using an orthonormal tetrad, here one uses a complex9 “null tetrad”.
Suppose p is a point of a 4-dimensional spacetime (M, gab ), and {(eμ )a } is an orthonormal
tetrad at p. Define 4 special vectors at p as follows:
1 1
m a := √ [(e1 )a − i(e2 )a ] , m̄ a := √ [(e1 )a + i(e2 )a ] ,
2 2
(8.7.1)
1 1
l a := √ [(e0 )a − (e3 )a ] , k a := √ [(e0 )a + (e3 )a ] ,
2 2

then gab m a m b = gab m̄ a m̄ b = gab l a l b = gab k a k b = 0, i.e., all 4 of them are null vectors.
Note that m a and m̄ a are both complex vectors conjugate to each other. To distinguish from
other tetrads, this text will use {(εμ )a } to represent a null tetrad, and stipulate the numbering
as [in agreement with Stephani et al. (2003)]

(ε1 )a ≡ m a , (ε2 )a ≡ m̄ a , (ε3 )a ≡ l a , (ε4 )a ≡ k a . (8.7.2)


The corresponding dual basis vectors are

(ε1 )a ≡ m̄ a , (ε2 )a ≡ m a , (ε3 )a ≡ −ka , (ε4 )a ≡ −la . (8.7.2 )

9 Change the R in Definition 2 of Sect. 2.2 to C, then a map v : F M → C is called a complex


vector, and thus the tangent space V p at p is generalized to an n-dimensional complex vector
space (the scalar multiplication uses complex numbers). Suppose u and w are real vectors at p
and v( f ) = u( f ) + iw( f ), ∀ f ∈ F M , then we say that v = u + iw, and call u and w the real and
imaginary parts of v, respectively. Similarly, it is not difficult to define a complex tensor as well as
its real and imaginary parts.
8.7 The Newman-Penrose (NP) Formalism [Optional Reading] 361

(εμ )a can be regarded as a special case of an arbitrary basis field (eμ )a , which we mentioned
at the beginning of Sect. 5.7; however, one should not confuse this with the (eμ )a in (8.7.1),
which only refers to an orthonormal tetrad. It is not difficult to see that the inner product of
any two basis vectors in a null tetrad has only the following two pairs of nonzero ones:

m a m̄ a ≡ gab m a m̄ b = g12 = g21 = 1 , l a ka ≡ gab l a k b = g34 = g43 = −1 ,

and thus the matrices constituted by the components gμν and g μν of the metric gab and its
inverse g ab are ⎡ ⎤
01 0 0
⎢1 0 0 0 ⎥
(gμν ) = ⎢ ⎥ μν
⎣ 0 0 0 −1 ⎦ = (g ) . (8.7.3)
0 0 −1 0
Just like in §5.7, the number indices μ of (εμ )a and (εμ )a can also be raised and lowered
using g μν and gμν . Applying (5.7.5) to a null tetrad yields

ωμ ν a = (εμ )c ∇a (εν )c , (8.7.4)


and the corresponding Ricci rotation coefficients are

ωμ ν ρ = (εμ )b (ερ )a ∇a (εν )b .

Equation (8.7.3) indicates that (εμ )a is a (complex) rigid tetrad, and hence we have ωμνa =
(εμ )b ∇a (εν )b and ωμνa = −ωνμa (i.e., ωμν = −ωνμ ), and for the corresponding Ricci
rotation coefficients

ωμνρ = (εμ )b (ερ )a ∇a (εν )b , ωμνρ = −ωνμρ . (8.7.5)


Since the numbering of the null tetrad indices are 1, 2, 3, 4 rather than 0, 1, 2, 3, the number
indices of the corresponding connection 1-forms are also changed to ω12 , ω13 , ω14 , ω23 ,
ω24 , ω34 . Note that ωμνρ corresponding to a null tetrad has a complex value, which obeys
the following proposition:

Proposition 8.7.1 If we exchange all the 1s and 2s in the subscripts of ωμνρ (and keep
all the 3s and 4s unchanged), we obtain its complex conjugate ω̄μνρ , e.g., ω134 = ω̄234 ,
ω342 = ω̄341 , ω421 = ω̄412 , ω122 = ω̄211 , ω344 = ω̄344 .

Proof It follows from (8.7.5) that ω̄μνρ = (ε̄μ )b (ε̄ρ )a ∇a (ε̄ν )b , and it is not difficult to prove
this proposition using this equation. For example,

ω̄412 = (ε̄4 )b (ε̄2 )a ∇a (ε̄1 )b = (ε4 )b (ε1 )a ∇a (ε2 )b = ω421 . 

Proposition 8.7.1 not only holds for ωμνρ , but also holds for all the quantities (including
tensors) that carry null tetrad indices, e.g., ω41 = ω̄42 , ω21 = ω̄12 , R31 = R̄32 , R12 = R̄21 ,
R34 = R̄34 .
The process of computing the curvature tensor using the null tetrad method is similar to that
using the orthonormal tetrad method; that is, one finds all the connection 1-forms ωμν of
the chosen null tetrad and then finds all the curvature 2-forms Rμν . The components ωμνρ
of the connection 1-forms can still be computed from (5.7.19) and (5.7.20), in which the
(eμ )a should now be interpreted as (εμ )a . After finding all the ωμν one can still use Cartan’s
second equation of structure to compute all the Rμν .

Proposition 8.7.2 In a null tetrad, Cartan’s second structure equation (5.7.8) reads
362 8 Solving Einstein’s Equation

R41 = dω41 + ω41 ∧ (ω21 + ω43 ) , (8.7.6a)


R32 = dω32 − ω32 ∧ (ω21 + ω43 ) , (8.7.6b)
R21 + R43 = d(ω21 + ω43 ) + 2ω32 ∧ ω41 . (8.7.6c)

Proof When we have a metric gab , Cartan’s second equation (5.7.8) can be written as

Rμν = dωμν + ωμ τ ∧ ωτ ν = dωμν + g λτ ωμλ ∧ ωτ ν ,

where g λτ are the components of g ab in the null tetrad. Noticing that the only nonzero g λτ
are g 12 = g 21 = 1 and g 34 = g 43 = −1, we can write down all 6 independent components
of Rμν as follows:

R43 = dω43 + ω41 ∧ ω23 + ω42 ∧ ω13 , (8.7.7a)


R42 = dω42 + ω42 ∧ (ω12 + ω43 ) , (8.7.7b)
R41 = dω41 + ω41 ∧ (ω21 + ω43 ) , (8.7.7c)
R32 = dω32 + ω32 ∧ (ω12 + ω34 ) , (8.7.7d)
R31 = dω31 + ω31 ∧ (ω21 + ω34 ) , (8.7.7e)
R21 = dω21 − ω23 ∧ ω41 − ω24 ∧ ω31 . (8.7.7f)
Considering Proposition 8.7.1, these 6 equalities are not all independent, since from R31 =
R̄32 and R42 = R̄41 we can derive (8.7.7e) and (8.7.7b) from (8.7.7d) and (8.7.7c). Moreover,
(8.7.7a) and (8.7.7f) can be written as

R43 = dω43 + ω32 ∧ ω41 + ω32 ∧ ω41 = dω43 + 2 Re(ω32 ∧ ω41 ) , (8.7.7a )
R21 = dω21 + ω32 ∧ ω41 − ω32 ∧ ω41 = dω21 + 2i Im(ω32 ∧ ω41 ) , (8.7.7f )
These two equations together are equivalent to (8.7.6c). Therefore, (8.7.7a)–(8.7.7f) are
equivalent to (8.7.6a)–(8.7.6c). 


The whole formalism introduced by Newman and Penrose based on the null tetrad method
is called the Newman-Penrose formalism, or NP formalism for short. The basic idea of
the NP formalism is to separate all kinds of sets of condensed equations [e.g., (8.7.6)] into
multiple component equations, which will certainly lead to the appearance of quantities with
many indices, such as ωμνρ , Rρσ μν , etc. For the sake of making the equations look simpler
(and other purposes), the NP formalism uses many notations which carry less or no indices
to represent these quantities with many indices. We will introduce all three kinds of them as
follows:
(1) Due to Proposition 8.7.1, only 12 out of the 24 linear combinations of the complex
ωμνρ are linearly independent. (Comparing with the fact that there are 24 linearly independent
real ωμνρ in a orthonormal tetrad, you will find this is quite natural). Use 12 Greek letters
without indices to represent 12 linearly independent combinations of ωμνρ as follows [(8.7.5)
is used]:

κ ≡ −ω144 = −m a k b ∇b ka , (8.7.8a)
ρ ≡ −ω142 = −m m̄ ∇b ka ,
a b
(8.7.8b)
σ ≡ −ω141 = −m a m b ∇b ka , (8.7.8c)
τ ≡ −ω143 = −m a l b ∇b ka , (8.7.8d)
ν ≡ ω233 = m̄ a l b ∇b la , (8.7.8e)
8.7 The Newman-Penrose (NP) Formalism [Optional Reading] 363

μ ≡ ω231 = m̄ a m b ∇b la , (8.7.8f)
λ ≡ ω232 = m̄ m̄ ∇b la ,
a b
(8.7.8g)
π ≡ ω234 = m̄ a k b ∇b la , (8.7.8h)
1 1
ε ≡ (ω214 − ω344 ) = (m̄ a k b ∇b m a − l a k b ∇b ka ) , (8.7.8i)
2 2
1 1
β ≡ (ω211 − ω341 ) = (m̄ a m b ∇b m a − l a m b ∇b ka ) , (8.7.8j)
2 2
1 1
γ ≡ (ω433 − ω123 ) = (k a l b ∇b la − m a l b ∇b m̄ a ) , (8.7.8k)
2 2
1 1
α ≡ (ω432 − ω122 ) = (k a m̄ b ∇b la − m a m̄ b ∇b m̄ a ) . (8.7.8l)
2 2
These 12 greek letters are called the spin coefficients.
Proposition 8.7.3 The 24 ωμνρ can be expressed in terms of the 12 spin coefficients as
follows:

ω121 = ᾱ − β , ω122 = β̄ − α , ω123 = γ̄ − γ , ω124 = ε̄ − ε ,


ω131 = λ̄ , ω132 = μ̄ , ω133 = ν̄ , ω134 = π̄ ,
ω141 = −σ , ω142 = −ρ , ω143 = −τ , ω144 = −κ ,
ω231 = μ , ω232 = λ , ω233 = ν , ω234 = π ,
ω241 = −ρ̄ , ω242 = −σ̄ , ω243 = −τ̄ , ω244 = −κ̄ ,
ω341 = −(ᾱ + β) , ω342 = −(α + β̄) , ω343 = −(γ + γ̄ ) , ω344 = −(ε + ε̄) .

Proof Only 8 out of the 24 equations above need to be checked (the others can be read
directly from the definition of the spin coefficients), the verification is as follows.
Firstly, since (ε3 )a and (ε4 )a are real vectors, ω343 and ω344 are real. Secondly, it follows
from ω213 = −ω123 = −ω̄213 that ω213 + ω̄213 = 0, and hence ω213 is imaginary. Similarly
we can see that ω214 is also imaginary. Also, ε ≡ 21 (ω214 − ω344 ) = 21 (ω434 + ω214 ), and
hence ω434 = 2Re(ε) = ε + ε̄, ω214 = 2iIm(ε) = ε − ε̄. Similarly, we have ω433 = γ + γ̄ ,
ω213 = γ − γ̄ . Furthermore, from the definitions of α and β we get β = − 21 (ω121 + ω341 ),
ᾱ = 21 (ω121 − ω341 ). Thus, ω341 = −(ᾱ + β), ω121 = ᾱ − β, from which we can easily get
ω122 = β̄ − α, ω342 = −(α + β̄). 


(2) Since the derivatives of spin coefficients along the 4 basis vectors appear frequently in
all kinds of equations, we introduce the following 4 notations for derivatives:

δ ≡ m a ∇a , δ̄ ≡ m̄ a ∇a ,  ≡ l a ∇a , D ≡ k a ∇a . (8.7.9)

(3) The components of the Riemann tensor Rabc d have 4 indices. We would like to denote
them using notations with less indices. Rabc d is determined by its “traceless part” (Weyl
tensor) Cabc d and “trace part” (Ricci tensor) Rab . Due to various symmetries, the Weyl
tensor has only 10 real independent components, which can be represented by 5 complex
quantities 0 , 1 , 2 , 3 , 4 defined as
1
0 := C4141 , 1 := C4341 , 2 := (C4343 − C4312 ) ,
2 (8.7.10)
3 := C3432 , 4 := C3232 ,

where Cμνρσ are the components of Cabcd in the null tetrad. The Ricci tensor Rab only has
10 real independent components due to the symmetry Rab = Rba . In the null tetrad, among
364 8 Solving Einstein’s Equation

the 10 independent components R44 , R43 , R42 , R41 , R33 , R32 , R31 , R22 , R21 , R11 , 6 are
complex and 4 are real. It is obvious that R44 , R43 , R33 are real, and R21 is also real since
R21 = R12 = R̄21 . In terms of linear combinations of these 4 real numbers, one can define
the following 4 real quantities:
1 1 1
00 := R44 , 11 := (R21 + R43 ) , 22 := R33 , R := 2(R21 − R43 ) .
2 4 2
(8.7.11a)
The fourth real quantity R is actually the scalar curvature [it is easy to show that the
scalar curvature indeed equals 2(R21 − R43 )]. In terms of the 6 complex components
R42 , R41 , R32 , R31 , R22 , R11 , one can define 6 complex quantities
1 1 1
01 := R41 , 10 := R42 , 02 := R11 ,
2 2 2 (8.7.11b)
1 1 1
20 := R22 , 12 := R31 , 21 := R32 .
2 2 2
The above 10 quantities excluding R can be arranged into a 3 × 3 “conjugate symmetric”
¯ τ λ , λ, τ = 0, 1, 2):
matrix [λτ ] (satisfying λτ = 

0 1 2
1 1 1
0 2 R44 2 R41 2 R11
1 1
2 R42
1
4 (R 21 + R43 ) 1
2 R31
1 1 1
2 2 R22 R
2 32 2 R33

The 3 independent off-diagonal elements together with the 3 real diagonal elements and the
real number R represent exactly the 10 real independent components of Rab .
The NP formalism contains 3 equation systems that are very useful, namely (A) the NP
equations; (B) the Bianchi identities; (C) the commutation relations. Here we introduce
them as follows.
(A) NP equations.
Expressing R41 , R32 , R21 , R43 in terms of 0 , 1 , 2 , 3 , 4 as well as the 10 quantities
00 , · · · , 22 and R, and expressing ω41 , ω32 , ω21 , ω43 in terms of the 12 spin coefficients,
one can reformulate (8.7.6) into the following 18 equations, called the NP equations:

Dρ − δ̄κ = (ρ 2 + σ σ̄ ) + ρ(ε + ε̄) − κ̄τ − κ(3α + β̄ − π ) + 00 , (8.7.12a)


Dσ − δκ = σ (ρ + ρ̄) + σ (3ε − ε̄) − κ(τ − π̄ + ᾱ + 3β) + 0 , (8.7.12b)
Dτ − κ = ρ(τ + π̄ ) + σ (τ̄ + π ) + τ (ε − ε̄) − κ(3γ + γ̄ ) + 1 + 01 , (8.7.12c)
Dα − δ̄ε = α(ρ + ε̄ − 2ε) + β σ̄ − β̄ε − κλ − κ̄γ + π(ε + ρ) + 10 , (8.7.12d)
Dβ − δε = σ (α + π ) + β(ρ̄ − ε̄) − κ(μ + γ ) − ε(ᾱ − π̄) + 1 , (8.7.12e)
Dγ − ε = α(τ + π̄ ) + β(τ̄ + π ) − γ (ε + ε̄) − ε(γ + γ̄ ) + τ π − νκ
+ 2 + 11 − R/24 , (8.7.12f)
Dλ − δ̄π = (ρλ + σ̄ μ) + π 2 + π(α − β̄) − ν κ̄ − λ(3ε − ε̄) + 20 , (8.7.12g)
Dμ − δπ = (ρ̄μ + σ λ) + π π̄ − μ(ε + ε̄) − π(ᾱ − β) − νκ + 2 + R/12 ,
(8.7.12h)
Dν − π = μ(π + τ̄ ) + λ(π̄ + τ ) + π(γ − γ̄ ) − ν(3ε + ε̄) + ψ3 + 21 , (8.7.12i)
λ − δ̄ν = −λ(μ + μ̄) − λ(3γ − γ̄ ) + ν(3α + β̄ + π − τ̄ ) − 4 , (8.7.12j)
δρ − δ̄σ = ρ(ᾱ + β) − σ (3α − β̄) + τ (ρ − ρ̄) + κ(μ − μ̄) − 1 + 01 , (8.7.12k)
δα − δ̄β = (μρ − λσ ) + α ᾱ + β β̄ − 2αβ + γ (ρ − ρ̄) + ε(μ − μ̄)
− 2 + 11 + R/24 , (8.7.12l)
8.7 The Newman-Penrose (NP) Formalism [Optional Reading] 365

δλ − δ̄μ = ν(ρ − ρ̄) + π(μ − μ̄) + μ(α + β̄) + λ(ᾱ − 3β) − 3 + 21 , (8.7.12m)
δν − μ = (μ2 + λλ̄) + μ(γ + γ̄ ) − ν̄π + ν(τ − 3β − ᾱ) + 22 , (8.7.12n)
δγ − β = γ (τ − ᾱ − β) + μτ − σ ν − εν̄ − β(γ − γ̄ − μ) + α λ̄ + 12 , (8.7.12o)
δτ − σ = (μσ + λ̄ρ) + τ (τ + β − ᾱ) − σ (3γ − γ̄ ) − κ ν̄ + 02 , (8.7.12p)
ρ − δ̄τ = −(ρ μ̄ + σ λ) + τ (β̄ − α − τ̄ ) + ρ(γ + γ̄ ) + νκ − 2 − R/12 ,
(8.7.12q)
α − δ̄γ = ν(ρ + ε) − λ(τ + β) + α(γ̄ − μ̄) + γ (β̄ − τ̄ ) − 3 . (8.7.12r)

Remark 1 Cartan’s second equations (8.7.6) contain 3 equations of complex antisymmetric


tensors of type (0, 2), each of which is equivalent to 6 complex component equations, and
thus there are 18 complex NP equations altogether.

Now we will illustrate the verification of the NP equations by some examples. First take
(8.7.12a) as an example, it is in fact a reformulation of the fourth and second components
of (8.7.6a). In the null tetrad, the components R4241 of Rabcd can be expressed as

R4241 = (ε4 )a (ε2 )b Rab41 = (ε4 )a (ε2 )b [(dω41 )ab + ω41a ∧ (ω21b + ω43b )] ,

where (8.7.6a) is used in the second step. Since (ω41 )b = σ (ε1 )b + ρ(ε2 )b + τ (ε3 )b +
κ(ε4 )b , we have

(ε4 )a (ε2 )b (dω41 )ab = (ε4 )a (ε2 )b (∇a ω41b − ∇b ω41a )


= −σ σ̄ + ρ(ε − ε̄) + Dρ − ρ 2 + κ̄τ − κπ + κ(α + β̄) − δ̄κ .

The last step is tedious but not difficult, which is left as an exercise. The operation of lowering
the index of ωμ ν ρ occurs a lot in the derivation, which relies on the expression (8.7.3) for
the components g νσ of g ab in the null tetrad. Since the matrix in (8.7.3) is quite simple, it is
pretty easy to do the calculation. For instance,

ω4 1 2 = g 1μ ω4μ2 = g 12 ω422 = ω422 .

Moreover,

(ε4 )a (ε2 )b [ω41a ∧ (ω21b + ω43b )] = κ(ω212 + ω432 ) − ρ(ω214 + ω434 ) = 2κα − 2ρε ,

and hence

R4241 = (Dρ − δ̄κ) − (ρ 2 + σ σ̄ ) − ρ(ε + ε̄) + κ̄τ + κ(2α + β̄ − π ) . (8.7.13)


On the other hand, it follows from the definition of 00 and Rμν = Rμσ ν σ that

1 1 1
00 ≡ R44 = R4μ4 μ = (R414 1 + R424 2 + R434 3 )
2 2 2
1
= (R4142 + R4241 − R4344 ) = R4241 . (8.7.14)
2
Comparing (8.7.13) and (8.7.14) yields (8.7.12a). Thus, (8.7.12a) is nothing but a compo-
nent equation of (8.7.6a). This might be unapparent for the beginning readers to see since
00 , which represents the curvature component, is written on the right-hand side of the
equation. Now we introduce the derivation of a more complicated equation (8.7.12f). This
is a reformulation of the 4th and 3rd component equations of (8.7.6c). First,
366 8 Solving Einstein’s Equation

R4321 + R4343 = (ε4 )a (ε3 )b (Rab21 + Rab43 ) = (ε4 )a (ε3 )b [(dω21 )ab + (dω43 )ab + 2ω32a ∧ ω41b ] ,

where (8.7.6c) is used in the second equality. Through a tedious but straightforward com-
putation we get

R4321 + R4343 = 2[(Dγ − ε) − α(τ + π̄) − β(τ̄ + π ) + γ (ε + ε̄) + ε(γ + γ̄ ) − τ π + νκ] .
(8.7.15)
On the other hand, from the definition (8.7.10) we know that 2 = (C4343 − C4312 )/2.
Applying the definition of the Weyl tensor [Equation (3.4.14)] to the n = 4 case yields
1 1
Cabcd = Rabcd − [(gac Rdb − gad Rcb ) − (gbc Rda − gbd Rca )] + R(gac gdb − gad gcb ) .
2 6
Noticing (8.7.3), we have C4343 = R4343 − R34 − R/6, C4312 = R4312 , and hence
1 1 1
2 = (R4343 − R4312 ) − R34 − R. (8.7.16)
2 2 12
It follows from (8.7.10) that 11 = (R12 + R43 )/4 and R = 2(R12 − R34 ), and hence

2(2 + 11 − R/24) = R4343 − R4312 = R4321 + R4343 . (8.7.17)


From (8.7.17) and (8.7.15) we arrive at (8.7.12f).
In a word, the 18 NP equations are nothing but the manifestation of Cartan’s second equation
of structure in the null tetrad. One of their features is that the summations in the condensed
expression (8.7.6) are listed one by one, which is convenient for practical computation.
Although the set of NP equations contains a lot of equations, each of them involves only first
order derivatives, and thus they are not so difficult to solve. By dint of the gauge freedom
of choosing the null tetrad [there are 6 real parameters to choose, see Stephani et al. (2003)
p. 33] one can even further simplify the NP equations.
(B) Bianchi identities.
From the definition of the Riemann tensor Rabc d , we have already proved in Chap. 3 that
it satisfies the Bianchi identity ∇[a Rbc]d e = 0. For the convenience of application, one can
formulate it into components equations by means of the NP null tetrad, see Stephani et al.
(2003) pp. 81–82.
(C) Commutation relations.
To compute the Riemann tensor we need to choose a basis field {(eμ )a } first. If we choose
the coordinate basis, then any two basis vector fields must commute with each other, i.e.,
[∂/∂ x μ , ∂/∂ x ν ]a = 0. However, it is not as simple for a non-coordinate basis. The commu-
tator of two arbitrary basis vector fields (eμ )a and (eν )a in the basis {(eμ )a } can be expressed
using (3.1.13) as
[eμ , eν ]a = (eμ )b ∇b (eν )a − (eν )b ∇b (eμ )a , (8.7.18)
where ∇a is an arbitrary torsion-free derivative operator. Choose the derivative operator (con-
nection) we assigned when computing the Riemann tensor as the ∇a in the above equation,
then it follows from (5.7.1) that

[eμ , eν ]a = −2γ σ [μν] (eσ )a , (8.7.19)


where γ σ μν are the connection coefficients defined by (5.7.1), whose relation with the
connection 1-form ωμ ν a is given by (5.7.4). Equation (8.7.19) is exactly the commutation
relation when computing the Riemann tensor using the tetrad method. Now we discuss its
specific expression in the NP formalism. By means of the components g μν of the metric
(inverse) in the null tetrad one can rewrite (5.7.4) as
8.8 Solving the Einstein-Maxwell Equations Using the NP … 367

− γ σ μν = g σβ ωμβν , (8.7.20)
and hence (8.7.19) in the null tetrad becomes

[εμ , εν ]a = g σβ (ωμβν − ωνβμ )(εσ )a . (8.7.21)


Take the μν to be 34, 14, 13, 21, respectively, then the equation above turns into the following
4 commutation relations specifically when applied to a real function (if one also takes μν to
be 24 and 23, the results will be the complex conjugates of the results when taking 14 and
13, and thus are not independent):

D − D = (γ + γ̄ )D + (ε + ε̄) − (τ + π̄ )δ̄ − (τ̄ + π )δ , (8.7.22a)


δD − Dδ = (ᾱ + β − π̄)D + κ − σ δ̄ − (ρ̄ + ε − ε̄)δ , (8.7.22b)
δ − δ = −ν̄D + (τ − ᾱ − β) + λ̄δ̄ + (μ − γ + γ̄ )δ , (8.7.22c)
δ̄δ − δ δ̄ = (μ̄ − μ)D + (ρ̄ − ρ) − (ᾱ − β)δ̄ + (β̄ − α)δ . (8.7.22d)
When acting on a real function f , (8.7.22a) gives a real equation, (8.7.22d) gives an imaginary
equation, and each of (8.7.22b) and (8.7.22c) gives a complex equation; hence, (8.7.22) is
equivalent to 6 real equations. To check (8.7.22a), one only has to show that both sides of it
acting on any (complex) scalar field f give the same scalar field. It follows from (8.7.9) that

(D − D) f = (l b ∇b k a − k b ∇b l a )∇a f = [l, k]a ∇a f


= [ε3 , ε4 ]a ∇a f = g σβ (ω3β4 − ω4β3 )(εσ )a ∇a f
= [g 12 (ω324 − ω423 )(ε1 )a + g 21 (ω314 − ω413 )(ε2 )a
+ g 34 (ω344 − ω443 )(ε3 )a + g 43 (ω334 − ω433 )(ε4 )a ]∇a f
= {(−π − τ̄ )m a + (−π̄ − τ )m̄ a
− [−(ε + ε̄) − 0]l a − [0 − (γ + γ̄ )]k a }∇a f
= (γ + γ̄ )D f + (ε + ε̄) f − (τ + π̄ )δ̄ f − (τ̄ + π )δ f ,

and hence we obtain (8.7.22a). The other 3 equations can be verified in a similar manner.
In order to help the readers to better understand the method of solving Einstein’s equation
using the NP formalism, this text will provide two specific examples in Sect. 8.8.2 and
Optional Reading 8.9.1.

8.8 Solving the Einstein-Maxwell Equations Using the NP


Formalism [Optional Reading]

8.8.1 Maxwell’s Equations and Einstein’s Equation in the


NP Formalism

Due to the antisymmetry, the electromagnetic tensor Fab has at most 6 inde-
pendent complex components in the null tetrad, which may be chosen as
F43 , F42 , F41 , F32 , F31 , F21 . Moreover, they also satisfy the following relations:
F43 = F̄43 , F42 = F̄41 , F32 = F̄31 , F21 = −F12 = − F̄21 ,
368 8 Solving Einstein’s Equation

and thus among all 6 of them, F43 and F21 are respectively real and imaginary (their
sum is complex), and the other 4 are equivalent to two independent complex quantities
(we may take F41 and F23 ). Therefore, they are represented by 3 complex quantities
0 , 1 and 2 , defined as

0 := F41 = Fab k a m b , (8.8.1a)


1 1
1 := (F43 + F21 ) = Fab (k a l b + m̄ a m b ) , (8.8.1b)
2 2
2 := F23 = Fab m̄ a l b . (8.8.1c)

The source-free Maxwell equations

∇ a Fab = 0 , (8.8.2a)
∇[a Fbc] = 0 (8.8.2b)
have the following form in the NP formalism:

D1 − δ̄0 = (π − 2α)0 + 2ρ1 − κ2 , (8.8.3a)


D2 − δ̄1 = −λ0 + 2π 1 + (ρ − 2ε)2 , (8.8.3b)
δ1 − 0 = (μ − 2γ )0 + 2τ 1 − σ 2 , (8.8.3c)
δ2 − 1 = −ν0 + 2μ1 + (τ − 2β)2 . (8.8.3d)

As an example, here we only pervide the verification of (8.8.3a) as follows:

2D1 = k c ∇c [Fab (k a l b + m̄ a m b )] = Fab k a k c ∇c l b + Fab l b k c ∇c k a + k a l b k c ∇c Fab


+ Fab m̄ a k c ∇c m b + Fab m b k c ∇c m̄ a + m̄ a m b k c ∇c Fab . (8.8.4)

The first and second terms on the right-hand side of the equation above are respec-
tively

Fab k a k c ∇c l b = F4ν (εν )b (ε4 )c ∇c (ε3 )b = F4ν g νμ ωμ34


¯ 0 + F43 ω344 ,
= F41 g 12 ω234 + F42 g 21 ω134 + F43 g 34 ω434 = π 0 + π̄ 
¯ 2 − κ2 + F43 ω344 ,
Fab l b k c ∇c k a = −κ̄ 

and hence the sum of the first and second terms on the right-hand side of (8.8.4) is
¯ 0 − κ2 − κ̄ 
π 0 + π̄  ¯ 2 . Similarly, the sum of the fourth and fifth terms on the
right-hand side of (8.8.4) is −κ2 + κ̄  ¯ 2 − π̄ 
¯ 0 + π 0 , and therefore,

2D1 = 2(π 0 − κ2 ) + k a l b k c ∇c Fab + m̄ a m b k c ∇c Fab .

In a similar manner one can also obtain that

δ̄0 = 2(α0 − ρ1 ) + k a m b m̄ c ∇c Fab .


8.8 Solving the Einstein-Maxwell Equations Using the NP … 369

Hence,

1
D1 − δ̄0 = (π − 2α)0 + 2ρ1 − κ2 + (k a l b k c + m̄ a m b k c − 2k a m b m̄ c )∇c Fab .
2
(8.8.5)
Let G ≡ (k a l b k c + m̄ a m b k c − 2k a m b m̄ c )∇c Fab , then to verify (8.8.3a) one only has
to show that G = 0. Maxwell’s equations is certainly involved in verifying this. From
(8.7.3) we can see that

g ac = m a m̄ c + m̄ a m c − l a k c − k a l c , (8.8.6)

and hence Maxwell’s equation ∇ a Fab = 0 can be written as (m a m̄ c + m̄ a m c −


l a k c − k a l c )∇c Fab = 0. Contracting this with k b yields

0 = [m a k b m̄ c + m̄ a k b m c − (l a k b k c + k a k b l c )]∇c Fab
= [m a k b m̄ c − (m a m̄ b k c + k a m b m̄ c ) + k a l b k c ]∇c Fab
= [−m b k a m̄ c − (−m b m̄ a k c + k a m b m̄ c ) + k a l b k c ]∇c Fab = G ,

where the second equality is because ∇[c Fab] = 0 leads to m̄ [a k b m c] ∇c Fab = 0 and
l [a k b k c] ∇c Fab = 0, and the third equality comes from the fact that Fab = −Fba . The
other equations in (8.8.3) can be verified similarly.
It follows from (7.2.6) that (Exercise 8.11)

1 1 1
T11 = ¯2,
0  T12 = T21 = ¯1,
1  T13 = T31 = ¯ 2 1 ,

2π 2π 2π
1 1 1
T14 = T41 = 0 ¯1, T22 = ¯0,
2  T23 = T32 = 2 ¯1, (8.8.7)
2π 2π 2π
1 1 1 1
T24 = T42 = ¯ 0 1 ,
 T33 = ¯2,
2  T34 = T43 = 1 ¯1, T44 = ¯0.
0 
2π 2π 2π 2π

Then, from (8.7.11a), (8.7.11b) and the component form of Einstein’s equation
Rμν = 8π Tμν we obtain the following succinct relations between 00 , · · · , 22
which represent the curvature tensor and 0 , 1 , 2 which represent the electro-
magnetic field tensor:

¯0,
00 = 20  ¯1,
01 = 20  ¯2,
02 = 20 
(8.8.8)
¯1,
11 = 21  ¯2,
12 = 21  ¯2.
22 = 22 

This is how Einstein’s equation in an electrovac spacetime is expressed in the NP


formalism, which can be formulated into the following algebraic equations:

¯τ ,
λτ = 2λ  λ, τ = 0, 1, 2 . (8.8.9)

In Sect. 8.4.1 we introduced a complex quantity  to define a null electromagnetic


field. It is not difficult to show that (Exercise 8.11) ab  ab can be expressed in terms
of the electromagnetic field components 0 , 1 , 2 in a null tetrad as follows:
370 8 Solving Einstein’s Equation

ab  ab = 16(0 2 − 21 ) . (8.8.10)

Hence, the null condition for an electromagnetic field can also be expressed equiva-
lently as
0 2 − 21 = 0 . (8.8.11)

8.8.2 An Example of Solving the Einstein-Maxwell


Equations Under the Axisymmetric Condition

In this subsection, we will introduce the detailed process of solving the Einstein-
Maxwell equations using the Newman-Penrose formalism by a specific example
[see Liang (1995)]. Suppose the metric to be found has the following line element
expression in a coordinate system {t, z, ϕ, ρ}:

ds 2 = eξ (−dt 2 + dρ 2 ) + eη dz 2 + eη+χ dϕ 2 , (8.8.12)

where ξ , η and χ are undetermined functions of t and ρ which are independent of z


and ϕ. One can readily see from the equation above that (∂/∂z)a and (∂/∂ϕ)a are two
commuting Killing vector fields. Suppose the integral curves of (∂/∂ϕ)a are closed,
then (8.8.12) represents a cylindrically symmetric metric, see Sect. 8.5.
Let v = t + ρ, u = t − ρ, then (8.8.12) becomes

ds 2 = −eξ dudv + eη dz 2 + eη+χ dϕ 2 , (8.8.13)

where ξ , η and χ should be regarded as functions of the new coordinates u and v.


Normalizing the orthogonal coordinate basis fields

{(∂/∂t)a , (∂/∂ρ)a , (∂/∂z)a , (∂/∂ϕ)a }

one obtains the orthonormal tetrad fields

(e0 )a = e−ξ/2 (∂/∂t)a , (e3 )a = e−ξ/2 (∂/∂ρ)a ,


(8.8.14)
(e1 )a = e−η/2 (∂/∂z)a , (e2 )a = e−(η+χ)/2 (∂/∂ϕ)a .

By means of (8.7.1), starting from the above orthonormal tetrad fields one can con-
veniently construct the following null tetrad fields

1
m a = √ [e−η/2 (∂/∂z)a − ie−(η+χ)/2 (∂/∂ϕ)a ] , (8.8.15a)
2
1
m̄ a = √ [e−η/2 (∂/∂z)a + ie−(η+χ)/2 (∂/∂ϕ)a ] , (8.8.15b)
2
8.8 Solving the Einstein-Maxwell Equations Using the NP … 371

1 √
l a = √ e−ξ/2 [(∂/∂t)a − (∂/∂ρ)a ] = 2e−ξ/2 (∂/∂u)a , (8.8.15c)
2
1 √
k a = √ e−ξ/2 [(∂/∂t)a + (∂/∂ρ)a ] = 2e−ξ/2 (∂/∂v)a . (8.8.15d)
2

After computing all the ωρμν using (5.7.19) [in which the (eμ )a should be interpreted
as (εμ )a ] and (5.7.20) or any other method, one can find all (12) complex spin
coefficients from (8.7.8) as follows:

κ = τ = ν = π = β = α = 0, (8.8.16a)
√  
2 −ξ/2 ∂η ∂χ
ρ=− e 2 + , (8.8.16b)
4 ∂v ∂v
√  
2 −ξ/2 ∂η ∂χ
μ= e 2 + , (8.8.16c)
4 ∂u ∂u

2 −ξ/2 ∂ξ
ε= e , (8.8.16d)
4 ∂v

2 −ξ/2 ∂χ
σ = e , (8.8.16e)
4 ∂v

2 −ξ/2 ∂χ
λ=− e , (8.8.16f)
4 ∂u

2 −ξ/2 ∂ξ
γ =− e . (8.8.16g)
4 ∂u
When solving the Einstein-Maxwell equations, we have already assumed that there
is only an electromagnetic field but no matter fields (“electrovacuum”). The trace-
lessness of the energy-momentum tensor Tab of the electromagnetic field leads to the
fact that the scalar curvature R vanishes. Noticing (8.8.16a), we can see that the NP
equations take the following form:

Dρ = ρ(ρ + 2ε) + σ 2 + 00 , (8.8.17a)


Dσ = 2σ (ρ + ε) + 0 , (8.8.17b)
0 = 1 + 01 , (8.8.17c)
0 = 10 , (8.8.17d)
0 = 1 , (8.8.17e)
Dγ − ε = −4εγ + 2 + 11 , (8.8.17f)
Dλ = λ(ρ − 2ε) + σ μ + 20 , (8.8.17g)
Dμ = μ(ρ − 2ε) + σ λ + 2 , (8.8.17h)
0 = 3 + 21 , (8.8.17i)
λ = −2λ(μ + γ ) − 4 , (8.8.17j)
0 = −1 + 01 , (8.8.17k)
372 8 Solving Einstein’s Equation

0 = μρ − λσ − 2 + 11 , (8.8.17l)
0 = −3 + 21 , (8.8.17m)
−μ = μ(μ + 2γ ) + λ + 22 ,
2
(8.8.17n)
0 = 12 , (8.8.17o)
−σ = σ (μ − 2γ ) + λρ + 02 , (8.8.17p)
ρ = ρ(2γ − μ) − σ λ − 2 , (8.8.17q)
0 = −3 . (8.8.17r)

Our discussion is limited only to the case of a source-free electromagnetic field, and
hence when (8.8.16a) holds Maxwell’s equations will take the following form:

D1 − δ̄0 = 2ρ1 , (8.8.18a)


D2 − δ̄1 = −λ0 + (ρ − 2ε)2 , (8.8.18b)
δ1 − 0 = (μ − 2γ )0 − σ 2 , (8.8.18c)
δ2 − 1 = 2μ1 . (8.8.18d)

From Einstein’s equations (8.8.9) we can see that (8.8.17d) and (8.8.17o) will lead
to 1 = 0 or 0 = 2 = 0. It follows from the null condition 0 2 − 21 = 0 that
an electromagnetic field with 0 = 2 = 0 can only be a nonnull electromagnetic
field, while an electromagnetic field with 1 = 0 can be either null or nonnull. Here
we only discuss nonnull electromagnetic fields with 1 = 0; that is, we only seek
for the solutions of nonnull electromagnetic fields with 1 = 0 (which must have
0 = 0 and 2 = 0). In this case, Maxwell’s equations (8.8.18) will be simplified
to

δ̄0 = 0 , (8.8.19a)
D2 = −λ0 + (ρ − 2ε)2 , (8.8.19b)
−0 = (μ − 2γ )0 − σ 2 , (8.8.19c)
δ2 = 0 . (8.8.19d)

By solving the Einstein-Maxwell equations, we mean finding the expression of the


metric functions ξ(t, ρ), η(t, ρ), χ (t, ρ) as well as the electromagnetic field functions
0 and 2 which satisfy these equations. They appear in the following 3 systems
of equations (which are coupled with each other): ① Maxwell’s equations (8.8.19);
② Einstein’s equations λτ = 2λ  ¯ τ (λ, τ = 0, 1, 2); ③ the NP equations (8.8.17).
The solving process is as follows.
Equation (8.8.19d) will lead to

∂2 ∂2
− ie−χ/2 = 0. (8.8.20)
∂z ∂ϕ
8.8 Solving the Einstein-Maxwell Equations Using the NP … 373

However, one cannot yet say that ∂2 /∂z = ∂2 /∂ϕ = 0 since 2 is a complex-
valued function. Suppose 2 = Ceiθ , where C and θ are real-valued functions, then

¯ 2 = 2C 2 .
22 = 22  (8.8.21)

Since μ, γ , λ are all independent of z and ϕ, (8.8.17n) indicates that C is independent


of z and ϕ, and hence (8.8.20) gives
 
∂ ∂
− ie−χ/2 eiθ = 0 .
∂z ∂ϕ

Thus,
∂θ ∂θ
= = 0,
∂z ∂ϕ

i.e., 2 is indeed independent of z and ϕ. Similarly, it follows from (8.8.19a),


(8.8.17a) and Einstein’s equation 00 = 20  ¯ 0 that 0 is also independent of z
and ϕ. On the other hand, (8.8.19b) and (8.8.19c) can be expressed as
 
∂2 ∂ξ ∂η ∂χ ∂χ
−4 = 2 +2 + 2 − 0 , (8.8.22)
∂v ∂v ∂v ∂v ∂u
 
∂0 ∂ξ ∂η ∂χ ∂χ
−4 = 2 +2 + 0 − 2 . (8.8.23)
∂u ∂u ∂u ∂u ∂v

To make the solving process more tractable, we will only discuss the case where
∂χ /∂u = 0. As long as we have a solution under this condition, we will obtain an
exact solution. Of course, we cannot assure beforehand that there must be a solution
in this case, and so this is a tentative approach. Now we only have to care about the
case ∂χ /∂v = 0. This is because ∂χ /∂u = ∂χ /∂v = 0 will make the line element
(8.8.13) locally the same as a plane symmetric metric, and the plane symmetric
metrics generated by “semi-plane symmetric” (which locally looks like cylindrically
symmetric) electromagnetic fields have been exhausted by Li and Liang (1985). The
condition ∂χ /∂u = 0 brings us many simplifications, for instance it leads to λ = 0,
also one can now integrate (8.8.22) and get

2 (u, v) = a(u)e−(2ξ +2η+χ)/4 , (8.8.24)

where a(u) is an arbitrary complex-valued function of u, and a(u) = 0. (Otherwise


2 = 0, which contradicts the premise). Hence, it follows from Einstein’s equation
22 = 22 ¯ 2 that
22 (u, v) = 2|a(u)|2 e−(2ξ +2η+χ)/2 . (8.8.25)

There is now only one unsolved Maxwell equation remaining, namely (8.8.23), which
can be simplified as
374 8 Solving Einstein’s Equation
 
∂0 ∂ξ ∂η
−4 =2 + 0 − χ a(u)e−(2ξ +2η+χ)/4 , (8.8.26)
∂u ∂u ∂u

where the ’ represents the derivative of a function of one variable (for the above equa-
tion it is χ ≡ dχ /dv). The condition ∂χ /∂u = 0 also simplifies the NP equations,
for instance (8.8.17g) now becomes

1 −ξ ∂η
− 20 = e χ , (8.8.27)
4 ∂u

which says that 20 is a real number, and thus 02 = 20 . Noticing that a(u) = 0
(otherwise the electromagnetic field vanishes), by combining (8.8.27) with (8.8.9)
and (8.8.24) we get

1 ∂η (−2ξ +2η+χ)/4
0 (u, v) = − χ e . (8.8.28)
8ā(u) ∂u

Taking the derivative of the above equation with respect to u and plugging into
(8.8.26) we obtain
  2 
∂η ∂ 2 η
−1 ∂η
− 2|a| = −ā ā
2
+ 2+ eη+χ/2 . (8.8.29)
∂u ∂u ∂u

Now we look back at the NP equations (8.8.17). Equation (g) has been used. By means
of (8.8.16) and (8.8.27) it is not difficult to verify that (p) is automatically satisfied.
The assumption that 1 = 0 leads to 01 = 10 = 12 = 21 = 11 = 0, and so
(d) and (o) become identities; also, (c) becomes equivalent to (k) and (e), which states
nothing but the fact that the Weyl tensor of the spacetime has its component

1 = 0 . (8.8.30)

Similarly, (i), (m) and (r) being equivalent gives

3 = 0 . (8.8.31)

In addition, λ = 0 simplifies (j) and (l) a lot and gives

4 = 0 , (8.8.32)
2 = μρ . (8.8.33)

If we leave (l) [i.e., (8.8.33)] and (b) to the end to determine 2 and 0 (no need
to solve), then the NP equations (8.8.17) has only 5 unsolved equations remaining,
namely (a), (f), (h), (n) and (q). Noticing 11 = 0, λ = 0 and (8.8.33), we see that
these 5 equations take the following form:
8.8 Solving the Einstein-Maxwell Equations Using the NP … 375

Dρ = ρ(ρ + 2ε) + σ 2 + 00 , (8.8.34)


Dγ − ε = −4εγ + μρ , (8.8.35)
Dμ = 2μ(ρ − ε) , (8.8.36)
−μ = μ(μ + 2γ ) + 22 , (8.8.37)
ρ = 2ρ(γ − μ) . (8.8.38)

Equations (8.8.36) and (8.8.38) are both equivalent to


 
∂ 2η ∂η ∂η 1
=− + χ ,
∂u∂v ∂u ∂v 2

integrating this yields

1
η(u, v) = − χ + ln[g(v) − f (u)] , (8.8.39)
2
where g(v) and f (u) are arbitrary functions. Hence, (8.8.35) becomes

∂ 2ξ 1
= − (g − f )−2 f g ,
∂u∂v 2
integrating this yields

1
ξ(u, v) = − ln(g − f ) + F(u) + G(v) , (8.8.40)
2
where F(u) and G(v) are arbitrary functions. Plugging (8.8.39) and (8.8.40) into
(8.8.13) yields

ds 2 = −(g − f )−1/2 e F+G dudv + (g − f )(e−χ/2 dz 2 + eχ/2 dϕ 2 ) . (8.8.41)

Define new coordinates ũ and ṽ as follows: dũ = e F(u) du, dṽ = eG(u) dv, then

ds 2 = −(g − f )−1/2 dũdṽ + (g − f )(e−χ/2 dz 2 + eχ/2 dϕ 2 ) . (8.8.42)

If we take F(u) = G(v) = 0, then it follows from (8.8.41) that

ds 2 = −(g − f )−1/2 dudv + (g − f )(e−χ/2 dz 2 + eχ/2 dϕ 2 ) . (8.8.42 )

Equations (8.8.42) and (8.8.42 ) represent the same line element (the only difference
is the coordinate notations u and v are changed to ũ and ṽ, which is not essential),
and thus when taking F(u) = G(v) = 0 we do not lose any solution. Henceforth we
will take this choice, i.e., take (8.8.42 ) as the line element.
376 8 Solving Einstein’s Equation

Now, 3 unsolved equations remain, namely (8.8.29), (8.8.34) and (8.8.37), and
the undetermined functions are g(v), f (u), χ (v) and a(u). Equation (8.8.37) is
equivalent to
 2
∂ 2 η ∂ξ ∂η 1 ∂η
− + + 2|a|2 e−(η+χ/2) = 0 . (8.8.43)
∂u 2 ∂u ∂u 2 ∂u

By means of (8.8.39) and (8.8.40) (where F = G = 0) one can rewrite the equation
above as
f = 2|a(u)|2 . (8.8.44)

Plugging (8.8.38) into (8.8.29) yields

ā −1 ā f = f − 2|a|2 = 0 , (8.8.45)

where (8.8.44) is used in the second equality. The equation above indicates that
either a = 0 or f = 0; however, from (8.8.44) we know that the latter leads to
a = 0, which is not allowed, and hence we have only a = 0, i.e., a = constant.
Thus, integrating (8.8.44) yields

f = 2 A2 u + c1 , f = A2 u 2 + c1 u + c2 , (8.8.46)

where A ≡ |a|, and c1 , c2 are real constants of integration.


Now let us consider the last unsolved Maxwell equation, namely (8.8.34). It
follows from (8.8.28), (8.8.29), (8.8.40) and (8.8.9) that

00 = (32|a|2 )−1 (g − f )−1/2 χ 2 f 2 . (8.8.47)

Plugging this into (8.8.34), by a brief calculation we can see that (8.8.34) is equivalent
to
8g (v)χ −2 (v) + g(v) = f (u) − (4|a|2 )−1 f 2 (u) . (8.8.48)

In the above equation, the left-hand side is not a function of u and the right-hand side
is not a function of v, and thus both sides are equal to a constant, denoted by K , i.e.,

8g (v)χ −2 (v) + g(v) = K , (8.8.49)


2 −1
f (u) − (4 A ) f (u) = K .
2
(8.8.50)

Plugging (8.8.46) into (8.8.50) yields

K = c2 − (4 A2 )−1 c12 . (8.8.51)


8.8 Solving the Einstein-Maxwell Equations Using the NP … 377

Therefore, the line element (8.8.42 ) can be expressed as

ds 2 = −[g(v) − A2 u 2 − c1 u − c2 ]−1/2 dudv


+ [g(v) − A2 u 2 − c1 u − c2 ](e−χ(v)/2 dz 2 + eχ(v)/2 dϕ 2 ) , (8.8.52)

where A, c1 and c2 are arbitrary constants, the functions g(v) and χ (v) are quite
arbitrary but are related by (8.8.48), in which the value K on both sides depends on
our choice of the constants A, c1 and c2 .
Conclusion: After choosing the constants A, c1 and c2 , any real function pair
(g(v), χ (v)) satisfying (8.8.49) determines a cylindrically symmetric metric by
(8.8.52), whose corresponding source is a cylindrically symmetric nonnull elec-
tromagnetic field described by a complex-valued function pair (0 , 2 ) satisfying
(8.8.28) and (8.8.24). There are many real function pairs (g(v), χ (v)) that satisfy
(8.8.49), for instance the following 3 function pairs all satisfy (8.8.49) with c1 and
c2 chosen to be zero, i.e., K =√0:
(1) g(v) = sin v, χ (v) = 2√ 2v.
(2) g(v) = ln v, χ (v) = 4 2(ln√v)1/2 .
(3) g(v) = v 1/α , χ (v) = (2/α) 2(α − 1) ln v, where α ∈ (1, ∞). This example
forms a one-parameter subfamily (with the parameter α) of the cylindrically sym-
metric solution family of the Einstein-Maxwell equations, in which √ the simplest one
is the solution characterized by α = 2, i.e., g(v) = v 1/2 , χ (v) = 2 ln v.
The electromagnetic field Fab described by a complex-valued function pair
(0 , 2 ) satisfying (8.8.28) and (8.8.24) can also be expressed in terms of its non-
vanishing components in the coordinate basis {(∂/∂t)a , (∂/∂ρ)a , (∂/∂z)a , (∂/∂ϕ)a }:

 
−χ/4 1
Ft z = −Fzt = −a1 e 1 − uχ , (8.8.53a)
4
 
−χ/4 1
Fρz = −Fzρ = a1 e 1 + uχ , (8.8.53b)
4
 
χ/4 1
Ftϕ = −Fϕt = −a2 e 1 + uχ , (8.8.53c)
4
 
χ/4 1
Fρϕ = −Fϕρ = a2 e 1 − uχ , (8.8.53d)
4

where Ft z ≡ Fab (∂/∂t)a (∂/∂z)b , the others are defined similarly; a1 , a2 ∈ R are
the real and imaginary parts of a, respectively. It is not difficult to verify that Fab
constituted by (8.8.53) satisfies the source-free Maxwell equations ∇ a Fab = 0 and
∇[a Fbc] = 0, and the energy-momentum tensor Tab constituted by Fab according to
(8.4.1) satisfies Einstein’s equation Tab = Rab /8π , where Rab is the Ricci tensor of
the metric (8.8.52).
378 8 Solving Einstein’s Equation

8.9 The Vaidya Metric and the Kinnersley Metric

8.9.1 From the Schwarzschild Metric to the Vaidya Metric

The line element of the vacuum Schwarzschild solution in the Schwarzschild coor-
dinate system {t, r, θ, ϕ} is given by
   −1
2 = − 1 − 2M dt 2 + 1 − 2M
dsSch dr 2 + r 2 (dθ 2 + sin2 θdϕ 2 ) (r > 2M) .
r r

Starting from the Schwarzschild coordinate system, we apply the coordinate trans-
formation {t, r, θ, ϕ} → {u, r, θ, ϕ}, where
 
r
u ≡ t − r∗ , r∗ ≡ r + 2M ln −1 (r∗ is called the tortoise coordinate), (8.9.1)
2M

then the Schwarzschild line element turns into the following form:
2
dsSch = −(1 − 2Mr −1 )du 2 − 2dudr + r 2 (dθ 2 + sin2 θ dϕ 2 )
= [−du 2 − 2dudr + r 2 (dθ 2 + sin2 θ dϕ 2 )] + 2Mr −1 du 2 . (8.9.2)

The square bracket on the right-hand side of the above equation can also be written
as −dt 2 + dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 ), which is nothing but the flat line element, and
hence
2
dsSch = dsflat
2
+ 2Mr −1 du 2 . (8.9.3)

Once we change the constant M in the above equation to a function m(u) of the
coordinate u, we obtain the following new line element [called the Vaidya line
element]:
2
dsVai = dsflat
2
+ 2m(u)r −1 du 2
= −[1 − 2m(u)r −1 ]du 2 − 2dudr + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (8.9.4)

Let gab represent the Vaidya metric, then from the equation above one can read off
all of its nonvanishing components in the system {u, r, θ, ϕ}:

guu = −[1 − 2m(u)r −1 ] , gur = gr u = −1 , gθθ = r 2 , gϕϕ = r 2 sin2 θ .


(8.9.5)
Hence, from gab = gμν (dx μ )a (dx ν )b we get the abstract index expression for the
Vaidya metric:

gab = − [1 − 2m(u)r −1 ](du)a (du)b − (du)a (dr )b − (dr )a (du)b


+ r 2 (dθ )a (dθ )b + r 2 sin2 θ (dϕ)a (dϕ)b . (8.9.6)
8.9 The Vaidya Metric and the Kinnersley Metric 379

It is not difficult to verify that the inverse of gab is


  
  a  b   a  b
∂ a ∂ b ∂ ∂ 2m(u) ∂ ∂
g ab = − − + 1−
∂u ∂r ∂r ∂u r ∂r ∂r
 a  b  a  b
1 ∂ ∂ 1 ∂ ∂
+ 2 + 2 2 . (8.9.7)
r ∂θ ∂θ r sin θ ∂ϕ ∂ϕ

Now that we have the metric we can compute its Einstein tensor G ab ≡ Rab −
Rgab /2, and from Einstein’s equation G ab = 8π Tab we can find its energy-
momentum tensor Tab in order to figure out what is the source associated with this
metric. The nonvanishing components of the Vaidya metric are already given in
(8.9.5), and the corresponding inverse matrix has the nonvanishing components

grr = 1 − 2m(u)r −1 , g ur = gr u = −1 , g θθ = r −2 ,
g ϕϕ = (r sin θ )−2 .
(8.9.8)
Plugging them into (3.2.10 ) yields the nonvanishing Christoffel symbols
u
uu = −mr −2 , u
θθ =r, u
ϕϕ = r sin2 θ
r
uu = −ṁr −1 + mr −3 (r − 2m) , r
ur = r
ru = mr −2 ,
r
θθ = 2m − r , r
ϕϕ = (2m − r ) sin2 θ ,
θ θ
rθ = θr = r −1 , θ
ϕϕ = − sin θ cos θ ,
ϕ ϕ −1 ϕ ϕ
rϕ = ϕr =r , θϕ = ϕθ = cot θ , (8.9.9)

where ṁ ≡ dm(u)/du. Plugging these into (3.4.21), we see that the Ricci tensor Rab
has only one nonvanishing component, i.e.,

Ruu = −2ṁr −2 , (8.9.10)

and hence
Rab = −2ṁr −2 (du)a (du)b . (8.9.11)

From the above equation we also get R = g uu Ruu = 0, and hence G ab = Rab . There-
fore, it follows from Einstein’s equation G ab = 8π Tab that


Tab = − (du)a (du)b . (8.9.12)
4πr 2
Let
ka ≡ −(du)a , k a ≡ g ab kb = −g ab (du)b , (8.9.13)

then from (8.9.7) we can easily see that

k a = (∂/∂r )a . (8.9.14)
380 8 Solving Einstein’s Equation

Hence, k a ka = 0, and thus k a is a null vector field. Now (8.9.12) can also be expressed
as

Tab = − ka kb . (8.9.12 )
4πr 2
When ṁ < 0, the above equation can be viewed as a special case of the energy-
momentum tensor in the following form:

Tab = 2 ka kb (2 is a positive definite function). (8.9.15)

What kind of field can have an energy-momentum tensor like that shown in the
above equation? It can be proved that (see Appendix D in Volume II) the energy-
momentum tensor of a source-free null electromagnetic field (satisfying Fab F ab = 0)
can be expressed in the form (8.9.15), in which

E2
2 ≡ (E is the electric field measured by an orthonormal tetrad).

(8.9.16)
A null electromagnetic field can be viewed as a “matter field” formed by many pho-
tons propagating along the null direction k a . Moreover, a matter field formed by
other particles with zero rest masses (such as massless scalar particles and neutri-
nos10 ) moving along the k a -direction also has an energy-momentum tensor of the
form (8.9.15). This kind of matter field is called a pure radiation field. In sum-
mary, the matter fields whose energy-momentum tensors can be expressed in terms
of (8.9.15) can be classified into two kinds: ① source-free null electromagnetic fields;
② pure radiation fields. The difference between them is that there exists a 2-form
field Fab for the former one, which satisfies the source-free Maxwell equations and
Tab = Fac Fb c /4π . It can be proved that (see Optional Reading 8.9.1) the matter
field corresponding to (8.9.12 ) does not obey the source-free Maxwell equations,
and thus the source of the Vaidya metric is a pure radiation field instead of a null
electromagnetic field.
When compared with the Schwarzschild metric, the Vaidya metric has mainly
the following three differences. ① The mass parameter M of the former one is a
constant while the m in the latter is a function of u. ② The former is a solution to
the vacuum Einstein equation G ab = 0 while the latter is a solution of the Einstein
equation with source G ab = 8π Tab , where Tab represents a pure radiation field. ③
By finding the general solution to the Killing equation one can show that, the former
has four independent Killing vector fields, in which one of them is timelike, and
hence is a stationary metric; the latter has only three independent Killing vector
fields (which are exactly those three reflecting spherical symmetry) with no timelike
Killing field, and hence the Vaidya solution is not a stationary metric. The above
three properties of the Vaidya metric are closely related. If we interpret m still as the

10 This is included because neutrinos are massless in the Standard Model of particle physics. How-
ever, now it has been experimentally confirmed that neutrinos have nonzero masses, and thus
technically it should not be included anymore.
8.9 The Vaidya Metric and the Kinnersley Metric 381

mass of a spherically symmetric star, and interpret u as the proper time of the star
(Sect. 8.9.3 will justify this interpretation), then m being a function of u (property
①) indicates that the mass of the star changes with time with a rate ṁ. Why is it
so? Because it keeps emitting massless particles (property ②) (for convenience they
are also called “photons”, although they are not quanta of an electromagnetic field),
which takes away energy ceaselessly. Calculation (see Sect. 8.9.3) shows that the
energy flows to infinity per unit time happen to be equal to −ṁ, i.e., equal to the
decreasing rate of the energy (mass) m of the star (assuming ṁ < 0),11 which agrees
with the law of the conservation of energy. It is exactly the feature that m is time
dependent which renders the Vaidya metric a non-stationary metric (property ③).
In consideration of the above-mentioned properties, P. C. Vaidya himself called this
kind of star a “shining star”, although the “shining” is not caused by photons but
other massless particles. It is natural to ask: does not a static star described by the
Schwarzschild metric shine? Of course a star shines, but the thing is, to simplify
the solving process, Schwarzschild ignored the energy-momentum tensor of the
photons emitted from the star (which also form a bath for the star) and treated its
exterior as a vacuum. This is how we can have the well-known, exceptionally easy,
while extensively used, vacuum Schwarzschild solution. Thus, the familiar physical
interpretation “the vacuum Schwarzschild solution describes the exterior metric field
of a static spherically symmetric star” is only an approximate statement.

[Optional Reading 8.9.1]


As another example of applying the NP formalism, here we compute the Riemann tensor
of the Vaidya metric again using the null tetrad. The first step is to choose an appropriate
null tetrad {(εμ )a }. The expression (8.9.6) for the Vaidya metric gab can also be written as

gab = − h(du)a (du)b − (du)a (dr )b − (dr )a (du)b


+ r 2 (dθ)a (dθ)b + r 2 sin2 θ(dϕ)a (dϕ)b , (8.9.17)
where
2m(u)
h ≡1− . (8.9.18)
r
The following reformulation of the above expression will bring us important inspiration:
   
1 1
gab = − (du)a h(du)b + (dr )b − h(du)a + (dr )a (du)b
2 2
√ √
+ {r [(dθ)a − i sin θ(dϕ)a ]/ 2}{r [(dθ)b + i sin θ(dϕ)b ]/ 2}
√ √
+ {r [(dθ)a + i sin θ(dϕ)a ]/ 2}{r [(dθ)b − i sin θ(dϕ)b ]/ 2} . (8.9.19)
Comparing with the general expression in the null tetrad {(εμ )a }

gab = gμν (εμ )a (εν )b = −ka lb − la kb + m̄ a m b + m a m̄ b , (8.9.20)


we can “read off” m a , m̄ a , la and ka as follows:

11 As a solution to Einstein’s equation, the derivative of the parameter m(u) can either be positive
or negative (also, of course, zero). However, in order to make this solution a metric corresponding
to a matter field which is physically acceptable, we need to require ṁ < 0.
382 8 Solving Einstein’s Equation

1
ka = −(du)a , la = − h(du)a − (dr )a ,
2
r r (8.9.21)
m a = √ [(dθ)a − i sin θ(dϕ)a ] , m̄ a = √ [(dθ)a + i sin θ(dϕ)a ] ,
2 2
and the corresponding m a , m̄ a , l a and k a are
1
k a = (∂/∂r )a , m a = √ [(∂/∂θ)a − i sin−1 θ(∂/∂ϕ)a ] , (8.9.21 )
2r
1 1
l a = (∂/∂u)a − h(∂/∂r )a , m̄ a = √ [(∂/∂θ)a + i sin−1 θ(∂/∂ϕ)a ] .
2 2r
The readers should verify that this null tetrad indeed satisfies

gab m a m b = gab m̄ a m̄ b = gab la lb = gab k a k b = 0 , gab m a m̄ b = 1 , gab l a k b = −1 .

After computing all of the ωρμν using (5.7.19) [in which (eμ )a should be interpreted as
(εμ )a ] and (5.7.20) or any other method, one can find all of the 12 spin coefficients using
(8.7.8) as follows:

κ = σ = ν = τ = λ = π = ε = 0, (8.9.22a)
 
1 1 2m(u) m 1
ρ=− , μ=− 1− , γ = 2 , β = −α = √ cot θ .
r 2r r 2r 2 2r
(8.9.22b)
Using (8.9.22a) one can simplify the NP equations into the following form:

Dρ = ρ 2 + 00 , (8.9.23a)
0 = 0 , (8.9.23b)
0 = 1 + 01 , (8.9.23c)
Dα = αρ + 10 , (8.9.23d)
Dβ = β ρ̄ + 1 , (8.9.23e)
Dγ = 2 + 11 − R/24 , (8.9.23f)
0 = 20 , (8.9.23g)
Dμ = ρ̄μ + 2 + R/12 , (8.9.23h)
0 = ψ3 + 21 , (8.9.23i)
0 = −4 , (8.9.23j)
δρ = ρ(ᾱ + β) − 1 + 01 , (8.9.23k)
δα − δ̄β = μρ + (α ᾱ + β β̄ − 2αβ) − 2 + 11 + R/24 , (8.9.23l)
−δ̄μ = −3 + 21 , (8.9.23m)
−μ = μ2 + μ(γ + γ̄ ) + 22 , (8.9.23n)
−β = γ (−ᾱ − β) − β(γ − γ̄ − μ) + 12 , (8.9.23o)
0 = 02 , (8.9.23p)
ρ = −ρ μ̄ + ρ(γ + γ̄ ) − 2 − R/12 , (8.9.23q)
α = α(γ̄ − μ̄) + γ β̄ − 3 . (8.9.23r)
Plugging (8.9.22b) into (8.9.23), one can readily find the 5 complex quantities 0 ∼ 4
representing the Weyl tensor and the 4 real quantities 00 , 11 , 22 , R representing the
Ricci tensor as well as 3 independent complex quantities 01 , 02 , 12 . Among them only
two are nonvanishing:
8.9 The Vaidya Metric and the Kinnersley Metric 383

2 = −m(u)/r 3 , (8.9.24)
22 = −ṁ(u)/r . 2
(8.9.25)
Noticing (8.7.11a), especially 22 = R33 /2 therein, we can see that the Ricci tensor of the
Vaidya metric is

Rab = R33 (ε3 )a (ε3 )b = R33 (−ka )(−kb ) = 222 ka kb = −2ṁ(u)r −2 (du)a (du)b ,
(8.9.26)
which agrees with the Rab [see (8.9.11)] derived using the coordinate basis method [Equa-
tion (3.4.21)].
Using the above result one can now also show that the matter field corresponding to the
Vaidya metric is not an electromagnetic field. In the NP formalism, an electromagnetic field
Fab is represented by complex quantities 0 , 1 , 2 , whose relations with 00 , · · · , 22
representing the Ricci tensor are given in (8.8.8). Since 22 is the only nonvanishing one
among 00 , · · · , 22 , (8.8.8) gives

0 = 1 = 0 , 2 = Aeiα , (8.9.27)

where A ≡ −ṁ(u)/2r −1 , and α is a real function of the coordinates. Plugging (8.9.27)
into the source-free Maxwell equations (8.8.3), one finds that (a), (c) are identities and (b),
(d) leads to, respectively,

∂α 1 ∂α ∂α
= 0, − −i = cot θ . (8.9.28)
∂r sin θ ∂ϕ ∂θ
The first equation indicates that α = α(u, θ, ϕ), and the real and imaginary parts of the
second equation gives ∂α/∂θ = 0 [and hence α = α(u, ϕ)] and ∂α/∂ϕ = − cos θ. These
two equations contradict each other. Thus, the matter field of the Vaidya metric is not an
electromagnetic field, and therefore can only be a pure radiation field.
[The End of Optional Reading 8.9.1]

8.9.2 The Kinnersley Metric

The Vaidya metric is a generalization of the Schwarzschild metric, and a new met-
ric defined by W. Kinnersley is a generalization of the Vaidya metric [Kinnersley
(1969)]. Now we introduce this metric. Suppose L(u) is an arbitrary smooth time-
like curve (imagine it as the world line of a rocket) in 4-dimensional Minkowski
spacetime (R4 , ηab ), where u is the proper time. (Here we use u instead of τ , the
purpose will be clear later). Following Kinnersley, we will use λa (instead of U a in
the convention of this text) to represent the 4-velocity of L(u), i.e., λa ≡ (∂/∂u)a .
Suppose p is an arbitrary point in R4 , then L and the past light cone surface of p
have exactly one intersection,12 denoted by q (see Fig. 8.11). Let {X μ } be an arbi-
trary inertial coordinate system, λμ be the components of λa in this system, and
ψ a , ξ a be the position vectors of p, q in this system, i.e., ψ a ≡ ψ μ (∂/∂ X μ )a | p ,
ξ a ≡ ξ μ (∂/∂ X μ )a |q , where ψ μ ≡ X μ ( p), ξ μ ≡ X μ (q). Originally, u and λa are

12 There is an exception when L(u) is asymptotically null (e.g., the hyperbola in Exercise 6.13).

Kinnersley (1969) did not discuss this exception.


384 8 Solving Einstein’s Equation

only a scalar field and a vector field defined on L(u); however, their domains can be
naturally extended to the whole R4 : ∀ p ∈ R4 , we have a unique q ∈ L, and thus we
can define u( p) := u(q), λμ ( p) := λμ (q). [Define λa | p by defining its coordinate
components λμ ( p), i.e., λa | p := λμ (q)(∂/∂ X μ )a | p ]. Thus, the parametric equations
for each integral curve C(u) of λa in the coordinate system {X μ } are

X μ (u) = ξ μ (u) + σ μ (constants σ μ satisfy ημν σ μ σ ν = 0). (8.9.29)

[Because the tangent of the curve represented by the above parametric equations has
components dX μ (u)/du = dξ μ (u)/du = λμ in the system {X μ }. When σ μ = 0 the
above equation will degenerate to X μ (u) = ξ μ , namely the parametric equations of
L(u)]. This indicates that the λa of any point p satisfies

λa ∂a u = (∂/∂u)a ∂a u = 1 . (8.9.30)

Define a vector σ a | p := ψ a − ξ a at p. From Fig. 8.11 we can see that σ a | p is null,


and hence σ a is a null vector field, which is the normal vector field of a family of
null hypersurfaces (the family formed by the future light cone surfaces whose apices
are points on L).
Since each point p has a timelike vector λa | p and a null vector σ a | p , one can
apply the “3 + 1 decomposition” to σ a | p and take λa | p as the time direction; that is,
one can decompose σ a into the sum of a component parallel to λa (denoted by r λa )
and a component perpendicular to λa (denoted by σ̂ a ), i.e., (see Fig. 8.12)

σ a = r λa + σ̂ a . (8.9.31)

Contracting both sides of this equation with λa ≡ ηab λb , and noticing that λa λa = −1
and that σ̂ a is orthogonal to λa , we obtain

r = −λa σ a . (8.9.32)

Fig. 8.11 Each spacetime


point p determines a point q
on the timelike curve L(u)
8.9 The Vaidya Metric and the Kinnersley Metric 385

Fig. 8.12 The “3 + 1


decomposition” of a null
vector σ a (and k a ) at p

Also, let
k a ≡ r −1 σ a , n a ≡ r −1 σ̂ a , (8.9.33)

then we have

(a) k a = λa + n a , (b) λa k a = −1 , (c) ηab n a n b = 1 . (8.9.34)

k a can be regarded as some kind of “normalization” of σ a : the magnitudes of the


time component λa and the spatial component n a of k a are both 1.
Since σ a (and thus k a ) is a normal vector field on each future light cone surface
with each point on L as the apex, ka ≡ ηab k b is the normal covector of each of these
hypersurfaces. On the other hand, these hypersurfaces being constant-u surfaces
indicates that ∂a u is their normal covector, and thus ka and ∂a u at most differ by a
multiplicative factor, i.e., ka = α∂a u. Combining this with (8.9.34)(b) and (8.9.30)
yields α = −1, and hence
ka = −(du)a . (8.9.35)

Based on the preceding discussion, Kinnersley defined a metric on R4 which later


was dubbed with his last name. This metric can be expressed in terms of abstract
indices as

gab := ηab + 2m(u)r −1 ka kb = ηab + 2m(u)r −1 (du)a (du)b , (8.9.36)

where m(u) is a function of u. Now that there are two metrics (ηab and gab ) on R4
[with L(u) removed], we need to pay additional attention to raising and lowering
indices (and other constructions involving a metric). For those quantities defined as
vectors (each carries an upper index) in the first place, such as λa , σ a and k a , it is
crystal clear. We stipulate that for all the tensors obtained by raising and lowering
indices (e.g., λa , σa , ka ), the indices are raised and lowered by ηab . For those tensors
386 8 Solving Einstein’s Equation

Fig. 8.13 Choose the


geodesic L(u) as the world
line of the spatial origin of
the inertial frame
{T, X, Y, Z }

whose indices are raised and lowered by gab we will write out gab explicitly, for
instance gab λb is not equal to λa (=ηab λb ).13
First we discuss the simple case where L(u) is a geodesic of ηab (we will call it
an η-geodesic for brevity). In this case, the Kinnersley metric (8.9.36) comes down
to the Vaidya metric (when ṁ = 0) or the Schwarzschild metric (when ṁ = 0). In
order to see this, one only needs to write out the line element of gab in an appropriate
coordinate system {u, r, θ, ϕ} and compare it with (8.9.4). Take u and r which are
already defined for each point as the first two coordinates of the system {u, r, θ, ϕ},
and leave θ and ϕ to be defined below. Suppose {T, X, Y, Z } is the inertial coordinate
system of ηab , whose origin of the spatial coordinates (X = Y = Z = 0) as a world
line coincides with the geodesic L(u), then the components of λa in this coordinates
are λμ = (1, 0, 0, 0) (see Fig. 8.13), and hence the r in (8.9.32) satisfies

r = −ημν σ μ λν = −η00 σ 0 λ0 = σ 0 .

On the other hand, the 3-dimensional space  p in Fig. 8.13 can be viewed as the
whole space at the time of p. From the figure we can see that σ 0 = the length of the
line segment qa = the length of the line segment ap, and thus r = σ 0 indicates that
the value of r at p is the spatial distance between p and the geodesic L(u). Set up a
spherical coordinate system {r, θ, ϕ} on  p with a as the origin and r as the radial
coordinate, in which θ and ϕ are defined as follows:

X = r sin θ cos ϕ, Y = r sin θ sin ϕ , Z = r cos θ .

Combining this {r, θ, ϕ} with u yields the 4-dimensional coordinate system we want.
The u and r of this system and the T of {T, X, Y, Z } has the following relation: T =
u + r . Hence, the line element of ηab in this system is −du 2 − 2dudr + r 2 (dθ 2 +
sin2 θ dϕ 2 ), and therefore the line element of the Kinnersley metric gab is

ds 2 = −[1 − 2m(u)r −1 ]du 2 − 2dudr + r 2 (dθ 2 + sin2 θ dϕ 2 ) , (8.9.37)

13However, gab k b is equal to ka (=ηab k b ). This is because it follows from (8.9.36) that gab k b =
ηab k b + 2mr −1 (du)a (du)b k b , and k b (du)b = k b ∂b u = 0 (According to the definition of u, it is a
constant on an integral curve of k a ).
8.9 The Vaidya Metric and the Kinnersley Metric 387

which has the same form as (8.9.4). Thus, the Kinnersley metric (8.9.36) comes down
to the Vaidya metric (when ṁ = 0) or the Schwarzschild metric (when ṁ = 0) in
the case where L(u) is an η-geodesic. The actual generalization by Kinnersley is the
case where L(u) is not an η-geodesic, which will be discussed in detail below.

8.9.3 The Kinnersley Metric (Detailed Discussions)

When L(u) is not an η-geodesic, the 4-acceleration λb ∂b λa of L(u) is nonvanishing


(∂b is the derivative operator associated with ηab ), and is orthogonal to the 4-velocity
λa . Following Kinnersley, we use λ̇a to represent the 4-acceleration λb ∂b λa , then
ηab λa λ̇b = 0. We still want to choose an appropriate coordinate system {u, r, θ, ϕ}
to represent the Kinnersley metric. The definitions of u and r are the same as those
in the case where L(u) is a geodesic, just the geometric interpretation of r is slightly
different now. Suppose p is an arbitrary spacetime point, then it determines a unique
point q on L(u). Let G q be the η-geodesic passing through q, and denote the inertial
reference frame determined by G q by Rq . Let  p and q represent two surfaces of
simultaneity in R p , which respectively include p and q. Following the discussion
in Sect. 8.9.2 we can see that the r determined by (8.9.32) represents the spatial
distance between p and G q (see Fig. 8.14). Let G p represent the inertial observer
that passes through p in Rq , then r can also be viewed as the spatial distance between
G p and L(u) at the moment  p (i.e., the distance between p and q in the figure). In
electrodynamics, q is referred to as the “retarded time” corresponding to  p (note
that  p is actually later than q , but q is conventionally called the retarded time),
and hence what r stands for is the retarded distance between G p and L(u).
Now we will introduce the definition of the coordinate θ and ϕ in the system
{u, r, θ, ϕ}. Any point q on L determines an instantaneous rest inertial observer
G q and an instantaneous rest inertial reference frame Rq . Define an instantaneous
rest inertial coordinate system {X μ } ≡ {T, X, Y, Z } in Rq based on the following
requirements: ① take G q as the world line of the origin of the spatial coordinates
(X = Y = Z = 0), ② take q as the point at T = 0 on this world line, ③ the Z -axis
and λ̇a |q are in the same direction (the direction of the X -axis still has arbitrariness).

Fig. 8.14 The value of r at


p gives the retarded distance
between the observer G p and
the rocket L(u)
388 8 Solving Einstein’s Equation

Take the Z -axis of this system as the polar axis, define the coordinates θ and ϕ on
the future light cone surface of q (including q) as follows:

X = r sin θ cos ϕ , Y = r sin θ sin ϕ , Z = r cos θ . (8.9.38)

Since the direction of the 4-acceleration λ̇a changes continuously as q moves along
L, when defining θ and ϕ we need to keep rotating the direction pointing at the north
pole in order to guarantee that it keeps align with λ̇a .
By a calculation based on the preceding discussion (see Optional Reading 8.9.2
for details) one can find all the nonvanishing components of the Kinnersley metric
gab in the system {u, r, θ, ϕ}:

guu = −1 − 2a(u)r cos θ + r 2 ( f 2 + g 2 sin2 θ ) + 2m(u)r −1 , gur = gr u = −1 ,


guθ = gθu = −r 2 f , guϕ = gϕu = −r 2 g sin2 θ , gθθ = r 2 , gϕϕ = r 2 sin2 θ ,
(8.9.39a)

with

f ≡ a(u) sin θ + b(u) sin ϕ − c(u) cos ϕ , g ≡ [b(u) cos ϕ + c(u) sin ϕ] cot θ ,
(8.9.39b)

where

a(u) ≡ |λ̇a (u)| ≡ [ηab λ̇a (u)λ̇b (u)]1/2 (8.9.39c)

is the magnitude of the 4-acceleration of L(u),14 and b and c describe the time rate of
change (u as the time) of the direction of λ̇a , see Optional Reading 8.9.2 for details.
If a segment of L(u) is a timelike hyperbola (see Exercise 6.13), then a = constant
and b = c = 0 in this segment.
One can further calculate the Ricci tensor Rab and scalar curvature R of the
Kinnersley metric:

Rab = −2r −2 (ṁ + 3ma cos θ )ka kb , R = 0, (8.9.40)

and hence the corresponding Tab is

1
Tab = − (ṁ + 3ma cos θ )ka kb . (8.9.41)
4πr 2

Similar to (8.9.12 ), the matter field corresponding to the above expression is also a
pure radiation field rather than an electromagnetic field. Although this matter field
is formed by massless particles which are not photons, we will refer to them as
“photons” for the sake of convenience.

14Note that the a(u) defined in this text has a sign difference compared with Kinnersley (1969) and
Bonnor (1994).
8.9 The Vaidya Metric and the Kinnersley Metric 389

As we have mentioned, the Kinnersley metric comes down to the Vaidya metric
when L(u) is an η-geodesic and ṁ = 0. By means of L(u) we can provide a more
intuitive interpretation for the physical meaning of the Vaidya metric. In this inter-
pretation one should note that there are two metric fields on R4 , namely ηab and the
(Vai)
Vaidya metric gab ; the geodesic, 4-acceleration, etc. we mentioned above are all
measured by ηab and its associated derivative operator ∂a .
One may imagine this: a star is undergoing geodesic motion (inertial motion) in
Minkowski space with L(u) as its world line. Since it keeps emitting particles, its
mass (energy) keeps decreasing (ṁ < 0). The 4-momentum of the star together with
the energy-momentum Tab of the surrounding radiation field produce a gravitational
(Vai)
field which makes the spacetime curved, and the spacetime is described by gab .
[However, one cannot ask a question like “is the world line a geodesic measured by
(Vai) (Vai)
gab ” since gab is not defined on the curve (r = 0)]. Since the geodesic L(u) “holds
(Vai)
the scales even”, i.e., it is isotropic, gab has spherical symmetry, but ṁ = 0 makes it
lose the stationarity. This intuitive physical interpretation can also be carried over to
the Kinnersley metric g (Kin) . Now L(u) is not an η-geodesic, and its radiation is not
isotropic anymore; hence, it is not appropriate to regard L(u) as the world line of a star.
Thus, we now change the star to a rocket, which keeps emitting “photons” outwards in
an anisotropic manner (to some extent similar to a real rocket emitting jets), and hence
is called a “photon rocket” in the literature. The recoil experienced by this rocket due
to the fact that it emits photons makes its energy and 3-momentum keep changing;
the former is manifested by ṁ < 0, and the latter renders the time rate of change of
the 3-momentum nonvanishing. Formulating in the 4-dimensional language, using
P a to represent the 4-momentum of the rocket, we have P a = mλa , and hence its
time rate of change is Ṗ a = ṁλa + m λ̇a , where Ṗ a ≡ λb ∂b P a . The first and second
terms represent the time rates of change of the energy and 3-momentum, respectively.
In the instantaneous rest inertial frame {X μ } at q, the component expression for this
equation reads
Ṗ μ = ṁλμ + m λ̇μ . (8.9.42)

Since at q we have

λμ = (1, 0, 0, 0) , λ̇μ = (0, 0, 0, a) (λ̇a is in the direction of the Z -axis),


(8.9.43)
the rocket has the following increasing rates:

increasing rate of the energy = Ṗ 0 = ṁλ0 = ṁ ,


(8.9.44a)
increasing rate of the i-component of the momentum = Ṗ i = m λ̇i = (0, 0, ma) .
(8.9.44b)

Now we will show that the energy and momentum increasing rates of the rocket
caused by this recoil are exactly the energy and momentum carried by the “photons”
emitted by the rocket to infinity per unit time times −1. To do so, we should calculate
390 8 Solving Einstein’s Equation

the energy and momentum flowing out of the sphere S in Fig. 8.14. Suppose {X μ } is an
instantaneous rest inertial frame at q, then {(eμ )a } ≡ {(∂/∂ X μ )a } is an orthonormal
tetrad field on R4 . Sect. 6.4 points out that T 0 j (= −T0 j ) is the j-component of the
energy flux density, and hence T 0 j (e j )a is the energy flux density vector. Therefore,
 
energy flowing outside S per unit time = T (e j ) n a dS =
0j a
T 0 j n j dS ,
S S
(8.9.45)
where n a ≡ ηab n b , while n b is the outgoing unit normal vector of the sphere S,
namely the n a in (8.9.33). Moreover, Sect. 6.4 also points out that T i j (ei )a (e j )b is
the 3-momentum flux density tensor, whose contraction with any spatial unit vector
gives the 3-momentum flux density vector. Therefore,
 
3-momentum flowing out of S per unit time = T i j (ei )a (e j )b n b dS = T i j (ei )a n j dS ,
S S
(8.9.46)

i-component of the 3-momentum flowing out of S per unit time = T i j n j dS .
S
(8.9.47)

Summarizing (8.9.45) and (8.9.47) we can write



μ-component of the 4-momentum flowing out of S per unit time = T μν n ν dS .
S
(8.9.48)

It follows from the definition of the instantaneous rest inertial frame {X μ } at q and
the coordinates θ and ϕ that at any point on S we have (see Figs. 8.14 and 8.12)

λμ = (1, 0, 0, 0) and n μ = (0, sin θ cos ϕ, sin θ sin ϕ, cos θ ) . (8.9.49)

Plugging in k a = λa + n a yields

k μ = (1, sin θ cos ϕ, sin θ sin ϕ, cos θ ) , (8.9.50)

and thus k μ n μ = 1. Combining this with (8.9.41) yields

1 1
T μν n ν = − (ṁ + 3ma cos θ )k μ k ν n ν = − (ṁ + 3ma cos θ )k μ ,
4πr 2 4πr 2
(8.9.51)
and hence

1
energy flowing out of S per unit time = − (ṁ + 3ma cos θ )k 0 dS
4πr 2 S
 2π  π
1
=− dϕ (ṁ + 3ma cos θ )r 2 sin θ dθ = −ṁ , (8.9.52a)
4πr 2 0 0
8.9 The Vaidya Metric and the Kinnersley Metric 391

the 3rd component of the 3-momentum flowing out of S per unit time

1
=− (ṁ + 3ma cos θ )k 3 dS
4πr 2 S
 2π  π
1
=− dϕ (ṁ + 3ma cos θ ) cos θr 2 sin θ dθ = −ma . (8.9.52b)
4πr 2 0 0

Similarly we obtain that

the 1st and 2nd components of the 3-momentum flowing out of S per unit time = 0 .
(8.9.52c)
Equations (8.9.52) also hold when r → ∞. Comparing them with (8.9.44) proves
the conclusion we claimed above, i.e., the increasing rates of the rocket’s energy and
momentum are exactly the energy and momentum carried by the “photons” it emits
to infinity per unit time times −1.
Based on the physical interpretation above, we may refer to the Kinnersley solu-
tion as the “solution of an arbitrary accelerating point mass”, or say that the Kin-
nersley metric represents the “gravitational field of an arbitrary accelerating point
mass”. However, one should note that: ① “accelerating point mass” means that
the 4-acceleration λ̇a ≡ λb ∂b λa of the rocket is nonvanishing (a = 0), and this 4-
acceleration is measured by ηab . Why is it not measured by gab ? The answer is: the
world line of the rocket has r = 0, while gab is not well-defined (is singular) on this
curve, and so it cannot be used to measure any quantity on the rocket’s world line.
② This “gravitational field of an accelerating point mass” is generated by the point
mass (rocket) together with the “photons” it emits, and the Tab corresponding to gab
is the energy-momentum tensor of the pure radiation field outside the rocket.
The preceding discussion about the Kinnersley metric also has a few subtleties,
as we will list below:
(1) When computing the energy and momentum flowing out of the sphere S, we
have used ηab for everything that involves a metric without mentioning; however, the
metric of Kinnersley spacetime is supposed to be the Kinnersley metric, and so the
legitimacy of the above calculation should be called into question. Regarding this,
Bonnor (1994) provides an answer as follows (gist, not exact words): the difference
between gab and ηab is only in the term with mr −1 . Adding this term will affect the
normalization of the n ν in (8.9.51), but its contribution to the integral will approach
zero when S approaches infinity. Thus, it turns out that ignoring the term with mr −1
will not affect the upshot.
(2) For any matter field known by physicists, the energy density measured by
any observer at any time is non-negative (called the weak energy condition, see
Appendix D in Volume II for details). Suppose ( p, Z a ) is an arbitrary instantaneous
observer, then it follows from (8.9.45) that

1
T00 = Tab Z a Z b = − (ka Z a )2 (ṁ + 3ma cos θ ) .
4πr 2
392 8 Solving Einstein’s Equation

When a = 0 (Vaidya), we only have to let ṁ < 0 to guarantee T00 > 0. However,
the case where a = 0 is not that simple since cos θ can be both positive and negative.
Nevertheless, as long as we assume m > 0, it is not difficult to see that T00 > 0 is
equivalent to −ṁ/3m  a cos θ . Therefore, in order to make T00 non-negative for
any value of θ , besides ṁ < 0, we should also require that a  −ṁ/3m. One can
consider this as some sort of constraint coming from the energy condition on the
relation between the two parameters m and a of the Kinnersley metric.
(3) Bonnor (1994) points out that, since the rocket undergoes an accelerating
motion, it should emit gravitational waves which carry energy and momentum out to
infinity. However, we have proved that under the premise without gravitational waves,
the energy and momentum carried only by the “photons” to infinity have already
satisfied the balance requirement, i.e., they are exactly the energy and momentum
increasing rates times −1. This implies that the energy and momentum carried by
gravitational waves to infinity vanishes. Hence, there is a paradox: does the Kin-
nersley spacetime have any gravitational radiation at all? Regarding this problem,
Damour (1995) and Dain et al. (1996) studied the gravitational radiation of the Kin-
nersley metric using very different approaches, and the basic conclusion is: both the
point-like accelerating rocket and the “photons” it is surrounded by emit gravita-
tional radiation; the energy and momentum carried by them cancel each other, and
so overall there are no gravitational waves in Kinnersley spacetime (the energy and
momentum carried by the gravitational waves to infinity vanish).
[Optional Reading 8.9.2]
Now we provide the detailed derivation of (8.9.39). It follows from (8.9.36) that among
all the components of gab and ηab in the system {u, r, θ, ϕ} the only different one is the uu-
component. Specifically speaking, if we use guu , gur , · · · , gϕϕ and 0 guu ,0 gur , · · · ,0 gϕϕ to
represent the components of gab and ηab in the system {u, r, θ, ϕ}, then

guu = 0 guu + 2mr −1 , gur = 0 gur , guθ = 0 guθ , guϕ = 0 guϕ , (8.9.53)
grr = 0 grr , gr θ = 0 gr θ , gr ϕ = 0 gr ϕ , gθθ = 0 gθθ , gθϕ = 0 gθϕ , gϕϕ = 0 gϕϕ .

Therefore, we only have to compute 0 guu , 0 gur , · · · , 0 gϕϕ .


Let us compute 0 guu | p , 0 gur | p , · · · , 0 gϕϕ | p for an arbitrary point p in R4 . A point p deter-
mines a point q on L, and we have defined an instantaneous rest inertial coordinate system
{X μ } ≡ {T, X, Y, Z } by means of q. Denoting the ψ μ in σ μ = ψ μ − ξ μ by X μ yields
X μ = σ μ + ξ μ . Then using (8.9.33) we obtain the coordinate transformation between the
two systems:
X μ = σ μ + ξ μ = r k μ (u, θ, ϕ) + ξ μ (u) , (8.9.54)
where k μ represents the components of k a in the system {X μ }. [Any quantity with indices
μ, ν, · · · or 0, 1, · · · represents the components of a certain tensor in the system {X μ } (not
{u, r, θ, ϕ})]. Since {X μ } is an inertial coordinate system, the components of ηab in {X μ }
are certainly ημν , and using the coordinate transformation (8.9.54) one can write down the
expressions for the components of ηab in the system {u, r, θ, ϕ}. First,

∂ Xμ ∂ Xν
0
guu = ημν = ημν (r k̇ μ + ξ̇ μ )(r k̇ ν + ξ̇ ν )
∂u ∂u
= r 2 ημν k̇ μ k̇ ν + 2r ημν k̇ μ ξ̇ ν + ημν ξ̇ μ ξ̇ ν ,
8.9 The Vaidya Metric and the Kinnersley Metric 393

where the dotted quantities stand for the (partial) derivatives with respect to u, e.g., ξ̇ 0 ≡
dξ 0 /du, k̇ 1 ≡ ∂k 1 /∂u. Since the parametric equations of the curve L(u) are X μ (u) = ξ μ (u),
ξ̇ μ ≡ dξ μ /du is equal to the components λμ of the tangent vector λa of L(u) in the system
{X μ }. From ημν λμ λν = −1 we obtain
0
guu = −1 + 2r ημν k̇ μ λν + r 2 ημν k̇ μ k̇ ν . (8.9.55a)
Second,
∂ Xμ ∂ Xν
0
gur = ημν = ημν (r k̇ μ + ξ̇ μ )k ν = r ημν k̇ μ k ν + ημν λμ k ν = −1 , (8.9.55b)
∂u ∂r

where ημν k̇ μ k ν = 0 can be derived from ημν k μ k ν = 0, while ημν λμ k ν = −1 comes from
λa k a = −1. In a similar manner one can find the expressions for the other components of
ηab in {u, r, θ, ϕ}:
0
guθ = r 2 ημν k̇ μ k ν ,θ + r ημν λμ k ν ,θ , (8.9.55c)
μ ν μ ν
0
guϕ = r ημν k̇ k
2
,ϕ + r ημν λ k ,ϕ , (8.9.55d)
μ ν
0
grr = ημν k k = 0 , (8.9.55e)
0
gr θ = r ημν k μ k ν ,θ = 0 , (8.9.55f)
0
gr ϕ = r ημν k μ k ν ,ϕ = 0 , (8.9.55g)
0
gθ θ = r 2 ημν k μ ,θ k ν ,θ , (8.9.55h)
0
gθ ϕ = r 2 ημν k μ ,θ k ν ,ϕ , (8.9.55i)
0
gϕϕ = r 2 ημν k μ ,ϕ k ν ,ϕ , (8.9.55j)
where the second equalities in (f) and (g) come from ημν k μ k ν = 0. In order to find the final
form of the above expressions, one must compute the partial derivatives of k μ with respect
to u, θ and ϕ, i.e., k̇ μ , k μ ,θ and k μ ,ϕ . To find k μ ,θ and k μ ,ϕ , one only needs to care about
the k μ on the future light cone surface with the fixed q being the apex. In this case (8.9.50)
holds, and we can again list the following expression (with a new equation number):

k μ = (1, sin θ cos ϕ, sin θ sin ϕ, cos θ) , (8.9.56a)

and hence

k μ ,θ = (0, cos θ cos ϕ, cos θ sin ϕ, − sin θ) , (8.9.56b)


k μ ,ϕ = (0, − sin θ sin ϕ, sin θ cos ϕ, 0) . (8.9.56c)
Thus,

ημν k μ ,θ k ν ,θ = 1 , ημν k μ ,θ k ν ,ϕ = 0 , ημν k μ ,ϕ k ν ,ϕ = sin2 θ . (8.9.56d)


Moreover, λμ = (1, 0, 0, 0) also leads to

ημν λμ k ν ,θ = ημν λμ k ν ,ϕ = 0 . (8.9.56e)


Now we have the most complicated step remaining, namely computing k̇ μ .
Suppose p and p̃ are two neighboring spacetime points, whose values of r, θ, ϕ are the
same, while the values of u are respectively u and u + du. Let q and q̃ represent the points
corresponding to p and p̃ on L(u), then k a | p points from q to p and k a | p̃ points from q̃ to
p̃. Denote k a ≡ k a | p , χ a ≡ k a | p̃ , then
394 8 Solving Einstein’s Equation

χ μ − kμ
k̇ μ | p = lim , (8.9.57)
du→0 du
where k μ and χ μ are respectively the components of k a and χ a in the instantaneous rest
inertial coordinate system {X μ } ≡ {T, X, Y, Z } at q. Now that k μ has already been expressed
as (8.9.56a), the main thing is how to derive χ μ . Let { X̃ μ } ≡ {T̃ , X̃ , Ỹ , Z̃ } represent the
instantaneous rest inertial coordinate system at q̃ [according to the definition in the paragraph
containing (8.9.38), one only needs to change q to q̃], then the components of χ a in the system
{ X̃ μ } are
χ̃ μ = (1, sin θ cos ϕ, sin θ sin ϕ, cos θ) . (8.9.58)
To derive χ̃ μ from χ μ , we should first clarify the relation between the systems { X̃ μ } and
{X μ }. According to our requirement, the Z -axis in {X μ } should be aligned with the direction
of λ̇a |q , and the Z̃ -axis in { X̃ μ } should be aligned with the direction of λ̇a |q̃ . Note that { X̃ μ }
and {X μ } are inertial coordinate systems in two different inertial reference frames Rq̃ and
Rq , since the T -coordinate line G q and the T̃ -coordinate line G q̃ [the η-geodesic tangent
to L(u)] are not parallel in general. However, since both of them are inertial coordinate
system, one can always transfer one to the other via an appropriate translation and Lorentz
transformation. This transformation can be realized by the following three steps: ① Transfer
the origin of {X μ } (namely the point with T = X = Y = Z = 0) from q to q̃ and obtain a
coordinate system {X μ }. ② Use a boost in the T Z -plane to transfer {X μ } to another system
{ X̂ μ } (where the T̂ -axis is parallel to the T̃ -axis). This is an inertial coordinate system in the
inertial reference frame Rq̃ just like { X̃ μ }, only the Ẑ -axis is in general not parallel to λ̇a |q̃ ,
which is the key difference between { X̂ μ } and { X̃ μ }. ③ Apply a spatial rotation R to { X̂ μ }
and turn it into { X̃ μ }, in which the Z̃ -axis is aligned with λ̇a |q̃ . This R can be considered as
two rotations R1 and R2 acting successively (a composite map). R1 is a rotation around the
X̂ -axis that turns the Ẑ -axis to a new position (denoted by Ẑ˜ , see Fig. 8.15), which is the
intersection of the Ŷ Ẑ -plane and the cone with the Ŷ -axis as the axis and Z̃ as a generatrix;
R is a rotation around the Ŷ -axis that turns the Ẑ˜ -axis to the Z̃ -axis. Suppose the angles for
2
R1 and R2 are bdu and cdu.15 These three steps can be expressed as

translation boost spatial rotation R


{T, X, Y, Z } −−−−−→ {T , X , Y , Z } −−−→ {T̂ , X̂ , Ŷ , Ẑ } −−−−−−−−−→ {T̃ , X̃ , Ỹ , Z̃ } .
(8.9.59)
The necessity of the spatial rotation R comes from our requirement that the polar axis
(Z -axis) of θ and ϕ is always aligned with the direction of λ̇a . If λ̇a |q̃ and λ̇a |q are parallel
(which means the direction of λ̇a does not change in time du), then since the boost in the
T Z -plane preserves the directions of X , Y and Z , we can assure that the Ẑ -axis is aligned
with the direction of λ̇a |q̃ without another spatial rotation, and thus b = c = 0. Conversely,
as long as λ̇a |q̃ and λ̇a |q are not parallel, then the Ẑ -axis will not be aligned with the direction
of λ̇a |q̃ , and hence we must rotate it by bdu and cdu so that the Z̃ -axis is in the direction of
λ̇a |q̃ . Thus, b and c indeed reflect the rate of change of the direction of the 4-acceleration
λ̇a .
Having the ideas above, one can compute χ μ from the expression (8.9.58) of χ̃ μ , and then
derive k̇ μ by plugging the result into (8.9.57). Since the spatial coordinate systems { X̂ , Ŷ , Ẑ }
and { X̃ , Ỹ , Z̃ } are related by a spatial rotation R, the components of χ a in these two systems,
χ̂ i and χ̃ i , can be expressed in terms of column matrices satisfying the following equation:

15 After these two rotations, the X -axis may still not be coincide with the X̃ -axis, but this is not
a problem since the choice of the X -axis if the instantaneous rest inertial frame at each point is
flexible. One should “foresee” this and choose the X̃ -axis based on the result of rotating the X -axis.
8.9 The Vaidya Metric and the Kinnersley Metric 395

Fig. 8.15 After rotating


around the X̂ -axis by bdu,
and then rotating around the
Ŷ -axis by cdu, the Ẑ -axis
will turn into the Z̃ -axis

⎡ ⎤ ⎡ 1⎤
χ̂ 1 χ̃
⎣ χ̂ ⎦ = R ⎣ χ̃ 2 ⎦ ,
2 (8.9.60)
χ̂ 3 χ̃ 3

where R = R2 R1 is the 3 × 3 matrix described by the rotating angles bdu and cdu. From
Fig. 8.15 and Appendix G in Volume II we have
⎡ ⎤⎡ ⎤
cos(cdu) 0 sin(cdu) 1 0 0
R = R2 R1 = ⎣ 0 1 0 ⎦ ⎣ 0 cos(bdu) − sin(bdu) ⎦
− sin(cdu) 0 cos(cdu) 0 sin(bdu) cos(bdu)
⎡ ⎤
cos(cdu) sin(bdu) sin(cdu) cos(bdu) sin(cdu)
=⎣ 0 cos(bdu) − sin(bdu) ⎦.
− sin(cdu) sin(bdu) cos(cdu) cos(bdu) cos(cdu)

Since du will approach zero at last, one can take cos(bdu) ∼


= cos(cdu) ∼
= 1, sin(bdu) ∼
= bdu,
sin(cdu) ∼
= cdu. Ignoring the 2nd-order small terms containing (du)2 , we obtain
⎡ ⎤
1 0 cdu
R=⎣ 0 1 −bdu ⎦ . (8.9.61)
−cdu bdu 1

Plugging the above equation and χ̃ i given by (8.9.58) into (8.9.60) yields
⎡ 1⎤ ⎡ ⎤⎡ ⎤
χ̂ 1 0 cdu sin θ cos ϕ
⎣ χ̂ ⎦ = ⎣ 0
2 1 −bdu ⎦ ⎣ sin θ sin ϕ ⎦
χ̂ 3 −cdu bdu 1 cos θ
⎡ ⎤
sin θ cos ϕ + cdu cos θ
=⎣ sin θ sin ϕ − bdu cos θ ⎦. (8.9.62a)
−cdu sin θ cos ϕ + bdu sin θ sin ϕ + cos θ

Since the spatial rotation does not affect the 0-component of a 4-vector, we have

χ̂ 0 = χ̃ 0 = 1 [(8.9.58) is used in the second equality], (8.9.62b)


and combining this with (8.9.62a) we obtain all χ̂ μ . However, we want χ μ in (8.9.57). It
follows from (8.9.59) that { X̂ μ } and {X μ } are related by a translation and a boost; since
the translation does not affect the components of a 4-vector, we only have to consider the
effect of the boost. Since { X̂ μ } is moving along the Z -axis relative to {X μ } with a speed
v ≡ adu and du → 0 assures that γ ≡ (1 − v 2 )−1/2 → 1, from the Lorentz transformation
with γ ∼= 1 we get

χ 0 = χ̂ 0 + (adu)χ̂ 3 , χ 1 = χ̂ 1 , χ 2 = χ̂ 2 , χ 3 = χ̂ 3 + (adu)χ̂ 0 . (8.9.63)


396 8 Solving Einstein’s Equation

Plugging the χ̂ μ of (8.9.62) into the equation above gives

χ 0 = 1 + (adu)(−cdu sin θ cos ϕ + bdu sin θ sin ϕ + cos θ) ∼


= 1 + adu cos θ ,
χ 1 = sin θ cos ϕ + cdu cos θ , χ 2 = sin θ sin ϕ − bdu cos θ ,
χ 3 = (−cdu sin θ cos ϕ + bdu sin θ sin ϕ + cos θ) − adu .
(8.9.64)
Plugging these equations and (8.9.56a) into (8.9.57) yields (written as a row matrix to save
space)

k̇ μ = (a cos θ, c cos θ, −b cos θ, −c sin θ cos ϕ + b sin θ sin ϕ + a) . (8.9.65)


Finally, plugging (8.9.56b), (8.9.56c) and the above equation into (8.9.55) we find all the
components of ηab in the system {u, r, θ, ϕ}, and then plugging the results into (8.9.53)
yields all the components of the Kinnersley metric gab in {u, r, θ, ϕ}. The result will be
(8.9.39), which essentially agrees with (13) and (14) of Kinnersley (1969) up to some sign
differences. The sign differences come from two reasons: ① The signature in this paper is
different from ours; ② the a and c we defined correspond to −a and −c in this paper.
[The End of Optional Reading 8.9.2]

8.10 Coordinate Conditions, the Gauge Freedom of


General Relativity

8.10.1 Coordinate Conditions

The vacuum Einstein equation


G ab = 0 (8.10.1)

is a tensor equation. To solve it, one can choose a suitable coordinate system and
write it as a system of component equations

G μν (x) = 0 , μ, ν = 0, 1, 2, 3 , (8.10.2)

where the x in G μν (x) indicates that each G μν is a function of 4 coordinates. Since

1
G μν (x) = Rμν (x) − R(x)gμν (x) ,
2
where Rμν (x) and R(x) can be expressed in terms of gμν (x) and its partial deriva-
tives, (8.10.2) can be viewed as a system of partial differential equations for the
unknown functions gμν (x). Also, since gμν = gνμ , gμν (x) only contains 10 inde-
pendent undetermined functions. On the other hand, due to the symmetry of μ and
ν, (8.10.2) also contains 10 algebraically independent partial differential equations.
Under suitable boundary conditions, it is reasonable that 10 independent equations
could determine 10 independent functions. However, things are not as simple as
8.10 Coordinate Conditions, the Gauge Freedom of General Relativity 397

that. The curvature tensor Rabc d satisfies the Bianchi identity ∇[a Rbc]d e = 0, from
which we have ∇a G a b = 0 [Equation (3.4.17)]. Written in terms of components, this
corresponds to 4 differential identities satisfied by the functions gμν (x):

G μ ν;μ = 0 , (8.10.3)

and thus there are only 10 − 4 = 6 independent functions. The 10 undetermined


functions gμν (x) only have to satisfy 6 independent differential equations; is not
that too much freedom? In fact it is indeed so. The key point is that (8.10.2) is the
component equation system of the tensor equation G ab = 0, and the undetermined
functions gμν (x) are the components of the metric tensor gab ; if the functions gμν (x)
form a solution to the equation system (8.10.2), then the tensor gab formed by gμν (x)
together with the coordinate basis satisfies the tensor equation G ab = 0, and so a new
set of functions gμν (x ) transferred from gμν (x) based on the tensor components
transformation law is also a solution to (8.10.2). In general, gμν (as a function of x)
and gμν (x ) (as a function of x ) have different functional forms, and hence gμν and
gμν (x ) are two different sets of solutions to (8.10.2). Thus, boundary conditions can
only determine the solution of (8.10.2) “up to a coordinate transformation”; that is, it
determines a unique spacetime geometry, but it cannot determine which coordinate
system should be used. (This is quite reasonable: the choice of the coordinate system
is arbitrary, so it would be strange if the coordinate system can be determined).
For instance, the Schwarzschild solution (8.3.18) can be specified the following 10
functions gμν (x):

g00 (r ) = −(1 − 2M/r ) , g11 (r ) = (1 − 2M/r )−1 , g22 (r ) = r 2 , g33 (r ) = r 2 sin2 θ ,


g01 = g02 = g03 = g12 = g13 = g23 = 0 . (8.10.4)

Define a new coordinate system {t , r , θ , ϕ } (isotropic coordinate system) as fol-


lows:
t =t r = r (1 + M/2r )2 , θ =θ ϕ=ϕ , (8.10.5)

then (8.3.18) becomes

ds 2 = − [(1 − M/2r )/(1 + M/2r )]2 dt 2


+ (1 + M/2r )4 [dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 )] , (8.10.6)

and the 10 functions gμν (x ) representing it are different from those in (8.10.4). For
example, the dependence of g00 (r ) = −[(1 − M/2r )/(1 + M/2r )]2 on its argu-
ment r is obviously different from the dependence of g00 (r ) on its argument r .
However, all Rμν derived from gμν (x ) also vanish, and thus (8.10.6) and (8.3.18) are
both (spherically symmetric) solutions to the vacuum Einstein equations G μν = 0
that satisfy the same boundary conditions, and therefore they represent the same
398 8 Solving Einstein’s Equation

geometry. This is an example of “boundary conditions cannot determine a unique


solution, but they can determine a unique geometry”.
Since a coordinate transformation involves 4 arbitrary functions (new coordinates
expressed using old coordinates), we can say that general covariance provides 4
“degrees of freedom” for the equation system (8.10.2). To remove this uncertainty,
one needs to assign a specific coordinate system, i.e., assign 4 additional equations for
the functions gμν (x); these 4 equations together are called a coordinate condition.
The 4 equations below are an example of a coordinate condition:

g00 = −1 , g0i = 0 (i = 1, 2, 3) . (8.10.7)

The coordinates satisfying this condition are called Gaussian normal coordinates (see
Optional Reading 8.10.1 for details). Another example of a coordinate condition is
requiring the coordinates x σ to satisfy the following 4 equations:

g ab ∇a ∇b x σ = 0 (σ = 0, 1, 2, 3) . (8.10.8)

Calculation shows that [see Weinberg (1972) pp. 161–163] the above equations are
equivalent to the following 4 equations:

g μν λ
μν =0 (λ = 0, 1, 2, 3) . (8.10.8 )

Equation (8.10.8) or (8.10.8 ) is called the harmonic coordinate condition, since


a function f satisfying g ab ∇a ∇b f = 0 is called a harmonic function. Equa-
tion (8.10.8 ) indicates more clearly that this coordinate condition is indeed additional
equations restricting the functions gμν (x).
A coordinate condition is obviously not a generally covariant equation, since its
mission is to pick a special coordinate system to remove the uncertainty of gμν (x)
coming from the general covariance of Einstein’s equation. A coordinate condition
should also satisfy the following requirement: starting from an arbitrary set of func-
tions gμν (x), one can always find gμν (x ) satisfying the coordinate condition by a
coordinate transformation.
[Optional Reading 8.10.1]
A Gaussian normal coordinate system is a coordinate system defined by means of
geodesics. Suppose  is an arbitrary spacelike hypersurface in a spacetime (M, gab ), n a
is a unit normal vector field on , and {x i } is an arbitrary 3-dimensional coordinate system
on an open region U ⊂ . Any point p in U together with its unit normal vector n a | p (nor-
mal to ) determines a unique geodesic γ (t) [stipulate t ( p) = 0], which is orthogonal to .
Although these geodesics emanating from U may intersect (see Fig. 8.16) or run into other
unideal situations, it can be proved that as long as we take an appropriate U , there must be
an open subset N ⊂ M containing U , in which for any point q there exists a unique p in U
such that q is on the geodesic γ (t) staring from p. Define x i |q ≡ x i | p and take the value
t (q) of the parameter of γ (t) at q as the zeroth coordinate, then the coordinate system {t, x i }
(with N as the coordinate patch) is called a Gaussian normal coordinate system. Now we
will show that Gaussian normal coordinates satisfy (8.10.7). Noticing that the geodesic γ (t)
is a t-coordinate line, whose tangent (∂/∂t)a is the zeroth coordinate basis vector, we have
8.10 Coordinate Conditions, the Gauge Freedom of General Relativity 399

Fig. 8.16 Constructing a


Gaussian normal coordinate
system from , whose
coordinate patch is N

g00 = gab (∂/∂t)a (∂/∂t)b .

Also (∂/∂t)a | p = n a | p , and hence g00 | p = (n a n a )| p = −1. Since the tangent vector (∂/∂t)a
is transported parallelly along γ (t), and parallel transport preserves the inner product, we
have

g00 |q = [(∂/∂t)a (∂/∂t)a ]q = [(∂/∂t)a (∂/∂t)a ] p = −1 , ∀q ∈ N . (8.10.9)


Let t be a constant-t hypersurface, then the x i -coordinatelines are lying on t since their
t are constant, and hence the three spatial coordinate basis vectors (∂/∂ x i )a are everywhere
tangent to t . Since g0i = gab (∂/∂t)a (∂/∂ x i )b , in order to prove g0i = 0 (i = 1, 2, 3) we
only have to show that γ (t) is orthogonal to any t . From the construction of γ (t) we can
see that this is surely correct for 0 ≡ , i.e., g0i | p = 0, and therefore we only have to show
that g0i = (∂/∂t)a (∂/∂ x i )a is a constant along γ (t). The following derivation indicates that
this is indeed true:
(∂/∂t)b ∇b [(∂/∂t)a (∂/∂ x i )a ] = (∂/∂t)a (∂/∂t)b ∇b (∂/∂ x i )a = (∂/∂t)a (∂/∂ x i )b ∇b (∂/∂t)a
1 1
= (∂/∂ x i )b ∇b [(∂/∂t)a (∂/∂t)a ] = (∂/∂ x i )b ∇b g00 = 0 ,
2 2
where the geodesic equation is used in the first equality, the commutativity of the coordinate
basis vectors is used in the second equality, the Leibniz rule is used in the third equality, and
(8.10.9) is used in the last step.
[The End of Optional Reading 8.10.1]

Now we discuss the Einstein equations with source G μν = 8π Tμν . Suppose the
matter field has N components, then usually it needs to satisfy N equations (such
as equations of motion). If the equations are independent (see Example 2 for the
non-independent case), then combining them with the Einstein equations we obtain
10 + N equations. It seems that they can determine 10 + N functions. However, the
10 gμν automatically satisfy G μ ν;μ = 0, and the equations of motion of the matter
field automatically lead to T μ ν;μ = 0, and hence G μ ν;μ − 8π T μ ν;μ automatically
vanishes. That is, we have the following differential identities
400 8 Solving Einstein’s Equation

G μ ν;μ − 8π T μ ν;μ = 0 , ν = 0, 1, 2, 3 , (8.10.10)

which will “dispose of” 4 equations. Together with 4 coordinate conditions, these
equations determine 10 + N unknown functions exactly.

Example 1 Suppose the matter field is a perfect fluid, whose components contain the
proper density ρ, pressure p and the 4-velocity components U μ , and hence N = 6.
The equations they satisfy are: (a) the equation of state f (ρ, p) = 0, where f is a
certain function [see above (9.3.20)], (b) the divergence-free condition ∇ a Tab = 0
for the energy-momentum tensor,16 (c) the normalization conditions gμν U μ U ν = −1
for the 4-velocity. In total there are 1 + 4 + 1 = 6 equations, which agrees with the
generic discussion above.

Example 2 Suppose the matter field is a source-free electromagnetic field. The 4-


potential only has to satisfy the equation of motion

∇ a ∇a Ab − ∇ a ∇b Aa = 0 , [a special case of (7.2.7)] (8.10.11)

and thus the numbers of the component equations and field components are both 4.
However, among the 4 equations above only 3 are independent, since Aa (or any
1-form) satisfies the following differential identity (which can be proved following
the proof of Exercise 7.1)

∇ b ∇ a (∇a Ab − ∇b Aa ) = 0 , (i.e., ∇ b ∇ a Fab = 0) (8.10.12)

which will “dispose of” one equation, and make (8.10.11) one equation short. This
is caused by the gauge freedom of Aa , and so after adding the Lorenz condition
∇ a Aa = 0 (choosing a gauge), we can apply the generic discussion above. Assigning
a gauge condition here is similar to assigning a coordinate condition for gμν . As a
matter of fact, the latter is also some kind of gauge choice, see the next subsection
for details.

Finally, we should point out that for partial differential equations, the claim “given
suitable boundary conditions, there is a unique solution as long as the number of
equations is equal to the number of the undetermined functions” is not as simple as
that for ordinary differential equations. There are many subtleties in this case. One
may view this subsection as a hand-waving discussion (for illustrating the necessity
of coordinate conditions), and should not regard it as a rigorous analysis.

16 From Sect. 6.5 we can see that the divergence-free condition ∂ a Tab = 0 for the energy-momentum
tensor of a perfect fluid in Minkowski spacetime contains the equations of motion of the fluid, namely
(6.5.7) and (6.5.8), which has in total 1 + 3 = 4 equations. For a curved spacetime, the condition
∇ a Tab = 0 also leads to 4 similar equations.
8.10 Coordinate Conditions, the Gauge Freedom of General Relativity 401

8.10.2 The Gauge Freedom of General Relativity

The above discussion can also be formulated using the geometric language, i.e.,
instead of talking about the component equations G μν = 8π Tμν , we can discuss
the tensor equation G ab = 8π Tab . Take the vacuum field equation G ab = 0 as an
example. One can prove the following claim (see later): suppose φ : M → M is a
diffeomorphism, Rab [g] is the Ricci tensor of the metric gab , then

φ∗ (Rab [g]) = Rab [φ∗ g] . (8.10.13)

From this we can easily get G ab [g] = 0 ⇔ G ab [φ∗ g] = 0. This indicates that gab is a
solution to G ab = 0 if and only if φ∗ gab is also a solution. Thus, the boundary condi-
tions can only determine a solution gab to Einstein’s equation up to a diffeomorphism.
This is actually the active formulation (see Optional Reading 4.1.1) equivalent to the
passive version above that “boundary conditions can only determine gμν up to a
coordinate transformation”. In the passive formulation, the components gμν and gμν
of the same metric field gab in different coordinate systems represent the same (local)
geometry; in the active formulation, suppose φ : M → M̃ is a diffeomorphism, then
gab and g̃ab ≡ φ∗ gab represent the same geometry. In order to avoid confusion, first
we consider two manifolds M and M̃. If there exists a diffeomorphism φ : M → M̃,
then M and M̃ “cannot be more alike”. Then, we consider two spacetimes (or more
generally, two generalized Riemannian spaces) (M, gab ) and ( M̃, g̃ab ). If there exists
a diffeomorphism φ : M → M̃ and φ∗ gab = g̃ab , then these two spacetimes “cannot
be more alike”, i.e., they have the same spacetime geometry, and every phenomenon
that can be described by (M, gab ) can be described equivalently by ( M̃, g̃ab ). For
instance, suppose there are two vectors u a and v b at a point p in M, then there are
two corresponding vectors φ∗ u a and φ∗ v b at the point φ( p) in M̃. In addition, the
inner product of φ∗ u a and φ∗ v b , g̃ab |φ( p) (φ∗ u)a (φ∗ v)b , equals the inner product of
u a and v b , gab | p u a v b , because

gab | p u a v b = (φ ∗ g̃)ab | p u a v b = g̃ab |φ( p) (φ∗ u)a (φ∗ v)b .

One can also show that the tensor product of φ∗ u a and φ∗ v b corresponds to the tensor
product of u a and v b , i.e., (φ∗ u a )(φ∗ v b ) = φ∗ (u a v b ), etc. In short, we have at φ( p)
whatever we have at p, and we can do at φ( p) whatever we can do at p and get the
same result (matched by φ∗ ). If we consider the metric at p as a stage, and consider
manipulating the quantities at p as putting on a play, one can say colloquially that
φ∗ “carries the stage” of (M, gab ) to ( M̃, g̃ab ) so that we can “perform a play in
a different town” (i.e., manipulate the pushforward of the quantities at a different
point).
This discussion can also be applied to the case where M = M̃. Suppose on M
we have a metric field gab and a diffeomorphism φ : M → M, then based on the
discussion that (M, gab ) and ( M̃, g̃ab ) “cannot be more alike”, we can see that
(M, gab ) and (M, φ∗ gab ) are equivalent geometrically. However, one should notice
402 8 Solving Einstein’s Equation

that now there are two metrics gab | p and φ∗ gab | p at a point p in M. Suppose u a
and v a are vectors at p, by (M, gab ) and (M, φ∗ gab ) are equivalent we do not mean
that gab | p u a v b = (φ∗ g)ab | p u a v b (this only holds when φ is an isometry), instead
we mean that gab | p u a v b = (φ∗ g)ab |φ( p) (φ∗ u)a (φ∗ v)b , i.e., we can “carry the whole
stage and perform the same play at φ( p)”. Here we give an application example.
Let Rabc d and R̃abc d represent the Riemann tensor fields of gab and g̃ab ≡ φ∗ gab ,
respectively. Given Rabc d | p we would like to find R̃abc d |φ( p) . Knowing that we can
“perform the same play in a different town”, all we have to do is to push forward
Rabc d | p to φ( p) using φ∗ . More precisely speaking, when calculating Rabc d | p we
have done the following manipulation: first find the ∇a associated with gab , then find
Rabc d | p from (∇a ∇b − ∇b ∇a )ωc = Rabc d ωd . This manipulation is just like “per-
forming a play”. In order to find R̃abc d |φ( p) , in principle we need to perform a
similar manipulation: first find the ∇˜ a associated with g̃ab , then find R̃abc d | p from
(∇˜ a ∇˜ b − ∇˜ b ∇˜ a )ωc = R̃abc d ωd . Nevertheless, it is in fact not necessary to do it all
over again like this, because it is natural to believe that as long as we push forward
the result of the manipulation on gab (and quantities derivable from it) at p to φ( p)
using φ∗ , it must be equal to the result of the manipulation on g̃ab (and quantities
derivable from it) at φ( p). That is, we can believe that

φ∗ (Rabc d | p ) = R̃abc d |φ( p) . (8.10.14)

For all quantities determined by gab (all geometric quantities), such as Rab , R, G ab ,
etc., we have similar relations, and thus (8.10.13) holds. If you want, the reader can
also verify (8.10.14) by computing it directly; hint: first verify that the ∇˜ a associated
with g̃ab satisfies

∇˜ a (φ∗ T ) = φ∗ (∇a T ) (where T is a tensor field of any type). (8.10.15)

In a word, in the sense of “performing the same play in a different town” we


can say that two metric fields gab and φ∗ gab on M (when φ is a diffeomorphism)
describe the same geometry, or say that gab and φ∗ gab are equivalent. Thus, metric
fields do not have a one-to-one correspondence with spacetime geometries; instead,
one kind of spacetime geometry corresponds to an equivalent class {gab }. This is sim-
ilar to the fact that in the theory of electromagnetism a gauge transformation of the
4-potential Aa does not change the electromagnetic field Fab . Therefore, the property
that “changing gab to φ∗ gab does not change the geometry” is called the gauge free-
dom of general relativity. As a physical theory, general relativity is endowed with
gauge freedom, which is an important feature of this theory (just like the electro-
magnetic theory formulated by the 4-potential is endowed with gauge freedom). This
gauge freedom has great significance for further studying relativity. For instance, it
will play an important role in Chaps. 14 and 16. The concepts of “gauge transforma-
tion” and “gauge invariance” came originally from the theory of electromagnetism,
and have become extremely important in theoretical physics. Roughly speaking,
any transformation that does not change the essence of the physics can be called
8.10 Coordinate Conditions, the Gauge Freedom of General Relativity 403

a gauge transformation, and the corresponding invariance (freedom) is then called


the gauge invariance (freedom). For convenience of computation, when discussing
a specific problem one can choose a certain gauge, which is called “gauge fixing”.
As for general relativity, choosing a coordinate system is nothing but fixing a gauge.
Besides the transformation of the electromagnetic 4-potential, there are mainly two
other kinds of gauge transformation in this text so far: ① the gauge transformation
in the theory of linearized gravity [Equation (7.8.14)], ② the gauge transformation
in general relativity. (In the active language this is a diffeomorphism φ : M → M,
and in the passive language this is a coordinate transformation). Actually, ① is just
an infinitesimal version of ②, and the reason is as follows. The gauge transformation
in the theory of linearized gravity is given by

γab → γ̃ab = γab + ∂a ξb + ∂b ξa

(where ξ a is an “infinitesimal” vector field). The difference between the metric before
and after the transformation, namely gab = ηab + γab and g̃ab = η̃ab + γ̃ab , is

g̃ab − gab = ∂a ξb + ∂b ξa . (8.10.16)

Introduce a vector λa and a real number t to express ξ a as ξ a = tλa (where t is


a small quantity of the same order as ξ a , i.e., a first-order small quantity), then it
follows from the formula of the Lie derivative that Lλ ηab = ∂a λb + ∂b λa . Since

Lλ ηab = Lλ (gab − γab ) ∼


= Lλ gab

(the second term is ignored as it is a second-order small term), we have

φ ∗ gab − gab
∂a λb + ∂b λa ∼
= Lλ gab ∼
= t . (8.10.17)
t

Comparing this with (8.10.16) yields φt∗ gab − gab ∼ = g̃ab − gab , and hence g̃ab ∼ =

φt gab − gab . Thus, the original metric gab and the new metric g̃ab after the transfor-
mation only differ by a diffeomorphism under the first-order approximation.
Of course, it is not just the spacetime geometry that we care about, but also
physics. Here is a general conclusion: suppose a physical theory is described by a
manifold M and some tensor fields T (i) living on it (for instance, for an electrovac
spacetime, T (i) includes at least gab and Fab ), then (M, T (i) ) and (M, T̃ (i) ) describe
the same physics if and only if there exists a diffeomorphism φ : M → M such that
T̃ (i) = φ∗ T (i) .
404 8 Solving Einstein’s Equation

Exercises

˜8.1. Prove Proposition 8.1.1.


˜8.2. Suppose γ (r ) is a curve from p1 to p2 on t in Fig. 8.7 where θ and ϕ are
both constants (with the radial coordinate r as the parameter of the curve).
Show that γ (r ) is a (non-affinely parametrized) geodesic. Hint: use (5.7.2).
˜8.3. Suppose ξ is a timelike Killing vector field in a stationary spacetime, and
a

χ ≡ −gab ξ a ξ b .
(a) Show that χ is a constant along an integral curve of ξ a ;
(b) Show that the 4-acceleration Aa = ∇ a (ln χ ). Hint: use the Killing equation
∇ (a ξ b) = 0 and the result of (a).
˜8.4. Show that: (a) the trace of the energy-momentum tensor of an electromagnetic
field is zero, i.e., T ≡ g ab Tab = 0; (b) the scalar curvature of an electrovac
spacetime is R = 0.
˜8.5. Prove (8.4.7) and (8.4.28).
8.6. Suppose Fab is a 2-form field in an arbitrary spacetime, ∗ Fab is the dual
2-form field of Fab , and α ∈ [0, 2π ] is a constant real number, then Fab ≡
Fab cos α − ∗ Fab sin α is called a duality rotation of Fab with the angle α.
(a) Show that Fab is a source-free electromagnetic field if and only if Fab is a
source-free electromagnetic field. [The proof is straightforward. One can see
this directly from the exterior differential expressions (7.2.4 ) and (7.2.5 ) of
Maxwell’s equations].
(b) Show that the electromagnetic fields Fab and Fab have the same energy-
momentum tensor. Hint: the proof can be simplified by using the symmetric
expression (6.6.28 ).
(c) Let M ≡ 2Fab F ab , N ≡ 2Fab ∗ F ab , M ≡ 2Fab F ab , N ≡ 2Fab ∗ F ab .
Show that

M = M cos 2α − N sin 2α , N = M sin 2α + N cos 2α .

(d) Let ab ≡ Fab + i∗ Fab , and  ab ≡ Fab + i∗ Fab , then K ≡ ab  ab and
K ≡ ab  ab are complex scalar fields, and hence the K and K at each
spacetime point correspond to two vectors in the complex plane. Using the
result of (c) show that the vector K is the result of rotating the vector K
counterclockwise by an angle 2α (i.e., |K | = |K |, and the arguments of K
and K differ by 2α).
 B)
(e) Suppose ( E,  and ( E , B ) are the electric and magnetic fields of Fab
and Fab measured by an instantaneous observer, respectively. Show that

E = E cos α + B sin α , B = − E sin α + B cos α . (8.10.18)

NB: For further interpretations of the physical meaning of the dual rotation,
see Volume II and Jackson (1998).
References 405

8.7. An n-dimensional spacetime is called an Einstein spacetime if Rab =


Rgab /n, where gab , Rab and R are the metric, Ricci tensor and scalar cur-
vature, respectively. Show that an electrovac spacetime (where the electro-
magnetic field is not vanishing) is not an Einstein spacetime. NB: It follows
from Exercise 3.17 that any 2-dimensional spacetime must be an Einstein
spacetime.
8.8. Consider Taub’s plane symmetric vacuum solution (8.6.1 ).
(a) Write down the expression for the 4-velocity of a static observer in terms of
the coordinate basis vectors; (b) Suppose the spatial coordinates of two static
observers are (x, y, z 1 ) and (x, y, z 2 ), respectively. Find the spatial distance
between them.
8.9. Show that the Fab in (8.6.5) has plane symmetry, i.e., Lξi Fab = 0 (i = 1, 2, 3),
where ξ1a ≡ (∂/∂ x)a , ξ2a ≡ (∂/∂ y)a , ξ3a ≡ −y(∂/∂ x)a + x(∂/∂ y)a are the
Killing fields reflecting the plane symmetry of the metric (8.6.3).
*8.10. Derive the expressions for the Maxwell equations with source in the NP for-
malism. Answer: For each of the equations in (8.8.3), one needs to add a term
to the right-hand side. In sequence they are −4π J4 , −4π J2 , −4π J1 , −4π J3
(where J1 , J2 , J3 , J4 are the components of Ja in the null tetrad).
*8.11. Prove (8.8.7) and (8.8.10).

References

Bonnor, W. B. (1994), ‘The photon rocket’, Class. Quant. Grav. 11, 2007–2012.
Carmeli, M. (1982), Classical Fields General Relativity and Gauge Theory, John Wiley & Sons,
New York.
Damour, T. (1995), ‘Photon rockets and gravitational radiation’, Class. Quant. Grav. 12, 725–738.
arXiv:gr-qc/9412063.
Dain, S., Moreschi, O. M. and Gleiser, R. J. (1996), ‘Photon rockets and the Robinson-Trautman
geometries’, Class. Quant. Grav. 13, 1155–1160. arXiv:gr-qc/0203064.
Hawking, S. W. and Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
Jackson, J. D. (1998), Classical Electrodynamics, John Wiley & Sons, Inc., New York.
Kinnersley, W. (1969), ‘Field of an arbitrarily accelerating point mass’, Phys. Rev. 186, 1335–1336.
Kuang, Z. and Liang, C. (1988), ‘Birkhoff and Taub theorems generalized to metrics with conformal
symmetries’, J. Math. Phys. 29, 2475–2478.
Kuang, Z., Li, J. and Liang, C. (1986), ‘Gauge freedom of plane-symmetric line elements with
semi-plane-symmetric null electromagnetic fields’, Phys. Rev. D 34, 2241–2245.
Kuang, Z., Li, J. and Liang, C. (1987), ‘Completion of plane-symmetric metrics yielded by elec-
tromagnetic fields’, Gen. Rela. Grav. 19, 345–350.
Letelier, P. S. and Tabenski, R. R. (1974), ‘The general solution to Einstein-Maxwell equations with
plane symmetry’, J. Math. Phys. 15, 594.
Li, J. and Liang, C. (1985), ‘An extension of the plane-symmetric electrovac general solution to
Einstein equations’, Gen. Rela. Grav. 17, 1001–1013.
Li, J. and Liang, C. (1989), ‘Static semi-plane-symmetric metrics yielded by plane-symmetric
electromagnetic fields’, J. Math. Phys. 30, 2915–2917.
406 8 Solving Einstein’s Equation

Liang, C. (1995), ‘A family of cylindrically symmetric solutions to Einstein-Maxwell equations’,


Gen. Rela. Grav. 27, 669–677.
Newman, E. and Penrose, R. (1962), ‘An approach to gravitational radiation by a method of spin
coefficients’, J. Math. Phys. 3, 566.
Patnaik, S. (1970), ‘Einstein-Maxwell fields with plane symmetry’, Proc. Camb. Phil. Soc. 67, 127.
Stephani, H. (1982), General Relativity: An Introduction to the Theory of Gravitational Field,
Cambridge University Press, Cambridge.
Stephani, H., Kramer, D., MacCallum, M. A. H., Hoenselaers, C. and Herlt, E. (2003), Exact
Solutions of Einstein’s Field Equations, Cambridge University Press, Cambridge.
Tariq, N. and Tupper, B. O. J. (1976), ‘Einstein-Maxwell metrics admitting a dual interpretation’,
J. Math. Phys. 17, 292–296.
Taub, H. (1951), ‘Empty space-times admitting a three parameter group of motions’, Ann. Math.
53, 472.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Weinberg, S. (1972), Gravitation and Cosmology: Principles and Applications of the General
Theory of Relativity, John Wiley and Sons, New York.
Chapter 9
Schwarzschild Spacetimes

In the first three sections of Chap. 8 we had a discussion on static spherically sym-
metric metrics and the vacuum Schwarzschild solution, which focused mainly on
finding the solution. In view of the essentialness of the Schwarzschild solution, this
chapter will further discuss several intimately related problems: Sect. 9.1 discusses
the timelike and null geodesics in Schwarzschild spacetime; Sect. 9.2 introduces
three experimental tests of general relativity posed by Einstein using the vacuum
Schwarzschild solution in his early years, namely the gravitational redshift, the pre-
cession of the perihelion of Mercury and the bending of starlight in the Sun’s gravita-
tional field; Sect. 9.3 discusses the spacetime geometric structure and physical states
in the interior of a spherically symmetric star, as well as the evolution of a spherically
symmetric star; Sect. 9.4 analyzes the theory of the extension of the Schwarzschild
spacetime in detail.

9.1 Geodesics in Schwarzschild Spacetimes

Let γ (τ ) be a timelike (or null) geodesic. For a timelike geodesic, τ represents the
proper time; for a null geodesic, τ represents a chosen affine parameter. In order to
find the parametric equations x μ (τ ) of γ (τ ), generally we need to solve the following
differential equations:

d2 x μ dx ν dx σ
2
+  μ νσ = 0, μ = 0, 1, 2, 3 . (9.1.1)
dτ dτ dτ
Since in these equations the unknown functions x μ (τ ) and their derivatives are cou-
pled with each other, solving for them is in general not simple. However, if the
spacetime has a sufficient amount of Killing vector fields, one can find x μ (τ ) in a
clever way using Theorem 4.3.3. Schwarzschild spacetime is an example of this.
© Science Press 2023 407
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_9
408 9 Schwarzschild Spacetimes

Fig. 9.1 Figures for the proof of Proposition 9.1.1

Before applying this theorem, we can also simplify the coordinate representation of
the geodesic γ (τ ) using the spherical symmetry of Schwarzschild spacetime.
Proposition 9.1.1 Suppose γ (τ ) is a timelike or null geodesic in Schwarzschild
spacetime, then one can always choose the Schwarzschild coordinates such that
θ = π/2 along γ (τ ), in other words, such that γ (τ ) lies in the “equatorial plane”.

Proof ∀ p ∈ γ (τ ), the orbit sphere S passing through p (see Definition 1 in Sect.


8.2) must lie in the constant-t surface t passing through p (Fig. 9.1a). The choice
of the coordinates θ and ϕ of the Schwarzschild system is quite arbitrary. Noticing
that the Schwarzschild line element (8.3.18) being invariant under the transformation
θ → π − θ assures that the northern and southern hemisphere are symmetric with
respect to the equator, if one can choose θ and ϕ such that the value of θ at a point
p ∈ γ (τ ) is π/2 and the θ -component of the 4-velocity U a ≡ (∂/∂τ )a vanishes at
p, then we will see that the whole γ (τ ) has θ (τ ) = π/2. Thus, we only have to show
that this kind of choice of θ and ϕ is indeed available. Since the geodesic is not
always orthogonal to the constant-t surface (otherwise it will become the world line
of a static observer, which is not a geodesic), one can always find a point p whose
U a has a projection β a = 0 on p . The orbit surface passing through p is S ⊂ p .
If β a has a component v a tangent to S , then ( p, v a ) determines a unique geodesic
on S , i.e., a great circle (see Fig. 9.1b), and we can define the coordinates θ and
ϕ on S with this great circle as the equator; if β a does not have a component v a
tangent to S [i.e., γ (τ ) is a radial geodesic], then we can choose any great circle
passing through p as the equator. Using the “carry method” in Sect. 8.3 to carry the
θ and ϕ coordinates away from S , we obtain the Schwarzschild coordinate system
{t, r, θ, ϕ} that satisfies our requirements, namely ① θ ( p) = π/2; ② the components
dθ/dτ | p of (∂/∂τ )a | p in the coordinate basis (∂/∂θ )a | p vanishes. 
The conclusion that a point p ∈ γ (τ ) satisfying ① and ② assures that θ = π/2 along
the whole curve can also be proved analytically as follows: from the expression
(8.3.20) for the  σ μν of the Schwarzschild line element we can see that for μ = 2,
(9.1.1) is
9.1 Geodesics in Schwarzschild Spacetimes 409
 2
d2 θ 2 dr dθ dϕ
2
+ − sin θ cos θ = 0. (9.1.2)
dτ r dτ dτ dτ

Since the geodesic γ (τ ) has been given beforehand, the functions t (τ ), r (τ ), θ (τ ),


ϕ(τ ) are all determined once a coordinate system is chosen. In order to show that
θ = π/2 for the entire curve, we only have to notice that (9.1.2) is a 2nd-order
ordinary differential equation, and θ (τ ) = π/2 is the unique solution satisfying the
initial conditions θ ( p) = π/2 and dθ/dτ | p = 0.
According to Proposition 9.1.1, one can always choose the Schwarzschild coor-
dinates so that the parametric equations for the above-mentioned geodesic γ (τ ) are

t = t (τ ) , r = r (τ ) , θ = π/2 , ϕ = ϕ(τ ) .

Suppose U a ≡ (∂/∂τ )a is the tangent of γ (τ ), and define κ := −gab U a U b , then



1, (for timelike geodesics)
κ= ,
0, (for null geodesics)

and
 a  b  2  2  2  2
∂ ∂ dt dr dθ dϕ
−κ = gab = g00 + g11 + g22 + g33
∂τ ∂τ dτ dτ dτ dτ
   2      2
2M dt 2M −1 dr 2 dϕ
=− 1− + 1− + r2 , (9.1.3)
r dτ r dτ dτ

where in the last step we used θ = π/2. Noticing that (∂/∂t)a and (∂/∂ϕ)a are
Killing vector fields, by means of Theorem 4.3.3, we can define two constants on the
geodesic γ (τ ):
     
∂ a ∂ b dt 2M dt
E := −gab = −g00 = 1− , (9.1.4)
∂t ∂τ dτ r dτ
 a  b
∂ ∂ dϕ dϕ
L := gab = g33 = r2 , (9.1.5)
∂ϕ ∂τ dτ dτ

Plugging (9.1.4) and (9.1.5) into (9.1.3) yields


 −1    
2M 2M −1 dr 2 L 2
−κ =− 1− E + 1−
2
+ 2 . (9.1.6)
r r dτ r

This equation, which contains only the unknown function r (τ ) and its 1st-order
derivative, is solvable in principle. Plugging the r (τ ) we just obtained into (9.1.4)
and (9.1.5), in principle we can find the unknown functions t (τ ) and ϕ(τ ), and hence
find the parametric equations of γ (τ ).
410 9 Schwarzschild Spacetimes

Fig. 9.2 Local measurement


on a freely falling particle
γ (τ ) made by a static
observer G

We now discuss the physical meaning of these two constants E and L. Suppose
γ (τ ) is a timelike geodesic, then it represents the world line of a free point mass.
Let m be the mass of the point mass, then U a ≡ (∂/∂τ )a and P a ≡ mU a are its
4-velocity and 4-momentum, respectively. Suppose p is a point on γ (τ ), G is the
static observer passing through p, Z a is the 4-velocity of G at p (see Fig. 9.2), and
ξ a = (∂/∂t)a is the static Killing vector field, then it follows from Z a Z a = −1 that

Z a = χ −1 ξ a , (9.1.7)

where χ ≡ (−ξ b ξb )1/2 . From (6.3.17) we can see that −Z a P a is the energy value
obtained from the local measurement made by observer G on the point mass, which
was denoted by E; to avoid confusion, now we will denote it as E local . The E defined
by (9.1.4) can be rewritten as

1 χ χ
E = −ξa U a = − ξa P a = − Z a P a = E local , (9.1.8)
m m m

and thus E = E local . If the geodesic γ (τ ) reaches infinity, then E → E local /m when
r → ∞, and hence E can be interpreted as the energy per unit mass obtained from
the local measurement made on the point mass by a static observer at infinity. Since
E is a constant on γ (τ ), E local is not a constant on it, i.e., it is E that is conserved
in the motion of a free point mass instead of E local . Therefore, E can be interpreted
physically as the total energy (including gravitational potential energy) per unit mass
of a free point mass. In contrast, E local is the energy obtained from the local measure-
ment made by a static observer G, which does not include the gravitational potential
energy, and is not a conserved quantity along a geodesic. This can be interpreted
physically as follows: although a free point mass does not experience any force other
than gravity, the gravitational force does work on it when it is moving. Hence, as an
energy excluding the gravitational potential energy, E local is not a constant. Similarly,
if γ (τ ) is a null geodesic, then E can be interpreted as the total energy of the photon
times −1 .
9.1 Geodesics in Schwarzschild Spacetimes 411

[Optional Reading 9.1.1]


For a timelike geodesic γ (τ ) that does not reach infinity (e.g., the Earth rotating around the
Sun), E is certainly still a constant; however, one cannot find a direct connection between
it and an observer at infinity anymore. Although some literature still refers to this as the
energy measured by observers at infinity, in this text we prefer the following perspective:
the clearest meaning of the word “measure” in “a quantity measured by an observer” is a
local measurement, which requires that the observer and the point mass (world lines) to be
intersecting. When the world line of the point mass does not reach infinity, to add a modifier
like “measured by an observer at infinity” we should specify a plan of indirect measurement
(e.g., by means of the light emitted to infinity). However, in many cases it is difficult to find a
practical plan to make an indirect measurement; the E of a timelike geodesics that does not
reach infinity may be an example of this. We prefer to refer to E as the energy of the point
mass whose world line is this geodesic without any additional modifier [see Wald (1984)].
It has the dimension of energy, and can even be interpreted physically as the sum of E local
and the gravitational potential energy, and thus totally deserves the designation “energy”.
However, this is not the energy measured by any observer, and a modifier like “measured by
observer at infinity” seems to be not necessary at all.
[The End of Optional Reading 9.1.1]

Now we come to the physical interpretation of the constant L. Suppose p is an


arbitrary point on a timelike geodesic γ (τ ), and Z a is the 4-velocity of a static
observer at p. Normalizing the coordinate basis vectors at p yields an orthonormal
tetrad of the tangent space V p at p:

(e0 )a ≡ (1 − 2M/r )−1/2 (∂/∂t)a = Z a , (e1 )a ≡ (1 − 2M/r )1/2 (∂/∂r )a ,


(e2 )a ≡ r −1 (∂/∂θ )a , (e3 )a ≡ r −1 (∂/∂ϕ)a ,

whose dual frame is

(e0 )a = (1 − 2M/r )1/2 (dt)a , (e1 )a = (1 − 2M/r )−1/2 (dr )a ,


(e2 )a = r (dθ )a , (e3 )a = r (dϕ)a .

Let W p be the 3-dimensional subspace orthogonal to Z a in V p , then {(e1 )a , (e2 )a ,


(e3 )a } and {(e1 )a , (e2 )a , (e3 )a } in the equations above are an orthonormal triad and
its dual triad of W p , respectively. The following discussion in the 3-dimensional
language is relative to a static reference frame. Suppose U a is the 4-velocity of a free
point mass γ (τ ) at p, u a ∈ W p is its 3-velocity, then its 3-momentum is pa ≡ γ mu a ,
where m is the mass of the point mass and γ ≡ −U a Z a . Following the definition
of the angular momentum j in Euclidean space, i.e., j := r × p, we can define the
angular momentum for a free point mass represented by γ (τ ) as j a := εa bc γ mr b u c ,
where r b ≡ r (e1 )b . Now let us show that the |L| defined by (9.1.5) is the magnitude
of the angular momentum per unit mass of a free point mass represented by γ (τ ).
Note that r b is in the radial direction, and thus the radial component of u c does not
contribute to r a . Therefore,
412 9 Schwarzschild Spacetimes

ja = γ mεabc r b u 3 (e3 )c = γ mε213r u 3 (e2 )a = −γ mr u 3 (e2 )a ,


| j| = |γ mr u 3 | = |γ mr u a (e3 )a | = |γ mr 2 u a (dϕ)a | = |mr 2 U a (dϕ)a |
= |mr 2 (∂/∂τ )a (dϕ)a | = m|r 2 dϕ/dτ | = m|L| , (9.1.9)

where in the fourth equality we used the decomposition U a = γ (Z a + u a ) and


Z a (e3 )a = 0, and in the last step we used (9.1.5). Thus, (the absolute value of)
L is the magnitude of the 3-momentum of a free point mass per unit mass relative to
a static reference frame, or the angular momentum per unit mass for short. Similarly,
if γ is a null geodesic, then L is the angular momentum of a photon times −1 .

9.2 Classical Experimental Tests of General Relativity

Basically, Einstein’s original motivation for creating general relativity was purely
theoretical. However, any physical theory must face the challenge of experimental
verification after it comes out. We have seen in Sect. 7.9 that the direct detection of
gravitational waves provided a strong confirmation of Einstein’s theory one century
after it came out. Nevertheless, during the formulation of general relativity, Ein-
stein already made three important predictions early on by means of the vacuum
Schwarzschild solution which could be compared with experiments (later on dubbed
the three classical experimental tests). The earliest one (in 1907) is the gravitational
redshift of light waves, and the other two are the precession of the perihelion of Mer-
cury and the bending of starlight in the gravitational field of the Sun. The result of
the perihelion precession calculation already agreed with the existing observational
data, and the prediction of light deflection was also supported by observation very
soon. However, due to the lack of experimental techniques for measuring extremely
weak general relativity effects (including the gravitational redshift) with sufficient
precision, the development in experimental researches of general relativity had been
slow-going, or even almost stopped, for 45 years since the late 1910s. Since the
1960s, with the advancement of technology and the new discoveries of astronom-
ical observations, the experimental verification of general relativity has entered its
heyday; there appeared not only verifications of light deflection and gravitational
redshift with a higher precision, but also a series of brand new experiments. It is
safe to say that general relativity has passed all experimental tests so far, although
many experiments with higher precision and difficulty are yet to be conducted. In
this section, we will only discuss the three classical experimental tests proposed by
Einstein. For the past, present, and future of the experimental tests of the relativistic
theory of gravity, the reader may refer to Ni (2005; 2016).
9.2 Classical Experimental Tests of General Relativity 413

Fig. 9.3 The derivation of


the gravitational redshift in a
stationary spacetime. G and
G are stationary observers

9.2.1 Gravitational Redshift

In this subsection we first discuss the gravitational redshift in a stationary spacetime,


then, as an example, provide the explicit expression for the redshift of Schwarzschild
spacetime. Under the geometric optics approximation, a light signal can be consid-
ered as propagating along a null geodesic (See the end of Sect. 7.2), and the angular
frequency of a photon with a wave 4-vector K a relative to an observer with a 4-
velocity Z a is [see (7.2.11)] ω = −K a Z a .
Consider a stationary spacetime. Suppose G and G are two observers in an
arbitrary stationary reference frame. The photon emitted at p by G reaches G at p
(see Fig. 9.3). Let Z a be the 4-velocity of an observer, and K a be the wave 4-vector
of the photon, then the angular frequency of a photon at p and p relative to the
stationary observers at these points are, respectively,

ω = −(K a Z a )| p , ω = −(K a Z a )| p . (9.2.1)

The world lines of the stationary observers coincide with the integral curves of
the Killing vector field ξ a , and hence ξ a = χ Z a , where χ can be obtained from
Z a Z a = −1 to be χ ≡ (−ξ b ξb )1/2 . Thus, (9.2.1) becomes ω = [(−K a ξ a )χ −1 ]| p
and ω = [(−K a ξ a )χ −1 ]| p . Since the world lines of a photon is a geodesic whose
tangent vector is K a , and ξ a is the Killing vector field, from Theorem 4.3.3 we can
see that K a ξ a is a constant on the curve, i.e., (K a ξ a )| p = (K a ξ a )| p . Thus, it follows
from (9.2.1) that1
ω χ λ χ
= or = , (9.2.2)
ω χ λ χ

where λ and λ are the wave lengths corresponding to ω and ω , respectively, and
χ ≡ (−ξ b ξb )1/2 | p . Now we will give the quantitative result for a static observer in

1 There can be more than one null geodesic between two points p and p in a stationary spacetime
[See Sachs and Wu (1977) Exercise 7.3.2]. Equation (9.2.2) indicates that the redshift only depends
on the points p and p and has nothing to do with the null geodesics.
414 9 Schwarzschild Spacetimes

Schwarzschild spacetime as a specific example. Suppose {t, r, θ, ϕ} is the


Schwarzschild coordinate system, then the timelike Killing vector field representing
the staticity is ξ a = (∂/∂t)a , and hence
 a  b
∂ ∂ 2M
χ 2 = −ξ b ξb = −gab = −g00 = 1 − .
∂t ∂t r

Plugging this into (9.2.2) yields

λ /λ = (1 − 2M/r )1/2 (1 − 2M/r )−1/2 . (9.2.3)

When r > r (i.e., light source is closer to the star than the receiver), we have λ > λ,
and thus the wave length of the light received by the receiver is longer than that
when it was emitted, which is called a redshift. Since there is no relative motion
between two stationary observers G and G , the redshift can be interpreted as purely
an effect of the gravitational field (curved spacetime). Hence, this effect is called
a gravitational redshift, and χ ≡ (−ξ b ξb )1/2 is called the (gravitational) redshift
factor.
The magnitude of a redshift can be described by the relative redshift parameter (or
simply redshift) z ≡ (λ − λ)/λ. Calculation indicates that when the light emitted
from the Sun arrives at the Earth (regard the Sun as a source of the gravitational
field), the relative redshift is only about 2 × 10−6 . In order to enhance the redshift,
one can measure the light coming from a white dwarf. A white dwarf is a celestial
body which has a much higher density than a normal star (see Sect. 9.3.2 for details).
Due to its high density, its surrounding gravitational field is way stronger than that
of the Sun. The redshift of the light coming from a white dwarf can be dozens of
times larger than the redshift of the light from the Sun. After general relativity was
published, people have measured the redshift of the light from white dwarves a few
times, but the results were not sufficiently precise to confirm the prediction of the
theory. The first successful gravitational redshift experiment with high precision was
done by R. V. Pound and G. A. Rebka Jr. using the Mössbauer effect in 1960. In
1960, R. L. Mössbauer discovered that some nuclei (e.g., 57 Fe) can emit γ-rays with
very narrow (very sharp) linewidth under certain conditions, and crystals containing
this kind of nucleus can have resonance absorption of γ-rays at this frequency with
very high selectivity. Assuming that the frequency of this kind of γ-ray is changed
slightly for some reason, the absorption by the crystal will be significantly reduced.
This provides a powerful tool for measuring the extremely weak gravitational red-
shift caused by the Earth’s gravitational field. Place two pieces of such a crystal
at different heights on the surface of the Earth, the lower one (E in Fig. 9.4) as
the emitter, and the higher one (see A in the figure) as the receiver. Although the
redshift calculated based on the height difference between them (12.5 m) is merely
1.36 × 10−15 , the absorption rate of the γ-ray emitted by A to E still decreases due to
the weak gravitational redshift of the γ-rays. To confirm and measure this decrease,
one can let A move towards E at a constant speed, and use the “blueshift” (wave-
9.2 Classical Experimental Tests of General Relativity 415

Fig. 9.4 Measuring the


gravitational redshift near
the ground using the
Mössbauer effect

length decreases) due to the Doppler effect to offset the gravitational redshift. When
the rate is adjusted to an appropriate value (only 3 × 10−7 m/s), the absorption rate
will reach the maximum value. Then the value of the gravitational redshift can be
measured. The precision of this experiment is very high (the relative uncertainty is
about 1%), and the results obtained agree well with the theoretical values. Since then
there were also experimental tests with higher precisions being done [the reader may
refer to Will (2018)].

9.2.2 Perihelion Precession of Mercury

According to Newtonian mechanics, the orbit of a planet is an ellipse with the sun
as a focus. However, the observational results are slightly divergent from this. Take
Mercury, which is the closest to the Sun, as an example. Although in each period
its orbit is very close to an ellipse, the major axes of two “ellipses” in two adjacent
periods do not coincide, which is indicated by the slight change of its perihelion. As
time goes on, due to the effect of accumulation, the slow rotation of the long axis
of the “ellipse” (and thus the perihelion) around the sun becomes observable. This
phenomenon is called the precession of the perihelion. Before the advent of general
relativity, the precession rate of Mercury’s perihelion had already been measured
as about 5600 per century ( stands for arcseconds). People have studied this in
depth and discovered many possible causes (including the influence from the other
planets). It was found that the precession rate caused by all these factors is 5557 per
century, and there is still 43 per century that cannot be explained. This is the famous
“43-second problem”. Based on general relativity, Einstein took Mercury as a free
point mass in a curved spacetime caused by the Sun. His approximate calculation of
a timelike geodesic in Schwarzschild spacetime naturally leads to the conclusion that
the orbit of Mercury is not a closed curve, and the precession rate of its perihelion is
exactly 43 per century. This result has greatly strengthened people’s confidence in
general relativity. Now we will introduce the derivation of the perihelion precession
in general relativity.
Suppose there are only the Sun and Mercury in the solar system and the gravita-
tional field of Mercury can be neglected, i.e., we only discuss the motion of Mercury
416 9 Schwarzschild Spacetimes

under the action of the Sun’s gravitational field (external gravitational field). First we
discuss this using Newton’s theory of gravity. Let the masses of the Sun and Mercury
be M and m, respectively, then the gravitational potential energy of Mercury is

U (r ) = −Mm/r (this text uses the system of geometrized units, where G = 1) .


(9.2.4)
Take the spherical coordinate system such that the orbit of Mercury is on the equa-
torial plane (θ = π/2, which is always possible, see Sect. 9.1), then the velocity
of Mercury has only a radial component u r = dr/dt and  a tangential component
u ϕ = r dϕ/dt, and hence the kinetic energy is m u r2 + u 2ϕ /2. From the law of con-
servation of mechanical energy we have

1  2 
m u r + u 2ϕ + U (r ) = A , (9.2.5)
2
where the constant A is the total mechanical energy of Mercury. Suppose the angular
momentum of Mercury per unit mass is |L|, then


L = r uϕ = r 2 . (9.2.6)
dt
From (9.2.4), (9.2.5) and (9.2.6) we can find by calculation that
 2
dr 2Mr 3 2 Ar 4
+ r2 = + . (9.2.7)
dϕ L2 m L2

Let μ ≡ r −1 , then μ = 0, and hence the above formula becomes


 2
dμ 2A 2M
+ μ2 = + 2 μ. (9.2.8)
dϕ m L2 L
 
d2 μ
Taking the derivative with respect to ϕ yields dμ
dϕ dϕ 2
+μ− M
L2
= 0. Thus, either


= 0 (round orbit), or
d2 μ M
+μ= 2 . (9.2.9)
dϕ 2 L

The solution to the above equation is

M
μ(ϕ) = [1 + e cos(ϕ − ϕ0 )] , (9.2.10)
L2
where e and ϕ0 are constants of integration. Without loss of generality, take ϕ0 = 0,
then
M
μ(ϕ) = 2 (1 + e cos ϕ) . (9.2.11)
L
9.2 Classical Experimental Tests of General Relativity 417

This is the equation of a conic section, with e as the eccentricity. Plugging (9.2.11)
and its derivative back to (9.2.8) yields

2 AL 2
e2 = 1 + . (9.2.12)
m M2
When 0  e < 1 this is an ellipse, and dμ/dϕ = 0 (round orbit) has been included
in this as the special case of e = 0.
However, general relativity provides a slightly different result. Let κ = 1 (timelike
geodesic). Dividing (9.1.6) by (dϕ/dτ )2 , and using (9.1.5) we can find by calculation
that  2   
dr E 2r 4 r2 2M
− +r 1+ 2
2
1− = 0. (9.2.13)
dϕ L2 L r

Again let μ ≡ r −1 , the above equation turns into


 2
dμ 1  2  2M
+ μ2 = 2
E − 1 + 2 μ + 2Mμ3 . (9.2.14)
dϕ L L

Taking the derivative with respect to ϕ yields

d2 μ M
2
+ μ = 2 + 3Mμ2 . (9.2.15)
dϕ L

Comparing this with (9.2.9) we find an additional term 3Mμ2 (general relativity
correction term). Since the r of Mercury is way larger than the M of the Sun,2 i.e.,
M/r 1, the correction term 3Mμ2 = (3M/r )μ μ, and thus one can manage
to find an approximate solution. The solution (9.2.11) in Newton’s theory of gravity
can be viewed as the zeroth order approximation, denoted by μ0 (ϕ) for clarity, i.e.,

M
μ0 (ϕ) = (1 + e cos ϕ) . (9.2.16)
L2
Plugging this zeroth-order approximate solution into the second term on the right-
hand side of (9.2.15), we obtain an equation that the first-order approximate solution
μ1 (φ) should satisfy, i.e.,

d2 μ1 M M 3M 3
2
+ μ1 = 2 + 3Mμ20 = 2 + 4 (1 + 2e cos ϕ + e2 cos2 ϕ) . (9.2.17)
dϕ L L L

It is not difficult to verify that its solution is

2 When doing the quantitative calculation, it is better to go back to the International System of Units
(SI), i.e., to fill in the physical constants G and c. From Appendix A one can see that M/r is actually
(G M/c2 )/r . The mass of the Sun M corresponds to G M/c2 ∼ = 1.5 km, while the distance between
the perihelion of Mercury and the Sun is about 5 × 107 km, and hence (G M/c2 )/r 1.
418 9 Schwarzschild Spacetimes
 
3M 3 1 1
μ1 (ϕ) = μ0 (ϕ) + 4 1 + eϕ sin ϕ + e2 − cos 2ϕ . (9.2.18)
L 2 6

What we care about is the perihelion. For μ0 (ϕ), the values of ϕ of the perihelion
are 0, 2π, · · · . Although there are many differences between the expressions of μ1 (ϕ)
and μ0 (ϕ), the values of ϕ of the perihelion will not change if the term eϕ sin ϕ is
missing. Only this term can deviate Mercury from a closed orbit, which leads to
the precession of the perihelion, and the precession angle increases as the value of
ϕ increases (the effect will accumulate). Therefore, when we only care about the
perihelion precession, we can neglect the other terms inside the square brackets in
(9.2.18) except eϕ sin ϕ and write it as [where μ0 (ϕ) has been substituted by (9.2.16)]

M 3M 2
μ1 (ϕ) = 2
1 + e(cos ϕ + 2 ϕ sin ϕ) . (9.2.19)
L L

Since M/L 2 ∼ μ [see (9.2.16)], we have M 2 /L 2 ∼ Mμ = M/r 1. Let

3M 2
ε≡ , (9.2.20)
L2

then cos ϕ ∼
= 1, sin ϕ ∼
= ϕ, and thus it follows from (9.2.19) that

1 ∼ M
= μ1 (ϕ) ∼
= 2 [1 + e cos (ϕ − εϕ)] . (9.2.21)
r (ϕ) L

This indicates that the orbit of Mercury is approximately an ellipse. Although the
right-hand side of (9.2.21) is still a periodic function, the period is not 2π as
in (9.2.16). The perihelion is the point with the smallest r , i.e., the point where
cos (ϕ − εϕ) = 1. ϕ = 0 is certainly a perihelion; however, when ϕ = 2π ,

cos(ϕ − εϕ) = cos(2π − 2π ε) = 1 .

Suppose ϕ̂ is the value of ϕ satisfying cos(ϕ̂ − εϕ̂) = 1 that is the closest to 2π , then
it is not difficult to show that (neglecting the higher order term 2π ε2 )

ϕ̂ ∼
= 2π + 2π ε . (9.2.22)

Thus, the precession angle of the perihelion of Mercury in each period is (see Fig. 9.5)

6π M 2
ϕP ∼
= 2π ε = . (9.2.23)
L2
The discussion above is valid for any planet. Plugging in the specific data one can
obtain that the precession rate of the perihelion of Mercury is 43 per century.
9.2 Classical Experimental Tests of General Relativity 419

Fig. 9.5 The precession


angle ϕP of the perihelion
(aphelion) of Mercury in
each period (with obvious
exaggeration)

9.2.3 Light Deflection

When a light ray from a distant star that hits the ground after passing by the Sun, it will
be bent due to the effect of the Sun’s gravitational field. This is an important prediction
of general relativity. In this section we introduce the derivation of this prediction. In
the 4-dimensional language, the world line of a photon is a null geodesic. Let κ = 0
in (9.1.6), then using a method similar to the derivation of (9.2.13), it is not difficult
to derive that  2  
dr E 2r 4 2M
− +r 1−
2
= 0. (9.2.24)
dϕ L2 r

Again let μ ≡ r −1 , then the equation above becomes


 2
dμ E2
+ μ2 = + 2Mμ3 . (9.2.25)
dϕ L2

Taking the derivative with respect to ϕ yields

d2 μ
+ μ = 3Mμ2 . (9.2.26)
dϕ 2

When M = 0 (flat spacetime), the general solution to (9.2.26) is

1
μ(ϕ) = sin(ϕ + α) , (9.2.27)
l
where l and α are constants of integration. Suppose the photon is at infinity when
ϕ = 0, i.e., μ(0) = 1/r (0) = 0, then α = 0, and hence

1
μ(ϕ) = sin ϕ . (9.2.28)
l
This is a straight line equation in 2-dimensional Euclidean space expressed in a polar
coordinate system {r, φ}. To see this, we take r = 0 as the origin of the Cartesian
coordinate system {x, y}, then
420 9 Schwarzschild Spacetimes

x = r cos ϕ , (9.2.29)
1
y = r sin ϕ = sin ϕ = l = constant , (9.2.30)
μ

where in the third equality we used (9.2.28). Thus, the spatial trajectory of a photon
is a straight line whose distance from the origin is l (see Fig. 9.6). Note that both r
and ϕ are changing along this straight line (y is a constant). Since the range of r is
(0, ∞), (9.2.29) indicates that the range of x is (−∞, ∞). To discuss the deflection
of starlight, obviously we cannot take M = 0. However, since M/r 1, finding the
first-order approximate solution is sufficient for us. Taking the μ(ϕ) in (9.2.28) as
the zeroth-order approximate solution μ0 (ϕ) and plugging it into the right-hand side
of (9.2.26), we obtain the differential equation satisfied by μ1 (ϕ):

d2 μ1 3M
2
+ μ1 (ϕ) = 2 sin2 ϕ . (9.2.31)
dϕ l

It is not difficult to verify that the solution to (9.2.31) is as follows:

1 M
μ1 (ϕ) = sin ϕ + 2 (1 − cos ϕ)2 . (9.2.32)
l l

From the above equation we know that μ1 (0) = 0, i.e., r (0) = ∞, which indicates
that when the ϕ-coordinate of a photon is zero, it is infinitely far from the Sun (the
r of a distant star can be regarded as ∞). However, (9.2.32) and (9.2.28) being
different indicates that the photon is “heading” towards different directions when
it is coming close to and leaving the Sun: it follows from (9.2.29) that μ1 (π ) = 0,
which indicates that the ϕ-coordinate is π when the photon is going away from the
“Sun” (this “Sun” has M = 0); however, it follows from (9.2.32) that μ1 (π ) = 0,
and so we expect it to leave the Sun in a direction π + β which is slightly different
from π , i.e., μ1 (π + β) = 0. To find the deflection angle β (see 9.7), by plugging
ϕ = π + β into (9.2.32) and using μ1 (π + β) = 0 we obtain

1 M
0 = μ1 (π + β) = sin(π + β) + 2 [1 − cos(π + β)]2 .
l l

The fact that β is small leads to sin (π + β) ∼


= −β, cos (π + β) ∼
= −1. Plugging
them into the above equation yields

4M
β∼
= . (9.2.33)
l
The above equation indicates that the deflection angle β increases as l decreases.
The minimum value of l is equal to the radius of the Sun. Plugging this into (9.2.33)
as the value of l [after adding the physical constants G and c, (9.2.33) becomes
β∼= 4G M/lc2 ], we find β = 1.75 . This is the quantitative prediction of general
9.2 Classical Experimental Tests of General Relativity 421

Fig. 9.6 The spatial


trajectory of a photon in flat
spacetime

relativity for the deflection angle of starlight. In order to verify this prediction by
observation, we can try to photograph the apparent position of the star when the
starlight is deflected by the Sun, and compare it with the actual position of the star
photographed six months later (or ago) when the Earth turned to the other side of
the Sun. However, it is not easy to observe the apparent position of a star, since the
Sun is much closer to the Earth than the star we want to observe, and the starlight
cannot be seen at all among the Sun’s sunlight. (You cannot “watch the stars in the
daytime”!) Then people came up with the idea of using a total solar eclipse. During
a total solar eclipse, the sunlight is blocked by the Moon between the Sun and the
Earth, but the light from a distant star can “bypass” the Sun and reach the Earth. Soon
after World War I, two expedition teams set off from the United Kingdom to Brazil
and Africa to observe the total solar eclipse on March 29th, 1919. The observation
results of the two teams were, respectively, 1.13 ± 0.07 times and 0.92 ± 0.17 times
of the theoretical prediction, which is considered as an important support of the
theory. The announcement in many European and American newspapers attracted
the attention of the war-weary public, and also made Einstein prestigious. However,
Einstein responded quite calmly to this. He believed in his own theory so much
(based on its elegance and internal self-consistency) that he once replied “Then I
would feel sorry for the dear Lord.” to the question what if the observation outcomes
had not agreed with the theory. Note that Newton’s theory of gravity can also predict
the bending of starlight by the Sun, but the deflection angle is half of the predicted
value of general relativity. The results of the 1919 UK teams indeed favored Einstein
over Newton, but they were not of high precision. Although there were a few more
observations for total solar eclipses in the next several decades, which continued to
give mild support to general relativity, the improvement in accuracy was very little
due to various reasons, especially the weather. In modern times, with technological
advances, people can test the bending of distant quasar light by Jupiter and also the
bending of radio waves, and the measurements are now quite a bit more precise.3
Therefore, we do have now pretty high precision results for light deflection which
provide a strong support for general relativity [for details see Will (2018)].

3 For example, an analysis based on the very-long-baseline interferometry (VLBI) database gives
an result which is 0.99983 ± 0.00045 times the predicted value of general relativity, where the
standard error is reduced to 4.5 × 10−4 , see Shapiro et al. (2004).
422 9 Schwarzschild Spacetimes

Fig. 9.7 The Sun warps the


spacetime, and the spatial
trajectory of the starlight is
deflected, where the
deflection angle is obviously
exaggerated

9.3 Spherical Stars and Their Evolution

9.3.1 Interior Solutions for Static Spherical Stars

In this subsection we discuss the interior spacetime metric and internal states of a
static spherically symmetric star. The matter field inside a star can be regarded rather
precisely as a perfect fluid, whose energy-momentum tensor is

Tab = (ρ + p)Ua Ub + pgab . (9.3.1)

The star being static means that every comoving observer inside the star can be
regarded as a static observer, whose 4-velocity U a is parallel to the static Killing
vector field ξ a = (∂/∂t)a . Again, choose the Schwarzschild coordinate system, the
line element can then still be expressed by (8.3.2). From U a Ua = −1 and ξ a ξa =
g00 = −e2 A , we get

U a = e−A (∂/∂t)a and Ua = −e A (dt)a , (9.3.2)

and hence it follows from (9.3.1) that the nonvanishing components of Tab are as
follows:

T00 = Tab (∂/∂t)a (∂/∂t)b = ρe2 A , T11 = Tab (∂/∂r )a (∂/∂r )b = pe2B ,
T22 = Tab (∂/∂θ )a (∂/∂θ )b = pr 2 , T33 = Tab (∂/∂ϕ)a (∂/∂ϕ)b = pr 2 sin2 θ .

The Einstein equations Rμν − Rgμν /2 = 8π Tμν can be rewritten as R μ ν − Rδ μ ν /2


= 8π T μ ν . Noticing that g μν has only diagonal components, we obtain

T 0 0 = g 00 T00 = −ρ , T 1 1 = g 11 T11 = p , T 2 2 = g 22 T22 = p , T 3 3 = g 33 T33 = p .


(9.3.3)
On the other hand, from the nonzero Rμν in (8.3.5)–(8.3.8), we can see that

R = 2e−2B [−A + A B − A 2 + 2r −1 (B − A ) − r −2 ] + 2r −2 ,

and further find that the nonzero R μ ν − Rδ μ ν /2 are as follows:


9.3 Spherical Stars and Their Evolution 423

1 0
R00 − Rδ = −e−2B (2B r −1 − r −2 ) − r −2 ,
2 0
1
R11 − Rδ11 = e−2B (2 A r −1 + r −2 ) − r −2 ,
2
1
R22 − Rδ22 = e−2B [A − A B + A 2 + (A − B )r −1 ] ,
2
1
R33 − Rδ33 = e−2B [A − A B + A 2 + (A − B )r −1 ] .
2
Plugging these together with (9.3.3) into R μ ν − Rδ μ ν /2 = 8π T μ ν , we see that there
are only 3 independent equations as follows:

−8πρ = −e−2B (2B r −1 − r −2 ) − r −2 , (9.3.4)


−2B −1 −2 −2
8π p = e (2 A r +r )−r , (9.3.5)
−2B −1
8π p = e [A − A B + A + (A − B )r 2
]. (9.3.6)

Equation (9.3.4) can be rewritten as

d
8πρr 2 = 2r e−2B B − e−2B + 1 = 1 − (r e−2B ) ,
dr
and hence by integration we get

r e−2B(r ) = r − 2m(r ) + C , (9.3.7)

where C is the constant of integration, and the function m(r ) is defined as


r
m(r ) := 4π ρ(x)x 2 dx . (9.3.8)
0

If C = 0, then from (9.3.7) and (9.3.8) we can see that e−2B → ∞ when r → 0;
however, e−2B = g 11 , while it is unreasonable to have g 11 = ∞ in the center (r = 0)
of the star, and hence C = 0. Thus, it follows from (9.3.7) that
−1
2m(r )
g11 (r ) = e2B(r ) = 1 − . (9.3.9)
r

Suppose the radius of the star is R, then when r > R the metric should be the vacuum
Schwarzschild solution [see (8.3.18)]. The interior metric and exterior metric should
be continuous at the surface (r = R) of the star. Plugging r = R into (9.3.9) yields
−1
2m(R)
g11 (R) = 1 − .
R
424 9 Schwarzschild Spacetimes

On the other hand, it follows from the vacuum Schwarzschild solution that
 
2M −1
g11 (R) = 1 − .
R

Comparing these two equations and using (9.3.8) we obtain


R
M = m(R) = 4π ρ(r )r 2 dr . (9.3.10)
0

[Optional Reading 9.3.1]


It seems that (9.3.10) is the same as the relation between the mass M and density ρ(r ) of
a star in Newtonian mechanics. However, things are not as simple as that. Since the space
t at a time t of a static reference frame has a non-Euclidean geometry h ab (similar to the
discussion in the end of Sect. 8.3.2), whose 3-dimensional proper volume element (i.e., the
volume element associated with h ab ) is

√ −1/2
2m(r )
ε= hdr ∧ dθ ∧ dϕ = 1 − r 2 sin θdr ∧ dθ ∧ dϕ ,
r

Therefore, when calculating the integral one cannot use r 2 sin θdr ∧ dθ ∧ dϕ as volume
element as in the 3-dimensional Euclidean space. However, the M in (9.3.10) is the result of
integrating ρ(r ) with r 2 sin θdr ∧ dθ ∧ dϕ as the volume element. From the mathematical
perspective, the integral (9.3.10) in the 3-dimensional non-Euclidean space ( t , h ab ) with
r 2 sin θdr ∧ dθ ∧ dϕ as the volume element is somewhat strange, but this does not mean
that the M in (9.3.10) is a weird quantity. In fact, as the only parameter of the Schwarzschild
solution, the physical meaning of M is crystal clear: it is the total mass (total energy) of
Schwarzschild spacetime, which includes the gravitational potential energy (see Chap. 12
for details). However, ρ(r ) is the energy density obtained from the local measurement made
by a static observer inside the star, which contains the static energy density of each particle
(mainly the nucleus) and internal energy (heat, pressure, etc.) density, except the gravitational
potential energy. This is similar to the discussion about the difference between E and E local
in Sect. 9.1: the result of a local measurement made by an observer does not contain the
energy contribution from the gravitational field. Therefore, the M including gravitational
potential energy is surely not equal to the integral ρ(r )ε, since the latter does not include
the contributions from the gravitational field. Note particularly that
−1/2
2m(r )
ρ(r )ε = ρ(r ) 1 − r 2 sin θdr ∧ dθ ∧ dϕ
r
−1/2
R 2m(r )
= 4π ρ(r ) 1 − r 2 dr
0 r
!!! R
= 4π ρ(r )r 2 dr = M .
0

The fact that ρ(r ) does not contain the contribution from the gravitational field is closely
related to another fact, namely the gravitational field energy is non-local. To put it in a simple
way, the so called non-locality of the gravitational field energy means that the energy density
of the gravitational field is meaningless: there does not exist such a quantity, which can be
reasonably interpreted as the energy density of the gravitational field (NB: compare with the
fact that the energy density of an electromagnetic field has a clear meaning and an explicit
9.3 Spherical Stars and Their Evolution 425

expression), see Chap. 12 for details. However, the non-locality of the gravitational field
energy does not indicate that the gravitational field itself has no energy. An important result
people found after a long and tortuous path of study is: for an asymptotically flat spacetime
(physically corresponding to an isolated gravitational system), one can always define the
notion of total energy, which contains all the energy contributions including that of the
gravitational field. Applying this definition to Schwarzschild spacetime with a parameter
M, one finds that M is exactly the total energy of this spacetime (as an asymptotically flat
spacetime).
[The End of Optional Reading 9.3.1]

Plugging (9.3.9) into (9.3.5) yields

dA m(r ) + 4π pr 3
= . (9.3.11)
dr r [r − 2m(r )]

Under the Newtonian approximation, ① the 3-dimensional space of a static ref-


erence frame can be approximately regarded as Euclidean space, with 1 ∼
= g11 =
[1 − 2m(r )/r ]−1 , and hence m(r ) r; ② p ρ leads to pr 3 ρr 3 ∼ m(r ), and
thus (9.3.11) can be approximated as

dA ∼ m(r )
= 2 . (9.3.12)
dr r
Since the Newtonian gravitational potential φ with spherical symmetry satisfies

dφ m(r )
= 2 , (9.3.13)
dr r
we can see that A is, in a sense, the quantity corresponding to the Newtonian gravita-
tional potential in a static spherically symmetric curved spacetime. Equation (9.3.13)
is actually a manifestation of the Poisson equation ∇ 2 φ = 4πρ in Newton’s theory
of gravity in the spherically symmetric case. When we have spherical symmetry,
∇ 2 φ = 4πρ becomes  
1 d 2 dφ
r = 4πρ .
r 2 dr dr

Integrating this we obtain


r

r2 = 4π ρ(x)x 2 dx = m(r ) ,
dr 0

which is exactly (9.3.13).


Until now, among the 3 undetermined equations, the only equation we have not
dealt with is (9.3.6). By plugging (9.3.9) and (9.3.11) into (9.3.6), this equation
is in principle solvable, but the calculation will be quite complicated. By dint of
the following fact (for proof see Optional Reading 9.3.2) the calculation can be
simplified: under the premise that (9.3.4) and (9.3.5) are satisfied, (9.3.6) is equivalent
426 9 Schwarzschild Spacetimes

to
(∂/∂r )b ∇ a Tab = 0 . (9.3.6 )

It follows from (9.3.1) that


 
∇ a Tab = Ua Ub ∇ a (ρ + p) + (ρ + p) U a ∇a Ub + Ub ∇a U a + ∇b p .

Noticing that (∂/∂r )b is orthogonal to U a , we can see that (9.3.6 ) is equivalent to

0 = (∂/∂r )b ∇ a Tab = (ρ + p)(∂/∂r )b Ua ∇ a Ub + (∂/∂r )b ∇b p , (9.3.14)

Also,

(∂/∂r )b ∇b p = d p/dr , (9.3.15)


−A
(∂/∂r ) Ua ∇ Ub = −Ub U ∇a (∂/∂r ) = e (dt)b e
b a a b A
(∂/∂t) ∇a (∂/∂r )b
a

= (dt)b  σ 10 (∂/∂ x σ )b =  0 10 = dA/dr ,

where (5.7.2) (the equivalent definition of Christoffel symbols) is used in the third
equality, and (8.3.4) is used in the fifth equality. Plugging the above equation and
(9.3.15) into (9.3.14) yields

dp dA
= −( p + ρ) . (9.3.16)
dr dr
Then, using (9.3.11) we obtian

dp m(r ) + 4π pr 3
= −( p + ρ) . (9.3.17)
dr r [r − 2m(r )]

This is the famous Oppenheimer-Volkoff (OV) equation of hydrostatic equilib-


rium, whose Newtonian approximation is

d p ∼ ρm(r )
=− 2 , (9.3.18)
dr r

where we used p ρ, m(r ) r and pr 3 ρr 3 ∼ m(r ). The equation above is


the well-known equation of hydrostatic equilibrium in Newtonian mechanics, which
can be easily derived by means of Fig. 9.8. Due to the spherical symmetry, the
gravitational force (self-gravity) from the star acting on any volume element in a
thin spherical shell points to the center of the sphere. On the other hand, dV also
experiences an outward force coming from the pressure gradient d p/dr < 0, and
the star is in hydrostatic equilibrium when this force is equal to the self-gravity [i.e.,
satisfies (9.3.18)].
9.3 Spherical Stars and Their Evolution 427

Fig. 9.8 The pressure


gradient keeps the force on
the volume element dV and
the self-gravity in balance,
from which (9.3.18) can be
derived

In summary, the interior spacetime metric in a static spherically symmetric star is


−1
2m(r )  
ds 2 = −e2 A(r ) dt 2 + 1 − dr 2 + r 2 dθ 2 + sin2 θ dϕ 2 , (9.3.19)
r

where the function m(r ) is defined by (9.3.8), and the function A(r ) needs to satisfy
(9.3.11). A necessary and sufficient condition of hydrostatic equilibrium is (9.3.17).
The internal state inside a spherical star is determined by 4 functions A(r ), m(r ),
p(r ) and ρ(r ), and there are only three equations they have to satisfy, namely (9.3.8),
(9.3.11) and (9.3.17). In order to determine the internal state of a star, one also has
to assign a fourth equation, called the equation of state. To put it in a simple way, an
equation of state is a relation of the energy density ρ and pressure p represented by
f (ρ, p) = 0 (where f is a certain specific function)4 . After an equation of state is
determined, there are only 3 undetermined functions A(r ), m(r ) and p(r ) remaining,
which have to satisfy the differential equations (9.3.11), (9.3.17), and

dm(r )
= 4πρ(r )r 2 (9.3.20)
dr
coming from (9.3.8). These 3 equations are all first-order differential equations,
which can be solved exactly once the initial conditions A(0), m(0) and p(0) are
given. It follows from (9.3.8) that m(0) ≡ 0, and thus m(0) does not need to be (and
cannot be arbitrarily) assigned. After A(0) (with adjustment later) and p0 ≡ p(0)
are assigned, we can integrate the above-mentioned 3 differential equations from
r = 0 to p = 0. [As long as the equation of state satisfies the following reasonable
requirements: for all p  0, we have ρ  0, then the OV equation (9.3.17) assures
automatically that the pressure decreases monotonically outwards.] The place where
p = 0 is the surface of the star, whose corresponding value of r is the radius R of the

4 Generally speaking, the pressure p is not only a function of the density ρ, but also depends on
the specific entropy (i.e., the average entropy per nucleus) and the chemical components of the star.
Only when the specific entropy and chemical components are the same everywhere inside the star
can p be solely a function of ρ, and the equation of state be expressed as f ( p, ρ) = 0. The specific
entropy of a normal star (including the Sun) is not everywhere the same. However, the specific
entropy inside a white dwarf or neutron star, which will be discussed later, can be considered as
vanishing everywhere. The discussion in the main text is valid for the study of these “abnormal
celestial bodies”.
428 9 Schwarzschild Spacetimes

star, and m(R) is the total mass (energy) M of the star (including the gravitational
potential energy!). After having R we need to come back and modify the value of
A(0) (by adding a constant) in order to have it satisfy the condition at the surface of
the star connecting the vacuum solution outside the sphere, i.e.,

2M
e2 A(R) = 1 − . (9.3.21)
R
Therefore, by assigning a value of p0 one can determine a set of functions A(r ),
m(r ) and p(r ), and the internal state and metric is then completely determined. For
an equation of state in the real world, the exact solutions of equations like (9.3.17)
are hard to be find, and thus a numerical method is used. However, for an idealized
equation of state, we can perform the integral analytically. The simplest and most
useful idealization is the following equation of state: ρ = constant. This is actually a
very special equation of state whose energy density ρ is independent of the pressure.
Although this is not a perfect model of a star, it can still be regarded as a first-order
approximation of a small star whose pressure is not that high. Then (9.3.8) becomes

4πρr 3
m(r ) = . (9.3.22)
3
The equation above holds for both general relativity and Newton’s theory of gravity.
For Newton’s theory of gravity, (9.3.18) can be simplified as ddrp = − 4π 3
ρ 2 r when
ρ is a constant. After the initial value p0 is assigned, the unique solution is p(r ) =
− 23 πρ 2 r 2 + p0 , and the radius R of the star can be determined by p(R) = 0:

2
0 = p(R) = − πρ 2 R 2 + p0 .
3
Thus, p0 can then be expressed in terms of R:

2 2 2
p0 = πρ R , (9.3.23)
3
and hence p(r ) can also be expressed in terms of R as

2 2 2
p(r ) = πρ (R − r 2 ) . (9.3.24)
3
When Newton’s theory of gravity is not a good approximation, one needs to solve
the OV equation (9.3.17). The solution found by Schwarzschild in 1916 is (and thus
the metric inside a star with uniform density is called the interior Schwarzschild
solution)
(1 − 2M/R)1/2 − (1 − 2Mr 2 /R 3 )1/2
p(r ) = ρ , (9.3.25)
(1 − 2Mr 2 /R 3 )1/2 − 3(1 − 2M/R)1/2

and the corresponding central pressure is


9.3 Spherical Stars and Their Evolution 429

1 − (1 − 2M/R)1/2
p0 = p(0) = ρ . (9.3.26)
3(1 − 2M/R)1/2 − 1

It is not difficult to show that (9.3.26) will approximately go back to (9.3.23) in


Newton’s theory of gravity when R  M (Exercise 9.4).
Let Y ≡ (1 − 2M/R)1/2 , then (9.3.26) becomes

ρ(1 − Y )
p0 = , (9.3.27)
3Y − 1

and it follows from d p0 /dY < 0 that p0 increases as M/R increases. This is easy to
understand since if M is larger, the self-gravity will be stronger, and so the pressure
gradient for balancing the self-gravity will be greater, and the central pressure p0
will be greater when R is fixed. In contrast, if M is fixed and R is smaller, then for
the purpose of creating the pressure gradient we need, p0 has to be higher as well.
When M/R is large enough such that Y = 1/3, we have p0 → ∞, which indicates
that equilibrium cannot be maintained no matter how large the central pressure is.
Thus, the M/R of a static star with a uniform density has an upper limit, and from
Y = 1/3 we can see that this upper limit is

(M/R)max = 4/9 . (9.3.28)

Of course, the M/R of a normal star is way smaller than this upper bound. To make
a numerical evaluation, we should add the constant G/c2 to M, i.e., substitute M by
G M/c2 . Take the Sun as an example, G M /c2 ∼ = 1.5 km, R ∼ = 7 × 105 km, and
hence
G M /c2 ∼ 4
= 2 × 10−6 .
R 9

It follows from (9.3.22) that M = 4πρ R 3 /3, and eliminating R by using (9.3.28)
yields
4 1
Mmax = √ . (9.3.29)
9 3πρ

This is the maximum allowable mass of a star with uniform density ρ (note that there
is no maximum allowable mass in Newton’s theory of gravity). The existence of the
upper mass limit in general relativity is not a result specifically for a star with uniform
density. It can be proved that as long as one assumes ρ(r )  0 and dρ/dr  0, the
mass of any spherically symmetric static star with any radius R cannot exceed 4R/9.
We mention in passing that, as we have emphasized in Sect. 8.10, when solving
Einstein’s equations, one should solve for the functions reflecting the matter field and
the components of the metric simultaneously. In Example 1 of Sect. 8.10 we have
pointed out that when the matter field is a perfect fluid, there are 16 undetermined
functions gμν (x), ρ(x), p(x), U μ (x) and 16 equations to be solved. The discussion
in this section provides a specific example of that.
430 9 Schwarzschild Spacetimes

[Optional Reading 9.3.2]


Let Hab ≡ 8π Tab − G ab , then Hab only has diagonal components in the coordinate sys-
tem {t, r, θ, ϕ}. The fact that (9.3.4) and (9.3.5) holds is equivalent to H00 = H11 = 0. Now
we will show that (9.3.6) is equivalent to (9.3.6 ) under the premise of (9.3.4) and (9.3.5).
Noticing that ∇ a G ab = 0, we have

8π(∂/∂r )b ∇ a Tab = (∂/∂r )b ∇ a Hab = ∇ a [(∂/∂r )b Hab ] − Hab ∇ a (∂/∂r )b . (9.3.30)


Since (∂/∂r )b Hab = Ha1 = H11 (dr )a = 0, the first term on the right-hand side of the above
equation vanishes. Let ∂a be the ordinary derivative operator of the coordinate system
{t, r, θ, ϕ}, then the above equation becomes
 b  c 
∂ ∂
8π(∂/∂r ) ∇ Tab = −H b ∂a
b a a
+  acb
= −H a b  b a1
∂r ∂r
= −(H 2 2  2 21 + H 3 3  3 31 ) = −2H 2 2 /r ,

where in the last step we used H 2 2 = H 3 3 and  2 21 =  3 31 = 1/r . Thus, H 2 2 = 0 [i.e.,


(9.3.6)] is equivalent to (∂/∂r )b ∇ a Tab = 0.
[The End of Optional Reading 9.3.2]

9.3.2 Stellar Evolution

In this subsection we introduce the formation and evolution of a spherically symmet-


ric star. When the density is not very high, the gravitational field is not that strong,
and Newton’s theory of gravity is approximately applicable. General relativity is
only necessary in the last part of this subsection. The predecessor of a star was a
cloud of gas (mainly hydrogen) of inhomogeneous density. Where the density is
higher, the gravity is stronger, which will attract more gas, and gradually forms a
spherically symmetric gas cloud. The gravitational force (self-gravity) from the gas
cloud acting on any volume element in any thin spherical shell (see Fig. 9.8) points
to the center of the sphere, and so the entire gas cloud will contract under the action
of self-gravity. This process transforms the gravitational potential into heat energy,
and thus the temperature T keeps rising. According to the formula for the pressure
of a classical ideal gas

p = kB nT , (kB is the Boltzmann constant, n is the number density) (9.3.31)

the pressure p in the gas cloud rises as T increases. Thus, the outward force on any thin
spherical layer caused by the pressure gradient d p/dr [see (9.3.18)] also increases
with T , and it seems that the contraction may stop when the temperature is high
enough. However, this is not possible without an energy source: since the temperature
of the gas cloud is higher than its surroundings, it keeps radiating energy outwards.
If the contraction stops, the temperature (and thus the pressure) will decrease, and
the pressure difference between two sides of the thin shell cannot counterbalance the
self-gravity. From the perspective of energy it also has to keep contracting, so that (a
9.3 Spherical Stars and Their Evolution 431

Fig. 9.9 A rough sketch of


the interior stellar core after
the hydrogen in the core is
burned up

part of) the gravitational potential energy keeps being converted into radiant energy.
After the gas cloud contracts slowly for a period of time, the temperature and the
density at the center is finally high enough to ignite a thermonuclear reaction. Near
the center (a central sphere called the stellar core), the hydrogen is transformed into
helium by thermonuclear fusion (which is the same reaction as in a hydrogen bomb
explosion), and at the same time releases a huge amount of energy. This supplements
the energy lost due to radiation (no need to rely on the gravitational potential energy
conversion), and so the gas cloud will reach equilibrium and no longer contract. At
this time, the gas cloud starts to become a star. The pressure gradient d p/dr at any
point in the gas cloud satisfies the stable equilibrium condition (9.3.18). The Sun is
an example of an ordinary star. It has spent about 4.5 billion years in this stable state
maintained by burning hydrogen into helium inside the stellar core, and can maintain
this state for about another 5 billion years. One day, all the hydrogen in the stellar
core will become helium, with only a thin layer of hydrogen around it still burning.
The situation inside the star is roughly sketched in Fig. 9.9.
When the temperature of the stellar core has not reached the level of igniting
helium nuclear fusion, the situation will be similar to the previous situation when it
has not reached the point required to ignite hydrogen: the helium ball contracts again
under the action of self-gravity and becomes hotter at the same time. This intensifies
the burning of hydrogen in the surrounding thin layer, which leads to the expansion
and cooling down of the outer part of the star, and turns it into a red giant. “Red” is
due to the decrease of the surface temperature, while “giant” comes from its inflated
size. The high temperature and density caused by the contraction of the helium sphere
may reach the level of the nuclear fusion reaction that ignites helium (burning helium
into carbon or oxygen), and the energy released will bring the stellar core to a stable
equilibrium again. The duration of this balance maintained by helium combustion is
much shorter than that of hydrogen combustion. When helium is burned into carbon
(or oxygen), the stellar core will contract again. The fate of a star in its later years
varies with its mass. For a star with a smaller mass (including the Sun), the contraction
of the stellar core cannot provide enough temperature for carbon to undergo nuclear
fusion, and thus it is no longer possible to maintain the equilibrium by nuclear energy.
Is there any power strong enough to counterbalance the self-gravity? There does not
exist such a power in classical physics. To prevent the contraction due to self-gravity,
there must be a sufficiently large pressure gradient [which is represented by (9.3.18)
in Newton’s theory of gravity, and (9.3.17) in general relativity].
432 9 Schwarzschild Spacetimes

A star is composed of hydrogen, helium and other elements. The high temperature
in the star puts these atoms in ionized states. According to classical physics, this
combination of ions and electrons can be regarded as an ideal gas. From (9.3.31) we
can see that a high temperature is required in order to obtain a high pressure for a
given density. Since the star keeps radiating energy, except for nuclear reactions, there
does not exist such a mechanism that can provide energy for maintaining the high
temperature. However, according to quantum physics, even a system at absolute zero
temperature may have a considerable pressure. Take an electron gas for example. In
classical physics, the average kinetic energy of the electrons is 3kB T /2; the average
kinetic energy vanishes when T = 0, and so all electrons are in a state with zero
energy. However, according to quantum physics, electrons are subject to the Pauli
exclusion principle, i.e., any energy state can be occupied by at most two electrons
(which have opposite spins and, hence, must be in different states). Therefore, when
T = 0, the electrons on the one hand must “squeeze” into a state with the lowest
possible energy; on the other hand, since each energy state can only be occupied
by two electrons, electrons must fill up all the states with the energy values from
zero all the way to a certain value E F (only states with energy greater than E F are
all empty). E F is called the Fermi energy, whose value increases as the density
increases. This indicates that even at absolute zero, the electrons in the electron gas
are not completely motionless as classical physics claims, they carry kinetic energy
that is not due to thermal motion (but due to the exclusion principle). This kind of
kinetic energy contributes to both pressure and energy density. An electron gas with
T = 0 is called a (completely) degenerate electron gas, and the pressure caused
by the above reasons is called the electron degeneracy pressure. At an ordinary
density, the Fermi energy E F is very small (for instance, the E F of the electron
gas in a common metal is only a few electronvolts), and the corresponding electron
degeneracy pressure is negligible. However, the degeneracy pressure will have a
considerable effect in the high density case. The high density caused by the second
contraction of the stellar core when the hydrogen and helium are burnt up gives the
electrons a rather high Fermi energy E F . Although the temperature T in the stellar
core is very high by the usual standard, due to the large E F , we have kB T E F , and
thus the contribution of electrons to the pressure p due to the thermal motion is much
smaller than that due to the kinetic energy of the electrons coming from the exclusion
principle and high E F . In this sense, it is not much different from the T = 0 case.
So at this time, the electrons in the star can be regarded as a degenerate electron gas,
whose degeneracy pressure may cancel the self-gravity, which will keep the star in
equilibrium and never contract. This kind of stable star supported by the electronic
degeneracy pressure is called a white dwarf. “Dwarf” means that it is much smaller
than an ordinary star, and “white” is named due to the high temperature at its surface.
Once an isolated star evolves into a white dwarf, there will be no important further
evolution anymore. Since the temperature is higher than its surroundings, it will
continuously radiate energy. Since there is no energy source, the radiation will cause
the star’s temperature to decrease until it is equal to that of the surroundings, and
so the star will no longer be visible (some literature refers to it as a “black dwarf”).
The existence of white dwarfs has been confirmed by astronomical observations
9.3 Spherical Stars and Their Evolution 433

dim and distant. Sirius B is the first white dwarf discovered by humans. Intuitively,
the more massive a star is, the stronger the self-gravity it has; only a star with a
sufficiently small mass can be supported by electronic degeneracy pressure and form
a white dwarf. S. Chandrasekhar first found the upper mass limit of a white dwarf,
MCh ∼ = 1.3M [see Chandrasekhar (1939)]. This work along with his extraordinary
contribution to astrophysics earned him the Nobel Prize in Physics in 1983. Optional
Reading 9.3.3 will briefly introduce the derivation of the Chandrasekhar limit.
During its evolution, a star will eject matter which makes its mass decrease. We
say that a white dwarf satisfies M < MCh , where M is the remaining mass. According
to estimation, any star with its initial mass less than 6 ∼ 8M will go through a red
giant phase, eject a large amount of matter and become a white dwarf with its mass
around 0.5 ∼ 0.6M .
If M > MCh , then the electron degeneracy pressure is not enough to maintain
the equilibrium of the star, and the nuclear fusion reaction inside the stellar core
will continue order by order until it is burned into iron and nickel. These are the
most tightly bound nuclei (with the maximum average binding energy), so they do
not release energy by nuclear fusion. Hence, the stellar core contracts sharply under
the action of the self-gravity, and the density and temperature increase sharply. At
this time the self-gravity is very strong, the Newtonian approximation (9.3.18) is
no longer applicable, and (9.3.17) in general relativity must be used. For a given
ρ(r ) > 0, the right-hand side (absolute value) of (9.3.17) is always greater than that
of (9.3.18); thus, to achieve an equilibrium in general relativity a greater central
pressure is needed, and so the equilibrium is more difficult to achieve. At such a high
temperature and high density, high-energy photons can break the iron-nickel nuclei
into neutrons, protons, or light nuclei (photofission), and the electrons will also react
with protons (electron capture) and form neutrons and neutrinos (the latter will run
out of the star). Therefore, neutrons account for the vast majority in the stellar core.
Neutrons are also fermions, which also obey the Pauli exclusion principle. When
the nuclear density (∼1017 kg·m−3 ) is reached, the Fermi energy E F (divided by the
Boltzmann constant kB ) of the neutrons is much higher than the temperature T in
the star,5 and so it can be regarded as a degenerate neutron gas (i.e., T ∼ = 0), whose
degeneracy pressure may also counterbalance the self-gravity, making the star reach
a stable equilibrium. This kind of stable star supported by the neutron degeneracy
pressure is called a neutron star. Since the density inside a neutron star reaches or
even exceeds nuclear density, people’s understanding of the equation of state under
this kind of condition is far less accurate than that at lower densities, which makes it
rather difficult to calculate the maximum mass of a neutron star. Different literature
gives different values of this, and one can only roughly say that the upper mass
limit of a neutron star is 2M (or 2 ∼ 3M ). Since it reaches nuclear density, one
may consider a neutron star as a “super-large atomic nucleus”. A neutron star is
much smaller than a white dwarf. The typical radius of a neutron star is only on the
order of 10 km, whereas a white dwarf has a radius between about 3,000 and 20,000

5 A more precise statement is: since it releases a large amount of high-energy neutrinos, a few
seconds after the formation of the neutron star it has E F  kB T .
434 9 Schwarzschild Spacetimes

kilometers. A neutron star is a very special (and complex) celestial body, which has
various “extreme” (abnormal) behaviors: a density up to nuclear density, unusually
strong magnetic field (up to 1012 Gauss), very high-speed rotation (with frequency
from 1 Hz to nearly 1000 Hz), high speed of sound which is close to the speed of light,
superfluid in the interior.... Until today, it is still difficult to understand it thoroughly.
The first theoretical model of a neutron star was published by J. R. Oppenheimer
and G. M. Volkoff in 1939. Since their article did not provide any observable physical
effect, the study of neutron stars had been slighted for 28 years. The existence of
neutron stars has been confirmed since the discovery of a pulsar in 1967. A pulsar is
a signal source of periodic electromagnetic pulse signals measured on Earth, with a
period about 1 s or less. The only persuasive explanation is: this is a rotating neutron
star whose strong magnetic field on the surface leads to magnetic dipole radiation,
and the combination of orientation of the radiation and the rotation of the neutron
star lets the Earth receive an electromagnetic pulse signal (the electromagnetic pulse
of the pulsar discovered in 1967 is a radio pulse). Only neutron stars (with small
radius and strong surface gravity) can rotate at such a high angular velocity without
“falling apart”.
The stellar core contracts very sharply before it forms a neutron star, and thus
this process is called gravitational collapse. Once the rapidly collapsing stellar core
reaches a sufficient density and is stopped by the neutron degeneracy pressure, its
high energy will appear as an outward shock wave, and bust out the outer material,
forming a supernova explosion with a great energy. Pulsars have been found in
two famous supernova remnants, the Crab Nebula and the Vela supernova remnant,
which provides an important support for the above-mentioned theory. Ancient Chi-
nese documents have extremely rich records of supernova explosions. For example,
Volume Nine of Zhi (Records) in Song Shi (History of Song) published in 1346
recorded the supernova (SN1054) observed in 1054 AD (during the Northern Song
Dynasty), which was particularly valued by modern international peers [the photo
of one page of it can be found at the title page of Misner et al. (1973)]. The Crab
Nebula is exactly the remnant of SN1054. The most recently observed supernova
explosion visible to the naked eye on Earth was in 1987 (SN1987a). This supernova
is located in a neighboring galaxy of the Milky Way—the Large Magellanic Cloud,
which is about 160,000 light-years away from the Earth. The detailed mechanism of
the supernova explosion is still a subject being studied in depth.
If the mass of a spherically symmetric star is still greater than the upper mass
limit of a neutron star (∼2M ) after ejecting matter, there will be no power to
prevent its gravitational collapse. Then, it will contract without any restriction into
a “singularity” with infinite density and curvature, and form a Schwarzschild black
hole (see Sect. 9.4).
[Optional Reading 9.3.3]
This optional reading introduces the derivation of the formula of electron degeneracy
pressure and the upper mass limit of a white dwarf. First we discuss the electron degeneracy
pressure. Suppose x, y, z are the spatial coordinates of an electron, and k x , k y , k z are the three
coordinate components of the electron’s momentum, then {x, y, z; k x , k y , k z } is a coordinate
system of the 6-dimensional phase space. A phase space can be divided into many quantum
phase cells dxdydzdk x dk y dk z (each phase cell corresponds to an energy level), and the
9.3 Spherical Stars and Their Evolution 435

Fig. 9.10 Electrons inside


an oblique cylinder that go
through the area element dσ
in a unit time, from which
one can calculate the
pressure at dσ

volume of a phase cell is h 3 (where h is the Planck constant). Hence,

dxdydzdk x dk y dk z = h 3 . (9.3.32)
Let k ≡ (k x2 + k 2y + k z2 )1/2 , then the points whose values of k are in the range of (k, k + dk)
in the momentum space constitute a spherical shell with volume 4π k 2 dk. Then, the points
in the phase space representing states whose position is in dxdydz and the value of k is in
(k, k + dk) constitute a shell with volume 4π k 2 dkdxdydz. Since the volume of each quantum
phase cell is h 3 , there will be 4π k 2 dkdxdydz/ h 3 phase cells in the shell. Since each cell
corresponds to an energy level, and each energy level is occupied by at most two electrons,
the number of electrons in a shell will not exceed 8π k 2 dkdxdydz/ h 3 . For a completely
degenerate electron gas with T = 0, each energy level with E  E F has two electrons, and
all the energy levels with E > E F are empty. Therefore, the number of electrons with their
values of k in (k, k + dk) per unit volume, denoted by f (k)dk, satisfies

8π k 2 dk/ h 3 , k < kF
f (k)dk = , (9.3.33)
0, k > kF

where kF is the Fermi momentum corresponding to E F . Hence, the number density of


electrons (the number of electrons per unit volume regardless of their momenta) is
kF 8π k 2 dk 8π
ne = = 3 kF3 . (9.3.34)
0 h3 3h
The predominant contribution to the mass density ρ in a star comes from the nuclei. Suppose
the number density and mass of nuclei are n N and m N , then ρ = n N m N . Let μ ≡ n N /n e (for
a star with hydrogen burnt up, μ ∼
= 2), then

ρ = μn e m N , (9.3.35)
where n e is given by (9.3.34). To obtain the equation of state, one should also compute the
degeneracy pressure pde . Pressure is the stress per unit area, i.e., the force that the matter
on the left side of an area element exerts on the matter on the right side, or the momentum
exchanged through the area element per unit time (as the definition of force is the rate
of change of momentum dk/dt). This exchange of momentum is caused by the electrons
going across the area from left to right or the other way around (each electron carries some
certain momentum). Therefore, the pressure equals the vector sum of the momenta of the
electrons going through per unit area per unit time. Suppose dσ is an area element in the
internal space of the star, whose normal vector is n (see Fig. 9.10). First, consider an electron
436 9 Schwarzschild Spacetimes

with momentum k whose corresponding velocity is u, i.e., k = (1 − u 2 )−1/2 m e u (where


u 2 ≡ u · u). Take an oblique cylinder with dσ as the base and u as the generatrix length,
whose generatrix is parallel to k. Suppose θ is the angle between n and k, then the volume
of the cylinder equals u cos θdσ , and it follows from (9.3.33) that the number of electrons
inside the cylinder whose values of k are in (k, k + dk) is f (k)u cos θdσ dk. So far we have
not considered the direction of k yet. Take a sphere with a point q in the area element dσ as
the center. Divide the right hemisphere into many area elements, each of which corresponds
to a solid angle element, where the one with k as the axis is denoted by dk . Since the
solid angle corresponding to the whole sphere is 4π , the number of electrons inside the
cylinder whose momenta are oriented within dk and magnitudes are in (k, k + dk) is only
f (k)u cos θdσ dkdk /4π . These electrons (carrying momentum) are going through dσ in
every unit of time, and thus the normal component of the total momentum of the electrons
going through dσ per unit time satisfying the above-mentioned conditions [① the magnitudes
are in (k, k + dk); ② the directions are in dk ] is

dk d
f (k)u cos θdσ dk k cos θ = f (k)uk cos2 θdσ dk k .
4π 4π
Hence, the total momentum of all electrons (regardless of the magnitudes and directions of
k) going through per unit area per unit time, i.e., the degeneracy pressure at dσ is

1 ∞ 8π kF
pde = cos2 θdk f (k)u(k)kdk = k 3 u(k)dk , (9.3.36)
4π sphere 0 3h 3 0

where (9.3.33) is used in the second equality. Using k = (1 − u 2 )−1/2 m e u, one can rewrite
(9.3.36) as
8π kF k 4 dk
pde = 3 . (9.3.37)
3h 0 (k 2 + m 2e )1/2
Then, using (9.3.34) and one can rewrite (9.3.35) as
 1/3

kF = h . (9.3.38)
8π μm N

Plugging this into (9.3.37) yields the explicit expression for the equation of state. This
equation is quite complex, but some useful conclusions can be obtained by analyzing two
extreme cases. When m e  kF , u F 1, the motion of electrons can be described by Newto-
nian mechanics, which is called the non-relativistic case; when m e kF , u F ∼
= 1, the motion
of electrons must be characterized by special relativity, which is called the ultra-relativistic
case. The non-relativistic condition m e  kF and the ultra-relativistic condition m e kF
can also be expressed as ρ ρC and ρ  ρC , respectively, where the critical density ρC
is defined by m e = kF , which can be found explicitly from (9.3.38) as

8π μm N m 3e
ρC = . (9.3.39)
3h 3
Rewriting this in SI (adding c3 ) and plugging in the specific values (take μ = 2), we obtain

8π μm N m 3e c3 ∼
ρC = = 2 × 109 kg · m−3 ,
3h 3
and thus the critical density ρC is about 2 × 106 times the density of water. For ρ ρC
(non-relativistic case), (9.3.37) gives approximately
9.3 Spherical Stars and Their Evolution 437

8π kF5
pde = . (9.3.40)
15h 3 m e
Plugging in (9.3.38) along with the specific values in SI yields
 2/3  5/3  5/3
1 3 h2 ρ 7 ρ
pde = = 10 (SI) . (9.3.41)
20 π memN
5/3 μ μ

Under the ultra-relativistic condition, (9.3.37) gives approximately

2π kF4
pde = . (9.3.42)
3h 3
Plugging (9.3.38) in the above equation, rewriting it in SI (adding c) and plugging in the
specific values, we obtain
 1/3  4/3  4/3
3 hc ρ 10 ρ
pde = = 1.24 × 10 (SI) . (9.3.43)
π 4/3
8m N μ μ

Eqs. (9.3.41) and (9.3.43) can be unified as

pde = Kρ γ , K = constant , γ = constant . (9.3.44)


A celestial body whose equation of state is (9.3.44) is called a polytrope. Under both of the
extreme cases, a star constituted of a degenerate electron gas is a polytrope (but in between
the two cases it is not), where for the non-relativistic case we have γ = 5/3, and for the
ultra-relativistic case we have γ = 4/3. Suppose a star is a polytrope, we can rewrite the
hydrostatic equilibrium condition (9.3.18) using (9.3.8) as

d r 2 d p(r )
= −4πρ(r )r 2 . (9.3.45)
dr ρ(r ) dr

Starting from this equation, using some calculation techniques [see Weinberg (1972) pp. 308–
310], one finds the dependence of the radius R and the mass M of the star on the central
density ρ0 :
(γ −2)/2
R = a γ ρ0 , (9.3.46)
(3γ −4)/2
M= bγ ρ0 , (9.3.47)
where the constants aγ and bγ are related to γ ; for γ = 5/3 and γ = 4/3 they are, respec-
tively,
a5/3 = 6.3 × 108 μ−5/6 , b5/3 = 1.7 × 1026 μ−5/2 ,
(9.3.48)
a4/3 = 5.3 × 1010 μ−2/3 , b4/3 = 11.6 × 1030 μ−2 .
Based on this we can further discuss white dwarfs. When the mass M of a star is small
enough, ρ0 ρC , then (9.3.41) is valid everywhere inside the star, and the electron gas
inside the whole star forms a polytrope with γ = 5/3. When the central degeneracy pressure
equals the central pressure for keeping equilibrium, the star will be in an equilibrium state.
The relation between the radius R and the mass M in equilibrium can be seen from (9.3.46),
(9.3.47) (with γ = 5/3) and (9.3.48) as

R ∝ M −1/3 . (9.3.49)
438 9 Schwarzschild Spacetimes

Thus, the radius of a white dwarf with γ = 5/3 decreases as the mass increases. This seems
to contradict our life experience and the experience from the planets; later we will give a
rough explanation of this. If (9.3.49) always holds, then the electron degeneracy pressure
can support stars of any mass, since one can always plug a value of M into (9.3.49) and
find a radius R of a star in equilibrium. However, when the mass M is sufficiently large,
the central pressure will be so large that the (special) relativistic effect of electrons has to
be considered. Then, the star can no longer be regarded as a polytrope with γ = 5/3, and
(9.3.49) no longer holds. In fact, since ρ0 increases as M increases, the electrons near the
center will be the first to reach the ultra-relativistic level. A spherical core which can be
regarded as a polytrope with γ = 4/3 will appear in the star, and then it will gradually
expand to the entire body. From (9.3.47)6 we can see that M is independent of ρ0 when
γ = 4/3, which is quite different from the case where γ = 5/3. When the entire star can
be regarded as a polytrope with γ = 4/3, it follows from (9.3.48) that the value of this M
(denoted by MCh ) which is independent of ρ0 is

MCh = b4/3 = 5.8 × (2 × 1030 )μ−2 (SI) .

Noticing that M = 2 × 1030 in SI, we have

5.8
MCh = M . (9.3.50)
μ2
Equation (9.3.47) is derived under the condition of hydrostatic equilibrium. If the mass is
larger than MCh , the star cannot be in equilibrium. In fact, from (9.3.49) and (9.3.44) we can
see that the conclusion above can be interpreted as follows: as a rough evaluation, we assume
that the star has a uniform density, then it follows from (9.3.23) that the central pressure for
keeping the equilibrium is
pgrav ∝ M 2 R −4 , (9.3.51)
where p0 is now denoted as pgrav to emphasize that this is the central pressure for counter-
balancing the self-gravity. It follows from (9.3.44) that the degeneracy pressure provided by
the degenerate electron gas is pde ∝ M γ R −3γ , and hence

pgrav M 1/3 R , for γ = 5/3 , (9.3.52a)
∝ M 2−γ R 3γ −4 =
pde M 2/3 , for γ = 4/3 . (9.3.52b)
Suppose the electron gas in the star is in the non-relativistic case (γ = 5/3) and M < MCh ,
then from (9.3.52a) we know that there exists an R such that pgrav / pde = 1, and the star is in
equilibrium when its radius equals this value of R. If M increases slightly, then pgrav / pde >
1, i.e., the self-gravity is slightly larger than the degeneracy pressure, and the star will
contract to a smaller radius to reach equilibrium again. (This can be considered as a specific
interpretation for the conclusion “a white dwarf with a greater mass has a smaller radius”.)
However, if M is so large that the entire star has γ = 4/3, then it follows from (9.3.52b)
that pgrav / pde is independent of R. Under this extreme circumstance, only when M equals a
suitable value MCh can the star be in equilibrium. If M < MCh , then pgrav / pde < 1, i.e., the
degeneracy pressure is greater than the self-gravity, R will increase until it quits the ultra-
relativistic case. In contrast, if M > MCh , then pgrav / pde > 1, and the star will contract,
which makes γ closer to exactly 4/3. Then pgrav / pde will not change with the decrease of
R, and hence the star can only continue contracting and cannot reach equilibrium under the
support of the electron degeneracy pressure. Thus, MCh is indeed the upper mass limit of a
white dwarf (whose character is that the electron degeneracy pressure keeps the equilibrium).

6 Here we still have p ρ and m(r ) r , and hence the Newtonian equation (9.3.18) and (9.3.45)–
(9.3.48) derived from it are still applicable.
9.4 The Kruskal Extension and Schwarzschild Black Holes 439

Since the interior of a white dwarf is mostly helium, carbon or oxygen, one can take μ = 2 in
(9.3.50) and obtain MCh = 1.45M . The discussion above is just a simplified version. Some
more precise discussions and calculations provide MCh slightly smaller than this value, such
as MCh = 1.3M .
[The End of Optional Reading 9.3.3]

9.4 The Kruskal Extension and Schwarzschild Black Holes

The line element of the vacuum Schwarzschild metric in the Schwarzschild coordi-
nate system is
   
2M 2M −1 2  
ds = − 1 −
2
dt + 1 −
2
dr + r 2 dθ 2 + sin2 θ dϕ 2 . (9.4.1)
r r

When r = 2M, g11 = ∞ (singular); when r = 0, both g00 and g11 are singular. These
places where gμν is singular (or degenerate) are called singularities. Note that the
word “singularity” may be used to refer to both the property of being singular and
the place that has the singularity.7 There are two reasons accounting for the appear-
ance of a singularity: ① the metric tensor gab is well-behaved at this place, just
some components are not well-behaved in certain coordinate systems; this is called
a coordinate singularity, which can be removed by choosing a suitable coordinate
system; ② The metric tensor gab itself is ill-behaved (singular) at this place; this is
called a true singularity or spacetime singularity, which is really a thorny prob-
lem in general relativity. Later we will see that the singularity at r = 2M is only
a coordinate singularity, and a spacetime singularity exists only at r = 0. Denoting
rS ≡ 2M (called the Schwarzschild radius, and adding the constants G and c, we
get
2G M ∼ 3M
rS = = (km) .
c2 M

For the Sun, rS ∼ = 3 km, which is far less than its radius. Since the external
Schwarzschild solution does not apply to the interior of the Sun, there is no sin-
gularity problem for it (or any normal celestial bodies). However, for a spherically
symmetric star which experiences gravitational collapse and turns into a black hole
(Birkhoff’s theorem assures that the external spacetime geometry is described by the
Schwarzschild metric), the singularity problem is of great significance.

7 Also note that a singularity may not be a point, since in the 4-dimensional language r = 0 (or
r = 2M) represents a hypersurface instead of a point.
440 9 Schwarzschild Spacetimes

9.4.1 The Definition of a Spacetime Singularity

The concept of singularity is closely related to the divergence of physical quantities,


which has existed long before general relativity came out. However, the problem
of spacetime singularities in general relativity is much more troublesome than the
singularity problem in any other physical theories (even the definition is more trou-
blesome). The key point is that, in any theory that is not of general covariance, a
background spacetime is given beforehand (e.g., Minkowski spacetime). Wherever
the physical field we care about is divergent (or undefined), we say that this is a
singularity of the physical field. For instance, in the 3-dimensional language, the
electrostatic field strength E = Q r /4πr 3 of a point charge is undefined at r = 0
(i.e., E → ∞ when r → 0), and thus we say that this point is the singularity of the
electrostatic field E, or say that E is singular at r = 0. However, things are different
in general relativity. Since what we care about is the spacetime singularity, i.e., the
singularity of the metric, the metric field plays a double role of both the background
field and the physical field in this problem (it acts as both the stage and the actor).
Following the definition of the singularity in an electrostatic field, it seems that one
can define a spacetime singularity as follows: “a spacetime (M, gab ) is said to be
singular if ∃ p ∈ M such that gab is undefined (or divergent) at p, and the point p is
called a spacetime singularity.” However, the spacetime itself, by definition, is a 4-
dimensional manifold M equipped with a Lorentzian metric, and at each point of M
the metric gab should be not only well-defined, but also well-behaved, such as being
continuous and differentiable to some certain order. If there exists a point p in M
such that gab | p is undefined, then p does not even belong to the spacetime (not a valid
spacetime point), and thus the actual spacetime is (M , gab ), where M is the result of
eliminating the point p, i.e., M = M − { p}. For example, if we use (M, gab ) to rep-
resent Schwarzschild spacetime, then it should be make clear that all the points with
r = 0 do not belong to M. It seems that the definition of spacetime singularity can be
modified as: “a spacetime is said to be singular if some of its region is eliminated.”
But the problem is how to determine whether “one or some regions are eliminated”.
Here we introduce an ingenious method. Take Minkowski spacetime as an example.
Suppose γ (λ) is an arbitrary inextendible geodesic in (R4 , ηab ) (meaning both ends
have been extended until they cannot be extended anymore), then its affine parameter
λ takes values from −∞ to +∞ [i.e., γ (λ) ∈ R4 , ∀λ ∈ (−∞, ∞)]. If we eliminate
a point p of γ (λ), there will be a “hole” left in R4 , making it M ≡ R4 − { p}, which
splits γ (λ) into two geodesics γ (λ) and γ (λ), whose affine parameter λ has ranges
(the domains of the curve maps γ and γ ) (−∞, λ p ) and (λ p , ∞), respectively. We
refer to both γ (λ) and γ (λ) incomplete geodesics. Generally speaking, an inex-
tendible geodesic in (M, gab ) is called an incomplete geodesic if the range of its
affine parameter is not (−∞, ∞). The existence of an incomplete geodesic, to some
extent, can be regarded as the sign of some region in spacetime being eliminated (and
thus there is a “hole”). Hence, one may consider a definition as follows: “if there
exists one (or more than one) incomplete geodesic in spacetime, then we call it a
singular spacetime.” However, this definition has a serious flaw, namely the “scope of
9.4 The Kruskal Extension and Schwarzschild Black Holes 441

attack” is overly broad. A spacetime that should not have been singular, if we remove
a point by hand, would be a singular spacetime according to the preceding definition,
which is not what we want. One way to overcome this flaw is to add a restriction
in the definition: the spacetime we consider must be inextendible, i.e., it cannot be
enlarged by adding some points to it.8 A spacetime with some points removed arti-
ficially is not inextendible, and hence does not meet this definition. Then we inspect
the above definition from the perspective of whether it is physically singular. If there
exists an incomplete timelike geodesic in an inextendible spacetime, physically it is
indeed quite singular: the freely falling observer it represents will actually vanish in
the spacetime within a finite time (according to its own standard clock) or not even
have existed a finite amount of time earlier! Similarly, an incomplete null geodesic
is also physically singular, since it represents the world line of a photon. However, a
spacelike geodesic is not the world line of any particle, and so there is no reason to
consider a spacetime which only has incomplete spacelike geodesics as physically
singular. Hence, we take the following definition [see Hawking and Ellis (1973)]:
Definition 1 If there exists one (or more than one) incomplete timelike or null
geodesic in an inextendible spacetime, we say that it is a singular spacetime, or
it has a spacetime singularity.
However, Definition 1 still has drawbacks. For instance, there exists such a space-
time [see Geroch (1968)] which has no incomplete geodesics, but has a bizarre
non-geodesic timelike curve (which has been maximally extended) whose arc length
is finite and 4-acceleration (magnitude) is bounded. This indicates that the observer
in a spaceship traveling along the curve will vanish in the spacetime after a finite
time! (The finite arc length and bounded 4-acceleration assures that the spaceship
can finish this curve with a finite amount of fuel, and a spaceship like this exists in
principle.) Such a spacetime is singular enough to be called a singular spacetime,
but unfortunately it is not according to Definition 1. This indicates that this defi-
nition has a drawback that the “scope of attack” is too narrow. Another drawback
of Definition 1 is that the intuitive statement that the spacetime has a “hole” does
not always meet the existence of an incomplete geodesics. For example, there exists
such a geodesically incomplete spacetime (which contains an incomplete timelike,
null or spacelike geodesic) whose background manifold is compact, and hence has
no “holes” (according to Theorem 1.3.9, any point sequence in a compact manifold
has an accumulation point, and thus the manifold has no “holes”), see Wald (1984).
Although Definition 1 has these drawbacks, it may still be considered as the first
choice of the definition of a singularity. The proof of the Penrose-Hawking singu-
larity theorems used exactly this definition (Appendix E of Volume II provides a
brief introduction of singularity theorems). Later we will see that in the maximally
extended Schwarzschild spacetime there still exist many incomplete timelike and
null geodesics (whose existence is related to the elimination of r = 0). Therefore,

8The precise mathematical definition is: a spacetime (M, gab ) is said to be inextendible if there
does not exist a spacetime (M , gab ) such that there exists an isometry between the proper subsets
of (M, gab ) and (M , gab ).
442 9 Schwarzschild Spacetimes

Schwarzschild spacetime is a singular spacetime, which has a spacetime singularity


at r = 0 (and thus the points at r = 0 do not belong to Schwarzschild spacetime).
Many singular spacetimes have “curvature divergence” when an incomplete
geodesic is approaching the singularity. The curvature is a tensor, whose compo-
nents depend on the basis. The components of an ordinary tensor will also diverge
in a bad basis, and therefore when talking about curvature divergence one needs to
first give it a clear and valid definition. First, we may consider all kinds of scalars
constructed by Rabc d , gab and ∇a [such as R, Rab R ab , Rabcd R abcd , Rabcd R cde f Re f ab ,
(∇ c R ab )∇c Rab ] and their polynomials. If one of these quantities is divergent along an
incomplete geodesic, then we say there exists an s.p. curvature singularity, where
s.p. is the abbreviation for scalar polynomial. However, there also exists such a space-
time, whose scalar polynomials all vanish while Rabc d = 0 (which is similar to the
fact that the self-contraction of a null vector vanishes while the vector itself is not).
Hence, we should also consider an alternative definition for curvature divergence: if
at least one component of Rabc d and its covariant derivatives in any frame parallelly
transported along the geodesics is divergent, we say that the spacetime has a p.p.
curvature singularity, where p.p. stands for parallelly propagated basis. Note that
an s.p. curvature singularity contains a p.p. curvature singularity, but not vise versa.
Once we find at least one incomplete timelike or null geodesic in the spacetime, we
can say that the spacetime is singular. Then we inspect if these incomplete geodesics
have curvature divergence, which has three possibilities: ① there is an s.p. singu-
larity; ② there is a p.p. singularity but no s.p. singularity; ③ there is no curvature
singularity (no curvature divergence). Taub-NUT spacetime is an example of a sin-
gular spacetime without curvature singularity [see Hawking and Ellis (1973) Sect.
5.8 and p. 261]. On the other hand, although some spacetimes have curvature diver-
gence, they only diverge when “approaching infinity”, which should not be regarded
as singular spacetimes. Thus, it is inappropriate to define spacetime singularity in
terms of only curvature divergence but not geodesically incompleteness.

9.4.2 Coordinate Singularities of Rindler Metrics

If you can find a coordinate system such that the components of the Schwarzschild
metric in this system behave ordinarily at r = 2M, you can claim that r = 2M is only
a coordinate singularity. This is a sufficient condition for determining a coordinate
singularity. Unfortunately, finding this kind of “good” coordinate system in general
is not easy, and is in no way guaranteed. Luckily, the singularity of the Schwarzschild
metric at r = 2M involves only the first two dimensions in the total 4-dimensional
line element, and finding a “good” coordinates system in a 2-dimensional spacetime
is way easier than doing that in a 4-dimensional spacetime. In this section we will
first introduce a simple but heuristic example. Consider the 2-dimensional Rindler
spacetime, whose metric has the following line element expression in the coordinates
system {t, x}:
ds 2 = −x 2 dt 2 + dx 2 . (9.4.2)
9.4 The Kruskal Extension and Schwarzschild Black Holes 443

The determinant of the metric components, g = −x 2 , vanishes at x = 0, and hence


the matrix (gμν ) has no inverse (is degenerate), which means gμν has a singularity
at x = 0. We want to show that this is a coordinate singularity. First, note that the
range of x should not include x = 0. There is a basic stipulation in relativity, namely
the background manifold has to be a connected manifold. Therefore, the range of x
can either be x > 0 or x < 0, but not the union of both. Without loss of generality,
we take x > 0, i.e., we restrict the range of t, x to be

− ∞ < t < ∞, 0 < x < ∞. (9.4.3)

The approach of finding a “good” coordinate system for determining the singularity
at x = 0 as a coordinate singularity is based on the following fact: each point in
a 2-dimensional spacetime has only two null directions (while in 4-dimensional
spacetime there are infinitely many), and hence (locally) there are only two null
geodesics passing through each point, which sorts the null geodesics in the entire
spacetime into two families. If we find that a null geodesic is incomplete, then we
should suspect that certain regions have been eliminated from the given spacetime. If
one can show that these eliminated regions can be mended, i.e., the given spacetime
can be extended, and x = 0 is a point in the extended spacetime, then one can claim
that the singularity at x = 0 is only a coordinate singularity. Here is how it works:
Suppose η(λ) is a null geodesic in Rindler spacetime, with λ as the affine param-
eter, then its tangent vector

(∂/∂λ)a = (∂/∂t)a dt/dλ + (∂/∂ x)a dx/dλ

satisfies

0 = gab (∂/∂λ)a (∂/∂λ)b = g00 (dt/dλ)2 + g11 (dx/dλ)2 = −x 2 (dt/dλ)2 + (dx/dλ)2 .

Thus, for η(λ) we have

dt/dx = ±1/x , t = ± ln x + c (c is the constant of integration), (9.4.4)

where the positive sign and negative sign represents the “ingoing” family and “out-
going” family of null geodesics, respectively (the “ingoing” and “outgoing” here
are introduced simply for the sake of convenience, one can choose either family as
ingoing, and the other as outgoing), and different values of c correspond to different
geodesics in the same family. Hence, t + ln x and t − ln x are constants on each
“ingoing” and “outgoing” null geodesic. Define coordinates v and u as follows:

v := t + ln x , u := t − ln x . (9.4.5)

Then v is a constant on each “ingoing” null geodesic, and u is a constant on each


“outgoing” null geodesic. It follows from (9.4.5) that
444 9 Schwarzschild Spacetimes

1
x = e 2 (v−u) ,
1
t= (v + u) , (9.4.6)
2
and plugging these into (9.4.2) after differentiating them yields

ds 2 = −ev−u dvdu . (9.4.7)

Thus, 0 = gvv = gab (∂/∂v)a (∂/∂v)b , which indicates that the basis vector (∂/∂v)a
is a null vector. Similarly, (∂/∂u)a is also a null vector. Therefore, we refer to v and
u as null coordinates. The ranges of the coordinates t and x [see (9.4.3)] correspond
to the following ranges of v and u (see Fig. 9.11):

− ∞ < v < ∞, −∞ < u < ∞ . (9.4.8)

This seems to suggest that all the null geodesics are complete, but actually it does
not since v and u are not affine parameters. The affine parameters can be obtained
by means of the timelike Killing vector field (∂/∂t)a . According to Theorem 4.3.3,
the E defined as follows is a constant along any null geodesic η(λ)

E := −gab (∂/∂t)a (∂/∂λ)b = −g00 dt/dλ = x 2 dt/dλ , (9.4.9)

where λ is the affine parameter. Noticing that u is a constant on any “outgoing”


null-geodesic, we can plug (9.4.6) into (9.4.9) and get

e−u v e−u e−u v


dλ = e dv , λ= ev dv = e + c1 , c1 = constant . (9.4.10)
2E 2E 2E

Define
V := ev . (9.4.11)

Since e−u /2E and c1 are constants, and λ is an affine parameter, (9.4.10) indicates
that V ≡ eν is also an affine parameter of the “outgoing” null geodesics (see Theorem
3.3.3). From (9.4.8) and V ≡ eν we can see that the range of V is (0, ∞), and thus the
“outgoing” null geodesics are incomplete. Similarly, for “ingoing” null geodesics,

U := −e−u (9.4.12)

is an affine parameter. From (9.4.8) and (9.4.12) we know that the range of U is
(−∞, 0), and thus the “ingoing” null geodesics are incomplete also. Does this indi-
cate that Rindler spacetime is a singular spacetime with spacetime singularity at
x = 0? No, the key is that Rindler spacetime is not an inextendible spacetime, but
is the result of eliminating certain regions from a larger spacetime. To confirm this
conclusion, one can derive from (9.4.11) and (9.4.12) that

dV dU = ev−u dvdu , (9.4.13)


9.4 The Kruskal Extension and Schwarzschild Black Holes 445

and plugging into (9.4.7) yields

ds 2 = −dV dU . (9.4.14)

The range of the new coordinates V and U :

0 < V < ∞, −∞ < U < 0 (9.4.15)

are derived from the range of the original coordinates x, i.e., 0 < x < ∞. However,
now it is not necessary to stick to this range, since it follows from (9.4.14) that
the only nonvanishing component in the coordinate system {V, U } of the metric,
gV U = −1/2, behaves quite normally. Even if V, U take values exceeding the range
of (9.4.15), the line element (9.4.14) still behaves well, with no singularity at all.
If we present (9.4.15) to you in the first place without mentioning the previous
discussion, you would naturally consider that the range of V, U has no constraints,
i.e., they can take any value within (−∞, +∞). In this way, the extension of the
domain of the Rindler metric is realized by introducing new coordinates V, U . x = 0
represents points in the extended domain (the positive semi-axis of the V -axis, see
Fig. 9.12). The metric behaves normally at these points, just its components in the
original coordinate system {t, x} behave badly there. This is actually pretty natural,
since x = 0 never belongs to the coordinates patch of the original coordinate system
(it only “touches the edge” from the outside), the so-called singularity at x = 0
is nothing but applying the original coordinate system inappropriately outside the
coordinate patch. Therefore, the singularity at x = 0 is only a coordinate singularity.
If we further define coordinates T, X as follows:

V +U V −U
T := , X := , (9.4.16)
2 2

then it follows from (9.4.14) that ds 2 = −dT 2 + dX 2 . Thus, the Rindler metric is
actually a flat metric,9 just that its true colors are concealed by the original coordinates
t, x. The Rindler spacetime defined in (9.4.2) is nothing but a sub-spacetime of the
2-dimensional Minkowski spacetime [a quadrant defined by (9.4.15), see region R
in Fig. 9.12]. The Minkowski spacetime in Fig. 9.12 is the maximal extension of
the Rindler spacetime in Fig. 9.11. It follows from x 2 = ev−u = −V U that both of
the two lines V = 0 and U = 0 in Fig. 9.12 correspond to x = 0, which is exactly
a specific manifestation of “x = 0 does not belong to the coordinate patch of the
original coordinate system (it only “touches the edge” from the outside)”. Although
the two families of null geodesics in Fig. 9.11 appear differently from those in region
R of Fig. 9.12, they are essentially the same. This again indicates that, even for the
same spacetime, the spacetime diagram can vary widely due to different choices of
the coordinate system.

9 It only differs from the Minkowski metric up to a diffeomorphism, and thus they are equivalent
(they have the same geometry, see Sect. 8.10.2).
446 9 Schwarzschild Spacetimes

Fig. 9.11 The behavior of


the “ingoing” family (1) and
outgoing family (2) of null
geodesics in the
2-dimensional Rindler
spacetime in the coordinate
system {t, x}

Fig. 9.12 2-dimensional


Rindler spacetime is a
sub-spacetime (region R) of
the 2-dimensional
Minkowski spacetime

9.4.3 The Kruskal Extension of Schwarzschild Spacetimes

As a differential equation, Einstein’s equation is local (one can talk about a differential
equation at any given point and its neighborhood on the manifold). Each solution
of the equation represents a metric. As for what manifold is the metric defined on
(this is a global problem), one can only discuss it after solving the equation. Take
the original Schwarzschild line element (9.4.1) as an example. We have pointed out
that this line element has singularities at r = 0 and r = 2M. Since the background
manifold has to be connected, the range of r can either be r > 2M or r < 2M, but
cannot be their union. We may take r > 2M, and then try to show that r = 2M
is a coordinate singularity. The way of proving this is quite similar to that of the
Rindler case. The Rindler line element (9.4.2) is not only 2-dimensional, but also
has a timelike Killing vector field (∂/∂t)a , i.e., the metric components do not contain
t, which greatly simplifies the task of finding a “good” coordinate system. We may
summarize the way of accomplishing this task as the following procedure:
9.4 The Kruskal Extension and Schwarzschild Black Holes 447

ds 2 = −x 2 dt 2 + dx 2 = x 2 (−dt 2 + x −2 dx 2 ) .

Define a function x∗ (x) such that dx∗ = x −1 dx, then ds 2 = x 2 (−dt 2 + dx∗2 ). Let
v := t + x∗ , u := t − x∗ , i.e., t = (v + u)/2, x∗ = (v − u)/2, then −dt 2 + dx∗2 =
−dvdu. Hence
ds 2 = −x 2 dvdu = −ev−u dvdu = −dV dU ,

where V = ev , U := −e−u . The Schwarzschild metric also has a Killing vector


field (∂/∂t)a , or equivalently, the coefficients of the first two dimensions of its line
element (9.4.1) (denoted by dŝ 2 ) do not contain t, and so the previous procedure is
still applicable:

dŝ 2 = −(1 − 2M/r )dt 2 + (1 − 2M/r )−1 dr 2


= (1 − 2M/r )[−dt 2 + (1 − 2M/r )−2 dr 2 ] = (1 − 2M/r )(−dt 2 + dr∗2 ) ,
(9.4.17)

where
dr∗ := (1 − 2M/r )−1 dr . (9.4.18)

Take  r 
r∗ := r + 2M ln −1 , (9.4.19)
2M
which is the tortoise coordinate r∗ in (8.9.1). Let

v+u v−u
v := t + r∗ , u := t − r∗ or t= , r∗ = , (9.4.20)
2 2
then the ranges of v and u are

− ∞ < v, u < ∞ . (9.4.21)

It follows from (9.4.20) that −dt 2 + dr∗2 = −dvdu, and hence

dŝ 2 = −(1 − 2M/r )dvdu . (9.4.22)

Let

V := eβv , U := −e−βu (β is an undetermined constant) , (9.4.23)

then the ranges of V and U are

0 < V < ∞, −∞ < U < 0 . (9.4.24)

Also
448 9 Schwarzschild Spacetimes

dvdu = β −2 e β(u−v) dV dU ,

and hence  
r − 2M
dŝ 2 = −β −2 e β(u−v) dV dU.
r

The factor eβ(u−v) on the right-hand side of the above equation can be expressed using
(9.4.20) as eβ(u−v) = e−2βr∗ . Using (9.4.19) to express −2βr∗ , we can organize that
 4β M
β(u−v) −2βr 2M
e =e .
r − 2M

Hence,    4β M
−2 r − 2M −2βr 2M
dŝ = −β
2
e dV dU .
r r − 2M

The cases where the above equation may be singular are r = 0 and r − 2M = 0, in
which the latter can be eliminated by choosing

1
β= (9.4.25)
4M
as
2M −2βr 32M 3 −r/2M
dŝ 2 = −β −2 e dV dU = − e dV dU. (9.4.26)
r r
This equation indicates that the metric components are no longer singular at r = 2M,
and hence the range of V, U can be extended to the regions where V  0 and U  0.
Unlike the Rindler case, (9.4.26) indicates that r = 0 is still a singularity, and thus
the range of r is constrained to r > 0. Thus, the values of V and U are in no way
arbitrary; together they must satisfy the condition r > 0. Also let

1 1
T := (V + U ) , X := (V − U ) , (9.4.27)
2 2
and complete it with the other two dimensions, then we obtain the expression for
the line element of the Schwarzschild metric in the Kruskal coordinate system
{T, X, θ, φ}:

32M 3 −r/2M
ds 2 = e (−dT 2 + dX 2 ) + r 2 (dθ 2 + sin2 θ dϕ 2 ) . (9.4.28)
r
The above equation indicates that the Schwarzschild metric can be defined on a
manifold much larger than the original domain (r > 2M). Generally speaking, a
spacetime ( M̃, g̃ab ) is called an extension of a spacetime (M, gab ) if M ⊂ M̃ and
g̃ab | p = gab | p , ∀ p ∈ M. The extension of the original Schwarzschild spacetime we
9.4 The Kruskal Extension and Schwarzschild Black Holes 449

Fig. 9.13 The maximal Kruskal extension of Schwarzschild spacetime

obtained just now is called the Kruskal extension [Kruskal (1960)]. In this extension,
the coordinates T, X can take all the values allowed by r > 0. The r in the line
element (9.4.28) should be regarded as a function of the coordinates T and X , which
is defined as follows (it is not difficult to prove this from the relation of the old and
new coordinates):  r 
− 1 er/2M = X 2 − T 2 . (9.4.29)
2M
Due to the spherical symmetry, one can sketch the spacetime diagram with only the
first two dimensions (see Fig. 9.13), and by imagining each point in the diagram
as an S 2 (2-dimensional sphere) yields the 4-dimensional spacetime. The factor
−dT 2 + dX 2 in (9.4.28) indicates that in the 2-dimensional Schwarzschild spacetime
diagram with T and X as the coordinate axes, the (radial) null curves are all lines
with slope ±1. This will bring huge convenience to our discussion.
From (9.4.29) we can see that r = constant corresponds to X 2 − T 2 = constant,
i.e., a hyperbola in the T X -plane (a pair of 45◦ lines when r = 2M), which becomes
a circular hyperboloid with the other two dimensions no longer suppressed (a hyper-
surface in the 4-dimensional manifold). There are two important special cases:
(1) r = 0 corresponds to X 2 − T 2 = −1. Thus, the bound of the Kruskal exten-
sion, r > 0, can be expressed in terms of the coordinates as

X 2 − T 2 > −1 . (9.4.30)

It is not difficult to show that any radial null or timelike geodesic with r → 0 is
incomplete. By calculation one also finds that the value of the scalar field Rabcd R abcd
approaches ∞ when r → 0 along these geodesics (which is obviously distinct
from the fact that Rabcd R abcd approaches a finite value when r → 2M), and thus
there exists an s.p. curvature singularity. This implies that the spacetime cannot be
450 9 Schwarzschild Spacetimes

extended to r = 0 and beyond (r < 0), which means r = 0 is a spacetime singularity,


and the Kruskal extension is the maximal extension of Schwarzschild spacetime.
The shadow region in Fig. 9.13a does not belong to the extended spacetime. The
two zigzag hyperbolas stand for the spacetime singularity at r = 0 (the singular-
ity in 4-dimensional spacetime is not a point). The spacetime domain is an open
subset of R2 , which is homeomorphic to R2 , while the topological structure of a
2-dimensional sphere is S 2 , and thus the topological structure of the maximally
extended 4-dimensional Schwarzschild spacetime is R2 × S 2 . Note that each point
outside the shadow region in Fig. 9.13 represents an S 2 , and two different points
stand for two different S 2 . Especially, for example, point p and p stand for neither
the same S 2 , nor two points on the same S 2 , but each corresponds to one S 2 .
(2) r = 2M corresponds to X 2 − T 2 = 0, i.e., T = ±X , which represents two
lines tilted at 45◦ passing through the origin in the 2-dimensional diagram (N1 and
N2 in Fig. 9.13a). In the 4-dimensional spacetime they are two 3-dimensional surfaces
(null hypersurfaces), which divide the spacetime into four open regions: region A is
characterized by X > 0 and X 2 > T 2 (i.e., V > 0, U < 0). It follows from (9.4.29)
that the region corresponds to the spacetime region of r > 2M; this is the coordinate
patch of the original coordinate system {t, r }, which is also the “base area” where
the Kruskal extension starts from. Treat Fig. 9.13a as the M̃ in the above mentioned
general definition of an extended spacetime, whose metric g̃ab is described by the
line element (9.4.28); treat the region A as the manifold M, whose metric gab is
described by the line element (9.4.1). The line element (9.4.28) is nothing but the
result of applying a coordinate transformation on (9.4.1); they represent the same
metric field gab in the region A, and hence ( M̃, g̃ab ) is indeed an extension of (A, gab )
(the original Schwarzschild spacetime). All of B, W and A are the outcome of the
extension starting from A. The boundary points of A satisfy V = 0 and U = 0, while
from (9.4.20) and the definition of V and U we get t = 2M[ln V − ln(−U )]. Thus,
the points of A have t → ±∞ when approaching the boundary (see Fig. 9.13b),
which indicates that t is not defined on the two tilted lines. This is exactly the reason
why the Schwarzschild line element (9.4.1) behaves singularly at r = 2M (the two
lines do not belong to the coordinate patch of the coordinate system {t, r }, but only
“touches the edge” from the outside).
The coordinate t is not defined yet in the three regions B, W and A . Recall the
relations between the coordinates V, U and t, r∗ in region A:

V = exp[(r∗ + t)/4M] , U = − exp[(r∗ − t)/4M] . (9.4.31)

Reversely, we can define the t coordinates in the other three regions in terms of V, U
using the following relations:

Region B V = exp[(r∗ + t)/4M] , U = exp[(r∗ − t)/4M] ,


Region W V = − exp[(r∗ + t)/4M] , U = − exp[(r∗ − t)/4M] , (9.4.31 )
Region A V = − exp[(r∗ + t)/4M] , U = exp[(r∗ − t)/4M] ,
9.4 The Kruskal Extension and Schwarzschild Black Holes 451

where
r∗ ≡ r + 2M ln |r/2M − 1| . (9.4.32)

Applying the line element (9.4.28) to regions B, W, A and rewriting the line element
in terms of t, r by means of (9.4.31 ) and (9.4.32), we still get (9.4.1), where the
range of r for regions B, W is 0 < r < 2M, and for A is r > 2M. Thus, the metric
in A, A and B, W are, respectively, the Schwarzschild line element (9.4.1) restricted
to r > 2M and 0 < r < 2M. The relations between the coordinate T, X and t, r in
these 4 regions are as follows:

Region A T = (r/2M − 1)1/2 er/4M sinh(t/4M) ,


X = (r/2M − 1)1/2 er/4M cosh(t/4M) , (9.4.33)
Region B T = (1 − r/2M) 1/2 r/4M
e cosh(t/4M) ,
X = (1 − r/2M) 1/2 r/4M
e sinh(t/4M) , (9.4.34)
Region W T = −(1 − r/2M) 1/2 r/4M
e cosh(t/4M) ,
X = −(1 − r/2M) 1/2 r/4M
e sinh(t/4M) , (9.4.35)
Region A T = −(r/2M − 1) 1/2 r/4M
e sinh(t/4M) ,
X = −(r/2M − 1) 1/2 r/4M
e cosh(t/4M) . (9.4.36)

The inverse transformations are

Regions A, B, W, A (r/2M − 1)er/2M = X 2 − T 2 , (9.4.37)


−1
Regions A, A t/2M = 2 tanh (T / X ) , (9.4.38)
−1
Regions B, W t/2M = 2 tanh (X/T ) . (9.4.39)

We have mentioned at the beginning of this subsection that according to (9.4.1), on


the one hand we cannot take r = 2M, while on the other hand we cannot take both
r > 2M and 0 < r < 2M (otherwise it would be disconnected). However, things will
be different now that we have the Kruskal extension. This extension indicates that the
Schwarzschild metric is defined on regions A, B and their intersection N+ 1 (on which
r = 2M, t = ∞), and A ∪ N+ 1 ∪ B is a connected manifold. An “ingoing” (r keeps
decreasing), future-directed null curve starting from any point in A will inevitably
cross N+ +
1 and enter B. (However, a timelike curve can go to infinity with N1 as the
asymptote). In contrast, for a future-directed timelike or null curve starting from any
point in B it will be impossible to cross N+
1 and enter A, and its end can only be falling
into the singularity. (The singularity does not belong to the spacetime. The precise
meaning of “falling into the singularity” is that the r of this world line becomes
smaller and smaller, and approaches zero. For a timelike geodesic, falling into the
singularity means that the freely falling observer it represents vanishes from the
spacetime when the proper time reaches a certain value, which is indeed incredibly
singular.) This indicates that N+ 1 is a “one-way membrane” with no way out. Any
452 9 Schwarzschild Spacetimes

object (including a photon) in the region A can never return to A (but can only fall
into the singularity) once it enters the region B. Therefore, the region B is called
a black hole, and N+ 1 is called the event horizon. Considering that each point in
Fig. 9.13 represents a 2-dimensional sphere, and thus the black hole is a 4-dimensional
spacetime region, while the event horizon is a (3-dimensional) null hypersurface (the
proof for the event horizon being a null hypersurface is left as Exercise 9.11, see the
hint therein). The region A is characterized by X < 0 and X 2 > T 2 , and it also has
r > 2M. In fact, it has exactly the same properties as the region A, including that
its relationship with the black hole B is similar to the relationship between A and B,
and hence N+ 2 is the event horizon of A . However, A and A do not have any causal
relation: any timelike or null curve staring from A cannot enter A and vise versa. In
this sense, people also often refer to A and A as two (independent) “universes”. The
region W is characterized by T < 0 and X 2 < T 2 , and it also has r < 2M. W and A
(or A ) are only divided by a “membrane”, which is the null hypersurface N− −
2 (or N1 ).
− −
Both N2 and N1 are “one-way membranes” with no way out. Any future-directed
timelike or null curve in W will cross N− −
2 (or N1 ) and enter A (or A ). Since B is
called a black hole, W is naturally called a white hole.
The above discussion is about the maximal extension of Schwarzschild spacetime
obtained under the premise that the entire spacetime is a vacuum. Although this
extension includes some tempting terminologies such as black hole, white hole, event
horizon and the two identical “universes”, the physical existence (authenticity) of it
deserves additional discussions. From the perspective of the initial value problem,
the chance for this entire spacetime to exist is very small, while part of it (including
part of A, B and the event horizon in between) is very meaningful, see Sect. 9.4.6
for details.
At the end of this subsection, we would like to discuss the Killing vector fields of
the maximally extended Schwarzschild spacetime. Before the extension, the space-
time has 4 independent Killing vector fields, in which 3 of them reflect the spherical
symmetry, see the ξ1a , ξ2a , ξ3a in Sect. 8.2; the fourth one reflects the staticity, namely
ξ a = (∂/∂t)a . For the maximally extended Schwarzschild spacetime, ξ1a , ξ2a , ξ3a still
reflect the spherical symmetry. Since the t coordinate is defined in all for regions
A, A’, B, W, and the line elements in all regions written in terms of t, r are the
original Schwarzschild form, the ξ a = (∂/∂t)a in each region is still a Killing field.
Note that in B and W ξ a is not timelike but spacelike, since it follows from the line
element (9.4.1) that r < 2M leads to gab (∂/∂t)a (∂/∂t)b > 0. There does not exist
other independent Killing vector fields besides ξ1a , ξ2a , ξ3a and ξ a , and hence B and
W are not static spacetime regions. (∂/∂t)a is undefined on the null hypersurfaces
N1 and N2 , since the coordinate t is undefined on it (t = ±∞). However, one can
express ξ a in A using the coordinate basis vectors (∂/∂ V )a and (∂/∂U )a as

1
ξ a = (∂/∂t)a = [V (∂/∂ V )a − U (∂/∂U )a ] . (9.4.40)
4M
Since (∂/∂ V )a and (∂/∂U )a are well-defined on N1 and N2 , one can define the vector
field ξ a on N1 and N2 using the above equation, and verify that it is a null Killing
9.4 The Kruskal Extension and Schwarzschild Black Holes 453

vector field. Hence, on the whole manifold there is a fourth C ∞ Killing vector field
ξ a , which is orthogonal to the other 3 independent Killing vector fields. Thus, the
symmetry of the maximally extended Schwarzschild spacetime is characterized by
4 Killing fields, in which three reflect the spherical symmetry and the fourth (i.e.,
ξ a ) is timelike in A and A , spacelike in B and W, and null on N1 and N2 . Thus,
we can see the necessity of changing “static” to “Schwarzschild” in the original
formulation of Birkhoff’s theorem “a spherically symmetric solution of the vacuum
Einstein equation must be a static metric” (see Sect. 8.3.3): the Schwarzschild metric
is not necessarily a static metric. From the geometric perspective, the essence of
Birkhoff’s theorem is: if the metric satisfies the vacuum Einstein equation and has
the three Killing vector fields reflecting the spherical symmetry, then it must have a
fourth (additional, not preassigned) Killing vector field ξ a , which can be timelike,
spacelike, or even null, depending on where the spacetime point is located.

9.4.4 Surfaces of Infinite Redshift in Schwarzschild


Spacetimes

Suppose the radius coordinates of static observers G and G outside the event horizon
are r and r (> r ), respectively. G emits light toward G . One can derive from (9.2.3)
that the redshift z ≡ (λ − λ)/λ. If r is fixed, then z is a function of r satisfying
dz(r )/dr < 0 and limr →2M z(r ) = ∞. Therefore, the hypersurface r = 2M is also
called the surface of infinite redshift. However, one should not say “the light emitted
from the surface of infinite redshift will have an infinite redshift when it reaches
G ”, since any outgoing null geodesic emitted from the hypersurface r = 2M (event
horizon) can only lie on the horizon and can never reach G .
The Schwarzschild spacetime (region A) has only one static reference frame (it
has only one hypersurface orthogonal Killing vector field, namely ξ a ), but there are
infinitely many stationary reference frames. This is because a linear combination of
ξ a and the spatial Killing field (∂/∂ϕ)a , ξ̃ a ≡ ξ a + β(∂/∂ϕ)a (where β is a constant)
is also a Killing field, and so ξ̃ a corresponds to a stationary reference frame in the
region where it is timelike. Equation (9.2.2) can be applied to any stationary reference
frame, where one just needs to interpret the ξ a as ξ̃ a . The surface of infinite redshift
corresponds to −ξ̃ a ξ̃a = 0, and thus relies on the stationary reference frame. In fact , if
one wants, one can even find a stationary reference frame that has a surface of infinite
redshift for Minkowski spacetime. Since the static reference frame in Schwarzschild
spacetime is unique, unless otherwise indicated, the surface of infinite redshift will
refer to the surface −ξ a ξa = 0 (which coincides with the event horizon), and the
“redshift factor” will mean

χ = (−ξ a ξa )1/2 = (1 − 2M/r )1/2 .


454 9 Schwarzschild Spacetimes

Fig. 9.14 The embedding


diagram of the maximally
extended Schwarzschild
spacetime (T = 0, one
dimension suppressed)

9.4.5 Embedding Diagrams [Optional Reading]

Lots of literature, including textbooks and popular science books like to use embedding
diagrams (see Fig. 9.14) to intuitively describe the Schwarzschild black hole. This subsection
provides an introduction to embedding diagrams. To start with, we first discuss the simple
case of the embedding diagram for a static spherically symmetric star. Equation (9.3.19)
represents the metric inside a static spherically symmetric star, whose induced line element
on any constant-t surface t reads
   
2m(r ) −1 2
ds 2 = 1 − dr + r 2 dθ 2 + sin2 θdϕ 2 . (9.4.41)
r

Let R be the radius of the star. If we let m(r ) take a constant value M ≡ m(R) when r  R,
then the above equation applies to both the inner and outer parts of the star. This is a curved
line element. Due to the spherical symmetry, we can just consider the cross section with
θ = π/2 in t (denoted by S), whose induced line element is
 
2m(r ) −1 2
ds 2 = 1 − dr + r 2 dϕ 2 . (9.4.42)
r

Let gab represent the metric corresponding to this line element, then (S, gab ) is a 2-
dimensional Riemannian space. To intuitively manifest its intrinsic warping, one can embed
it into the one higher dimensional Euclidean space (R3 , δab ), i.e., consider the embedding
φ : S → R3 , and use the warping of φ[S] in R3 to intuitively reflect the intrinsic warping of
(S, gab ). Figure 9.15 is the embedding diagram that embeds (S, gab ) into (R3 , δab ). From
this figure we can see that the further from the center of the star, the lesser the space warps,
and as r → ∞ it approaches flat space. However, how is this diagram drawn? Based on what
principle can we draw a diagram with this kind of effect?
The line element expression of the 3-dimensional Euclidean metric δab in a cylindrical
coordinate system {z, r, ϕ} reads

ds 2 = dz 2 + dr 2 + r 2 dϕ 2 . (9.4.43)
Take a radial line segment on S. The difference between the values of r at its ends p and
p is dr (see Fig. 9.16 left). If we parallelly transport this segment to somewhere on S with
a different value of r , then although the new segment has the same dr as the old one, the
arc length is in general different [see (9.4.42)]. This is a significant manifestation of the
intrinsic warping of (S, gab ). Let q ≡ φ( p) and q ≡ φ( p ). As long as we assure that the
9.4 The Kruskal Extension and Schwarzschild Black Holes 455

Fig. 9.15 The embedding


diagram of a static
spherically symmetric star
(one dimension suppressed)

line segments qq and pp have the same arc length when drawing the diagram, then the
external warping of φ[S] in R3 reflects the above-mentioned intrinsic warping of (S, gab ).
This is the principle of making the embedding diagram. Based on this one can find the
equation of the surface φ[S], and then draw φ[S]. As a hypersurface in R3 , the equation of
φ[S] can be expressed as f (z, r ) = 0 (the axial symmetry makes f to not depend on ϕ).
This corresponds to a function of one variable, z = z(r ), which represents the dependence
of the value of z on the value of r at an arbitrary point on φ[S]. Hence, the arc length of any
line segment on φ[S] is

ds 2 = dz 2 + dr 2 + r 2 dϕ 2 = {[dz(r )/dr ]2 + 1}dr 2 + r 2 dϕ 2 . (9.4.44)


Comparing this with (9.4.42) yields
  
dz(r ) 2
2m(r ) −1 dz(r ) 2m(r )
+1= 1− , i.e., = . (9.4.45)
dr r dr r − 2m(r )

Stipulate z(0) = 0, then



r 2m(r )
z(r ) = dr (for 0 < r < ∞) . (9.4.46)
0 r − 2m(r )

Since m(r ) = M for r  R, we have



z(r ) = 8M(r − 2M) + C (for r  R) , (9.4.47)
where 
 R 2m(r )
C ≡ − 8M(R − 2M) + dr (9.4.48)
0 r − 2m(r )
is a constant. Although for points with r < R, z(r ) depends on the form of the function
m(r ), r > 2m(r ) guarantees that z(r ) is monotonically increasing, and (9.4.46) indicates
that φ[S] is a circular paraboloid when r > R. Hence, we have the embedding diagram
shown in Fig. 9.15 (the r < R part is only a qualitative sketch). Note that the background
Euclidean space (R3 , δab ) is introduced by hand only for showing φ[S], and the points with
actual physical meaning are only the points on φ[S]. (Do not think there is something filled
in the “hat”!)
Now it is not difficult to understand Fig. 9.14, which is actually the embedding of the “whole
space” 0 at T = 0 (at t = 0) in the Kruskal extension of the Schwarzschild solution (see
Fig. 9.13). 0 contains all the points (each represents an S 2 ) on the X -axis in Fig. 9.13.
Following the above derivation we can see that the hypersurface φ( 0 ) in the embedding
diagram (one dimension is suppressed) is a circular paraboloid determined by the equation

z(r ) = ± 8M(r − 2M). (9.4.49)
456 9 Schwarzschild Spacetimes

Fig. 9.16 Embedding a surface S in the spacetime of a static spherically symmetric star into
(R3 , δab )

(Due to the asymptotic flatness, it develops from above and below into two surfaces that are
approximately planes.) Since the value of r in any point of 0 is greater than or equal to
2M, there does not exist a point with r < 2M in the embedding diagram. The whole “space”
is divided by the circle formed by the points of r = 2M (it is actually a sphere, called the
throat) into upper and lower halves, which correspond respectively to the X > 0 and X < 0
parts on the X -axis in Fig. 9.13b. For instance, the points p and p in Fig. 9.13b correspond
respectively to the circles (spheres) φ( p) and φ( p ) in Fig. 9.14. It is necessary to reiterate
that only the circular paraboloid represents the “whole space” 0 at t = 0, while the points
outside the surface do not have any physical meaning.

9.4.6 The Gravitational Collapse of a Spherical Star and


Schwarzschild Black Holes

As we have mentioned in Sect. 9.3.2, if a star in its late stage of evolution wants to
maintain hydrostatic equilibrium in its interior [satisfying (9.3.17)], its mass must be
less than the upper mass limit of a neutron star. If a star whose initial mass is greater
than this upper bound cannot eject enough mass during its evolution and become
a stable white dwarf or neutron star, then it cannot be stable at all but can only
keep contracting until it becomes a black hole. According to Birkhoff’s theorem (see
Sect. 8.3.3), the exterior of the star must have a Schwarzschild metric, and hence
can be described by the spacetime diagram shown in Fig. 9.17. The non-shadow
region in the diagram is identical to the corresponding part in Fig. 9.13, while the
shadow region is described by the interior metric (non-vacuum solution to Einstein’s
equation). Therefore, the spacetime of a collapsing star does not have the white hole
region W at all, and it does not have the Region A’ either, while the black hole region
B and part of the region A in this case are of great significance. No matter how solid
the matter constructing the star is, as long as the surface of the star crosses the event
horizon, it has to keep contracting until the entire star is squashed into the singularity.
The reason is simple: the world line of any point on the surface of the star must lie
inside the light cone (must be timelike), and thus the angle between the T -axis and
9.4 The Kruskal Extension and Schwarzschild Black Holes 457

Fig. 9.17 The late time


collapse of a massive star
described by the Kruskal
coordinates. The vacuum
Schwarzschild solution only
applies to the exterior of the
star’s surface. The
unshadowed region B
represents the black hole
caused by the collapse

the world line must be smaller than 45◦ (note that the lines tilted at 45◦ in Figs. 9.13
and 9.17 represent radial null geodesics). The Schwarzschild coordinates can only
cover the spacetime region of r > 2M (or 0 < r < 2M), and thus cannot manifest
the whole process of a star in its late stage collapse into a black hole; particularly, it
cannot manifest the most crucial step in the whole process—the surface of the star
is contracted into the event horizon. If we want to describe the collapse of a star
using the Schwarzschild coordinates, then we can only draw it as Fig. 9.18. Since the
Schwarzschild coordinate t is undefined at r = 2M, this figure is actually just the
result of combining two diagrams (representing r > 2M and 0 < r < 2M) together.
The right part of the figure (r > 2M) may mislead people into thinking that the
surface of the collapsing star is always outside the event horizon r = 2M. This kind
of misunderstanding comes from confusing t = ∞ with “always” (readers who are
familiar with Zeno’s paradox may notice the similarity between the coordinate time
t and “Achilles time”). From Fig. 9.17 we can see that the intersection of the star’s
surface and r = 2M (see p in Fig. 9.17) corresponds to t = ∞. However, the proper
time τ of an observer at a point p on the star’s surface has a finite value, and they
will enter the black hole and fall into the singularity in a very short time τ . (For a
black hole with M = M , τ is approximately 2 × 10−5 s.)
The process of a star collapsing into a black hole can be represented more intu-
itively by another coordinate system—the ingoing Eddington-Finkelstein coordinate
system {v, r, θ, ϕ}. Although it cannot cover the maximally extended Schwarzschild
spacetime like the Kruskal system, it can cover the regions A and B (unlike the
Schwarzschild coordinate system which can only cover any one of the four regions).
The r, θ, ϕ in this system are the same as the corresponding coordinates in the
Schwarzschild system, and v := t + r∗ . The line element of the first two dimen-
sions in the Schwarzschild system

dŝ 2 = −(1 − 2M/r )dt 2 + (1 − 2M/r )−1 dr 2 (9.4.50)


458 9 Schwarzschild Spacetimes

Fig. 9.18 The late time


collapse of a massive star
described by the
Schwarzschild coordinates.
Although not until t → ∞
will the star’s surface shrink
to r = 2M, this does not
indicate that it will always be
outside the horizon, since the
coordinate time approaching
infinity does not represent
“always”

in the ingoing Eddington-Finkelstein system becomes

dŝ 2 = −(1 − 2M/r )dv 2 + 2dvdr . (9.4.51)

It follows from the equation above that the nonvanishing components gvv = −(1 −
2M/r ), gvr = 1 and the determinant g = −1 are all well-behaved at r = 2M, and
hence r = 2M is no longer a singularity. Also, considering that v ∈ (−∞, ∞) cor-
responds to V ∈ (0, ∞), we can see that {v, r } can cover the regions A and B in
Fig. 9.13. grr = 0 and gvv = −(1 − 2M/r ) also indicates that the coordinate basis
vector (∂/∂r )a of the Eddington-Finkelstein system {v, r, θ, ϕ} is a null vector while
(∂/∂v)a is a timelike (for region A) or spacelike (for region B) vector. Suppose η(λ)
is an arbitrary radial null geodesic in the regions A and B, then it follows from
(9.4.51) that
  2  
2M dv dv dr dv 2M dv dr
0=− 1− +2 = − 1− +2 .
r dλ dλ dλ dλ r dλ dλ

This indicates that the radial null geodesics can be classified into two families, which
are characterized respectively by the following conditions:

dv
(1) = 0, i.e., v = constant , (9.4.52)
dλ  
2M dv dr dv 2r
(2) − 1− +2 = 0, and hence = . (9.4.53)
r dλ dλ dr r − 2M
9.4 The Kruskal Extension and Schwarzschild Black Holes 459

Fig. 9.19 The behavior of


the two families of null
geodesics in 2-dimensional
Schwarzschild spacetime in
the coordinate system {t˜, r }

The first family of null geodesics are horizontal lines in the vr -diagram, which is not
intuitive enough. Define t˜ := v − r , then it follows from (9.4.51) that

dŝ 2 = −(1 − 2M/r )dt˜2 + (4M/r )dt˜dr + (1 + 2M/r )dr 2 . (9.4.54)

The behaviors of the two families of null geodesics in the coordinate system {t˜, r }
are shown in Fig. 9.19: the equation for family (1) (incoming family) is dt˜/dr = −1,
and thus it is a family of parallel lines with slope −1; the equation for family (2)
(outgoing family) is
dt˜ r + 2M
= .
dr r − 2M

The behavior of this family is rather special: all of the null geodesics are curves
except for a vertical line (r = 2M); for a curve on the right of the vertical line, the
value of r increases as the affine parameter λ increases (which is truly outgoing),
while for a curve on the left of the vertical line, the value of r decreases as λ increases
(which is actually ingoing, but belongs to the outgoing family). This oddity reflects
an important property of the black hole: r = 2M is the event horizon, and any photon
inside the horizon (r < 2M) cannot cross the horizon and come out of the black hole
(to r > 2M); their values of r can only keep decreasing to zero. Based on the two
families of null geodesics one can easily draw the light cone at each point, which
is helpful for analyzing the motion of a point mass, since the world line of a point
mass is a timelike curve and the tangent vector at each point on the line must lie
inside the light cone at this point. Thus, a point mass outside the event horizon can
cross the horizon and enter the black hole, while once it enters there is no way out,
and it can only fall in to the singularity. Revolving Fig. 9.19 with respect to the
t˜-axis we get the 3-dimensional spacetime diagram (see Fig. 9.20), and then adding
the world tube of the surface of the collapsing star (the cannon shape surface in
Fig. 9.20) we can intuitively represent the exterior spacetime geometry of a star
collapsing into a black hole. To facilitate understanding this, let us consider the
460 9 Schwarzschild Spacetimes

Fig. 9.20 The spacetime


diagram of the star
collapsing into a black hole
in the system {t˜, r }. A black
hole explorer will inevitably
fall into the singularity if
they do not turn around
before reaching the horizon

following thought experiment (ignore the effects of the tidal force). Suppose you are
an observer exploring a black hole on a spaceship with enough fuel. If you do not
turn on the engine, the spaceship will fall freely, and will evidently cross the event
horizon, enter the black hole and die at the singularity. If you turn around before
reaching the horizon and go full steam ahead (namely let r increase before it reaches
2M), you will be able to return safely and write an exploration report. However, if
you go for one more step and reach the horizon (note that you will not feel anything
special when your world line intersects the horizon), then this moment will become
the regret of a lifetime, because from the light cone on the horizon we can see that
you cannot return once you are at the horizon. You cannot even have a phone call
wirelessly to your distant friend, since the “outgoing” photon on the horizon can
only go upward vertically along the horizon (r remains equal to 2M), see Fig. 9.20.
Now we will discuss the appearance of the collapsing star. Figure 9.21 shows
the situation of a photon emitted from the surface of the star reaching an exterior
static observer. Since the photon emitted from the surface of the star cannot reach
outside the horizon, seemingly an exterior observer will find that the star gets smaller
gradually and vanishes suddenly. However, if we look at Fig. 9.21 more carefully we
will see that this is not the case. Since the world line of an outgoing photon outside
the horizon will be more steep when it comes closer to the horizon, and becomes
completely vertical on the horizon (lies on the horizon), the exterior observers will
always (no matter how large its proper time τ is) receive the light emitted from the
surface of the star outside the horizon. They will observe that the star contracts slower
9.4 The Kruskal Extension and Schwarzschild Black Holes 461

Fig. 9.21 An exterior


observer in principle can
always receive the light
emitted from the star’s
surface. τ1 > τ1
indicates that there exists
redshift; τ2 = τ1 and
τ2 > τ1 indicates that the
redshift is getting stronger

and slower, and approaches a certain size,10 i.e., the radius of the star will approach
2M with a smaller and smaller speed and get “frozen” at this size. This phenomenon
is also called the time dilation effect of the gravitational field. We have mentioned
in Chap. 6 that when comparing the rates of clocks, first one needs to stipulate a
specific method for “clock comparison”. In the situation of Fig. 9.21, the world lines
of photons becomes the key of clock comparison: we stipulate the proper times of the
world lines of, respectively, an observer on the star’s surface and an exterior observer
set by the world lines of two neighboring radial photons, τ and τ , as the objects
to compare. Calculation shows that (see Optional Reading 9.4.2) τ > τ , and if
τ2 = τ1 then τ2 > τ1 , i.e., τ /τ increases as τ increases. Thus, the exterior
observer views that a standard clock on the star’s surface is not only slower than his
own clock, but also becomes slower and slower as it goes (note that this kind of “view”
is the outcome of both the spacetime geometry and the method of clock comparison
we just stipulated). Another manifestation of this effect of time dilation is the redshift.
Regard two neighboring null geodesics as the world lines of two neighboring wave
crests, then τ and τ are the periods of the light waves measured by the observer

10 Later we will see that the light waves received by the exterior observers will have stronger and
stronger redshift. Thus, only if we assume (theoretically) that the observer is sensitive to light of
any wavelength and intensity can they observe this phenomenon.
462 9 Schwarzschild Spacetimes

on the star’s surface and the exterior observer, respectively. τ > τ indicates that
the light wave received by the exterior observer has a longer wavelength, i.e., it has a
redshift, and τ /τ increasing with the increase of τ indicates that the redshift is
getting stronger. However, this kind of redshift is different from the redshift between
stationary observers which we discussed in Sect. 9.2.1, since the observer on the
star’s surface is not a stationary observer, see Optional Reading 9.4.2 for details.
[Optional Reading 9.4.1]
Beginners often feel confused by the fact that the spacetime diagrams of the same physical
process in different coordinate systems can be so different. The essence of the problem is
actually very simple: a coordinate system by definition is nothing but a map from an open
set O of the manifold to an open set V of Rn , and a spacetime diagram is a diagram in
V . The same physical process can certainly have different spacetime diagrams in different
coordinate systems. So we may say that a physical process is absolute, while a spacetime
diagram is relative (since a coordinate system is involved). We have pointed this out when
we first introduced spacetime diagrams in Sect. 6.1.5.
It follows from (9.4.54) that the coordinate basis vectors (∂/∂ t˜)a and (∂/∂r )a are not
orthogonal. In Fig. 9.19, the t˜-axis and the r -axis are drawn to be orthogonal; this is because
the spacetime diagram is a diagram in an open set V of R, which does not reflect the
spacetime metric, and thus does not reflect the orthogonality of vectors. All Fig. 9.19 tells
us is: all the vertical lines have r as a constant, all the horizontal lines have t˜ as a constant.
Only in this way can the two families of null geodesics characterized by dt˜/dr = −1 and
dt˜/dr = (r + 2M)/(r − 2M) be represented as the two families of curves in the diagram.
[The End of Optional Reading 9.4.1]

[Optional Reading 9.4.2]


To simplify the discussion, we consider the simplest model of a star, i.e., a spherically
symmetric star with a uniform density and no pressure (i.e., a dust cloud). Since the pressure
gradient is zero, the world line of each point on the star’s surface is a radial timelike geodesic.
Figure 9.22 shows the situation of a photon emitted from an event p on the star’s surface
reaching an exterior observer (event p ). Z a and Z̃ a represent the 4-velocities of a radial
freely falling observer and a static observer at p, respectively; Z a represents the 4-velocity
of an exterior static observer at p . Suppose λ, λ̃ and λ are the wavelengths of the same
photon measured by Z a , Z̃ a and Z a , respectively, then it follows from (9.4.55) that

λ χ
= , (9.4.55)
λ̃ χ
where

 2M 1/2  2M 1/2
χ ≡ (−ξ a ξa )1/2  p = 1 − , χ ≡ (−ξ a ξa )1/2  p = 1 − .
r ( p) r(p )
(9.4.56)
However, the redshift corresponding to τ > τ we mentioned before refers to (λ − λ)/λ.
Based on (9.4.55), to find λ /λ one only has to find λ̃/λ. In this case, only p is involved,
and we can deal with this by using the same approach of special relativity (see Sects. 7.2 and
7.5). This is essentially a problem of Doppler frequency shift, and so we can use (6.6.66a)
directly, in which the γ can be derived as follows:

γ ≡ −gab Z a Z̃ b = −gab (∂/∂τ )a χ −1 (∂/∂t)b = χ −1 E .


9.4 The Kruskal Extension and Schwarzschild Black Holes 463

(E is the energy of the timelike geodesic with Z a as the tangent vector.) Then, from γ =
(1 − u 2 )−1/2 we find the 3-speed of Z a relative to Z̃ a , u = E 2 − χ 2 /E. Plugging this
into (6.6.66a) yields 
λ̃ E + E2 − χ2
= . (9.4.57)
λ χ
The above equation indicates that when p is infinitesimally approaching the event horizon,
there exists an infinite (Doppler) redshift between the wavelengths measured by the observers
Z a and Z̃ a . Combining (9.4.55) and (9.4.57) we can find λ /λ:

λ χ (E + E 2 − χ 2 )
= . (9.4.58)
λ χ2
The above equation can be viewed as a combination of the Doppler redshift and the grav-
itational redshift. By means of this equation we can also give a proof to a conclusion we
claimed before—for Fig. 9.21 we have τ > τ , and τ /τ increases as τ increases.
Since τ and τ can be interpreted as the periods of the light wave when it is emitted and
received, we have 
τ χ (E + E 2 − χ 2 ) χ
= > 2E > E. (9.4.59)
τ χ2 χ
The E in the above equation is the energy of the world line (geodesic) of a point on the
surface of the collapsing star, i.e.,

E = −gab (∂/∂t)a (∂/∂τ )b = −χgab Z̃ a (∂/∂τ )b = χγ ,

where γ ≡ −gab Z̃ a (∂/∂τ )b . Extend this geodesic backwards to r = ∞, then χ = 1 while


γ is still greater then (or equal to) 1, and hence E  1. From (9.4.59) we see that τ > τ ,
and τ /τ increases when χ decreases, and thus τ /τ increases as τ increases. That
is, as the star collapses, the light emitted from its surface will have a stronger and stronger
redshift when it reaches an exterior observer.
[The End of Optional Reading 9.4.2]

Exercises

˜9.1. Consider Taub’s plane symmetric static spacetime, whose line element is
(8.6.1 ). By means of the Killing vector fields, write down the decoupled
equations satisfied by the parametrization t (τ ), x(τ ), y(τ ) and z(τ ) of the
timelike geodesic γ (τ ) (the reader may refer to Sect. 9.1).
9.2. In Newton’s theory of gravity, derive (9.3.18) directly using Fig. 9.8.
˜9.3. Show that the OV equation of hydrostatic equilibrium (9.3.17) can be
rewritten as
2m(r ) 1/2 d p
1− = −(ρ + p)g , (9.4.60)
r dr

where g represents the magnitude of the 4-acceleration U b ∇b U a of a fluid


particle.
464 9 Schwarzschild Spacetimes

Remark. Under the Newtonian approximation [1 − 2m(r )/r ]1/2 = ∼ 1, p =



0, and (9.4.60) becomes d p/dr ∼ = −ρg. Also, g ∼= m(r )/r 2 , and hence we
get (9.3.18), i.e., d p/dr ∼
= −ρm(r )/r 2 .
˜9.4. Show that (9.3.26) approximately goes back to (9.3.23) of Newton’s theory
of gravity when R  M.
˜9.5. Find the relation between the Rindler coordinates t, x and the Lorentzian
coordinates T, X in Minkowski spacetime.
˜9.6. Which Killing vector field in Minkowski spacetime is the timelike Killing
vector field (∂/∂t)a in Rindler spacetime?
˜9.7. Find the magnitude A ≡ (Aa Aa )1/2 of the 4-acceleration of a static observer
in Schwarzschild spacetime. Hint: one may use the conclusion of Exer-
cise 8.3, i.e., Aa = ∇a ln χ .
˜9.8. Name the null geodesic represented by N1 (or N2 ) in Fig. 9.13a as N1 (or
N2 ) for short. Show that: (1) the coordinate V (or U ) is an affine parameter
of the null geodesic N1 (or N2 ); (2) the coordinate r is an affine parameter
of a radial null geodesic other than N1 and N2 .
˜9.9. By introducing coordinates similar to the Kruskal coordinates, eliminate
the coordinate singularity of the following line element at r = R:

ds 2 = −(1 − r 2 /R 2 )dt 2 + (1 − r 2 /R 2 )−1 dr 2 + r 2 (dθ 2 + sin2 θdϕ 2 ) , R = constant.

9.10. Show that the maximally extended Schwarzschild spacetime has an s.p.
curvature singularity. Hint: use (8.3.21).
9.11. Show that the N1 in Fig. 9.13 is a null hypersurface. Hint: one only has
to show that its normal vector n a is null. Note that the equation of N1 is
U = 0, and hence its normal covector is n a ≡ ∇a U .
9.12. Derive (9.4.51) from (9.4.50), and then derive (9.4.54).
˜9.13. Write down the expression for the line element of the Schwarzschild metric
in the outgoing Eddington-Finkelstein coordinate system {u, r, θ, ϕ} (u ≡
t − r∗ ).
*9.14. Show that the ξ a defined in terms of (∂/∂ V )a and (∂/∂U )a [see (9.4.40)]
is a null Killing vector field on N1 and N2 .
*9.15. Transform Figs. 9.21, 9.22 and 9.23. Give another derivation for (9.4.58)
by calculating the τ /τ in the figure. Hints: (1) U ≡ −e(r∗ −t)/4M is
a constant on each outgoing null geodesic. Along the world line of an
exterior static observer and the world line of a freely falling observer on
the star’s surface, derive two expressions for the same dU (with dτ and
dτ respectively). Then putting an equal sign between the two expressions
yields (9.4.58). (2) When writing the expression of dU in terms of dτ one
needs to use the formulae of dt/dτ and dr/dτ expressed by the energy E,
which can be obtained using the approach in Sect. 9.1.
References 465

Fig. 9.22 The relation


between the wavelengths
measured by Z a and Z̃ a is
gravitational redshift; the
relation between the
wavelengths measured by Z a
and Z̃ a is Doppler redshift

Fig. 9.23 Another method


of deriving (9.4.58) (see
Exercise 9.15)

References

Chandrasekhar, S. (1939), An Introduction to the Study of Stellar Structure, University of Chicago


Press, Chicago.
Geroch, R. P. (1968), ‘What is a spacetime singularity in general relativity’, Ann. Phys. 48, 526–540.
Hawking, S. W. and Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time, Cambridge
University Press, Cambridge.
Kruskal, M. D. (1960), ‘Maximal extension of Schwarzschild metric’, Phys. Rev. 119, 1743–1745.
Misner, C., Thorne, K. and Wheeler, J. (1973), Gravitation, W H Freeman and Company, San
Francisco.
Ni, W.-T. (2005), ‘Empirical foundations of relativistic gravity’, Int. J. Mod. Phys. D 14, 901–922.
arXiv:gr-qc/0504116.
466 9 Schwarzschild Spacetimes

Ni, W.-T. (2016), ‘Solar-system tests of the relativistic gravity’, Int. J. Mod. Phys. D 25(14), 1630003.
arXiv:1611.06025.
Sachs, R. K. and Wu, H. (1977), General Relativity for Mathematicians, Spinger-Verlag, New York.
Shapiro, S. S., Davis, J. L., Lebach, D. E. and Gregory, J. S. (2004), ‘Measurement of the solar
gravitational deflection of radio waves using geodetic very-long-baseline interferometry data,
1979-1999’, Phys. Rev. Lett. 92, 121101.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Weinberg, S. (1972), Gravitation and Cosmology: Principles and Applications of the General
Theory of Relativity, John Wiley and Sons, New York.
Will, C. M. (2018), Theory and Experiment in Gravitational Physics, Cambridge University Press,
Cambridge.
Chapter 10
Cosmology I

Thoughts of the universe began ever since the dawn of mankind, full of mystery,
imagination, and wisdom. Almost every sage has thought over, talked about, and
drawn conclusions concerning the universe. However, it is only after the development
of general relativity that cosmology became a genuine science. From the point of view
of general relativity, the universe is the maximal spacetime containing everything in
Nature, with its curvature on large scales and a distribution of matter satisfying the
Einstein field equation.
Among the various branches of physics, cosmology is the most special one in the
following sense: the object it concerns is unique—our universe. There are no other
objects that could be compared with the universe. It is impossible to do experiments
again and again as is done in other branches of physics, because the evolution of the
universe cannot be replayed. The only way to study cosmology is to accumulate data
from observations, to develop cosmological models in order to interpret these data,
to speculate on the unknown past history of the universe, and to predict its future.
There are many cosmological models. In this chapter only the mostly accepted one
is introduced, known as the standard cosmological model1 for its notable success.
There are still various problems in the standard model. Hence, it has been contin-
uously amended ever since its birth. For example, an important amendment to it is
inserting an “inflation” period in the very early universe. Furthermore, observations
in 1998 showed that the universe is currently undergoing an accelerating expansion,
which consequently also requires that the standard model must be amended. A new
standard cosmological model is in development, although there are still open ques-
tions. We will introduce the inflationary model and the new standard cosmological
model in Volume II.

1We will refer to the standard cosmological model as the “standard model” for short when there is
no confusion with the Standard Model of particle physics.

© Science Press 2023 467


C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0_10
468 10 Cosmology I

Cosmology is an actively progressing subject. Therefore, even though the con-


clusions and data are being updated at the time of writing this book, they are still
possibly already old fashioned when the book is published. For the latest progress
of cosmology, readers have to refer to the new literature.

10.1 Kinematics of the Universe

10.1.1 Cosmological Principle

A fundamental postulate in the standard cosmological model, as well as other models,


is the cosmological principle: at each moment, the cosmic space is homogeneous
and isotropic when viewed on a very large scale.
Homogeneity of the cosmic space means that the physical properties are the same
at every point in the space, with no single point being more special than any other.
This is, of course, incorrect on ordinary scales, for stars might be here and there in the
universe, but not everywhere. In fact, matter in the universe tends to be accumulated:
matter is accumulated into stars, stars are accumulated into galaxies, (approximately
106 to 1013 stars are contained in a galaxy. The Milky Way is a quite ordinary galaxy,
consisting of several hundred billion stars.) and some galaxies are accumulated into
clusters of galaxies of various sizes, or even superclusters, …. However, based on
observations on scales of 1010 ly (where ly denotes light-year) or larger, it is believed
that the universe is homogeneous on a very large scale (one greater than 3 × 108 ly,
referred to as a cosmological scale, which is large enough to contain many clusters of
galaxies), with the density being constant from point to point. Here the density refers
to the average density over a cosmological scale, the density obtained by smoothing
matter in a volume that is small enough when viewed on a cosmological scale.
Smoothing is a common trick in physics. For example, matter is not continuously
distributed on microscopic scales (mainly concentrated in atomic nuclei), but it is
regarded as being continuously distributed when viewed on macroscopic scales, so
that a macroscopic density can be defined. If the macroscopic density is uniform from
point to point, in a certain volume, then we say that the matter is homogeneously
distributed in this volume. Here, a “point” in the volume is a “macroscopic point”,
which is a volume that is small enough on macroscopic scales and large enough on
microscopic scales, containing a large number of molecules.
By isotropy, we mean that there is a reference frame such that, for any observer
in it, all directions appear the same, with no single direction being more special
than others. This is also not true on ordinary scales, since we may see a star in
one direction, but no star in another direction. However, it is accepted that there
exists such a reference frame, as described above, if the universe is smoothed on a
cosmological scale.
The postulate that cosmic space is homogeneous and isotropic at a large scale
was suggested by Albert Einstein in 1917, when he applied general relativity to
10.1 Kinematics of the Universe 469

Fig. 10.1 A foliation of the spacetime, each slice represents the space at a certain time

cosmology. At that time, there was little observational data to support it. He suggested
such a postulate in order to simplify the discussion. [Mach’s principle also contributed
to the development of this postulate, see Peebles (1993) pp. 10–16.]
The cosmological principle concerns the properties of the cosmic space. In pre-
relativity physics, the word “space” is rather simple to understand. In relativity,
however, it is not that simple. The key point is that spacetime is an absolute object
in relativity, while space and time are related to a decomposition by an observer.
For the same spacetime, different space and time are the result of different 3 + 1
decompositions. To provide a precise interpretation of “spatial homogeneity”, one
must first have a clear definition of “space”.
In pre-relativity physics, each surface  of absolute simultaneity is the “whole
space” at a certain moment, see Fig. 6.10. In this sense the concept of space (as an
absolute concept) is rather simple. In special relativity, given an inertial reference
frame {t, x, y, z}, a constant-t surface t is the whole space at the time t relative
to this inertial reference frame. In the above cases, each spacetime is foliated by
constant-t surfaces. (This means that, for each spacetime point p, there is exactly
one constant-t surface t such that p ∈ t .) In the former case, the foliation (slicing)
is absolute (i.e., the foliation is unique, as shown in Fig. 10.1a), while in the latter,
the foliation is relative (i.e the foliations for different reference frames are different,
as shown in Fig. 10.1b).
In special relativity, the foliation relative to a given inertial reference frame has
the following properties:
① Each leaf (slice) is a connected spacelike hypersurface.
② The set {t } of all leaves of the foliation is a 1-parameter family. That is, each
real number t corresponds to a unique leaf t in the set, whose t is the coordinate
time of this leaf relative to the given inertial reference frame. Thus a leaf is also
called a specific “time”.
In general relativity, there are no global inertial reference frames in a curved
spacetime. Instead, any foliation similar to the above is acceptable, with each leaf of
the foliation identified with a time. To be more precise, in cosmology, a foliation (or
slicing) {t } of a spacetime (M, gab ) is characterized by a smooth function τ on M
satisfying the following conditions:
① (dτ )a is a timelike 1-form with (dτ )a Z a > 0 for any future-directed timelike
vector field Z a . It follows that each constant-τ surface is a spacelike hypersurface.
② For any real number t, the corresponding constant-τ surface, denoted by t ≡
{ p ∈ M | τ ( p) = t}, is either empty or connected.
470 10 Cosmology I

③ For each pair of real numbers t and t  , if both t and t  are not empty, they
must be diffeomorphic to each other.
It is easy to see that, for each p ∈ M, there exists a unique t such that p ∈ t .
In fact, p ∈ τ ( p) . Each surface t is called a “time”, which is a leaf (or slice) of
the foliation, see Fig. 10.1c.
To be specific, the foliation {t } as in the above is also called the foliation asso-
ciated with τ . Let {t } be a foliation of M associated with a function τ  . Then, since
both τ and τ  are smooth maps from the connected manifold M to R, and since both
(dτ )a and (dτ  )a are nonvanishing on M, the images I ≡ τ [M] and I  ≡ τ  [M] are
both open intervals. In this chapter, we regard {t } and {t } as the same (or an equiv-
alent) foliation if there is a diffeomorphism f : I → I  such that τ  = f ◦ τ . In other
words, the foliations {t } and {t } are equivalent if they differ from each other by
a reparametrization. Later, when we say a spacetime admits a unique homogeneous
foliation, it will be in this sense.
For a more detailed discussion on the topics of foliation as well as 3 + 1 decom-
position, the reader may refer to Sect. 14.4 in Volume II.
In relativity, acceptable foliations for a spacetime satisfying the above condi-
tions are not unique. This is how the concepts of space and time in relativity have
arbitrariness. If the spacetime has some nontrivial symmetries, foliations adapted to
these symmetries are more convenient. In fact, as a special case with zero curvature,
Minkowski spacetime admits various foliations, among which a foliation consisting
of surfaces of simultaneity relative to an inertial reference frame is the most accepted,
because it is associated with the symmetries of Minkowski spacetime.
Spatial homogeneity and spatial isotropy of a spacetime are closely related to
some intrinsic symmetries of the spacetime. By spatial homogeneity, we refer to the
existence of a foliation of the spacetime such that, at all points in the same arbitrary
leaf of the foliation, the geometric properties and physical properties are the same.
Hence each leaf in this foliation is called a surface of homogeneity. Such a foliation
is adapted to the intrinsic symmetries. Although other foliations are acceptable, they
are not convenient for use in that some of their leaves are not surfaces of homogeneity.
Unless otherwise stated, when we talk about spaces in cosmology they are all surfaces
of homogeneity. Spatial isotropy is relative to a reference frame. If there exists a
reference frame in spacetime where no observer can find any spatial direction (a
direction orthogonal to the world line) distinct from other spatial directions in a local
experiment at a certain time, then we say that the spacetime is spatially isotropic.

10.1.2 Spacial Geometries of the Universe

In this subsection we will show that on each surface of homogeneity, there are
only three kinds of possible geometry satisfying the cosmological principle. This
conclusion will significantly simplify the successive discussions.
The cosmological principle assumes that the universe is spatially homogeneous
and spatially isotropic in both physics and geometry. To be precise, and to be
10.1 Kinematics of the Universe 471

Fig. 10.2 Figure for t


defining spatial homogeneity
q
p

convenient for discussing the spatial geometry of the universe, we shall first define
some geometric concepts, including spatial homogeneity and spatial isotropy, using
mathematical language.
Definition 1 A generalized Riemannian space (M, gab ) is said to be homogeneous
if, for any p, q ∈ M, there exists an isometry φ : M → M of gab such that φ( p) = q.
An embedding submanifold i : S → M is said to be homogeneous if (S, h ab )
is a homogeneous generalized Riemannian space, where h ab = i ∗ gab is the induced
metric. For convenience, we will also refer to the image i[S] as a homogeneous
submanifold of M.
A foliation {t } of a spacetime (M, gab ) is called a homogeneous foliation if
each leaf in it is a homogeneous submanifold.
A spacetime (M, gab ) is said to be spatially homogeneous if it admits a homo-
geneous foliation {t } (see Fig. 10.2). Each t in the foliation is called a surface of
homogeneity.

Definition 2 A reference frame R in a spacetime (M, gab ) is said to be spatially


isotropic, or isotropic for short, if for any point on the world line of any observer
(with a 4-velocity Z a ) in R, and for any two spatial vectors w1a and w2a at p with the
same magnitude, there exists an isometry ψ : M → M of gab , such that ψ( p) = p,
ψ∗ Z a = Z a and ψ∗ w1a = w2a (see Fig. 10.3).
An observer in an isotropic reference frame is called an isotropic observer.
A spacetime admitting an isotropic reference frame is called an isotropic space-
time.
For the comparison of scales, a galaxy to the universe is as a drop to the ocean.
Thus, a galaxy is treated as a world line in the spacetime. One may take the conjecture
that every galaxy is an isotropic observer (observations indicate that this conjecture
is almost true, with only a tiny deviation). In the following, galaxies will refer to
isotropic observers unless stated otherwise.

Fig. 10.3 Figure for


defining an isotropic
observer a
Z

w2a
pp
w1a
472 10 Cosmology I

For a spacetime that is both spatially homogeneous and isotropic, it is natural to


ask about the relation between R, its isotropic reference frame, and t , a surface
of homogeneity. We certainly hope that the world lines of the observers in R are
orthogonal to t . However, this is not necessarily true for Minkowski spacetime.
This is because any inertial reference frame in Minkowski spacetime is isotropic,
and the family of surfaces of simultaneity relative to an inertial reference frame
is a homogeneous foliation, but the world line of an observer stationary in one
inertial reference frame is not orthogonal to the surface of simultaneity relative
to another inertial reference frame. However, if a spacetime that is both spatially
homogeneous and isotropic possesses a unique family of surfaces of homogeneity,
then the orthogonality holds. See Proposition 10.1.1, which follows.

Proposition 10.1.1 If an isotropic spacetime (M, gab ) has a unique homogeneous


foliation {t }, then the world line of an isotropic observer is orthogonal to each leaf
in the homogeneous foliation.
[Optional Reading 10.1.1]
Before giving the proof of Proposition 10.1.1, we first prove Lemma 10.1.2 and Proposi-
tion 10.1.3.

Lemma 10.1.2 Suppose  ⊂ M is a homogeneous submanifold of the generalized Rie-


mannian space (M, gab ). Let ψ : M → M be an arbitrary isometry of gab , then ψ[] is
also homogeneous.

Proof For clarity, we will distinguish an embedded submanifold (as a map) and its image
in this proof. Consider a generalized Riemannian space (S, h ab ) so that the map i : S → M
is the embedded submanifold, whose image i[S] = . Then h ab = i ∗ gab . Since ψ is an
isometry, according to Theorem 4.4.5, i  = ψ ◦ i : S → M is also an embedded submanifold
of M, and the corresponding induced metric h ab = i ∗ g is equal to h .
ab ab
The image  of i : S → M being homogeneous means that (S, h ab ) is a homogeneous
generalized Riemannian space. Now that h ab  = h , i  : S → M is also a homogeneous
ab
submanifold. It is easy to see that ψ[] is the image of i  : S → M. Therefore, ψ[] is also
homogeneous.

Proposition 10.1.3 Suppose {t } is a homogeneous foliation of a spacetime (M, gab ) asso-
ciated with a function τ . Let ψ : M → M be an isometry of (M, gab ). If ψ preserves the
future direction (time orientation), then {ψ[t ]} is a homogeneous foliation of (M, gab )
associated with the function (ψ −1 )∗ τ ; otherwise, {ψ[t ]} is a homogeneous foliation of
(M, gab ) associated with the function −(ψ −1 )∗ τ .

Proof First, we show that {ψ[t ]} satisfies the three conditions of a foliation. Let τ  =
(ψ −1 )∗ τ , then τ  is a smooth function on M. Since ψ is an isometry, (dτ  )a = (ψ −1 )∗ (dτ )a
is nonvanishing on M. For an arbitrary future-directed timelike vector field Z a on M and
any p ∈ M,

[(dτ  )a Z a ] p = [(ψ −1 )∗ (dτ )a ](Z a | p ) = (dτ )a [(ψ −1 )∗ (Z a | p )] .

If the isometry ψ preserves the future direction, so does ψ −1 . Thus, (ψ −1 )∗ (Z a | p ) is a


future-directed timelike vector at ψ −1 ( p). It follows that (dτ )a (ψ −1 )∗ (Z a | p ) > 0, i.e.,
[(dτ  )a Z a ] p > 0.
10.1 Kinematics of the Universe 473

For any p ∈ M, τ  | p = (τ ◦ ψ −1 )| p = τ |ψ −1 ( p) . Therefore, ∀t ∈ R, the necessary and


sufficient condition of p ∈ ψ[t ] is ψ −1 ( p) ∈ t . The latter is equivalent to t = τ |ψ −1 ( p) =
τ  | p . Hence, p ∈ ψ[t ] if and only if p is in the constant-τ  surface with τ  = t. That is,
∀t ∈ R, ψ[t ] is a constant-τ  surface. Since {t } is a foliation, each leaf is either empty or
connected. And since ψ is a diffeomorphism, we see that ψ[t ] is either empty or connected.
{t } being a foliation also means that any two leafs t and t  are diffeomorphic to
each other. Since ψ is a diffeomorphism, it is easy to see that ψ[t ] and ψ[t  ] are also
diffeomorphic to each other. Therefore, {ψ[t ]} is a foliation of (M, gab ) associated with
the function τ  = (ψ −1 )∗ τ .
Since {t } is a homogeneous foliation, each t ∈ {t } is a homogeneous hypersurface of
M. Then, according to Lemma 10.1.2, ψ[t ] is homogeneous. As a consequence, {ψ[t ]}
is a homogeneous foliation.
In a similar manner, one can show that if ψ does not preserve the future direction, then
{ψ[t ]} is a homogeneous foliation associated with the function −(ψ −1 )∗ τ .

Proof of Proposition 10.1.1 Suppose p is a point on the world line of an isotropic observer
G, and  is the surface of homogeneity containing p which is not orthogonal to the 4-
velocity Z a of G at p (see Fig. 10.4). Suppose V p is the 4-dimensional tangent space at p,
and W p is the linear subspace of V p orthogonal to Z a . Then the elements in W p are spatial
vectors (with respect to G) at p. Let w1 a ∈ W p be a unit vector tangent to . Since  is not
orthogonal to Z a , there also exists a unit vector w2 a ∈ W p that is not tangent to .
Let ψ be an arbitrary isometry of (M, gab ) such that ψ( p) = p, and ψ∗ Z a = Z a . Then,
according to Proposition 10.1.3, {t } being a homogeneous foliation assures that {ψ[t ]}
is also a homogeneous foliation. Due to the uniqueness of homogeneous foliations, we have
{ψ[t ]} = {t }. Especially, for the leaf  containing p, ψ[] is also a leaf in the same
foliation, which contains ψ( p) = p. Hence, ψ[] = . Since w1 a is tangent to , ψ∗ w1 a
is tangent to ψ[] = , and thus ψ∗ w1 a = w2 a . Consequently, there is no isometry ψ such
that ψ( p) = p and ψ∗ w1 a = w2 a , which contradicts the fact that G is an isotropic observer.

[The End of Optional Reading 10.1.1]

Corollary 10.1.4 If an isotropic spacetime has a unique homogeneous foliation,


then

Fig. 10.4 Figure for the G (isotropic


proof of Proposition 10.1.1
Za

p
w2a
w1a
474 10 Cosmology I

(1) its isotropic reference frame is unique;


(2) an isometry of the spacetime maps one isotropic observer to another.

In addition to the cosmological principle, it is also assumed that the homogeneous


foliation of the cosmic spacetime (i.e., the universe) is unique. Therefore, the world
lines of isotropic observers are orthogonal to the surfaces of homogeneity, and the
unique isotropic reference frame is called the cosmic rest frame (it corresponds to
the comoving reference frame in Sect. 6.5). In this way there is a unique “orthogonal
3 + 1 decomposition” of the cosmic spacetime, in which each surface of homogeneity
is the whole space at a time, and each world line of an isotropic observer (which is
orthogonal to the surfaces of homogeneity) represents the whole history of a spatial
point. Now we will discuss the 3-dimensional geometry of the space (surface of
homogeneity) at an arbitrary time.

Proposition 10.1.5 Suppose h ab is the metric on a surface t of homogeneity


induced by the metric gab of the cosmic spacetime. Let R̂abc d be the curvature tensor
of h ab , and R̂abcd ≡ h de R̂abc e , then there exists a constant K such that

R̂abcd = 2K h c[a h b]d . (10.1.1)

Proof Let  p (2) be the collection of all the 2-forms on t at an arbitrary p ∈


t , where t is treated as an independent 3-dimensional manifold. According to
Theorem 5.1.3,  p (2) is a 3-dimensional vector space. Let R̂ab cd ≡ h ce R̂abe d , then,
∀Ycd ∈  p (2), we have R̂ab cd Ycd ∈  p (2). Hence, R̂ab cd is a linear map from  p (2)
to itself, namely a linear operator on  p (2). It follows from the symmetry of the
curvature tensor, R̂abcd = R̂cdab , that R̂ab cd can be regarded as a symmetric operator2
L on  p (2), i.e., the corresponding matrix of L in an orthonormal basis of  p (2) is
symmetric (for a proof, see Optional Reading 10.1.2). Hence, L is diagonalizable, i.e.,
one can choose a basis of  p (2) such that each vector in this basis is an eigenvector
of L. Due to the isotropy and the uniqueness of the homogeneous foliation, it can be
proved that all the eigenvalues of L are equal (see Optional Reading 10.1.2). Thus,
L could only be the scalar product of a real number 2K and the identity map I on
 p (2). That is,
L = 2K I , K ∈ R. (10.1.2)

Also, it follows from

δa [c δb d] Ycd = δa c δb d Y[cd] = δa c δb d Ycd = Yab

that the identity map I on  p (2) corresponds to the tensor δa [c δb d] . Thus, (10.1.2)
can be rewritten as a equality of tensors

2 In linear algebra, a linear operator L on a real vector space V is said to be symmetric or self-
adjoint if (u, Lv) = (Lu, v) for arbitrary u, v ∈ V . More discussion on the self-adjointness of
linear operators will be given in Appendix B in Volume II.
10.1 Kinematics of the Universe 475

R̂ab cd = 2K δa [c δb d] . (10.1.3)

So far the above equation is valid at a point p ∈ t . Since p is arbitrary, (10.1.3)


holds pointwise on t with K being a scalar field. On account of spatial homogeneity,
K is required to be a constant. Then, contracting both sides of (10.1.3) with h ce h d f
and noticing that h ce h d f R̂ab cd = R̂abe f , we obtain (10.1.1).
[Optional Reading 10.1.2]
Now we prove the following two conclusions we used in the proof above:
(1) For each p ∈ t , when R̂ab cd is regarded as a linear operator on  p (2), it is a
symmetric operator L.
(2) L = 2K I with K ∈ R, where I is the identity map on  p (2).
Before talking about (2), let us first consider an n-dimensional vector space V over R
with an inner product (·, ·). According to the theory of linear algebra, for any symmetric
operator L on V , there exists an orthonormal basis of V formed by the eigenvectors of
L.  p (2) is a 3-dimensional vector space over R. In order to apply the above conclusion
to  p (2) and its linear operator R̂ab cd , an inner product must be defined on  p (2): for
any X ab , Yab ∈  p (2), the inner product is defined by (X, Y ) := X ab Yab , where X ab ≡
h ac h bd X cd . It is obvious that the inner product (·, ·) of  p (2) is symmetric and bilinear.
Since h ab is positive definite, (·, ·) is positive definite. Thus, (·, ·) is indeed an inner product.
For each X ab ∈  p (2), R̂ab cd X cd will be denoted by L X , with abstract indices omitted.
Then, for arbitrary X ab , Yab ∈  p (2),

(X, LY ) = X ab (LY )ab = X ab R̂ab cd Ycd = R̂abcd X ab Y cd ,


(L X, Y ) = (L X )ab Yab = R̂ abcd X cd Yab = R̂abcd X cd Y ab = R̂cdab X ab Y cd .

It follows from R̂abcd = R̂cdab that (X, LY ) = (L X, Y ). This indicates that the linear oper-
ator L (i.e., R̂ab cd ) is a symmetric operator on  p (2). Therefore, there exists an orthonormal
basis of  p (2) formed by the eigenvectors of L. In other words, the corresponding matrix of
L in this basis is diagonal, and each diagonal element is the eigenvalue of a corresponding
eigenvector.
Let I be the identity map on  p (2). In order to show that L = 2K I with K ∈ R, we only
have to show that all the eigenvalues of L are equal and then denote them by 2K , see the
following proposition:

 ∈  (2) are two arbitrary eigenvectors of R̂ cd ,


Proposition 10.1.6 Suppose Yab , Yab p ab

and λ and λ are the corresponding eigenvalues, i.e.,

R̂ab cd Ycd = λYab , R̂ab cd Ycd = λ Yab

. (10.1.4)
Then, λ = λ is assured by the isotropy and the uniqueness of the homogeneous foliation.

Proof Suppose (t , h ab ) is a 3-dimensional Riemannian space. Let W p be the tangent space
of t at p ∈ t . Then (W p , h ab | p ) is a 3-dimensional vector space together with a positive
definite metric, and  p (2) is nothing but the set of all the 2-forms on W p . Denote the dual
form of Yab ∈  p (2) (which is a 1-form) by wa . Then, according to (5.6.1), wc = Y ab ε̂abc /2,
where ε̂abc is the volume element associated with h ab . Raising the index of wc by h ac , we
 /2. We assume
obtain a spatial vector wa = ε̂abc Ybc /2. Similarly, there is also wa = ε̂abc Ybc
that Yab and Yab are chosen such that w a and w a have the same magnitudes, then it follows

from the fact that the cosmic spacetime is isotropic that there exists an isometry ψ of (M, gab )
476 10 Cosmology I

satisfying ψ( p) = p and ψ∗ wa = wa . Since the homogeneous foliation is unique, we have


ψ[t ] = t for the above isometry ψ. Hence, the restriction of ψ on t is an isometry
ψ|t : t → t of h ab . For simplicity, we still denote ψ|t by ψ. Then, ψ ∗ ε̂abc = ε̂abc
and ψ ∗ R̂ab cd = R̂ab cd . It follows from the forms Yab and wc being dual to each other that
Yab = ε̂cab wc . Similarly, Yab  = ε̂ c
cab w . Hence,
 
ψ ∗ Yab

= ψ ∗ ε̂cab wc = ε̂cab ψ∗−1 wc = ε̂cab wc = Yab . (10.1.5)
On the other hand, from (10.1.4) we can see that
 
ψ ∗ R̂ab cd Ycd

= ψ ∗ (λ Yab

). (10.1.6)
Now let us look at both sides of (10.1.6):

l.h.s. of (10.1.6) = R̂ab cd ψ ∗ Ycd



= R̂ab cd Ycd = λYab , (10.1.7)

where we used ψ ∗ R̂ab cd = R̂ab cd in the first equality, (10.1.5) in the second equality, and
(10.1.4) in the third equality. Also,

r.h.s. of (10.1.6) = λ ψ ∗ Yab



= λ Yab . (10.1.8)
Combining (10.1.6), (10.1.7) and (10.1.8) yields λ = λ.
[The End of Optional Reading 10.1.2]
[Optional Reading 10.1.3]
In the proof of Proposition 10.1.5, K being a constant on t follows from the the spatial
homogeneity. However, in order to show that K is a constant on t , spatial homogeneity
is in fact not necessary, as shown in the following. Let ∇a be the derivative operator on
t associated with h ab . Then, applying the Bianchi identity (3.4.8) to (10.1.2) yields 0 =
∇[e R̂ab]cd = 2h c[a h b|d| ∇e] K . Contracting both sides with h ad h cb , we have ∇e K = 0 (where
we have considered that the dimension n of t satisfies n  3). This indicates that K is a
constant on t , since t is connected as a leaf in a foliation. Therefore, when we only care
about the geometry of t , spatial homogeneity is not required for obtaining the conclusion
in Proposition 10.1.5.
[The End of Optional Reading 10.1.3]
A generalized Riemannian space (M, gab ) is called a space of constant curvature
if there exists a constant K such that the Riemann curvature tensor satisfies

Rabcd = 2K gc[a gb]d . (10.1.1 )

According to a proposition in Appendix J, ① a space of constant curvature is locally


maximally symmetric; that is, the dimension of its isometry group (which may only
be a local group), which is also the number of independent Killing vector fields, is
n(n − 1)/2, where n is the spatial dimension. ② Two spaces of constant curvature
with the same dimension, metric signature and K are (locally) isometric; that is,
they have the same local geometry. Equation (10.1.1) indicates that each surface of
homogeneity of the universe, (t , h ab ), is a space of constant curvature.3 Therefore,

3 As shown in the proof of Proposition 10.1.6, the isotropy and the uniqueness of the homogeneous
foliation is a sufficient condition for (t , h ab ) to be a space of constant curvature. In fact, the
10.1 Kinematics of the Universe 477

if we can list out the h ab corresponding to each real number K , we can exhaust all
the possible (local) geometries of t .
Speaking of the geometries with maximal symmetry, the first thing one may have
in mind is a flat metric. For a flat metric, its curvature tensor vanishes, which trivially
satisfies (10.1.3) and (10.1.1 ) with K = 0. Hence, on a surface t of homogeneity,
the line element with K = 0 can be expressed by means of Cartesian coordinates as

dl 2 = dx 2 + dy 2 + dz 2 . (10.1.9)

Of course, for K = 0 we have R̂ab cd = 0, and thus it is impossible for h ab to be


flat when K = 0. Since the metric is maximally symmetric, besides the flat one, it
is also natural to think about a spherically symmetric metric. Usually, a spherically
symmetric metric refers to the metric on a 2-sphere, which is induced by the δab
of the 3-dimensional Euclidean space (R3 , δab ). The corresponding line element
can be expressed as dθ 2 + sin2 θ dϕ 2 . But now we are looking for a spherically
symmetric metric on a 3-dimensional space, which can be induced on a 3-sphere in
4-dimensional Euclidean space (R4 , δab ) by δab . Let x, y, z and w be the Cartesian
coordinates on R4 , then the equation of a 3-sphere (denoted by S R̄ ) reads

x 2 + y 2 + z 2 + w2 = R̄ 2 , (10.1.10)

where R̄ > 0 is the radius of the 3-sphere. Similar to the 3-dimensional Euclidean
space, the spherical coordinates in the 4-dimensional Euclidean space, R, ψ, θ and
ϕ, are defined by
x = R sin ψ sin θ cos ϕ ,
y = R sin ψ sin θ sin ϕ ,
(10.1.11)
z = R sin ψ cos θ ,
w = R cos ψ .

Then, the line element of the 4-dimensional Euclidean space can be expressed as

ds 2 = dx 2 + dy 2 + dz 2 + dw2 = dR 2 + R 2 [dψ 2 + sin2 ψ (dθ 2 + sin2 θ dϕ 2 )] .

From (10.1.10) and (10.1.11) we can see that, on a 3-sphere S R̄ with radius R̄, we
have R = R̄ and dR = 0. Thus, the line element on S R̄ induced by the line element
of the 4-dimensional Euclidean space is

dl 2 = R̄ 2 [dψ 2 + sin2 ψ (dθ 2 + sin2 θ dϕ 2 )] . (10.1.12)

From the above line element, one can find that the curvature of S R̄ reads R̂ab cd =
2 R̄ −2 δa [c δb d] (Exercise 10.1). Hence, the curvature of the 3-sphere S R̄ satisfies

uniqueness of the homogeneous foliation is not necessary, since Minkowski spacetime is obviously
a space of constant curvature whose homogeneous foliation is not unique.
478 10 Cosmology I

Fig. 10.5 A circular t


hyperboloid in Minkowski
spacetime (with two
dimensions suppressed) S

(10.1.3) with K = R̄ −2 . Since R̄ −2 can be any positive number, the 3-spheres with
various radii exhaust the local geometries of the 3-dimensional spaces of constant
curvature with K > 0.
For spaces of constant curvature with K < 0, let us consider a 3-dimensional
circular hyperboloid (denoted by Sξ̄ , shown in Fig. 10.5 with two dimensions sup-
pressed) in 4-dimensional Minkowski spacetime, determined by the following equa-
tion:
t 2 − x 2 − y 2 − z 2 = ξ̄ 2 , (10.1.13)

where t, x, y and z are the Lorentzian coordinates, and ξ̄ is a positive constant.


In the region of Minkowski spacetime where t 2 − x 2 − y 2 − z 2 > 0, we can define
hyperbolic coordinates ξ , ψ, θ and ϕ by the following equations:

x = ξ sinh ψ sin θ cos ϕ ,


y = ξ sinh ψ sin θ sin ϕ ,
(10.1.14)
z = ξ sinh ψ cos θ ,
t = ξ cosh ψ .

Then, the 4-dimensional Minkowski line element can be expressed in the above
region as

ds 2 = −dt 2 + dx 2 + dy 2 + dz 2
= −dξ 2 + ξ 2 [dψ 2 + sinh2 ψ (dθ 2 + sin2 θ dϕ 2 )] . (10.1.15)

From (10.1.14) we can see that on the 3-dimensional hyperboloid Sξ̄ defined by
(10.1.13), we have ξ = ξ̄ and dξ = 0, and hence the line element on Sξ̄ induced by
the 4-dimensional Minkowski line element reads

dl 2 = ξ̄ 2 [dψ 2 + sinh2 ψ (dθ 2 + sin2 θ dϕ 2 )] . (10.1.16)

From the above line element, one can find that the the curvature of Sξ̄ is (left as an
exercise)
R̂ab cd = −2ξ̄ −2 δa [c δb d] . (10.1.17)
10.1 Kinematics of the Universe 479

Hence, the curvature on the 3-dimensional hyperboloid Sξ̄ satisfies (10.1.3) with
K = −ξ̄ −2 . Since ξ̄ −2 could be any positive number, the 3-dimensional hyperboloid
Sξ̄ with various positive ξ̄ exhausts the local geometries of the 3-dimensional spaces
of constant curvature with K < 0.
Summary. As a consequence of the cosmological principle, at any time of the uni-
verse (i.e., for any surface of homogeneity), there are only three kinds of possible
local spatial geometries, described by the following three kinds of metrics:
(a) 3-dimensional spherical metric, whose line element can be expressed in terms
of the spherical coordinates ψ, θ and ϕ as

dl 2 = K −1 [dψ 2 + sin2 ψ (dθ 2 + sin2 θ dϕ 2 )] , K > 0. (10.1.18)

(b) 3-dimensional flat metric, whose line element can be expressed in terms of
the Cartesian coordinates as

dl 2 = dx 2 + dy 2 + dz 2 , K = 0. (10.1.19)

In terms of the spherical coordinates ψ, θ and ϕ, the above line element can also be
written in a form similar to (10.1.18)
 
dl 2 = dψ 2 + ψ 2 dθ 2 + sin2 θ dϕ 2 , (10.1.19 )

(c) 3-dimensional hyperbolic metric, whose line element can be expressed in terms
of the hyperbolic coordinates ψ, θ and ϕ as

dl 2 = −K −1 [dψ 2 + sinh2 ψ (dθ 2 + sin2 θ dϕ 2 )] , K < 0. (10.1.20)

Is the universe spatially finite? Throughout history, this question has been
answered in both the affirmative and the negative. Each of these conceptions have
dominated the mainstream in certain periods, with the amounts of time almost equal.
Now, as it is certain that the spatial geometry of the universe can be classified into
the above three cases, with some further requirements on the global topology (see
the paragraph below) of the universe, this question becomes absolutely clear. In
case (a), the space of the universe is a 3-dimensional sphere, resulting in a “closed
universe” whose volume is finite. Although the universe is “finite’ in this case, it
is “boundless” since a sphere has no boundaries. In case (b) and (c), the space of
the universe is respectively a 3-dimensional Euclidean space and a 3-dimensional
hyperboloid, each resulting in an “open universe” with an infinite volume, and hence
we say that the universe is “infinite” in these two cases. However, which case does
our universe really belong to? We will discuss this problem in Sect. 10.3 and Chap.
15 in Volume II.
It is necessary to elucidate that a space of constant curvature only requires its
metric to satisfy no more than condition (10.1.1 ) Especially, there is no condition
for its global topological structure. Take a surface (t , h ab ) of homogeneity in the
universe as an example. In the case of K > 0, (10.1.1) only leads to the conclusion
480 10 Cosmology I

that the metric h ab is a “3-dimensional spherical metric” satisfying (10.1.18). It does


not indicate that t itself is a 3-sphere, since t may only be part of a 3-sphere, or,
for example, a space obtained by identifying antipodal points of a 3-sphere (namely
a quotient space of a 3-sphere), etc. All such spaces share the same local geometry.
In the case of K = 0, it is possible that t is a space obtained by identifying a pair
of points in R3 whenever the differences of their x-coordinates, y-coordinates and
z-coordinates are all integers. For such a space t , which is known as a quotient
space of R3 , its metric is still flat, and its volume is finite (being exactly 1). By virtue
of the cosmological principle, t cannot be a proper subset of a 3-sphere, or of a
3-dimensional Euclidean space, or of a 3-dimensional hyperboloid, but the quotient
spaces we mentioned above still satisfy the cosmological principle. Nevertheless,
from the perspective of physics, the quotient spaces are not regarded as being natural.
Then, this finally leads to the conclusion about the global spatial geometry of the
universe: when K > 0, t is a 3-sphere (resulting in a closed universe); when K = 0
or K < 0, t is the Euclidean space or a circular hyperboloid, respectively (resulting
in an open universe).

10.1.3 The Robertson-Walker Metric

The cosmic spacetime should be equipped with a metric gab such that, on each surface
t of homogeneity, the induced metric h ab is one of those we obtained in Sect. 10.1.2
(corresponding to the line element dl 2 ). One can introduce a suitable coordinate
system so that the line element of gab can be expressed in a simple form. To do that,
we first point out the following conclusion: for two arbitrary isotropic observers A
and B, and for two arbitrary surfaces t1 , t2 of homogeneity with t1 < t2 , the world
line segments of A and B between t1 and t2 are of the same length. Physically, it
is easy to accept this statement, since all isotropic observers should be on an equal
footing, and the existence of an exceptional observer is hard to imagine. In Optional
Reading 10.1.4, we will give a rigorous proof for this conclusion.
Now let us introduce the coordinate system. On a surface 0 of homogeneity,
set (local) coordinates x 1 ≡ ψ, x 2 ≡ θ and x 3 ≡ ϕ (as described in Sect. 10.1.2 for
the different signs of K ), then the world lines of isotropic observers can carry these
coordinates out of 0 in the following way: along the world line γ of an isotropic
observer, the coordinates ψ, θ and ϕ remain constants, determined by their values
at the intersecting point of γ and 0 . Next, set the standard clock carried by each
isotropic observer (i.e., the proper time τ ) to zero on 0 , and define the coordinate
time t at each spacetime point p to be the proper time τ of the isotropic observer
passing through p. In this way, we have a coordinate system {t, x i } of the cosmic
spacetime, called the Robertson-Walker (RW) coordinate system. This system is
obviously a comoving coordinate system of an isotropic reference frame. Note that
the value of t on a surface of homogeneity can be different from the parameter for
the family {t } of the surfaces of homogeneity, since the parameter for the family
in principle can be assigned arbitrarily. In other words, the homogeneous foliation
{t } can be associated with a function different than the coordinate t. However, to
10.1 Kinematics of the Universe 481

avoid confusion, we will stipulate that the homogeneous foliation {t } is indeed
associated with the coordinate t in the RW system. That is, for an arbitrary surface
t of homogeneity, the time coordinate at every point equals the parameter of t in
the foliation. The RW coordinate system has two major virtues:
① Each constant-t surface is a surface of homogeneity. Therefore a surface t of
homogeneity is also a surface of simultaneity, representing the whole space of the
universe at a time t.
② The world line of an isotropic observer is also a t-coordinate line, with its
coordinate time being its proper time τ , called the cosmic time. Unless otherwise
stated, the “time” in cosmology refers to the cosmic time.
Due to the virtue ②, the coordinate basis vector (∂/∂t)a equals the 4-velocity Z a
of isotropic observers. Hence

g00 = gab (∂/∂t)a (∂/∂t)b = gab Z a Z b = −1 .

Due to the virtue ①, the three spatial coordinate basis vectors (∂/∂ x i )a are all tangent
to surfaces of homogeneity, and thus are orthogonal to (∂/∂t)a . Hence,

g0i = gab (∂/∂t)a (∂/∂ x i )b = 0 , i = 1, 2, 3 .

Since h ab is the metric induced by gab , we have

gi j = gab (∂/∂ x i )a (∂/∂ x j )b = h ab (∂/∂ x i )a (∂/∂ x j )b = h i j , i, j = 1, 2, 3 ,

where the definition of the induced metric (see Definition 1 in Sect. 4.4) is used in the
second step. Note that generally speaking, h i j depends on t, x 1 , x 2 and x 3 , we may
denote it by h i j (t, x) (where x stands for x 1 , x 2 , x 3 ). By means of the uniqueness
of the homogeneous foliation, it can be proved that (see Optional Reading 10.1.5)
h i j (t, x) has the form of “separation of variables”, i.e.,

h i j (t, x) = a 2 (t)ĥ i j (x) , (10.1.21)

with a(t) depending only on t, and ĥ i j (x) depending only on x i . Consequently, the
line element of the cosmic metric gab in the RW coordinate system reads

ds 2 = −dt 2 + a 2 (t)ĥ i j (x) dx i dx j . (10.1.22)

The induced line element on 0 is

dl 2 = a 2 (0) ĥ i j (x) dx i dx j ,

which belongs to the three cases summarized in Sect. 10.1.2. First we look at the
simplest case, i.e., case (b), where the spatial metric is flat. In terms of the Cartesian
coordinates x i , the induced line element on 0 is a 2 (0) ĥ i j (x) dx i dx j = δi j dx i dx j .
We may further take a(0) = 1 so that ĥ i j (x) = δi j . Then, (10.1.22) in this case
482 10 Cosmology I

becomes

ds 2 = −dt 2 + a 2 (t) (dx 2 + dy 2 + dz 2 ) [case (b)] . (10.1.23b)

In terms of the spherical coordinates ψ, θ , ϕ, where ψ ≡ (x 2 + y 2 + z 2 )1/2 , the


above line element reads

ds 2 = −dt 2 + a 2 (t) [dψ 2 + ψ 2 (dθ 2 + sin2 θ dϕ 2 )] [case (b)] . (10.1.23b )

The form of the function a(t) is determined by the Einstein field equation (see
Sect. 10.2.3). Similarly, if the line element induced on 0 has the form of (10.1.18)
or (10.1.20), then we have, respectively

1
dl 2 = a 2 (0) ĥ i j dx i dx j = [dψ 2 + sin2 ψ (dθ 2 + sin2 θ dϕ 2 )] ,
K
1
dl 2 = a 2 (0) ĥ i j dx i dx j = − [dψ 2 + sinh2 ψ (dθ 2 + sin2 θ dϕ 2 )] .
K

For these two cases, we may further take a 2 (0)K = 1 and a 2 (0)K = −1, respec-
tively. Then, (10.1.22) in cases (a) and (c) becomes
  
ds 2 = −dt 2 + a 2 (t) dψ 2 + sin2 ψ dθ 2 + sin2 θ dϕ 2 [case (a)] , (10.1.23a)
  
ds 2 = −dt 2 + a 2 (t) dψ 2 + sinh2 ψ dθ 2 + sin2 θ dϕ 2 [case (c)] . (10.1.23c)

Remark 1 The spacetime metric in (10.1.23b) is curved (unless a(t) is a constant,


while in Sect. 10.2.3 we will see that a(t) is not a constant). It is each 3-dimensional
surface of homogeneity that is flat, not the spacetime itself. Therefore, the spacetime
corresponding to case (b) is a curved spacetime with a flat spatial geometry on each
slice.
For cases (a), (b) and (c), we define r as follows:


⎨sin ψ , [case (a)]
r = ψ, [case (b)] , (10.1.24)


sinh ψ , [case (c)]

Then, (10.1.23a), (10.1.23b ) and (10.1.23c) can be combined to become

dr 2
ds 2 = −dt 2 + a 2 (t) + r 2 (dθ 2 + sin2 θ dϕ 2 ) , (10.1.25)
1 − kr 2

where
10.1 Kinematics of the Universe 483

Fig. 10.6 Spacetime points A


a and b represent galaxies A B
and B at the time t. The
length of the geodesic
segment γ (l) passing a
b
t
through them is the distance
between galaxies A and B (l )
at t


⎨ 1 , [case (a)]
k≡ 0 , [case (b)] (notice the similarity and distinction between k and K ).


−1 , [case (c)]

The metric in (10.1.23b) or (10.1.25) is called the Robertson-Walker metric,4 or


RW metric for short. The above discussion indicates that, by only the cosmological
principle (and the assumed uniqueness of the homogeneous foliation), the cosmic
spacetime metric can be determined to be an RW metric, up to only two undetermined
factors: ① the value of k specifying which case our universe belongs to (which will
be discussed in Sect. 10.3.2); ② the function a(t), determined by Einstein’s equation
(see Sect. 10.2.3).
Before the end of this section, let us talk about the physical meaning of a(t).
Suppose h ab is the metric on a surface t of homogeneity induced by the RW
metric. Then (t , h ab ) is a 3-dimensional Riemannian space. Let a and b be the
intersecting points of t and two galaxies A and B, respectively, and γ (l) be the
geodesic segment lying on t connecting a and b (see Fig. 10.6). Then the arc length
of γ (l) is the distance between a and b in the space t . In physics, this length can
be interpreted as the distance between the galaxies A and B at the time t, denoted
by D AB (t). Let l1 and l2 be the parameters of γ (l) at a and b (assuming l1 < l2 ),
respectively. Then, according to the definition of arc length,

 a  b
l2
∂ ∂
D AB (t) = h ab dl .
l1 ∂l ∂l

Define ĥ ab ≡ a −2 (t) h ab with a(t) > 0, then

D AB (t) = a(t) D̂ AB , (10.1.26)

where

4 This metric is also called the Friedmann–Lemaître–Robertson–Walker (FLRW) metric, since


A. Friedmann and G. Lemaître had considered certain special cases as dynamical solutions to
Einstein’s equation much earlier than H. P. Robertson and A. G. Walker (see Sect. 10.2.3). Whereas
Robertson and Walker (each independently) obtained this general form of the metric first as purely
a geometric result under the spatial homogeneity and isotropy conditions before using it to solve
Einstein’s equation.
484 10 Cosmology I

 a  b
l2
∂ ∂
D̂ AB = ĥ ab dl . (10.1.27)
l1 ∂l ∂l

From the above equation and (10.1.25), we can see that D̂ AB is only determined by
the galaxies A and B, and it is independent of time. Equation (10.1.26) then indicates
that a(t) is the value obtained by measuring the distance between A and B at t, with
D̂ AB as the unit. Hence, a(t) reflects the time dependence of the distance between
any two galaxies, and thus is called the scale factor of the universe. If the spatial
coordinates of the galaxies A and B are (r A , θ, ϕ) and (r B , θ, ϕ), then the parameter
l can be chosen to be r along the geodesic. It can be verified that both θ and ϕ remain
constants along the geodesic γ (l). Then (10.1.26) can be expressed as
rB
dr
D AB (t) = a(t) √ . (10.1.28)
rA 1 − kr 2

It is easy to perform the above integral for all cases of k = 1, 0 and −1. Note that the
−dt 2 in (10.1.25) is −c2 dt 2 in SI, which has the dimension of length. Thus, when
k = ±1, the coordinate r is dimensionless, and a(t) has the dimension of length;
when k = 0, the dimensions of a(t) and r can be arbitrary, with a(t)r having the
dimension of length.
When k = 1, the universe is closed. In this case, one can also ask about the volume
of the universe at any time t (as the volume of a 3-sphere), which is obviously related
to a(t). It follows from (10.1.23a) that the volume element associated with the spatial
induced metric h ab is

ε̂ = a 3 sin2 ψ sin θ dψ ∧ dθ ∧ dϕ .

Hence, the volume of the whole space (as a 3-sphere), is


2π π π
V = ε̂ = a 3 dϕ sin θ dθ sin2 ψ dψ = 2π 2 a 3 . (10.1.29)
0 0 0

Therefore, a 3 (t) is proportional to the volume of the universe at the time t, and a(t)
is exactly the radius of the closed universe at t.
[Optional Reading 10.1.4]
Now we will prove a statement we claimed in the beginning of Sect. 10.1.3, which is
summarized as Proposition 10.1.8. But before that, we shall introduce some facts which will
be useful in the proof of Proposition 10.1.8 and the later discussions.
First, there is a theorem which assures that each point in a Riemannian space has a convex
neighborhood, in which any pair of points can be joined by a unique geodesic segment lying
in it [for a proof of this theorem, see Hicks (1965)]. Then, we have the following proposition.

Proposition 10.1.7 Suppose (M, gab ) is a spacetime satisfying the cosmological principle,
which has a unique homogeneous foliation. Let t be a slice in the foliation, and P be
10.1 Kinematics of the Universe 485

an isotropic observer which intersects t at p, i.e., p = P ∩ t .5 Suppose N is a convex


neighborhood of p in t . Then, for an arbitrary isotropic observer P  such that P  ∩ t ⊂ N ,
there exists an isometry ψ : M → M of gab satisfying the following conditions:
(1) For each hypersurface t  of homogeneity, ψ[t  ] = t  .
(2) Each isotropic observer is mapped by ψ to an isotropic observer;
(3) P is mapped by ψ to P  ;
(4) There exists an isotropic observer which is preserved under ψ (i.e., each point on the
world line is fixed by ψ).

Proof Let p  ∈ N be the intersecting point of P  and t . Since N is a convex neighborhood


of p, there is a unique geodesic γ (l) lying in N from p to p  , with l the arc length parameter.
Let q be the middle point of γ (l), i.e., l pq = lq p , and let wa ≡ −wa = −(∂/∂l)a |q . It follows
from the definition of isotropy that there exists an isometry ψ : M → M such that ψ(q) = q,
ψ∗ (Z a |q ) = Z a |q and ψ∗ wa = wa , where Z a is the 4-velocity of the isotropic observers.
Then, (1) due to the uniqueness of the homogeneous foliation and Proposition 10.1.3, we
have {ψ[t  ]} = {t  }, and ψ(q) = q assures that ψ[t  ] = t  for each hypersurface of
homogeneity. (2) It follows from Corollary 10.1.4 that ψ sends one isotropic observer to
another. (3) Since geodesics and arc lengths are all defined based on the metric gab , ψ
being an isometry makes ψ[γ pq ] = γq p . Thus, ψ( p) = p  , and so ψ[P] = P  . (4) Let
Q be the isotropic observer passing through q. Since ψ[Q] is still an isotropic observer,
ψ(q) = q indicates that ψ[Q] = Q. Then, for each point q  ∈ Q, ψ(q  ) = q  , i.e., Q is
fixed pointwisely by ψ.

Proposition 10.1.8 Assume that the spacetime (M, gab ) satisfies the cosmological principle
and admits a unique homogeneous foliation. Then, for any pair of isotropic observers and
any pair of surfaces t1 and t2 of homogeneity, the lengths of the world line segments of
these two observers between t1 and t2 are equal.6

Proof Suppose A and B are two different isotropic observers, and t1 and t2 are two
different surfaces of homogeneity. Let a1 and a2 be the intersection points of the world line
of A and t1 and t2 , i.e., a1 = A ∩ t1 and a2 = A ∩ t2 . Similarly, let b1 = B ∩ t1 and
b2 = B ∩ t2 (Fig. 10.7).
Now being mindful that (t1 , h ab ) is a 3-dimensional Riemannian space, let us first
suppose that b1 is in a convex neighborhood of a1 . Then, according to Proposition 10.1.7,
there exists an isometry ψ of (M, gab ) satisfying ψ[t1 ] = t1 , ψ[t2 ] = t2 and ψ[A] =
B. Since a2 = A ∩ t2 and b2 = B ∩ t2 , it follows that ψ(a2 ) = b2 . Hence, for the line
segments Aa1 a2 and Bb1 b2 , we have ψ[Aa1 a2 ] = Bb1 b2 . As a consequence, the arc lengths of
Aa1 a2 and Bb1 b2 are equal.
Since (t1 , h ab ) is connected, and since it can be covered by convex neighborhoods, One
can easily show that the arc lengths of Aa1 a2 and Bb1 b2 are still equal even if b1 is not in a
convex neighborhood of a1 .

[The End of Optional Reading 10.1.4]

5 Technically, P ∩ t = { p} is a set with p as its only element. We will recognize it as a point for
convenience.
6 If the homogeneous foliation is not unique, then the lengths of the world line segments of two

isotropic observers between t1 and t2 can be different, even if t1 and t2 are leaves in the same
homogeneous foliation.
486 10 Cosmology I

Fig. 10.7 It takes isotropic A


observers A and B the same B
proper time from t1 to t2
a2 q b2
t2

a1 p w' a
wa b1
t1

[Optional Reading 10.1.5]


Now we provide the proof for (10.1.21). That is, h i j (t, x) has the form of “separation of
variables”.
Suppose t1 and t2 are surfaces of homogeneity, and G is an isotropic observer. Let p1 ≡
G ∩ t1 and p2 ≡ G ∩ t2 , which can be expressed in terms of coordinates as p1 = (t1 , x G i )

and p2 = (t2 , x Gi ). Let X a | and Y a | be two spatial vectors at p with the same magnitude,
p1 p1 1
whose coordinate components are X i and Y i , respectively, i.e., X a | p1 = X i (∂/∂ x i )a | p1 and
Y a | p1 = Y i (∂/∂ x i )a | p1 . Then

h i j (t1 , x G ) X i X j = h i j (t1 , x G ) Y i Y j . (10.1.30)

 G isan isotropic observer, there is an isometry ψ : M → M such that ψ( p1 ) = p1 and


Since
ψ∗ X a | p1 = Y a | p1 . From the definition of the RW coordinate system and Corollary 10.1.4,
we can see that the coordinate transformation {t, x i } → {t  , x i } induced by ψ satisfies

t  = t , x i = f i (x) , i = 1, 2, 3, (10.1.31)
and x G i = f i (x ), where the x in the parentheses stands for x 1 , x 2 and x 3 , and similarly for
G
x G . It follows from (4.1.7) that
∂ f i 
Yi = X j j  . (10.1.32)
∂ x xG
Consider the spatial vectors X a | p2 = X i (∂/∂ x i )a | p2 and Y a | p2 = Y i (∂/∂ x i )a | p2 at p2 .
Since the real numbers X i and Y i satisfy (10.1.32), we have ψ∗ (X a | p2 ) = Y a | p2 . Hence,
X a | p2 and Y a | p2 are of the same magnitude. That is,

h i j (t2 , x G ) X i X j = h i j (t2 , x G ) Y i Y j . (10.1.33)

Next, suppose X a | p1 and Y a | p1 do not have the same magnitude (and Y a | p1 = 0). Since the
3 × 3 matrix constituted by h i j is positive definite, there exists λ ∈ R such that

h i j (t1 , x G )X i X j = λ2 h i j (t1 , x G )Y i Y j = h i j (t1 , x G )(λY i )(λY j ) ,

and hence

h i j (t2 , x G )X i X j = h i j (t2 , x G )(λY i )(λY j ) = λ2 h i j (t2 , x G )Y i Y j .

Therefore, for any nonvanishing row vectors (X 1 , X 2 , X 3 ) and (Y 1 , Y 2 , Y 3 ), we have

h i j (t1 , x G )X i X j h i j (t1 , x G )Y i Y j
= . (10.1.34)
h kl (t2 , x G )X k X l h kl (t2 , x G )Y k Y l
10.1 Kinematics of the Universe 487

Note that the indices i, j, k and l in the above equation are all summed over 1, 2 and 3. The
ratio in the above equation does not depend on (X 1 , X 2 , X 3 ) and (Y 1 , Y 2 , Y 3 ), but is only
determined by t1 , t2 and x G . Thus, there exists a real number ω(t1 , t2 , x G ) such that

h i j (t1 , x G )X i X j = ω(t1 , t2 , x G ) h i j (t2 , x G )X i X j , ∀X 1 , X 2 , X 3 ∈ R . (10.1.35)


Consequently, for arbitrary (X 1 , X 2 , X 3 ) and (Y 1 , Y 2 , Y 3 ), we have

h i j (t1 , x G )(X i + Y i )(X j + Y j ) = ω(t1 , t2 , x G ) h i j (t2 , x G )(X i + Y i )(X j + Y j ) ,


h i j (t1 , x G )(X i − Y i )(X j − Y j ) = ω(t1 , t2 , x G ) h i j (t2 , x G )(X i − Y i )(X j − Y j ) ,

and thus for arbitrary (X 1 , X 2 , X 3 ) and (Y 1 , Y 2 , Y 3 ),

h i j (t1 , x G )X i Y j = ω(t1 , t2 , x G ) h i j (t2 , x G )X i Y j .

Hence, h i j (t1 , x G ) = ω(t1 , t2 , x G )h i j (t2 , x G ). Since the isotropic observer G is arbitrary,


we can simply denote x G as x and obtain

h i j (t1 , x) = ω(t1 , t2 , x)h i j (t2 , x) . (10.1.36)

Finally, now we show that ω actually does not depend on x. For an arbitrary isotropic
observer G with p1 = G ∩ t1 and p2 = G ∩ t2 , there is a convex neighborhood N of
p1 in t1 . For simplicity, we may assume that the coordinate patch of x i covers N . Then,
according to Proposition 10.1.7, for an arbitrary point p1 ∈ N , there is an isometry φ of
(M, gab ) satisfying ① φ maps each t to itself; ② φ maps each isotropic observer to an
isotropic observer, and, especially, ③ φ maps G to G  , the isotropic observer that contains
p1 = φ( p1 ) ∈ G  . Let p2 ∈ G  ∩ t2 , then it follows that p2 = φ( p2 ). Because of ① and ②
above, the coordinate transformation induced by φ will have the form of (10.1.31). In terms
of the new coordinates t  and x i , the line element of the cosmic metric gab can be expressed
as
∂fi ∂f j k l
ds 2 = −dt 2 + h i j (t  , x  ) dx i dx  j = −dt 2 + h i j (t, x  ) k dx dx ,
∂ x ∂ xl
Since φ is an isometry, comparing with the line element in the old coordinate system ds 2 =
−dt 2 + h kl (t, x) dx k dx l , we obtain

∂fi ∂f j
h i j (t, x  ) = h kl (t, x) . (10.1.37)
∂ xk ∂ xl
Setting t to be t1 and t2 yields

∂fi ∂f j
h i j (t1 , x  ) = h kl (t1 , x) , (10.1.38)
∂xk ∂ xl
∂fi ∂f j
h i j (t2 , x  ) k = h kl (t2 , x) . (10.1.39)
∂x ∂ xl
Applying (10.1.36) to both sides of (10.1.38), we have

∂fi ∂f j
ω(t1 , t2 , x  )h i j (t2 , x  ) = ω(t1 , t2 , x)h kl (t2 , x) .
∂ xk ∂ xl
Noticing (10.1.39), we obtain ω(t1 , t2 , x  ) = ω(t1 , t2 , x). To see more clearly that ω does not
depend on x, let us go to the active perspective, i.e., viewing the coordinate transformation
x i → x i under ψ as the map between two observers G and G  in the old coordinates x i .
Then, we have
488 10 Cosmology I

ω(t1 , t2 , x G  ) = ω(t1 , t2 , x G ) .
Since G is arbitrary, the above equation shows that ω(t1 , t2 , x G ) is independent of x G for
G ∩ N = ∅. As t1 can be covered by convex neighborhoods, it follows that ω(t1 , t2 , x)
does not depend on x, so it can be denoted by ω(t1 , t2 ). Then, (10.1.36) turns out to be
h i j (t1 , x) = ω(t1 , t2 )h i j (t2 , x). Particularly, let t2 be fixed and t1 be arbitrary. Denoting t1
by t, we have
h i j (t, x) = ω(t, t2 )h i j (t2 , x) . (10.1.40)
Let ĥ i j (x) ≡ h i j (t2 , x) and a 2 (t) ≡ ω(t, t2 ), the above equation becomes h i j (t, x) =
a 2 (t)ĥ i j (x), i.e., (10.1.21).
[The End of Optional Reading 10.1.5]

10.2 Dynamics of the Universe

10.2.1 The Hubble-Lemaître Law

In the early 20th century, the American astronomer V. S. Slipher observed the spectral
lines of 41 galaxies. He discovered redshifts within 36 of these galaxies. Recall that
a redshift is defined to be z ≡ (λ − λ)/λ, where λ and λ are the wave lengths of
light when it is emitted and observed, respectively. Attributing the redshifts to the
Doppler effect, Slipher’s discovery shows that these 36 galaxies are moving away
from our galaxy, the Milky Way. In other words, this indicates that our universe is
expanding. (Since the solar system is orbiting around the Galactic Center, i.e., the
center of the Milky Way, the blueshifts of the other 5 galaxies can be interpreted as
being caused by their motion toward the Sun.) So far, spectra from tens of thousand
of galaxies have been measured, and all of them are redshift except for few (those
from nearby galaxies). This provides a solid observational basis for the expansion
of the universe. In 1923, the American astronomer Edwin Hubble began to make
measurements of the distance of the extragalactic galaxies from us, which is more
difficult than the measurement of redshifts. He found that the redshift z of a galaxy is
proportional to its distance D from us, and z is equal to the recessional speed u when
the latter is very small. [The Taylor expansion of (6.6.66a) to the first order yields
z∼= u.] Hence, Hubble published the well-known Hubble law in 1929 stating that,

u 0 = H0 D0 , (the subscript 0 stands for “the current value”) (10.2.1)

where H0 is a constant independent of galaxies, known as the Hubble constant.


From the theory side, it is not difficult to derive Hubble’s law from the RW metric.
Define the relative speed, also called the speed of separation, between two galaxies
as u(t) := dD(t)/dt, where D(t) is the proper distance between them at a time t.
From (10.1.26) we can easily see that,
10.2 Dynamics of the Universe 489

da(t) ȧ(t) da(t)


u(t) = D̂ = D(t) , ȧ(t) ≡ . (10.2.2)
dt a(t) dt

Thus, one can define the Hubble parameter

ȧ(t)
H (t) := , (10.2.3)
a(t)

which is independent of both the galaxies and the distance between the galaxies, so
that

u(t) = H (t)D(t) . (10.2.4)

The above equation indicates that, at any time t, the recessional speed (the speed
of separation) between two galaxies is proportional to the proper distance between
them. Let t0 be the present time, and denote H (t0 ) simply by H0 (i.e., the Hubble
constant), then

u(t0 ) = H0 D(t0 ) , (10.2.1 )

which is exactly (10.2.1). The result in (10.2.4) was derived by the Belgian physicist
G. Lemaître two years before Hubble’s article, and thus more properly, Hubble’s law
is also called the Hubble-Lemaître law. Note that the Hubble parameter is different
from from the Hubble constant in that the former depends on t, while the latter is
merely the current value of the former. Because observational results indicate that
H0 > 0, it follows from (10.2.1 ) that u(t0 ) > 0 whenever D(t0 ) = 0. That is, on an
arbitrary galaxy, the observation for another galaxy will show that the latter is moving
away. [The measurement by Hubble only reveals that other galaxies are going away
from the Milky Way, while (10.2.1) asserts that any pair of galaxies are going away
from each other.] Thus, the fact that all galaxies are measured to be away from us
does not mean that the Milky Way is the center of the expanding universe. As an
analogy, one can imagine a balloon with lots of ants on its surface. When such a
balloon is expanding, each of these ants finds that the others are going away from
it, with no ants being more special than the others. Just like the expanding balloon,
there is no center of expansion in the universe.
According to (10.2.1), u could be greater than the speed of light in vacuum when
D is large enough. This does not contradict relativity. To see this, recall that one
of the principles of relativity states that “the world line of a point mass must be
timelike.” This is an absolute and unambiguous statement, which, by a properly
defined concept of speed, is equivalent to the statement that “the speed of a point
mass is less than the speed of light in vacuum” (which is a relative statement). One
must keep in mind that the “speed” in the latter statement refers to the magnitude
of the 3-velocity u a defined in (6.3.28), i.e., the 3-speed of a point mass obtained
from a local measurement by an instantaneous observer. If the observer is an inertial
observer in Minkowski spacetime, then the speed is nothing but the speed of the
490 10 Cosmology I

particle relative to the inertial frame the observer belongs to. However, there are
various definitions of speed. For a definition different from the above definition of
speed, a speed greater than the speed of light does not necessarily violate relativity.
The recessional speed of galaxies is such an example. It is defined as the derivative
of the distance of the galaxies with respect to the cosmic time, which is, of course,
of physical meaningful and reasonable to be called a speed. However, this is not the
speed obtained from a local measurement by an instantaneous observer, and hence
it is not a contradiction to the principle of relativity. In fact, when deriving the RW
metric, the world line of each galaxy has been recognized to be timelike, which
automatically obeys the principle of relativity stated above. As a consequence, for
any instantaneous observer (not necessarily an isotropic observer), the speed of a
galaxy obtained by a local measurement is certainly less than the speed of light in
vacuum.

10.2.2 Cosmological Redshift

Hubble interpreted cosmic redshifts as the Doppler effect in flat spacetime, from
which he obtained the recessional speed of the galaxies. According to general rela-
tivity, the existence of matter in the universe results in the curvature of spacetime,
and cosmic redshifts are actually an effect of the curved spacetime. Compared to
cosmological scales, the galaxies Hubble measured are of very small distances from
us, and the redshift is relatively small. In this case, it is acceptable to treat these
redshifts as the Doppler effect. However, for galaxies at sufficiently large distances
from us, their redshifts have to be interpreted as being due to the curved spacetime
geometry.
Under the geometric optics approximation (see Optional Reading 7.2.1), light
signals are regarded as propagating along null geodesics. Suppose a photon emitted
by a galaxy A at p1 travels along a null geodesic η(β) (where β is an affine parameter),
and this photon is received by another galaxy B at p2 (see Fig. 10.8). Let K a =
(∂/∂β)a be the wave 4-vector of the above photon, then its angular frequency at p1
relative to the observer A is ω1 = −gab Z a K b | p1 , where Z a | p1 is the 4-velocity of A
at p1 . Noticing that Z a is the same as the coordinate basis vector (∂/∂t)a in the RW
coordinate system {t, r, θ, ϕ}, and that
   
dt ∂ b dx i ∂ b
K =
b
+ ,
dβ ∂t dβ ∂ x i

we have ω1 = dt/dβ| p1 . Similarly, the angular frequency of this photon at p2 relative


to the observer B is ω2 = dt/dβ| p2 . The cosmic redshift can be obtained by means
of the geodesic equation of η(β):
10.2 Dynamics of the Universe 491

Fig. 10.8 Derivation of the A B


cosmic redshift. η(β) is a
null geodesic a a
Z K

p2
t2

a
Z a
K

p1
D t1 t1

d2 x μ dx ν dx σ
2
+  μνσ = 0, (μ = 0, 1, 2, 3) , (10.2.5)
dβ dβ dβ

where x 0 , x 1 , x 2 , x 3 are respectively t, r , θ , ϕ. From (10.1.25), we can obtain the


Christoffel symbols, among which the nonvanishing ones are

a ȧ
 0 11 = ,  0 22 = a ȧr 2 ,  0 33 = a ȧr 2 sin2 θ ,
1 − kr 2
ȧ kr
 1 01 =  1 10 = ,  1 11 = ,
a 1 − kr 2
 1 22 = −r (1 − kr 2 ) ,  1 33 = −r (1 − kr 2 ) sin2 θ ,
ȧ 1
 2 02 =  2 20 =  3 03 =  3 30 = ,  2 12 =  2 21 =  3 13 =  3 31 = ,
a r
 2 33 = − sin θ cos θ ,  3 23 =  3 32 = cot θ .

It is easy to verify that the world line of an isotropic observer is a geodesic. We leave
this as Exercise 10.2. [From the above expressions for the Christoffel symbols as
well as (5.7.2), this is in fact almost obvious.] Setting μ = 2, 3 in (10.2.5), we have
 2
d2 θ 2ȧ dt dθ 2 dr dθ dϕ
+ + − sin θ cos θ = 0 ,
dβ 2 a dβ dβ r dβ dβ dβ
d2 ϕ 2ȧ dt dϕ 2 dr dϕ dθ dϕ
2
+ + +2 cot θ = 0 . (10.2.6)
dβ a dβ dβ r dβ dβ dβ dβ

The RW coordinate system can be chosen such that θ ( p1 ) = θ0 , ϕ( p1 ) = ϕ0 and


that both the θ - and ϕ-components of K a | p1 vanish. In this way, we have θ = θ0
and ϕ = ϕ0 along the whole geodesic η(β). That is, for an arbitrary geodesic η(β),
one can always redefine θ and ϕ and make η(β) a radial geodesic. [Since η(β) has
492 10 Cosmology I

been given, functions t (β), r (β), θ (β) and ϕ(β) in its parametric equation are all
determined in a given coordinate system. To show that θ (β) = θ0 and ϕ(β) = ϕ0 , one
only needs to notice that (10.2.6) is a system of 2nd-order equations for two unknown
functions θ (β) and ϕ(β), while θ (β) = θ0 and ϕ(β) = ϕ0 give the unique solution
satisfying the initial conditions θ ( p1 ) = θ0 , ϕ( p1 ) = ϕ0 , dβ

| p1 = 0 and dβ

| p1 = 0.]
Furthermore, setting μ = 0 in (10.2.5), we have
 2
d2 t a ȧ dr
+ = 0.
dβ 2 1 − kr dβ
2

Since K a is null, i.e., K a K b gab = 0, we also have


 2  2
dt a2 dr
= . (10.2.7)
dβ 1 − kr 2 dβ

Combining the above two equations yields


 
d2 t ȧ dt 2
+ = 0.
dβ 2 a dβ

ω da
Define ω = dt/dβ, then the above equation becomes dω

+ a dβ
= 0. Its general
solution gives
ω0
ω= , (10.2.8)
a
where ω0 is a constant of integration. In the manner as we have discussed above, we
see that for any value of β, the corresponding value of ω is the angular frequency of
the photon measured by the isotropic observer that passes through the point η(β).
Thus, the above equation can be interpreted as follows: as the universe is expanding,
the wavelength of each photon in the universe (with respect an isotropic observer) is
stretched proportionally, which leads to the redshift. Applying (10.2.8) to points p1
and p2 , respectively, we obtain

ω2 a(t1 )
= , (10.2.9)
ω1 a(t2 )

where t1 = t ( p1 ) and t2 = t ( p2 ). Hence, the relative redshift is

λ2 − λ1 ω1 a(t2 )
z= = −1= − 1. (10.2.10)
λ1 ω2 a(t1 )

If the distance between A and B is sufficiently small, we have t2 − t1 ∼ = D(t1 )


because the world line η(β) of the photon is null, as shown in Fig. 10.8. Neglecting
the higher order terms in the Taylor expansion yields
10.2 Dynamics of the Universe 493

a(t2 ) ∼
= a(t1 ) + ȧ(t1 ) (t2 − t1 ) ∼
= a(t1 ) + ȧ(t1 ) D(t1 ) ,

Hence,

ȧ(t1 )
z= D(t1 ) = H (t1 ) D(t1 ) , (10.2.11)
a(t1 )

where (10.2.3) is used in the last step. Denote t2 (∼


= t1 ) as t0 , then the above equation
is exactly the observational result by Hubble.

Remark 1 In Exercise 10.3 we will see another approach for deriving (10.2.8), which
takes the advantage of the geodesic equation in the form of K a ∇a K b = 0, instead
of using the component form (10.2.5). Alternatively, (10.2.8) can also be derived
in a purely geometric fashion (which makes use of the fact that the contraction of
the tangent vector field of a geodesic and a Killing field remains constant along the
geodesic). For details, see Wald (1984), pp. 103–104.

10.2.3 Evolution of the Scale Factor

The Einstein tensor G ab for the Robertson-Walker metric can be expressed in terms
of a(t). When Tab , the energy-momentum tensor of all the content in the universe, is
also expressed in terms of a(t), Einstein’s equation G ab = 8π Tab will give rise to a
set of differential equations for a(t), from which we can solve for the time evolution
of the universe.
The contents of the universe can be classified into two types: those consisting of
particles with nonzero rest masses are called matter; those consisting of particles
with zero rest mass are called radiation. Matter is mainly accumulated in galax-
ies, while the main contribution to radiation is the cosmic microwave background
radiation (CMB, or CMBR), which is some electromagnetic microwaves distributed
throughout the whole universe, discovered in 1965 (for details, see Sect. 10.3.1). On
a cosmological scale, each galaxy can be treated as a point mass (like a drop in the
ocean), and all the galaxies are regarded as forming a perfect fluid. The pressure of
such a perfect fluid is negligible (namely the random motions of the galaxies can
be neglected), and thus such a perfect fluid can be approximated as a dust, with
each galaxy being a particle in this dust. Hence, the world line of each galaxy is a
geodesic (see Sect. 6.5). Furthermore, since a perfect fluid is isotropic, each galaxy
can be approximately regarded as an isotropic observer [as we have seen below
(10.2.5), the world lines of isotropic observers are indeed geodesics]. The energy-
momentum tensor of all the matter (i.e., the dust) in the universe can be expressed
as
Tab (matter) = ρM Ua Ub ,
494 10 Cosmology I

where U a is the 4-velocity field of the isotropic observers, and ρM is the energy
density of matter measured by the isotropic observers. On the other hand, all the
radiation in the universe may also be treated as a special kind of perfect fluid, whose
4-velocity is the same as U a . Then, the energy-momentum tensor of all the radiation
in the universe reads

Tab (radiation) = ρR Ua Ub + p (gab + Ua Ub ) ,

where the energy density ρR and the pressure p of the radiation are both measured by
the isotropic observers and satisfy p = ρR /3 [see (6.5.3)]. Combining these two con-
tributions, the total energy-momentum tensor of the universe can be approximately
written as
Tab = ρUa Ub + p (gab + Ua Ub ) , (10.2.12)

where ρ = ρM + ρR is the sum of the energy densities of the dust (galaxies) and the
radiation. In the actual universe, there are also other kinds of matter, in addition to
galaxies. However, according to the cosmological principle, one can expect that their
4-velocities on average is still U a , and so their energy-momentum tensors will also
have the form of (10.2.12). In other words, the Tab in (10.2.12) can be regarded as
including the contributions from all kinds of matter in the universe (ρ and p have
included the contributions from all of them). In summary, in the standard model,
there are only two types of content in the universe, matter with the characteristic
p∼ = 0, and radiation with the characteristic p = ρ/3. The contributions from both
of them have been included in (10.2.12), where ρ and p are independent of the spatial
coordinates due to the spatial homogeneity of the universe.
In the RW line element (10.1.25), t, r , θ and ϕ are the comoving coordinates. The
nonvanishing components of Tab obtained from (10.2.12) in this system are

T00 = ρ , Ti j = pgi j , (10.2.13)

where the only nonvanishing gi j are

a2
g11 = , g22 = a 2 r 2 , g33 = a 2 r 2 sin2 θ .
1 − kr 2

On the other hand, from (10.1.25), one can find the nonvanishing components of the
Einstein tensor G ab (the calculation is left as an exercise), which can be expressed
in terms of a(t) as

3(ȧ 2 + k)
G 00 = , (10.2.14)
a2
 
2ä ȧ 2 + k
Gi j = − + gi j . (10.2.15)
a a2
10.2 Dynamics of the Universe 495

Then, the components of Einstein’s equation, G 00 = 8π T00 and G i j = 8π Ti j , can


be expressed as

3(ȧ 2 + k)
= 8πρ , (10.2.16)
a2
2ä ȧ 2 + k
+ = −8π p . (10.2.17)
a a2
Equations (10.2.16) and (10.2.17) are the fundamental equations that determine the
scale factor a(t). From these two equations we can easily get

ä 4π
= − (ρ + 3 p) . (10.2.18)
a 3
Equation (10.2.16) is called the Friedmann equation. Equations (10.2.16) and
(10.2.18) are also called the first Friedmann equation and the second Friedmann
equation, respectively. Differentiating (10.2.16) yields
 
ȧ ä ȧ 2 + k 4π ρ̇
− = . (10.2.19)
a a a2 3

Then, using the Friedmann equations (10.2.16) and (10.2.18) to eliminate ä/a and
(ȧ 2 + k)/a 2 in the above equation, we obtain


ρ̇ + 3(ρ + p) = 0. (10.2.20)
a
On the other hand, once we have (10.2.16) and (10.2.20), we can apply both of them
to (10.2.19) and get
 
ȧ ä 4π
+ (ρ + 3 p) = 0 .
a a 3

Therefore, the Friedmann equations are equivalent to (10.2.16) and (10.2.20) when
ȧ = 0.
For ρ > 0 and p  0, (10.2.18) indicates that ä < 0. This means that the universe
is either expanding (ȧ > 0) or contracting (ȧ < 0), but cannot be static, since ȧ = 0
can at most happen at a special moment when ȧ > 0 is turning into ȧ < 0. Since
the observation results indicate that the universe is expanding in the present day,
i.e., ȧ(t0 ) > 0 (t0 is the current time coordinate), it follows from ä < 0 that ȧ(t) >
ȧ(t0 ) > 0 for arbitrary t < t0 , and, the smaller t is, the greater ȧ(t) is. Hence, when
we trace backward in time, the universe shrinks more and more rapidly, and finally,
at a certain time (set as t = 0), the value of a becomes zero. At this time, the density
becomes infinite, and so we say that the universe expanded out of a singularity, called
the big bang singularity. In fact, it is not so appropriate to refer to the origin of the
496 10 Cosmology I

universe as a “big bang”. The word “bang” usually means a hit striking violently
with a loud noise, which occurs as an event in a regular spacetime background, with
no singularity at the spacetime point where the hit occurs. Furthermore, there exists
something (e.g., a bomb before its explosion) whose world line ends in the future
direction at the spacetime point where the bang occurs (each bomb fragment has its
own world line after the explosion). The big bang of the universe is quite different.
First, it corresponds to a spacetime singularity. All timelike geodesics are incomplete
in the past direction, and all of them approach the singularity as t tends to zero: for
any pair of such geodesics, γ1 (t) and γ2 (t), the distance between the spacetime points
γ1 (t) and γ2 (t) tends to zero as t > 0 tends to zero. On the other hand, there does
not exist a timelike geodesic that approaches the big bang singularity in the future
direction. Intuitively, one may imagine that at the beginning of time, all particles in
the universe are jammed in a spatial volume that “cannot be smaller”. During the
expansion of the universe, each particle runs away from the others.
Before solving (10.2.16) and (10.2.20) in the general cases, let us discuss two
extreme cases: ① the dust-only universe, whose contribution to Tab comes com-
pletely from matter (dust); and ② the radiation-only universe, whose contribution
to Tab comes completely from radiation. For the dust-only universe, p = 0, and
integration of (10.2.20) gives

ρM a 3 = constant . (10.2.21)

This is pretty natural, because a comoving volume is proportional to a 3 , and the


number of particles within this volume is a constant. Thus, the energy contained in
this comoving volume is a constant, and so its density is proportional to a −3 . For the
radiation-only universe, p = ρR /3, and integration of (10.2.20) gives

ρR a 4 = constant . (10.2.22)

This is because the number of photons within a comoving volume is a constant, while
the frequency (energy) of every photon is proportional to a −1 due to the redshift
[see (10.2.8)]. Thus, the energy density of radiation is proportional to a −4 . Our
present universe is matter-dominated, which is closer to a dust-only universe than
to a radiation-only universe. However, when t is sufficiently small, the universe is
radiation-dominated (although there were no galaxies yet in the early universe, only
particles). In the following, we will solve (10.2.16) and (10.2.20) for the these two
extreme cases.
For the radiation-only universe, we write (10.2.22) as

8π 4
B2 = ρa , (10.2.23)
3
where B > 0 is a constant, and ρR is denoted by ρ. Then (10.2.16) can be rewritten
as
10.2 Dynamics of the Universe 497

Fig. 10.9 The curves of a(t) a( t )


for the radiation-only

1
universe

k=
k=0

k = +1

0 t

B2
ȧ 2 = −k. (10.2.24)
a2

By setting b(t) ≡ a 2 (t), under the condition a(t) = 0 when t → 0, we can find that
a special solution of the above equation is a 2 (t) = 2Bt − kt 2 . Thus, for the three
cases of k, we have

case (a) (k = +1), a 2 (t) = 2Bt − t 2 , (10.2.25a)


case (b) (k = 0), a (t) = 2Bt ,
2
(10.2.25b)
case (c) (k = −1), a (t) = 2Bt + t .
2 2
(10.2.25c)

The diagrams for a(t) in these cases are shown in Fig. 10.9. Since radiation is
dominant when t is sufficiently small, the behaviors of the three curves of a(t) are of
significance near the origin. It follows from (10.2.25) or Fig. 10.9 that a = 0 when
t = 0, which corresponds to the big bang singularity. In (10.2.24), k can be neglected
when a is sufficiently small. Therefore, the three curves are approximately the same
near the origin.
For the matter (dust)-only universe, we write (10.2.21) as

8π 3
A= ρa , (10.2.26)
3
where A > 0 is a constant, and ρM in (10.2.21) is replaced by ρ. Then, (10.2.16) can
be rewritten as
A
ȧ 2 = −k. (10.2.27)
a
In order to solve it, we introduce a new variable
t
dt 
tˆ(t) ≡ . (10.2.28)
0 a(t  )
498 10 Cosmology I

Denote da/dtˆ as a  , then (10.2.27) becomes

a 2 = Aa − ka 2 . (10.2.29)

Note that tˆ = 0 only when t = 0. Then, the special solutions to the above equation
satisfying the initial condition a(0) = 0 can be listed case by case as follows:

A A 
case (a) (k = +1), a= (1 − cos tˆ) , t= tˆ − sin tˆ ,
2 2
(10.2.30a)
 
9A 1/3 2/3
case (b) (k = 0), a= t , (10.2.30b)
4
A A 
case (c) (k = −1), a = (cosh tˆ − 1) , t= sinh tˆ − tˆ .
2 2
(10.2.30c)

For each of these cases, the graph of a(t) is similar to that in Fig. 10.9, so it is not
shown here separately. The solution for the dust-only universe was first obtained by
the Soviet physicist and mathematician Alexander Friedmann in 1922, and then inde-
pendently by Georges Lemaître in 1927, much earlier than the discoveries by Howard
P. Robertson and Arthur G. Walker in 1935. Therefore, the standard cosmological
model is also often referred to as the Friedmann-Lemaître-Robertson-Walker
(FLRW) model.
So far we have discussed the two extreme cases. The actual universe contains
both matter and radiation. In this case, it is very difficult to solve the Friedmann
equations quantitatively. However, a qualitative discussion is still possible. Firstly,
observations indicate that our universe is presently in expansion, i.e., ȧ(t0 ) > 0, with
t0 the present value of t. According to (10.2.18), ä is negative, and hence the smaller
t > 0 is, the greater ȧ(t) is. Thus, for 0 < t < t0 , the curve of a(t) is convex upwards.
Thus, the curve of a(t) intersects the t-axis at a certain time before t0 (which has
been stipulated to be t = 0), similar to the curves in Fig. 10.9. Secondly, for t > t0 ,
we can write (10.2.20) as
d(ρa 3 )
= −3 pa 2 . (10.2.31)
da

Since p is always positive, the above equation indicates that ρa 3 decreases as a


increases, and hence ρ decreases not slower than a −3 . Now rewrite (10.2.16) as

3(ȧ 2 + k) = 8πρa 2 . (10.2.32)

Then, as a increases, its right-hand side decreases not slower than a −1 . The above
equation indicates that the behavior of ȧ depends on k. For k = 0, we have ȧ 2 =
8
3
πρa 2 , which implies that ȧ 2 decreases as a increases, and that ȧ approaches zero as
a goes to infinity. Noticing that ȧ(t0 ) > 0, we see that a(t) is positive for any t > t0 .
10.2 Dynamics of the Universe 499

Hence, a increases as t increases, but the slope of the curve a(t) keeps decreasing.
Note that this does not ensure that a approaches infinity when t goes to infinity. In
this case, the curve of a(t) quantitatively behaves the same as the curve for k = 0 in
Fig. 10.9. For k = −1, (10.2.32) results in ȧ 2 = 83 πρa 2 + 1, which is similar to the
case of k = 0, only that the slope of the curve a(t) tends to 1, instead of 0, as a → ∞
(which corresponds to t → ∞). This quantitatively behaves the same as the curve
for k = −1 in Fig. 10.9. For k = +1, (10.2.32) results in ȧ 2 = 83 πρa 2 − 1, which
indicates that ȧ 2 decreases as a increases. When 83 πρa 2 decreases to 1, ȧ decreases
to zero. Let aC be the value of a when 83 πρa 2 = 1, then it represents the critical
state: in the process of a increasing from a(t0 ) to aC , the value of ȧ decreases from
ȧ(t0 ) > 0, and becomes zero when a increases to be aC (with the corresponding value
of t denoted by tC ). Since ä is always negative, according to (10.2.18), ȧ decreases
as t increases. Consequently, ȧ becomes negative when t > tC . That is, for t > tC , a
decreases as t increases until it becomes zero, and so aC is obviously a maximum of
a, which quantitatively behaves the same as the curve for k = 1 in Fig. 10.9. Thus, as
we discussed case by case, the behavior of a for the actual universe can still roughly
be described by Fig. 10.9.
[Optional Reading 10.2.1]
Rigorously speaking, it should also be proved that both values of aC and tC are finite. Let
f (a) ≡ a ȧ 2 . Then, due to ȧ 2 = 83 πρa 2 − 1, one has

8
f (a) = πρa 3 − a = f 1 (a) − f 2 (a) ,
3

where f 1 (a) ≡ 83 πρa 3 and f 2 (a) = a. As shown in (10.2.31), the curve of f 1 is a curve
with negative slope and positive function value. Thus, it is located in the first quadrant, and
has an intersection with the graph of f 2 , a straight line with slope 1. It follows from a ȧ 2 =
f 1 (a) − f 2 (a) that the value of a at the intersection is exactly aC , and thus aC is finite. Let
ρC be the value of ρ corresponding to aC , then ȧ 2 = 83 πρa 2 − 1 results in ρC = 8πa 3
2 > 0.
C
Since p  0 in (10.2.18), we have ä(tC ) < 0, or, more precisely, limt→tC ä(t) < 0, which
guarantees that tC is finite. In other words, it is impossible that a increases to aC only when
t → ∞.
[The End of Optional Reading 10.2.1]

Since the universe has an initial time (t = 0), we can talk about its age in the
present day, which is t0 − 0 = t0 . Suppose D(t) is the distance between two arbi-
trarily chosen galaxies, and denote D(t0 ) by D0 for short. Roughly, assume that
the universe expands at a constant rate, which is the presently observed rate u 0 .
Then t0 ∼= D0 /u 0 = D0 /H0 D0 = H0−1 . The value of H0 is measured to be about
73 (km/s)/Mpc or 22.4 (km/s)/Mly.7 Hence, roughly, t0 ∼ = H0−1 ∼
= 13.4 Gyr. As we
−1
can see in Fig. 10.10, however, t0 < H0 (see Exercise 10.4 for the precise relation of
t0 and H0−1 ), and thus t0 is less than 13.4 billion years in the FLRW model. However,
the observation for Type Ia supernovae in 1999 shows that the present universe is

7Note that 1 Mpc = 106 pc, where pc stands for parsec (an abbreviation for “parallax second”),
which is a commonly used unit of length in astronomy. Roughly speaking, 1 pc ≈ 3.26 ly (light-
years). Also, 1 Mly = 106 ly and 1 Gly = 109 ly.
500 10 Cosmology I

Fig. 10.10 The age of the D(t )


universe t0 < H0−1

D0

0
H0 1
t0 t

not undergoing an decelerating expansion, as shown in Fig. 10.10, but an accelerat-


ing expansion, as shown in Fig. 10.13 (see Sect. 10.3.3 for details), which leads to
t0 > H0−1 . This is a clear evidence that the FLRW model is not complete. The latest
estimated value of the age of the universe is 13.787 ± 0.020 billion years [Aghanim
et al. (2018)].8
For various reasons, it is difficult to measure an accurate value for H0 . The value
obtained by Hubble in the early days was about 8 times the value above, which gives
a t0 less than 1.7 billion years. This is unacceptable, because the age of the Earth
is estimated to be 4.6 billion years based on the relative abundance of radioactive
elements. Moreover, in the 1930s, the age of some stars had been incorrectly estimated
to be 103 billion years, making it more unacceptable that the age of the universe is just
1.7 billion years. This made people raise questions regarding the theory of the Big
Bang, and consequently many cosmological theories were brought out. At the end
of the 1950s, when the value of H0 was estimated to be about 1/8 of that measured
by Hubble, those questions regarding the Big Bang theory were finally resolved.
For the study of cosmology, it is of great significance to improve the accuracy for
the measurement of H0 . For a long time, the maximum value of H0 was as great as
2.5 times the minimum value. Usually it is regarded that

H0 = 100h (km/s)/Mpc (10.2.33)

with 0.5  h  1, which reflects the discrepancy between the estimated results. So
far, the methods that have been used for measuring the Hubble constant include
two major kinds. One is the “late universe” method, which measures the redshifts
using the technique of a “calibrated distance ladder”. The values obtained from
these measurements agree on a value near 73 (km/s)/Mpc [the latest data gives
73.30 ± 1.04 (km/s)/Mpc in Riess et al. (2022)]. The other is the “early universe”
method, which is based on the CMB observations. The measurements of this kind
have agreed on a value near 67.7 (km/s)/Mpc [67.66 ± 0.42 (km/s)/Mpc in Aghanim
et al. (2018)]. Although the techniques of both kinds of measurements have been
improved over the years and they both clearly converge on some certain values,
these two values do not agree with each other. This discrepancy is called the Hubble
tension [see Di Valentino et al. (2021) for a review].

8 The estimating and measurement of the age of the universe depends on the cosmological model,
this result is based on the CDM model (see Sect. 10.3.3).
10.2 Dynamics of the Universe 501

As we have stated at the beginning of this chapter, the universe is the maximal
spacetime containing everything in Nature. We should supplement this statement,
noting that the universe described by the RW metric is just the universe after being
“smoothed”, which only represents the behavior of the actual universe on a cosmo-
logical scale. If the local behavior of the actual universe in a relatively small scale is
of concern, one should choose a suitable metric according to the local distribution of
matter. Even though the universe is the maximal spacetime containing everything,
the RW metric does not reflect the spacetime geometry of the local regions (i.e., those
much smaller than cosmological scales).

10.2.4 The Cosmological Constant and Einstein’s Static


Universe

As early in 1917, Einstein himself studied the universe using his field equation. Due
to the widely accepted philosophical idea at that time that the universe is supposed
to be invariant, he attempted to find a spacetime metric to describe a static universe.
Unfortunately, the Einstein equation is not compatible with such a static solution.
This is because static means ȧ = 0, and then (10.2.16) and (10.2.17) become

3k = 8πρa 2 , (10.2.16 )
k = −8π pa 2 , (10.2.17 )

which are obviously incompatible with the physical conditions ρ > 0 and p > 0.
Einstein realized that there are no static solutions to his equation from the beginning.
However, at that time he believed firmly that our universe is static, and so he modified
his own equation just in order to acquire a static solution for the universe. He assumed
that the modified field equation has the form G̃ ab = 8π Tab . It follows from the
properties of Tab that G̃ ab must satisfy G̃ ab = G̃ ba and ∇ a G̃ ab = 0. For such a tensor
field G̃ ab constructed out of gab and its derivatives of the first and second orders, G̃ ab
can only be a linear combination of G ab and gab (sans proof). Therefore, in 1917,
Einstein published the modified Einstein equation

G ab + gab = 8π Tab , (10.2.34)

where  is a constant, called the cosmological constant.9


Now we will show that (10.2.34) indeed admits a static solution. First, we rewrite
(10.2.34) as

G ab = 8π (Tab − gab /8π ) , (10.2.35)

9Einstein assumed that  is very small, so that the -term is negligible in every other problem
except for cosmology [see Rindler (1982)].
502 10 Cosmology I

and formally treat Tab − gab /8π as a new “energy-momentum tensor”. In this
manner, (10.2.35) is still, formally, Einstein’s equation without the cosmological
constant. For convenience, the actual energy-momentum tensor Tab will be now
denoted by T̄ab , and Tab will denote the new energy-momentum tensor, i.e., T̄ab −
gab /8π . Then, (10.2.35) can now be expressed as

G ab = 8π Tab = 8π (T̄ab − gab /8π ) , (10.2.36)

In the original model, there is only matter (dust), but no radiation, i.e., T̄ab depends
only on ρ̄ but not p̄. Then, T̄ab = ρ̄Ua Ub , and thus T̄00 = ρ̄, T̄i j = 0. It follows from
(10.2.13) that the ρ and p in the new energy-momentum tensor Tab satisfy

ρ = T00 = T̄00 − g00 /8π = ρ̄ + /8π , (10.2.37)


pgi j = Ti j = T̄i j − gi j /8π = −gi j /8π , (10.2.38)

where (10.2.38) is also equivalent to


p=− . (10.2.38 )

This indicates that the introduction of the -term is equivalent to adding “matter” with
a negative pressure p into the universe (as long as  > 0). In this case, the equation
system of (10.2.16 ) and (10.2.17 ) will admit a solution. Plugging (10.2.38 ) into
(10.2.17) yields

k = a2 . (10.2.39)

On the other hand, plugging (10.2.37) into (10.2.16 ) yields

3k = (8π ρ̄ + )a 2 . (10.2.40)

Subtracting the two equations above, we have

2k = 8π ρ̄a 2 . (10.2.41)

The condition ρ̄ > 0 leads to k > 0, and hence

k = +1 . (10.2.42a)

Then, (10.2.39) and (10.2.41) give, respectively,

 = a −2 , (10.2.42b)
1
a2 = . (10.2.42c)
4π ρ̄
10.3 The Thermal History of Our Universe 503

Equation (10.2.42) represents the unique static solution for a dust-only universe with
the -term added, where ρ̄ is the density of the dust. Equation (10.2.42a) indicates
that the spatial geometry of this solution is spherical, with the corresponding 4-
dimensional line element

ds 2 = −dt 2 + a 2 [dψ 2 + sin2 ψ (dθ 2 + sin2 θ dϕ 2 )] , (10.2.43)

called the metric of Einstein’s static universe, where a 2 = 1/ is a constant.


Although (10.2.43) describes a static solution, this is not a stable solution, which
will turn into a contracting or an expanding solution once we apply a perturbation to
it. However, since Einstein did not write down the differential equations for a(t), he
did not notice the instability, nor did anyone else until A. S. Eddington in 1930 [see
Ellis (1989)].
Because Einstein firmly believed that the universe is static, he refuted, as a referee,
the paper submitted to Zeitschrift für Physik by Friedmann in 1922, and neglected
Friendmann’s appealing letter to him. In 1923, Einstein felt that Friedmann’s paper
was plausible, after Yuri Krutkov, a friend of Friedmann, explained Friedmann’s
opinions to him. Then, Einstein sent the retraction of his refutation to the journal,
saying that he was convinced that Friedmann’s results are “correct and shed new
light”. Einstein abandoned the cosmological constant in 1931, after Hubble’s con-
firmation that the universe is expanding. It is reported that Einstein referred to the
introduction of the cosmological constant as his “biggest blunder” [Peebles and Ratra
(2003)]. However, the story of the cosmological constant does not end here. Over
the past century, its profound impact on cosmology, and even the entirety of physics,
has developed in many startling ways. We will get back to this in Sect. 10.3.3 and
Chap. 15 in Volume II.

10.3 The Thermal History of Our Universe

10.3.1 A Brief History of the Universe

As we have discussed, the universe is radiation-dominated when the scale factor


a is sufficiently small. The transition to the matter-dominated universe happens at
about t = 1011 s, i.e., thousands of years after the big bang. Based on the standard
cosmological model, in this subsection we will briefly introduce the history of our
universe’s evolution up to today. Some basics of high energy physics will be inevitably
involved in the discussion, so it will be helpful if the reader has learned some basic
knowledge before. However, to avoid too much particle physics, thermodynamics and
quantum statistical mechanics, we will only provide a rough introduction [for more
precise, more detailed discussions, see Kolb and Turner (1990); Weinberg (2008)].
Since the universe contains everything in Nature, and does not have an exterior, its
evolution can be regarded as an adiabatic expansion. The early universe is radiation-
504 10 Cosmology I

dominated, and it is not difficult to derive the relation between the temperature T
and the scale factor a. It follows from (10.2.22) that the energy density ρ of the
radiation-dominated universe is proportional to a −4 , and from quantum statistical
mechanics we know that ρ is proportional to T 4 for radiation,10 and hence T ∝ a −1 .
On the other hand, the k in (10.2.24) is negligible when a is sufficiently small, and
thus its solution is a = (2Bt)1/2 , combined with T ∝ a −1 yields T t 1/2 = constant.
The value of this constant in SI is about 1010 , and hence

1010
T = √ (the units for T and t are K and s, respectively) . (10.3.1)
t

This is an approximate relation of the temperature T and time t for the early universe
(radiation-dominated), from which we can see that T = ∞ at t = 0 (the big bang).
Therefore, starting from the big bang singularity where the temperature is infinitely
high, the evolution of our universe is a process of adiabatic expansion with the
temperature continually decreasing.
1. The big bang singularity.
The expansion of the universe starts from the big bang singularity (t = 0, T = ∞).
The spacetime singularity is one of the thorniest problems. Many physical quantities
approach infinity as one approaches the singularity, where all the physical laws also
become invalid. Before 1965, most of the physicists did not believe in the existence
of a spacetime singularity, and tried to post various reasons for avoiding singularities.
Making use of global differential geometry, R. Penrose and S. W. Hawking proved,
first individually and then jointly, a series of singularity theorems, which assert that
spacetime singularities (including the collapse of a star in its late stage and the big
bang at the beginning of the universe) are inevitable as long as some reasonable
conditions are satisfied (see Appendix E in Volume II for a qualitative introduction
to singularity theorems). What is notable is that these conditions do not contain any
requirement on symmetry. Subsequently, many relativists had to admit the existence
of singularities, and so a variety of intensive studies regarding singularities sprung up.
However, since it is hard to believe that physical quantities can be infinite, one may
look at singularly theorems from another perspective: rather than prove the existence
of singularities, singularity theorems indicate classical general relativity fails to be
applicable near a singularity (where the spacetime curvature is very large). As is
well-known, there were two great revolutions of physics that happened in the early
20th century—the creation of relativity and quantum theory. In the perspective of
understanding the spacetime structure and the essence of gravity, general relativity is
undoubtedly a revolutionary theory, while it is “not quite revolutionary” from another
perspective, since it does not obey the fundamental principles of quantum theory.
According to quantum theory, any observable cannot have a determined value (unless

10 For electromagnetic radiation, it follows from the law of blackbody radiation that ρ ∝ T 4 . If one
considers the contributions from other particles, ρ and T will have the relation ρ = (π 2 /30)Neff T 4 ,
where Neff is a number determined by the number of types of the particles whose rest energy is far
less than kB T (kB is the Boltzmann constant). Thus, ρ ∝ T 4 only when Neff is a constant.
10.3 The Thermal History of Our Universe 505

the system is in an eigenstate of this observable), and one can only make probabilistic
predictions for the results of a measurement. However, all the observables (e.g., the
metric) in general relativity have determined values (as we describe the history of
a particle using its world line, we have assumed that it has a determined position
at each moment). Nowadays, it has been a consensus that a which theory does not
consider quantum effects is referred to as a classical theory, and thus Einstein’s
general relativity is referred to as classical general relativity.11 Since singularity
theorems indicate that classical general relativity breaks down when the spacetime
curvature is sufficiently large, there should exist a critical time tC > 0 in the very
early universe, where classical general relativity is invalid in the period [0, tC ] and
should be substituted by a brand new theory of quantum gravity. Although people
has been exploring for this quantum theory of gravity actively and important progress
keeps being made, so far we have not established a complete theory yet. Hence, we
still cannot consider the singularity or a region very close to it (within [0, tC ]) and our
discussion can only start from the critical time tC . How do we estimate the value of tC ?
Since this question involves spacetime, gravity and quantum theory, tC should only
depend on fundamental constants c, G and , and the “unique” quantity with time
dimension constructed by c, G and  is the Planck time tP ≡ (G/c5 )1/2 ∼ 10−43 s.
Therefore, tP is taken as the critical time tC , i.e., a rough bound for the region where
classical general relativity is valid is at tP (see Optional Reading 10.3.1 for details).
We will only discuss the history of the evolution after tP ∼ 10−43 s.
[Optional Reading 10.3.1]
Is can be said that the spacetime curvature in the period [0, tC ] is so large that classical
general relativity breaks down. However, this statement needs some explaining. First of all,
what is the magnitude of the spacetime curvature? The spacetime curvature is a tensor, whose
magnitude usually refers to a scalar constructed from the curvature tensors (and metric),
such as the scalar curvature R ≡ g ab Rab and the scalar R ≡ R ab Rab . The early universe is
radiation-dominated, and the trace of the energy-momentum tensor of the electromagnetic
radiation (a null electromagnetic field) gives T = 0, and so from Einstein’s field equation
we can see that R = 0 in this case. Therefore, we may use R ≡ R ab Rab to represent the
magnitude of the spacetime curvature of the early universe. Second, what value of R is large
enough so that classical general relativity is invalid? We would like to find a critical value
RC such that in a very rough sense we can say that classical general relativity is valid when
R < RC and it is not when R > RC . The most solid way is to determine this bound is by
a theory of quantum gravity, but we do not have such a theory yet. A concession would
be to obtain some information using perturbation techniques, from which one can get an
approximate order of magnitude of RC . Another cursory but quite convenient method is
dimensional analysis. The dimension of R in SI is L −4 [which can be derived from (A.7) in
Appendix A], while the “unique” quantity with length dimension constructed by c, G and

11 Note that the criterion for “classical physics” has became different from that in the first half of the
20th century. People used to refer to (both special and general) relativity and quantum mechanics
as “modern physics” and the previous physics as “classical physics”. As time goes on (especially as
people realized that general relativity has to be combined with quantum theory), the term “classical”
gradually becomes a synonym of “non-quantum”, and the general relativity without considering
quantum effects is referred to as classical general relativity to be distinguished from a theory of
quantum gravity. As this criterion for “classical physics” has became a consensus among physicists
internationally, the previous interpretation of the word “classical” now seems to be too “classical”.
506 10 Cosmology I

 is the Planck length lP ≡ (G/c3 )1/2 ∼ 10−35 m. Hence, it is generally accepted that
RC ∼ lP−4 (∼ means they are of the same order).
In a word, one can roughly regard RC ∼ lP−4 by means of dimensional analysis. On the
other hand from dimensional analysis we also have tC ∼ tP . It is natural to ask: if we assume
for now that classical general relativity is applicable, would the value of R really have the
same magnitude as RC ∼ lP−4 when the universe evolves to tP ? From the expressions of the
Christoffel symbols below (10.2.5) we can find all the nonvanishing components of the Ricci
tensor of the FLRW universe as

R00 = −3ä/a , R11 = (1 − kr 2 )−1 (a ä + 2ȧ 2 + 2k) ,


R22 = r (a ä + 2ȧ + 2k) ,
2 2
R33 = r 2 sin2 θ(a ä + 2ȧ 2 + 2k) ,

from which we obtain

R ≡ R μν Rμν = 9(ä/a)2 + 3a −4 (a ä + 2ȧ 2 + 2k)2 . (10.3.2)


Since the very early universe is radiation-dominated, and since we can take k = 0 when a
is very small, it follows from (10.2.25b) that a(t) = (2Bt)1/2 . Taking the derivative we find
−ä/a = (ȧ/a)2 = (1/4)t −2 , then plugging in (10.3.2) yields R = 0.75t −4 . Converting back
to SI (adding c to it), one finds that the value of R at t reads R (t) = 0.75c−4 t −4 . Noticing
that lP = ctP , we have
R (tP ) = 0.75lP−4 ∼ lP−4 .
That is, the curvature R (tP ) at tP is of the same order as RC ∼ lP−4 , and thus classical general
relativity is not applicable in the period [0, 10−43 s].
[The End of Optional Reading 10.3.1]

2. Thermal equilibrium in the early universe.


Although classical general relativity begins to be valid since about t = 10−43 s, the
temperature is extremely high in a small period of time right after t = 10−43 s, and
such a high energy is still too high to tackle for high energy physics. According to the
Standard Model of particle physics, the universe consists of particles and antiparticles
with extremely high energy, including quarks, leptons, gauge bosons (e.g., photons)
that mediate interactions, and the Higgs boson which is associated with a mechanism
that gives masses to other elementary particles.12 The frequent interactions between
these high energy particles make them live in thermal equilibrium, which may be
described as “a pot of thoroughly stirred soup of elementary particles” (the “pot”
refers to the whole space of the universe at a certain time). Take a photon γ as an
example, it could not travel for a long path as unimpeded as it could in the present
universe; its mean free time (i.e., the average time between collisions) is very short
due to the frequent collisions with other particles (including scattering, absorption
and emission). Although the universe is also expanding rapidly, the photon would
have collided with other particles numerous times before it “notices” that the universe
has expanded. Besides photons, neutrinos have the same experience in the very early
universe. In a word, although the universe keeps expanding, the rate of the interactions
between particles is greater than the rate of the expansion of the universe during most
of the universe’s history (especially the early universe). To put it intuitively, the speed

12 Besides, there might also be other particles beyond the Standard Model that have not been
discovered yet.
10.3 The Thermal History of Our Universe 507

of “stirring” is way faster than the speed of the expansion of the “pot” (the whole
space). Therefore, in most parts of the early universe all kinds of particles can reach
a local thermal equilibrium.
According to quantum statistical physics, the average energy of the radiation
particles emitted in the radiation with a temperature T is roughly equal to kB T ,
where kB is the Boltzmann constant. This conclusion can also be approximately
applied to matter particles with rest energy far less than kB T , whose speed is close
to the speed of light. Together with radiation particles, they are called relativistic
particles. For example, kB T ∼ = 10 MeV when T = 1011 K, while the rest energy of
an electron e is about 0.5 MeV, and hence an electron is a relativistic particle when
T = 1011 K. According to quantum field theory, two photons can be transformed
into some particle-antiparticle pair (“pair production”), and a particle-antiparticle
pair can also be transformed into two photons (“pair annihilation”). Of course, both
of these two processes satisfy the energy conservation law. The average energy kB T
of a photon at room temperature is far less than the rest energy of an electron, and thus
the probability of two photons becoming an electron-positron pair (2γ → e + e+ ) is
almost zero. However, at a high temperature like T = 1011 K, the rate of this kind
of “pair production” is very large (basically proportional to the density of photons).
When e and e+ collide, they can also annihilate into two photons (e + e+ → 2γ), and
the rate of the annihilation is proportional to the density of (e, e+ ) pairs. Therefore,
when equilibrium is reached, the density of (e, e+ ) pairs is roughly equal to the photon
pairs whose energy is greater than the rest energy m e of an electron. Conversely, since
the rest energy of a proton p and a neutron n is about 1840 times the rest energy m e
of an electron, the densities of (p, p̄) and (n, n̄) pairs are almost zero even at this
high temperature T = 1011 K (where p̄ and n̄ stand for antiproton and antineutron,
respectively).
3. Asymmetry of matter and antimatter.
When t = 1 s and T = 1010 K, because kB T  m e and kB T  m p , there exist plenty
of (e, e+ ) (the same order as the number of γ) while there are almost no (p, p̄) and
(n, n̄). Therefore, the contents of the universe are: a large amount of neutrinos ν and
antineutrinos ν̄, a large amount of photons γ, a large amount of (e, e+ ) (the number
density of each kind of particle above is basically the same) and a small amount
of protons p and neutrons n. Earlier than this, such as when T  1013 K, since
kB T > m p , there used to be a large amount of (p, p̄) and (n, n̄), which vanished due
to annihilation when the temperature decreased to kB T < m p . Since the p, p̄ and n,
n̄ annihilate in pairs, why could there still be a small amount of p and n that remain?
The reason we know that there must be a small amount of p and n is because the
matter in the present universe are all composed of p and n, while antiparticles in the
present universe are extremely rare. That is to say, there exists a particle-antiparticle
(matter-antimatter) asymmetry in the present universe. If we accept this fact, we
have to admit that besides a large amount of (p, p̄) and (n, n̄), there should be a small
amount of unpaired p and n before t = 0.01 s. When kB T < m p , p and p̄, n and n̄
annihilate in pairs, with only a small amount of p and n (both are baryons) remaining.
508 10 Cosmology I

It is estimated that n b /n γ , the ratio of the number densities of baryons and photons
in the universe, is only on the order 10−10 , but it is surely not zero.
If one questions further about the source of this asymmetry of baryons and
antibaryons, then there are only two possible answers: ① The universe prefers par-
ticles over antiparticles from its beginning (which is obviously not quite natural);
② There were the same number of baryons and antibaryons at the beginning of the
universe, and for some reason baryons became favored during its very early evolu-
tion. If one believe that the baryon number must be conserved, then the latter choice
would not be acceptable. Fortunately, there could be ways to bypass this difficulty.
For example, a Grand Unification Theory (GUT) proposed in the 1970s which unifies
the electromagnetic, weak, and strong interactions suggests that the baryon number
may not be conserved at a very high energy scale. It has been shown that the baryon
number not being conserved plus a temporary deviation from thermal equilibrium
in the very early universe may create surplus p and n from the universe which orig-
inally has particle-antiparticle symmetry. Although the present GUT models have
not been supported by experiments as anticipated, it is generally believed that a suc-
cessful Grand Unification Theory will sooner or later resolve the above difficulty in
cosmology.
4. Neutrino decoupling.
When t = 1 s and T = 1010 K, kB T ∼ = 1 MeV is still greater than m e , and hence
there still exist plenty of (e, e+ ). However, since the temperature and density have
decreased a lot compared with before, the interaction rate between neutrinos (or
antineutrinos) and other particles is far less than the expansion rate of the universe.
The mean free time of a neutrino got extended significantly, and so it becomes
approximately a free particle that does not interact with other particles, which means it
is no longer in thermal equilibrium with other particles. This is called the decoupling
of neutrinos. The decoupling time and temperature of neutrinos are denoted by tνd
and Tνd , respectively. Although neutrinos will fill up the universe after the decoupling
just like other particles, and keep affecting the evolution of the universe since they
still contribute to the total energy-momentum tensor, they are not correlated with
any other constituents of the universe in any other aspects. This huge amount of
neutrinos evolve independently up to today, and they now exist as an independent
particle system whose temperature is about 1.95 K, known as the cosmic neutrino
background. Since the interaction between neutrinos and a detector is extremely
small, it is almost impossible to observe the cosmic neutrino background directly.
However, indirect evidence has been observed from the fluctuations of the cosmic
microwave background [see Follin et al. (2015)].
5. Primordial nucleosynthesis.
Observations indicates that about 1/4 of the total baryonic mass of the current uni-
verse is helium. Except for primordial nucleosynthesis in the early universe, there is
no other known process that could have created this abundance of helium. (Although
the nuclear reactions inside stars keep producing helium, they only contribute to
a small portion of the abundance above.) The temperature involved in primordial
nucleosynthesis is roughly between 1010 K and (slightly lower than) 109 K. When
10.3 The Thermal History of Our Universe 509

the temperature is higher than this range, even if a proton and a neutron could com-
bine into a helium nucleus, it will be shattered by the high energy photons (called
“photofission”). Since the physics in this temperature interval is already well-studied
and has been conformed in labs on the Earth, people are confident enough for the
theory of primordial nucleosynthesis. Due to the fact that the density number of the
nuclei is relatively low and the rapid expansion of the universe leads to a very short
reaction time (about 102 s), a reaction can only happen for two high speed particles in
primordial nucleosynthesis. First, protons and neutrons combine into deuterons, and
the rest of the energy and momentum is carried away by a photon (p + n → 2 H + γ).
And then this is followed by a sequence of reactions which produce 3 H (triton), 3 He
and 4 He, such as
2
H + n → 3H + γ , 2
H + p → 3 He + γ , 2
H + 2 H → 3 He + n ,
2
H + H → He + n ,
3 4 3
He + He → 4 He + 2p .
3

Since there does not exist a stable nuclide whose mass number is 5, the reaction
chain stops here. As the main product, 4 He gradually accumulates, and the nuclear
reaction continues until there are a large enough number of nuclei, which will lead
to the production of a tiny amount of 7 Li. Since there does not exist a stable nuclide
whose mass number is 8, the reaction chain again ends here. The first step that
this whole reaction chain must go through is protons and neutrons combining into
deuterons. The binding energy of a deuteron nucleus is much lower than that of a
helium nucleus. A helium nucleus can remain stable when the temperature decreases
to 3 × 109 K, while at this temperature a deuteron nucleus will be broken right after it
is formed. Therefore, the nucleosynthesis process that is actually meaningful begins
after the “deuteron barrier” is passed when the temperature is slightly lower than
109 K, and the product is a large amount of 4 He and a tiny amount of 2 H, 3 He and 7 Li
(3 H is unstable and will quickly decay to 3 He). If we take the yield of 4 He as the unit,
then the yields of 2 H and 3 He are about 10−5 , while the yield of 7 Li is about 10−10 .
As for all kinds of elements that are heavier than 7 Li in today’s universe, they mainly
come from the nuclear reactions in the interior of stars and supernova explosions.
The reason that the reactions inside a star can skip over the elements with A = 5
and A = 8 and yield heavy elements is that the self-gravity there is so strong that the
density of the star’s core is extremely high, and there is enough reaction time such
that three-particle collisions can happen.
The helium abundance produced by the primordial nucleosynthesis closely
depends on the ratio n n /n p of the number density of protons and neutrons before the
end of the nucleosynthesis (the reason will be seen shortly). This ratio can be derived
from the following discussion. Before the neutrinos decouple, protons and neutrons
can convert mutually by the following weak interaction processes: p + e ↔ n + νe ,
p + νe ↔ n + e+ . Since the mass of a neutron is slightly greater than the mass of
a proton (m n − m p ∼= 2.5m e ), it is more difficult for a proton to turn into a neutron
than the reverse. For example, since m p + m e ∼ = m n − 1.5m e < m n , it follows from
the conservation of energy that a rest proton and a rest electron cannot even turn into
a rest neutron, but the reverse process does not have such an issue. Certainly, the
510 10 Cosmology I

energy of an electron when the temperature is above 1010 K is far greater than its rest
energy m e , and thus p + e → n + νe could happen, but nevertheless its probability
is always less than that of the reverse process. Therefore, n n /n p should be less than
1 when the forward and reverse reactions reach a statistical equilibrium, and the
quantitative relation is given by the Boltzmann equation
nn − m
= e kB T , (10.3.3)
np

where m ≡ m n − m p . When the temperature drops to Tνd ∼ = 1010 K, the neutrinos


are decoupled and the above weak interaction processes of n and p converting into
− m
each other basically stop, then n n /n p will almost freeze out at the value e kB T . The
discussion above neglected the spontaneous decay of free neutrons, n → p + e + ν̄e ,
since its half-life (about 10 mins) is far greater than the age of the universe at Tνd
(tνd ∼
= 1 s). However, the age of the universe when helium is synthesized (t ∼ = 102 s)
already takes a considerable portion of the half-life of neutrons, and thus n n /n p will be
m

slightly lower than its freeze-out value e kB Tνd , which is a little below 1/7. Let Nn and
Np represent the total neutron number and total proton number, and let σ ≡ Nn /Np ,
then the total nucleon number N = Nn + Np = (σ + 1)Np , where all the neutrons
are combined with the same number of protons and turn into helium, and hence the
nucleon number contained in helium is NHe = 2Nn = 2σ Np . Therefore, the helium
abundance (measured by mass) produced by the primordial nucleosynthesis is

NHe 2σ Np 2σ
Y = = = ,
N (σ + 1)Np σ +1

i.e.,   
nn n n −1
Y =2 1+ . (10.3.4)
np np

Plugging in n n /n p ∼
= 1/7 yields Y ∼ = 0.25. Apart from primordial nucleosynthesis,
the nuclear reactions in the interior of stars also produces 4 He (a lot less than the pro-
duction of primordial nucleosynthesis though), and thus it is necessary to deduce the
primordial helium abundance (the abundance of helium when primordial nucleosyn-
thesis is over) from the observed helium abundance. As the accuracy of measure-
ments got improved over the years, recent estimations [Y = 0.245 ± 0.003 in Zyla
et al. (2020)] have matched very well with the theoretical value above. Although the
abundances of other products (2 H, 3 He and 7 Li) are very small, they are also sig-
nificant for verifying the theory. There is another important physical parameter η
involved in the quantitative calculation of the abundances of the products of primor-
dial nucleosynthesis, which is defined as the ratio of the densities of the baryons
and photons in the universe (η ≡ n b /n γ ). η−1 stands for the photon number around
each baryon, which affects the starting time of the nucleosynthesis by affecting the
difficulty of photofission, and thus affects the abundances of the products. The abun-
10.3 The Thermal History of Our Universe 511

dance of 4 He only depends weakly on η, while the abundances of 2 H, 3 He and 7 Li


are rather sensitive to η. Calculation shows that as long as one assumes that η is in
the range of 5.8 × 10−10 ∼ 6.5 × 10−10 , i.e.,
nb ∼
η≡ = (5.8 ∼ 6.5) × 10−10 , (10.3.5)

then the theoretical abundances of all four products above agree with their obser-
vational abundances [Zyla et al. (2020)]. This not only is a powerful support to the
theory of nucleosynthesis, but also sets a rather clear (and narrow) possible range for
this key parameter η, which provides another important contribution to cosmology.
Another important contribution of the theory of primordial nucleosynthesis is that
it determines the number of neutrino species Nν as 3, i.e., it confirms that there are
only 3 types of neutrinos (and thus leptons only have three generations). This is
supposed to be a problem of particle physics; the history of cosmology being used in
this subject started from 1976. The situation of high energy physics at that time had the
following features: ① there was already evidence that, beside the first two generations
of leptons e and μ (and their corresponding neutrinos νe and νμ ), there exists a third
generation of leptons (and thus a third type of neutrino); ② the accelerators at that
time could not provide any meaningful restriction on Nν ; ③ many physicists tended
to believe that the value of Nν would increase as the energy of the accelerators
increased; ③ very few particle physicists believed that the study of cosmology could
be helpful to particle physics. However, G. Steigman and collaborators blazed a new
trail by pointing out that the increase of the number of neutrino species will lead
to an increase in the abundance of 4 He coming from primordial nucleosynthesis,
and thus the observed abundance of 4 He should give an upper bound for Nν . The
basic idea is as follows: since the k in (10.2.16) is negligible when a is small, it
is easy to see from H ≡ ȧ/a that H 2 = 8πρ/3. More species of neutrinos leads to
a greater ρ, which leads to a greater H due to the equation above, namely a faster
expansion of the universe. This would make neutrinos decouple earlier, i.e., tνd would
be smaller, and thus the decoupling temperature Tνd would be greater. It follows from
(10.3.3) that this “freezes out” n n /n p at a greater value, and hence the abundance of
4
He would be higher. The upper bound they gave in Steigman (1977) was Nν  7.
This article demonstrated the novel insight that “cosmology can provide important
constraints on particle physics, and the universe is an important supplement for high
energy accelerators.” Later on, more and more studies have been carried out along
this direction, which keeps shrinking down the estimated value of Nν [see Steigman
(2012) for a review]. A recent analysis in Cyburt et al. (2016) gives Nν  3.2, which
agrees well the result Nν = 3 obtained from the collider experiments by the European
Organization for Nuclear Research (CERN).
6. Cosmic microwave background radiation.
In a long period of time after primordial nucleosynthesis, nothing significant happens
in the universe until t ∼= 1013 s ∼= 4 × 105 years at which time T ∼ = 3000 K (or
4000 K). At this temperature, nuclei and electrons start to combine into neutral atoms
512 10 Cosmology I

(before this the electrons still have enough energy to escape from the electromagnetic
bound of a nucleus), and the matter in the universe starts to transfer quickly from
an ionized state (plasma) to the neutral state. In an ionized state, photons interact
frequently with charged particles (especially free electrons), and thus they are in
thermal equilibrium with the matter particles. However, photons have almost no
interaction with neutral particles, and thus the universe becomes transparent after
the charged particles are combined into neutral particles (the mean free time of a
photon is a lot longer than the present age of the universe). At this stage, photons
are decoupled from the “big family” of the particles in thermal equilibrium and
become an independent system. Before decoupling, these photons were in thermal
equilibrium with the matter particles (similar to the photons in an oven being in
thermal equilibrium with the particles of the oven’s wall), whose energy density
distribution in wavelength satisfies the blackbody radiation curve, which can be
described by Planck’s law:

8π hc  k hcT λ −1
du = e B − 1 dλ , (10.3.6)
λ 5

where du stands for the energy per unit volume of the photons whose wavelength is
in the range (λ, λ + dλ), T is the temperature, and h and kB are the Planck constant
and Boltzmann constant, respectively. Although the photons are no longer in thermal
equilibrium with the matter particles after decoupling, their energy distribution in
wavelength still satisfies Planck’s law, only the temperature T will decrease inversely
as the scale factor a increases. The reason can be briefly explained as follows: after
the photon decoupling, suppose a is increased by a factor α, i.e., a  = αa, then the
number of photons per unit volume is decreased by a factor α −3 . On the other hand,
the energy of each photon also decreases by a factor α −1 due to the redshift [see
(10.2.8)]. Therefore, the energy of those photons whose wavelength is in the range
(λ, λ + dλ) per unit volume (when the scale factor is a) will decrease to

8π hc  k hcT λ −1
du  = α −4 du = e B − 1 dλ ,
α λ
4 5

Expressed in terms of the new wavelength λ = αλ, this becomes

8π hc  k hcT  λ −1
du  = 5
e B − 1 dλ , where T  ≡ α −1 T . (10.3.7)
λ
Thus, the distribution of energy density in wavelength when the scale factor increases
to α  can still be described by Planck’s law, which just corresponds to a lower tem-
perature T  . Estimation shows that the temperature of the decoupled photon system
in the present day is T0 ∼ 3 K. That is to say, the present universe is filled with a
large amount of background photons homogeneously (all the galaxies are “soaked”
in the bath of ubiquitous photons), and the distribution of their energy in wavelength
is described by the blackbody radiation curve at 3 K. The radiation energy is mainly
concentrated in microwave band (the wavelength of the maximum energy density
10.3 The Thermal History of Our Universe 513

is about 0.1 cm), and therefore this is called the cosmic microwave background
radiation (CMB, CMBR).
American physicists and radio engineers A. A. Penzias and R. W. Wilson detected
this isotropic radiation accidentally in 1965, and received the 1978 Nobel Prize in
Physics for this discovery. What they detected was in fact the signal at only one
wavelength (7.35 cm) (i.e., only one point on the curve). Assume that this is blackbody
radiation, then the temperature corresponds to the blackbody radiation curve passing
through this point is 3.5 K. American physicist R. H. Dicke and colleagues pointed
out immediately that this is a trace (a “fossil”) of the big bang, which was exactly the
cosmic background radiation they were preparing to search for. However, there were
also a few articles at that time which gave alternative explanations for this signal. In
order to confirm that this is indeed a trace of the big bang, two conditions need to be
satisfied: ① the distribution of the energy spectrum is a blackbody radiation curve;
② the radiation is highly isotropic [the intensity (or the corresponding temperature)
is the same in all directions]. This urged people to measure the other points of the
curve and to test the isotropy of the radiation. Soon (in 1967), it was confirmed
that the anisotropy is no more than 0.1–0.3%, and the results of measuring many
other points with wavelength greater than 0.3 cm all fit the blackbody radiation
curve. The radiation with wavelength less than 0.3 cm can be easily absorbed by
the atmosphere, which could be measured outside the atmosphere by balloons or
satellites. Since 1989, the Cosmic Background Explorer (COBE) satellite started to
measure for a wide wave band in a high precision and obtained a perfect blackbody
radiation curve. Figure 10.1113 illustrates the first results published in 1990, which
is regarded as the most perfect blackbody radiation observed by humans in nature.
COBE also presented a more precise result for the anisotropy of the background
radiation. Expand the temperature T (as a function of the angular coordinates) in
terms of the spherical harmonics, then other than the constant term T0 , the two lowest
order spherical harmonics are called the dipole moment and quadrupole moment,
which are the main manifestations of the anisotropy. Let T1 and T2 represent the
amplitudes of the dipole anisotropy and quadrupole anisotropy, respectively, then the
measurements of COBE give T1 /T0 ∼ 10−3 and T2 /T0 ∼ 10−5 . The former can be
reasonably interpreted as the consequence of the small velocity of the Earth relative to
the isotropic reference frame: the Earth orbits around the Sun, the Sun moves relative
to the center of the Milky Way, and the Milky Way also has “peculiar motion” relative
to the isotropic reference frame. By definition, only isotropic observers can obtain
isotropic results from measurements, and thus it certainly makes sense that a small
anisotropy of the background radiation is observed by the Earth’s observer. Analysis
shows that the first-order approximation of this anisotropy manifests exactly as the

13 The luminance Bν in this figure refers to the energy per unit frequency transmitted per unit area,
per unit solid angle, per unit time, whose unit is J·s−1 ·m−2 ·sr−1 ·Hz−1 , where sr stands for steradian.
The corresponding luminance of u in (10.3.6) is not Bν but Bλ , i.e., the energy per unit wavelength
transmitted per unit area, per unit solid angle, per unit time, whose unit is J·s−1 ·m−2 ·sr−1 ·m−1 . For
the same temperature T , the peak frequency (wavelength) of the Bν − ν (or Bν − λ) curve is not
equal to that of the Bλ − λ (or Bλ − ν) curve. For T = 2.73 K, the peak of the Bν − ν and Bλ − λ
curves are approximately 1.6 and 1 mm, respectively.
514 10 Cosmology I

Fig. 10.11 The cosmic wavelength (mm)


background radiation curve
10.00 4.00 2.00 1.50 1.00 0.80 0.67 0.50
measured by COBE (based
on the first results in 1990),
the corresponding blackbody
temperature is T ∼ = 2.735 K

luminance

30 150 300 450 600

frequency (GHz)

dipole moment. [Intuitively, as the Earth is going across the “ocean” of background
radiation, the radiation in front of it should be stronger than that behind it, which
gives rise to the dipole anisotropy.] Therefore, the anisotropy of about one part per
thousand obtained by COBE (and the previous ground observations) is not only
reasonable, but also it can be used conversely to determine the precise velocity of
the Earth relative to the isotropic reference frame (the cosmic rest frame), and the
result is about 369 km·s−1 .14 After this anisotropy is subtracted out, one finds that the
anisotropy (mainly the quadrupole anisotropy) when photons are decoupled is only
about 10−5 . This tiny anisotropy is very critical for understanding the formation of
the large scale structure of the universe (e.g., galaxies), see 7 below; when it was first
discovered by COBE in 1992, this suddenly became the headline news all over the
world. The leaders of the COBE project, G. F. Smoot and J. C. Mather were awarded
the 2006 Nobel Prize in Physics for the discoveries of the blackbody spectrum and
the anisotropy of the CMB.
Besides its intensity represented by the temperature of the blackbody spectrum,
the CMB radiation as electromagnetic radiation also exhibits polarization. The CMB
polarization can be decomposed into two components, dubbed an E-mode and a B-
mode [for the details of the decomposition, the reader may refer to, e.g., Chap. 10
of Dodelson and Schmidt (2020)].15 E-modes can be produced by the interaction
between photons and free electrons (such as Compton scattering). However, if the
radiation is isotropic, the polarization will be equal in all directions, and the overall
effect is still unpolarized. In fact, the tiny anisotropy of the CMB temperature we
just mentioned plays a crucial role here, which allows the scattering process to

14 This is the average speed of the Earth relative to the isotropic reference frame, which is also the
speed of the Sun relative to the isotropic reference. Since the Earth orbits the Sun with a speed of
about 30 km/s, the actual speed of the Earth at each time of the year can be obtained by considering
the correction due to this relative motion between the Earth and the Sun.
15 For the CMB polarization one only considers the polarization patterns on the celestial sphere.
10.3 The Thermal History of Our Universe 515

produce polarization. Thus, the spectrum of the E-mode polarization is smaller than
the anisotropy spectrum of the CMB. On the other hand, the spectrum of the B-mode
polarization is even smaller than that of the E-mode polarization. There are two
types of B-mode polarization, the first one is caused by the gravitational lensing
of E-modes (gravitational lensing is the effect that, due to the deflection of light by
gravitational fields, massive bodies such as galaxies behave similar to convex glass
lenses); the second one is caused by the primordial gravitational waves produced in
the early universe. The E-mode polarization and the first type of B-mode polarization
have been detected, while the B-modes produced by primordial gravitational waves
have not been found yet. The latter, once detected, will provide a powerful tool we can
use to see through the early universe, and open a new window for gravitational-wave
astronomy (see Sect. 7.9.4).
Since it takes some amount of time for the light emitted by a galaxy to reach the
Earth (see Fig. 10.8), from the observations for bright galaxies and quasars one can
obtain information of the universe earlier than the present time t0 . The CMB data
carries information way earlier than that (no galaxy was formed yet when photons
decoupled), which is highly valuable for the study of cosmology. The observation
of the CMB is regarded as the most powerful support to the standard model. One
of the drawbacks of a once strong competitor of the standard model—the steady
state model—is that it cannot provide a persuasive explanation for the background
radiation, and hence it has stepped down from the stage of history since 1965.
7. Structure formation.
The basic premises of the standard model are the large scale spatial homogeneity
and isotropy. On a smaller scale, the universe presents a hierarchical structure: there
exists stars, galaxies, galaxy clusters and superclusters. A generally accepted idea
is that the complicated structure today originates from the extremely weak density
fluctuation (also called perturbation) δρ/ρ in the very early universe, where ρ is
the average density, and δρ is the difference between the density at a point and ρ.
Gravity has the effect of amplifying the density fluctuation: if δρ/ρ > 0 (density
is higher than the average density) somewhere, then the matter there will contract
under the action of gravity, which leads to a higher density fluctuation. J. H. Jeans
has established the corresponding theory for static fluids in 1902, and E. M. Lifshitz
proposed the theory for the density fluctuation being amplified in an expanding
universe in 1946. Based on these theories, all kinds of models regarding structure
formation have been put forward. The early models (in the 1970s) considered that
baryons are the largest contributors to the matter in the universe, which leads to
serious troubles. Later, after the concept of non-baryonic dark matter was posed (see
Sect. 10.3.2 and Chap. 15), two theories of structure formation, namely the hot dark
matter model and cold dark matter model, appeared accordingly [see Longair
(2008) for details]. In the hot dark matter model, the formation of structures has
a top-down scenario: the superclusters are formed first, and then they break into
galaxy clusters and galaxies hierarchically. In contrast, the formation of structures
in the cold dark matter model has a bottom-up scenario: the galaxies are formed
first, and then galaxy clusters and superclusters are formed hierarchically. As to the
516 10 Cosmology I

Table 10.1 Chronicle of the evolution of the universe


t T kB T Main events
0.01 s 1011 K 10 MeV A large amount of ν (and ν̄), γ, (e, e+ ) and a
small amount of p, n are in thermal equilibrium
1s 1010 K 1 MeV ν (and ν̄) decouple; A large amount of γ, (e, e+ )
and a small amount of p, n are in thermal
equilibrium
14 s 3 × 109 K 0.3 MeV (e, e+ ) annihilate rapidly
>100 s <109 K <0.1 MeV 1. Primordial nucleosynthesis. The products are
4 He (about 25%), H (about 75%) and a tiny amount

of 2 H, 3 He, 7 Li
2. (e, e+ ) are all annihilated, there remains a
small amount of electrons for balancing the
charge of protons
105 years 3000 K 0.3 eV Neutral atoms are synthesized; photons decouple
and become the background radiation
109 years Structure formation

origin of the primordial perturbation, previously people could only treat it as a pre-
assigned initial condition. Nowadays, as the inflationary model has been generally
accepted (the basic idea is that the very early universe once experienced a dramatically
accelerating exponential expansion in a very short period of time, see Chap. 15 in
Volume II), the primordial perturbation can be completely explained by inflation. The
cold dark matter model now has achieved great success and is now the favored model.
More precisely, the cold dark matter model with the inflationary model offering
the “seeds” of the primordial perturbation has became the most widely accepted
theory of structure formation. Some even consider it as the fourth cornerstone of the
modern cosmology (the first three are the consensual cosmic expansion, primordial
nucleosynthesis and the CMB). However, there are still people who have different
opinions.
In the end of this subsection, to help readers remember, we roughly summarize a
few important periods in the history of the universe’s evolution in Table 10.1.
It should be pointed out that, in the above description of the universe’s evolution,
our understanding for the time after t = 1 s is relatively reliable. However, for t < 1 s,
we do not have a description for the early universe with such a high credibility, since
any “fossil” from that time has undetermined factors.

10.3.2 The Dark Matter Problem

There only exist three possibilities for the RW metric, k = 1, k = 0 and k = −1. As
we have seen in Sect. 10.1.3, the first one is a closed universe, while the latter two are
open universes. Which one does our universe really belong to? Is it closed or open?
10.3 The Thermal History of Our Universe 517

The answer of course closely relies on astronomical observations. In this subsection


we will have some theoretical discussion and introduce some observational results.
It follows from H ≡ ȧ/a and (10.2.16) that H 2 = 8πρ/3 − k/a 2 . Adding back
the physical constants G and c (see Appendix A for the details of adding constants),
we have
8π Gρ kc2
H2 = − 2 . (10.3.8)
3 a
Define the critical density
3H 2
ρC := , (10.3.9)
8π G
then
3kc2
ρ = ρC + . (10.3.10)
8π Ga 2
Thus, k = 0 corresponds to ρ = ρC , and k = ±1 correspond to ρ ≶ ρC . That is to
say, if the mass density ρ of the universe is greater than the critical density ρC , then
it is a closed universe (k = 1), otherwise it is an open universe. This conclusion
can be understood intuitively as follows: all kinds of particles are scattered in high
speeds at the big bang, and will gradually slow down due to the gravitational effect.
If the gravity is strong enough, their speed will gradually decrease to zero and start
to accelerate in the opposite direction, which means they will eventually gather back
together again (corresponding to a universe that first expands and then contracts);
if the gravity is not that strong, although the particles will still keep slowing down,
they will never come to a stop and turn around (corresponding to a universe that
expends forever). This may be likened to a rocket launched from the Earth: it will
eventually fall down with an acceleration if the initial speed is less than some critical
value (escape velocity), while if the initial speed is large enough it will leave forever
and never come back. The strength of gravity depends on the mass density ρ of the
universe, and thus one can expect that there exists a critical value ρC , and the universe
is closed if and only if ρ > ρC . Define the density parameter
ρ
 := , (10.3.11)
ρC

then  can be interpreted as the density with ρC as the unit, and hence we can say
that the universe is closed if and only if  > 1. Notice that ρC itself is also a function
of t. It follows from (10.3.11) and (10.3.9) that  can be expressed as

8π Gρ
= . (10.3.12)
3H 2
Once the present values of the Hubble parameter H0 and the mass density ρ0 are
measured, then it can be determined from the above equation whether our universe
is closed or not. Assume for now that the main contents of the universe are presented
518 10 Cosmology I

mainly in the form of galaxies. Suppose the present number density of the galaxies is
n, and the average mass of the galaxies is M̄, then ρ0 = n M̄. Suppose the luminosity
density of the universe per unit volume is L (called the luminosity density), and the
average luminosity of the galaxies is L̄, then L = n L̄. Plugging this into ρ0 = n M̄
yields

ρ0 = L , (10.3.13)

where M̄/ L̄ is called the average mass-to-light ratio of the galaxies. Let ρC0 and
0 represent the present values of ρC and , respectively, then

ρ0 8π G M̄
0 = = L , (10.3.14)
ρC0 3H02 L̄

where (10.3.9) and (10.3.13) are used in the second equality. There is already a
relatively reliable observational value for L . Plugging the observational values of
L and H0 into the above equation, we can get the relation between 0 and the
average mass-to-light ratio M̄/ L̄. The actual measurement is performed on a galaxy,
and the result is only the mass-to-light ratio M/L for this galaxy; only if the galaxy
is highly representative can we plug M/L into (10.3.14) as M̄/ L̄ and get a relatively
good result. The masses of different galaxies vary enormously (they may differ by
several orders of magnitudes), while the differences in their mass-to-light ratios are
much smaller. This is one of the merits of substituting (10.3.12) with (10.3.14) (for
t = t0 ). Now we introduce the dynamical method (which considers the gravitational
effect of the mass) of measuring the mass of a spiral galaxy (e.g., the Milky Way).
Besides its random motion, a star in a spiral galaxy also undergoes revolution (orbital
motion) around the galactic center with the gravity of the galaxy as the centripetal
force. To simplify the discussion, we assume that the galaxy has spherical symmetry.
From Newton’s theory of gravity we know that

G M(r )
v2 (r ) = , (10.3.15)
r
where v(r ) is the speed of the orbital motion of a star at a distance r from the center,
and M(r ) is the mass of the galaxy within the radius r . The curve v(r ) is called
the rotation curve of the galaxy. The rotation curve for many galaxies has been
measured. Let R be the r at which the galaxy’s luminosity disappears, then M(R)
represents the mass of the luminous matter in the galaxy. Plugging the mass-to-light
ratio measured in this way into (10.3.14) yields the contribution of the luminous
matter to 0 :

0 (luminous matter) < 1% (tends to 0.5%) . (10.3.16)

This indicates that the contribution from all the luminous matter to the mass density
is less than one percent of the critical mass. Moreover, M(R) is also way less than
10.3 The Thermal History of Our Universe 519

Fig. 10.12 The rotation v(r )


curve of a galaxy (sketch) measured

0 R r

the total mass of the galaxy. If there is no mass outside r = R, then it follows
from (10.3.15) that the curve for v(r ) should decrease as r −1/2 starting from r =
R. However, the rotation curves of plenty of galaxies has the following common
property: they first increase steeply from the galactic center, and then extend almost
horizontally until very far away from R where it is incapable of measurement,16
as shown in Fig. 10.12. This indicates that there is a spherical “dark halo” outside
the luminous part of a spiral galaxy, formed by non-luminous dark matter, whose
radius is a lot greater than R, and the mass of this dark halo is 3–10 times as much
as the mass of the luminous part. There are also other types of galaxies other than
spiral galaxies, e.g., elliptical galaxies. Evidence has indicated that there also exists
a considerable amount of dark matter in elliptical galaxies.
Considering that there exists a large space between galaxies, it is very likely that a
large amount of matter is there. People have also applied a similar dynamical method
for measuring galaxy clusters. [Assume that the Viral theorem holds, then there is a
formula similar to (10.3.15)]. The result gives

0 (galaxy cluster) ∼
= 10% ∼ 30% . (10.3.17)

This confirms that, apart from the galaxies, there is a large amount of dark matter
in a galaxy cluster. Since the above results are based on some hypotheses that are
not completely conformed yet, and since there are only about 5% of the galaxies in
the universe that belong to big clusters of galaxies, we cannot claim that the 0 of
the universe can be represented by (10.3.17) (although circumstantial evidence has
been found). However, one can at least conclude that the mass of dark matter in the
universe is way more than that of luminous matter.
If we take (10.3.17) as the contribution of all the matter in the universe to 0 , we
would draw the conclusion that the universe is far from being closed. However, the
inflationary model proposed in 1981 (see Chap. 15 in Volume II) suggests that 0
may be very close (or even equal to) 1; this is supported by some measurements and

16 Each point of the curve is measured from the frequency shift of rays emitted by stars or neutral
gas clouds. These stars and gas clouds serve as test particles. It is hard to find a test particle when
r is much greater than R.
520 10 Cosmology I

analyses. As the inflationary model is now widely accepted, how can we coordinate
the result 0 ∼ = 1 and (10.3.17)? Before 1998, in order to avoid the contradiction
with 0 ∼ = 1, people had to think that the distribution of the galaxies and galaxy
clusters is far from the total matter distribution of the universe: besides the matter
associated with galaxies and galaxy clusters, there might also be about 80% of the
matter that is not clustering, or even smoothly distributed in the universe. Note that so
far we have only considered the Einstein equation without the -term. The important
progress on the measurement of the cosmological constant  in 1998 made people
believe that one should use the Einstein equation with the -term when discussing
cosmology problems. The key point is, besides the contribution M0 coming from
matter (including luminous matter and dark matter), 0 also has a contribution 
from the cosmological constant. The contributions from  and M0 roughly has a
seventy-thirty ratio, and together they give 0 ∼= 1. For details, see Sect. 10.3.3.17

10.3.3 The Cosmological Constant Problem and the CDM


Model

Ever since Einstein introduced the cosmological constant in 1917, the status of  has
experienced several ups and downs. Although Einstein himself abandoned  after
1923, it was still valued by many people until the 1950s. One of the reasons is that
the early measurement of H0 by Hubble was excessively large, and the existence of
a positive  could avoid the age of the universe being too small. Here is a qualita-
tive explanation. From (10.2.35) we can see that the existence of the cosmological
constant is equivalent to adding an “energy-momentum tensor” −λgab /8π to the
universe. Comparing this with the energy-momentum tensor of a perfect fluid

Tab = (ρ + p)U a U b + pgab ,

we can see that the -term can be considered as a “perfect fluid” with the equation
of state −ρ = p = −/8π . If this is the only contributor, then (10.2.18) becomes
3ä = a, and for  > 0 we have ä > 0. Thus, contrary to the matter field, a pos-
itive cosmological constant  provides a repulsive force unlike the usual attractive
gravitational force and makes the universe undergo an accelerating expansion. In the
early universe, the energy density ρ of the radiation and matter is very large, and its
gravity is stronger than the repulsive effect, which leads to a decelerating expansion.
Then, ρ decreases as the universe expands, and the expansion will have a constant
rate when ρ is small enough and the gravity counterbalances the repulsive force (note

17 The important discovery in 1998 is that the present universe is experiencing an accelerating
expansion. People once regarded the -term as the cause of this accelerating expansion. However,
difficulties still exists. The mechanism of the accelerating expansion of the universe has became
one of the biggest puzzles in cosmology or even in fundamental physics, called the “dark energy”
problem, see Chap. 15 in Volume II.
10.3 The Thermal History of Our Universe 521

Fig. 10.13 The curve of a (t )


a(t) in the model with
 > 0. The age of the
universe t0 > H0−1

0 H0 1
t0 t

that  is fixed). As ρ keeps decreasing, the universe will turn into an accelerating
expansion when the repulsive force is stronger than gravity. By choosing a suitable
model, one can obtain the result that the present universe is undergoing an accelerat-
ing expansion, and so the measured H0−1 is less than t0 instead of greater than t0 (see
Fig. 10.13). So the “age paradox” of the universe can be resolved or at least relieved.
However, the situation turned around in the 1950s. On the one hand, newer mea-
surements indicated that value of H0 is about 1/8 of that measured by Hubble. On the
other hand, the development of the modern theory of stellar evolution also made the
age of stars a lot smaller than the estimated values in the 1930s. As a consequence,
the “age paradox” disappeared, and  became unnecessary again. Nevertheless, after
having three ups and downs, nowadays the necessity of  has revived once again.
The cosmological constant has influenced not only cosmology but also many other
areas of physics, and its significance has been generally recognized. However, peo-
ple are still facing difficulties regarding the cosmological constant. One aspect of
it is related to the vacuum energy in quantum field theory, which leads to the cos-
mological constant problem for physicists. Another aspect is from the perspective
of astronomers. Now we will introduce the cosmological the constant problem for
astronomers; the cosmological constant problem for physicists will be introduced in
Chap. 15.
A major concern of astronomers is whether or not a nonzero  can be obtained
from the observations. The existence of  affects the evolution of the universe.
After the -term is added into Einstein’s equation, (10.2.16) and (10.2.17) should
be modified as follows:

3(ȧ 2 + k)
= 8πρ +  , (10.3.18)
a2
2ä ȧ 2 + k
+ = −8π p +  . (10.3.19)
a a2
Applying (10.3.18) to the present time t0 yields

8πρ0  k
H02 = + − 2. (10.3.20)
3 3 a0

Following the definition of 0 , one can define


522 10 Cosmology I


0 := , (10.3.21)
3H02

then (10.3.20) can be written as

k
1 =  M0 + 0 − , (10.3.22)
a02 H02

M0 stands for the contribution from matter to 0 (note that the present contribution
of radiation is negligible). Through observation, the following questions should be
answered: Do we need a nonzero 0 to assure that the above equation holds? If
so, what is the value of 0 ? This may be referred to as the cosmological constant
problem for astronomers.
The speed ȧ and the acceleration ä of the universe’s evolution depends on the
overall effect of M representing the gravity and  representing the repulsive
force. Astronomers usually use a dimensionless quantity to represent the deceleration
−ä; what they care about is the present value of the (dimensionless) deceleration
parameter defined as follows:
a 
q0 := − ä . (10.3.23)
ȧ 2 t0

Considering that in the present day we have ρ + 3 p ∼ = ρ, we can plug the present
values of (10.3.18) and (10.3.19) (take p = 0) into (10.3.23) and obtain

1
q0 = M0 − 0 . (10.3.24)
2
The above equation intuitively reflects the fact that “M0 leads to a deceleration
and a positive 0 leads to an acceleration”. The direct measurement of q0 has
been going on for decades. One of the main difficulties is how to choose a suitable
object to measure (a distance indicator). The clustered galaxies used to be taken as
the measured objects, but they have a shortcoming—their own evolution will bring
undetermined factors into the measurements. What we need is a distance indicator
that is not sensitive to evolution. Later, people found that Type Ia (also denoted
by 1a) supernovae can serve as ideal distance indicators, and the measurements of
them have became an active subject. A large number of observational results on high
redshift Type Ia supernovae published since 1998 [e.g., Riess et al. (1998); Perlmutter
et al. (1999)] has attracted huge attention internationally. These results indicate with a
high confidence level that: ① the cosmological constant is nonzero, and is positive; ②
unlike what people used to think, the present universe is experiencing an accelerating
expansion (the effect of 0 exceeds the effect of M0 /2, which leads to q0 < 0).
Furthermore, the combination of these results and the observational results of the
anisotropy of the CMB provides the following quantitative result [Aghanim et al.
(2018)]:
10.3 The Thermal History of Our Universe 523

M0 = 0.311 ± 0.006 , 0 = 0.689 ± 0.006 , (10.3.25)

This indicates that: ① M0 + 0 = ∼ 1, and thus [according to (10.3.22)] k = ∼ 0,


i.e., the present universe is very close to flat, which agrees with the prediction of the
inflationary model introduced later in Chap. 15; ② 0 is not only far from innocuous,
but also dominates the contribution to 0 . The ratio of 0 and M0 is about seventy-
thirty. The accelerating expansion of the universe is regarded as one of the most
groundbreaking discoveries in the 20th century. S. Perlmutter, B. P. Schmidt and
A. G. Riess are awarded the 2011 Nobel Prize in Physics for this discovery. Further
interpretation of this accelerating expansion leads to the topic of “dark energy”,
which will be introduced in Chap. 15.
The above result also have support from another aspect: although the cold dark
matter (CDM) model is the most successful model in the theory of structure forma-
tion, the CDM model based on M0 = 1 cannot fit the observations. In contrast, if
a positive  is included in the theory with M0 ∼ = 0.3 and 0 ∼ = 0.7, the resulting
model, dubbed the CDM model fits the observation result very well.
As a relatively simple model developed based on the FLRW model, the CDM
model is also often called the standard cosmological model or the concordance
cosmological model in modern literature due to its success in explaining the obser-
vation results. However, this is not the end of the story as the standard model is
still facing challenges and can be further extended. We will come back to this in
Chap. 15.

Exercises

˜10.1. Verify that the curvature tensor (3) Rabc d of the metric in (10.1.12) satisfies
(3)
Rab cd = 2 R̄ −2 δa [c δb d] .
10.2. Show that the world line of an isotropic observer is a geodesic. Hint: from
the expressions for the Christoffel symbols below (10.2.5) and (5.7.2), this
is almost obvious.
10.3. Derive the formula (10.2.8) for cosmological redshift from the following
steps:
(a) Show that any null geodesic η(β) (where β is an affine parameter) has
dω/dβ = −K a K b ∇a Z b , where

K a ≡ (∂/∂β)a , Z a ≡ (∂/∂t)a , ω ≡ −gab Z a K b .

(b) Show that ∇a Z b = (ȧ/a)h ab , where h ab is the metric on the surface of


homogeneity induced by gab , and ȧ ≡ da/dt.
Hint: first show that ∇a Z b is a spatial tensor field, i.e., Z a ∇a Z b = 0 =
Z b ∇a Z b , and then show that the results of applying both sides of ∇a Z b =
(ȧ/a)h ab on (∂/∂ x i )a (∂/∂ x j )b (i, j = 1, 2, 3) are the same.
524 10 Cosmology I

(c) Using the results of (a) and (b), derive that dω/ω = −da/a, which gives
(10.2.8).
10.4. The present age of the universe is the time it takes for the evolution from
a = 0 to a0 ≡ a(t0 ). Given any value of a, one can talk about the time it
takes for the scale factor of the universe to evolve to this value, which is
called the age of the universe corresponding to this value of a. Therefore,
the age t can be regarded as a function of a.
(a) Starting from (10.2.30) and (10.2.26), show that the age function of
a matter-dominated universe with  = 0 is given by the following three
equations:

 3/2
2 a
for 0 = 1 , t= ,
3H0−1 a0
for 0 > 1 ,
  2 1/2 
0 a 1 a a
t = H0−1 cos−1 1 − 2(1 − −1
0 ) − 0 − (0 − 1) ,
2(0 − 1) 3/2 a0 0 − 1 a0 a0
for 0 < 1 ,
  2 1/2 
−0 a 1 a a
t = H0−1 cosh −1
1 + 2( −1
− 1) +  0 + (1 − 0 ) .
2(1 − 0 )3/2 0
a0 1 − 0 a0 a0

(b) Derive the expressions for the present age t0 of the universe in the cases
0 = 1, 0 > 1, 0 < 1 from the above three equations.
˜10.5. Show that the Einstein equation with the -term does not admit a solution
having a flat metric even if there is no matter field (Tab = 0). Hint: find the
relation of R and T from the Einstein equation with the -term, so that the
R in the equation can be eliminated. Then, it is easy to see that Rab cannot
vanish when Tab = 0.

References

Aghanim, N. et al. (2020), ‘Planck 2018 results. VI. Cosmological parameters’, Astron. Astrophys.
641, A6. arXiv:1807.06209.
Cyburt, R. H., Fields, B. D., Olive, K. A. and Yeh, T.-H. (2016), ‘Big bang nucleosynthesis: 2015’,
Rev. Mod. Phys. 88, 015004. arXiv:1505.01076.
Dodelson, S. and Schmidt, F. (2020), Modern Cosmology, Academic Press, London.
Ellis, G. F. R. (1989), The expanding universe: a history of cosmology from 1917 to 1960, in
D. Howard and J. Stachel, eds, ‘Einstein and the History of General Relativity’, Birkhäuser,
Boston, pp. 367–431.
Follin, B., Knox, L., Millea, M. and Pan, Z. (2015), ‘First detection of the acoustic oscillation
phase shift expected from the cosmic neutrino background’, Phys. Rev. Lett. 115, 091301.
arXiv:1503.07863.
Hicks, N. J. (1965), Notes on Differential Geometry, Van Nostrand, Princeton.
Kolb, E. W. and Turner, M. S. (1990), The Early Universe, Addison-Wesley Publishing Company,
Redwood City.
References 525

Longair, M. S. (2008), Galaxy Formation, Springer-Verlag, Berlin.


Peebles, P. J. E. (1993), Principles of Physical Cosmology, Princeton Press, Princeton.
Peebles, P. J. E. and Ratra, B. (2003), ‘The cosmological constant and dark energy’, Rev. Mod. Phys.
75, 559–606. arXiv:astro-ph/0207347.
Perlmutter, S. et al. (1999), ‘Measurements of  and  from 42 high redshift supernovae’, Astrophys.
J. 517, 565–586. arXiv:astro-ph/9812133.
Riess, A. G. et al. (1998), ‘Observational evidence from supernovae for an accelerating universe
and a cosmological constant’, Astron. J. 116, 1009–1038. arXiv:astro-ph/9805201.
Riess, A. G. et al. (2022), ‘A comprehensive measurement of the local value of the Hubble constant
with 1 km s−1 Mpc−1 uncertainty from the Hubble space telescope and the SH0ES team’,
Astrophys. J. Lett. 934(1), L7. arXiv:2112.04510.
Rindler, W. (1982), Introduction to Special Relativity, Clarendon Press, Oxford.
Steigman, G., Schramm, D. N. and Gunn, J. E. (1977), ‘Cosmological limits to the number of
massive leptons’, Phys. Lett. B 66, 202–204.
Steigman, G. (2012), ‘Neutrinos and big bang nucleosynthesis’, Adv. High Energy Phys.
2012, 268321. arXiv:1208.0032.
Di Valentino, E., Mena, O., Pan, S., Visinelli, L., Yang, W., Melchiorri, A., Mota, D. F., Riess, A. G.
and Silk, J. (2021), ‘In the realm of the Hubble tension—a review of solutions’, Class. Quant.
Grav. 38(15), 153001. arXiv:2103.01183.
Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.
Weinberg, S. (2008), Cosmology, Oxford University Press, Oxford.
Zyla, P. A. et al. (2020), ‘Review of particle physics’, PTEP 2020(8), 083C01.
Appendix A
The Conversion Between Geometrized and
Nongeometrized Unit Systems

When discussing systems of units, one should pay attention to the distinction between
a quantity and a number. Besides quantity equations, what is more commonly used
are numerical-valued equations. The form of a numerical-value equation depends on
the system of units, and thus when memorizing a physical formula we should also
remember in which system of units it holds. Since the speed of light in vacuum and
the gravitational constant are frequently involved in relativity, setting their numerical
values to 1 (i.e., c = G = 1) will simplify the equations a lot, and the corresponding
system of units is called the geometrized unit system. However, geometrized units
are inconvenient for calculating the numerical values of physical quantities. Now we
will introduce the conversion of physical equations between systems of geometrized
units and non-geometrized units (e.g., SI).
To avoid confusion, we will use bold and regular letters to represent quantities
and numbers, respectively (only in this appendix). A non-geometrized unit system
usually takes the time T , length L and mass M as the base quantities in mechanics.
In the geometrized unit system, since c = G = 1, only one of these three quantities
can be chosen arbitrarily, and hence we can say that there is only one base quantity.
For instance, one can choose time as the base quantity and choose s (second) as its
unit (base unit). However, the essence of c = 1 is to take the speed of light as a unit
of speed, and so one can consider the speed V as a base quantity in the geometrized
unit system, and the speed of light is a base unit. Similarly, G = 1 implies that
the gravitational constant G is also a base quantity in the geometrized unit system.
Therefore, one can also say that there are three base quantities, i.e., T , V and G. In
fact, the number of base quantities in the same system of units is flexible, and one
can choose them according to the specific context. Suppose A is an arbitrary quantity
whose numerical value in the International System of Units (SI) and the geometrized
unit system are A and A , respectively, then their ratio

A
χ≡ (A.1)
A

© Science Press 2023 527


C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0
528 Appendix A: The Conversion Between Geometrized and Nongeometrized …

is called the conversion factor of A between the two systems. The reason that χ is
not equal to 1 is that the units of T , L and M are different in the two systems. The
units of T , L and M in SI are the s, kg and m, respectively. In the geometrized unit
system, the only certain thing is that c = G = 1, while the units of T , L and M are
somewhat flexible. For the convenience of comparing the two systems, we stipulate
the time unit in the geometrized system to also be s. Under this stipulation, one
can determine the geometrized units of L and M using c = G = 1, and there is no
longer any flexibility (see Optional Reading A.1). According to dimensional analysis,
the relation between a derived unit and the base units is given by the dimensional
equation:
[A] = [T ]τ [L]λ [M]μ . (A.2)

When we only care about the conversion between the geometrized unit system and
SI (or the Gaussian unit system), the [T ]τ in this system can be ignored since the
time unit is the same in the two systems:

[A] = [L]λ [M]μ . (A.3)

What the dimensional equation describes is how a derived unit changes with a change
of the base units. For instance, once we treat the [L] and [M] in the above equation as
multiples of the units of the base quantities L and M, respectively, then [A] represents
the corresponding multiple of the unit of the derived quantity A. In this interpretation,
all of [A], [L] and [M] represent numbers, and (A.3) should be interpreted as a
numerical-value equation. Changes of the units of L and M lead to corresponding
changes of the units of the velocity V and the gravitational constant G, their relations
obey
[V ] = [L] , [G] = [L]3 [M]−1 . (A.4)

Combining the equation above with (A.3) yields

[A] = [V ]λ+3μ [G]−μ . (A.5)

Suppose the multiple of the units of L and M when we turn from SI to the geometrized
system are [L] and [M], respectively, then the multiple of V and G are [V ] and [G]
in (A.4); the multiple of A is [A] in (A.5), and comparing with (A.1) we can see
that χ = [A]. The speed of light and the true value of the gravitational constant in
SI are c and G, which are both 1 in the geometrized system, and hence [V ] = 1/c,
[G] = 1/G. Plugging these into (A.5) yields [A] = c−λ−3μ G μ . Therefore,

χ = c−λ−3μ G μ . (A.6)

Equations (A.1) and (A.6) indicate that to find the numerical value of A in SI from its
value A in the geometrized system, we only have to know the dimensional exponents
λ and μ of A with respect to the base quantities L and M, which can be easily derived
or looked up.
Appendix A: The Conversion Between Geometrized and Nongeometrized … 529

Example 1 Find the expression of the Schwarzschild radius in SI from its expression
r S = 2M  in the geometrized system.

Solution Suppose A is a quantity whose numerical value in SI is A ≡ r S /M,


then its numerical value in the geometrized system is A ≡ r S /M  = 2. It fol-
lows from [A] = [L][M]−1 that λ = 1, μ = −1, and it follows from (A.6) that
χ = c2 G −1 . Then, from χ ≡ A /A we have A = c2 G −1 A, and hence r S /M ≡ A =
c−2 G A = 2c−2 G. Therefore, the expression for the Schwarzschild radius in SI is
r S = 2G M/c2 . 

Example 2 Convert the form of the timelike normalization condition Z a Z a = −1


in the geometrized system to its form in SI.

Solution Z a Z a = −1 is equivalent to gμν



(dx μ /dτ  )(dx ν /dτ  ) = −1. Choose x μ
and x as length coordinates, then it follows from ds 2 = gμν dx μ dx ν that [gμν ] = 1,
ν

[dx μ /dτ ] = [T ]−1 [L], and hence for a quantity dx μ /dτ we have λ = 1, μ = 0,
χ = c−1 and

gμν (dx μ /dτ  )(dx ν /dτ  ) = c−2 gμν (dx μ /dτ )(dx ν /dτ ) .

Therefore, gμν (dx μ /dτ )(dx ν /dτ ) = −c2 , i.e., Z a Z a = −c2 . 

Example 3 In Sect. 10.2.2 we used the expression for the angular frequency of a
photon in the geometrized system ω = dt  /dβ  (β is the affine parameter of the
photon’s world line). Find its form in SI.

Solution First we should figure out the dimension of β. The wave 4-vector
of the photon is K  = (∂/∂β  )a . It follows from K a = ω (∂/∂t  )a + k a that [K a ] =
[k a ].1 Since k a = k i (∂/∂ x i )a , where k i are the components of the wave 3-vector,
and [k i ] = [L]−1 , we have [K i ] = [L]−1 , and thus [β] = [L]2 . The expression ω =
dt  /dβ  can be written as ω dβ  /dt  = 1. Let A ≡ ω dβ  /dt  , then [A] = [T ]−2 [L]2 .
Hence λ = 2, μ = 0, χ = c−2 , A = c−2 A, i.e., ω dβ  /dt  = c−2 ωdβ/dt, and so
ω = c2 dt/dβ. 

In general relativity we often encounter tensors like gab , Rabc d , Rab and R. When
it comes to the conversion of units, we will need to know the dimensions of these
quantities. For the convenience of the conversion, first we prove the following con-
clusions (note that the indices can be unbalanced in an dimension equation):

(1) [gab ] = [L]2 , (2) [∇a ωb ] = [ωb ] , (3) [Rabc d ] = 1 , (A.7)


−2
(4) [Rabcd ] = [L] , 2
(5) [Rac ] = 1 , (6) [R] = [L] .

1The dimension of a vector can be defined as the dimension of the real number (quantity) obtained
by acting the vector on a dimensionless scalar field. Similarly one can define the dimension of a
dual vector and a tensor.
530 Appendix A: The Conversion Between Geometrized and Nongeometrized …

Proof (1) Since the essence of ds 2 = gμν dx μ dx ν is gab = gμν (dx μ )a (dx ν )b , we have
[gab ] = [ds 2 ] = [L]2 . (The readers who feel confused about this may consider it from
the perspective of the components. When x μ and x ν are both length coordinates, from
ds 2 = gμν dx μ dx ν and [ds 2 ] = [L]2 we can see that [gμν ] = 1. Then it follows from
gab = gμν (dx μ )a (dx ν )b that [gab ] = [L]2 . One should notice that, unlike [gab ] which
is absolute, [gμν ] relies on the dimension of the coordinates involved.)
(2) [∇a ωb ] = [∂a ωb ] = [(dx μ )a (dx ν )b ∂ων /∂ x ν ] = [(dx ν )b ][ων ] = [ωb ].
(3) ∇a ∇b ωc − ∇b ∇a ωc = Rabc d ωd . Considering that [∇a ∇b ωc ] = [ωc ], we have
[Rabc d ωd ] = [ωd ], and thus [Rabc d ] = 1.
(4) [Rabcd ] = [gde Rabc e ] = [L]2 .
(5) [Rac ] = [g bd Rabcd ] = [L]−2 [L]2 = 1.
(6) [R] = [g ac Rac ] = [L]−2 . 
 
Example 4 Find the form of Einstein’s equation in SI from its form Rab − R  gab /2 =

8π Tab in the geometrized system.

Solution For simplicity (and without loss of generality), take a perfect fluid as an
example. The energy momentum tensor of a perfect fluid is

Tab = (ρ  + p  )Ua Ub + p  gab

.

Since the dimensions of the terms being summed are the same, we only have
 
to consider how does the equation Rab = 8π p  gab transform. [Rab ] = 1 leads to

[Rab ] = [Rab ]. Also [ pgab ] = [M][L]−1 [T ]−2 · [L]2 = [M][L][T ]−2 , and hence for

the quantity pg ab we have λ = μ = 1, χ = c−4 G. Thus, p  gab = c−4 Gpgab , and so
Rab = 8π c−4 Gpgab . Therefore, the form of Einstein’s equation in SI is

1 8π G
Rab − Rgab = 4 Tab . (A.8)
2 c


Until now we only talked about the conversion of the units in mechanics. Although
we used SI as an example for non-geometrized systems, the discussion can also be
applied to the Gaussian system. However, when electromagnetism is involved, one
needs to add a fourth base quantity, then the difference between SI and the Gaussian
system will be revealed. The fourth base quantity in SI is the electric current I , whose
base unit is the ampere; the fourth base quantity in the Gaussian system is the permit-
tivity , whose base unit is the permittivity of the vacuum  0 (and thus the number
0 = 1). Correspondingly, the equations in the geometric system that has electro-
magnetic quantities also have two forms, which may be called the “geometrized
SI” and “geometrized Gaussian system”. Besides the basic requirement c = G = 1,
the geometrized Gaussian system also requires 0 = 1, while the geometrized SI
stipulates that the unit of electric current is the ampere. To match the international
literature, we adopt the geometrized Gaussian system for all the equations in this text
that have electromagnetic quantities. The equations that do not have electromagnetic
Appendix A: The Conversion Between Geometrized and Nongeometrized … 531

quantities have the same form in the two geometrized systems. It is not difficult to see
that the method above can be applied to both the conversion from the geometrized
Gaussian system to the Gaussian system and that from the geometrized SI to SI.
For instance, it is straightforward for the reader to convert the form of the RN line
element in the geometrized Gaussian system
   
2M  Q 2 2M  Q 2 −1 2
ds 2 = − 1 −  + 2 dt 2 + 1 −  + 2 dr + r 2 (dθ 2 + sin2 θ  dϕ 2 ) ,
r r r r
(A.9)
to the following form in the Gaussian system:
   
2G M G Q2 2G M G Q 2 −1 2
ds 2 = − 1 − 2 + 4 2 c2 dt 2 + 1 − 2 + 4 2 dr + r 2 (dθ 2 + sin2 θdϕ 2 ) .
c r c r c r c r
(A.10)
To facilitate lookup, we list some of the equations involving electromagnetic quan-
tities in the form of geometrized SI as follows (the equation numbers without * are
the corresponding equations in the geometrized Gaussian system):
−1
∂ a Fab = − 0 Jb ,
(6.6.10*)
 
 · E = ρ ,
∇  × E = − ∂ B ,
∇  · B = 0 ,
∇  × B = μ0 j + ∂ E .

0 ∂t ∂t
(6.6.12*)
1
Tab = 0 (Fac Fb
c
− ηab Fcd F cd ) , (6.6.28*)
4
(Fac Fb c + ∗ Fac ∗ Fb c ) ,
0
Tab = (6.6.28 *)
2
T00 =
0
(E 2 + B 2 ) wi = −Ti0 = 0(E
 × B)
 i, i = 1, 2, 3 ,
2
(no number)
   −1
2M Q2 2 + 1 − 2M + Q2
ds 2 = − 1 − + dt dr 2 + r 2 (dθ 2 + sin2 θ dϕ 2 ) ,
r 4π 0 r 2 r 4π 0 r 2
(8.4.26*)
2
Q Q
Fab = − (dt)a ∧ (dr )a , or Aa = − (dt)a . (8.4.27*)
4π 0 r 2 4π 0 r

All of the 1/2π in (8.8.7) are changed to 2 0 , and all of the factors 2 in (8.8.8) and
(8.8.9) are changed to 8π 0 . The −2π Jμ in Exercise 8.10 is changed to − 21 0 Jμ .
[Optional Reading A.1]
This optional reading further introduces the geometrized system (still restricted to
mechanics). Question: How large are the units of the length L and mass M (as quanti-
ties)? It will be convenient for answering this question if we choose T , V and G as the base
quantities. The dimension equations of L and M with respect to these three base quantities
are
[L] = [T ][V ] , [M] = [T ][V ]3 [G]−1 . (A.11)
532 Appendix A: The Conversion Between Geometrized and Nongeometrized …

Let L G and L I represent the number obtained by measuring the same length using the
length units in the geometrized system and SI respectively, then L G /L I = [L]. Note that the
time units in the geometrized system and non-geometrized systems are respectively 1 and
c = 3 × 108 , which means [L] = 1/c, and hence the above equation gives L I = cL G . Thus,

length unit in the geometrized system = c × length unit in the SI = 3 × 108 m . (A.12)

Similarly, it follows from the second equation in (A.11) and [G] = 1/G (where the number
G = 6.67 × 10−11 ) that

c3
mass unit in the geometrized system = × mass unit in the SI
G
(3 × 108 )3
= × kg = 4 × 1035 kg . (A.13)
6.67 × 10−11

On the other hand, when we do not need to change the units of V and G, it is quite beneficial to
regard the geometrized system as having only one base quantity T . In this case, we can view
three originally different quantities—time, length and mass—as the same type of quantity.
The “key” to identifying them is to regard 1 s, 3 × 108 m and 4 × 1035 kg as equal, i.e.,

1 s = 3 × 108 m = 4 × 1035 kg , (A.14)


and so a quantity has either no unit or a unit of s [or a power of s (which is a quantity)].
For instance, ① the Earth has a speed v ∼ = 10−3 relative to the center of the Milky Way,
this numerical value ( 1) strongly indicates that the Earth’s speed is so slow that the
observations of the universe made by an Earth’s observer can be regarded as those made by
an (imaginary) observer at the center of the Milky Way. ② In the geometrized system, the
distance from the Earth to the Sun is 480 s, which indicates intuitively that it takes eight
minutes for light to travel from the Sun to the Earth. ③ The Earth’s radius and mass in the
geometrized system are R⊕ ∼ = 2 × 10−2 s and M⊕ ∼ = 1.5 × 10−11 s; M⊕ R⊕ indicates
that the gravitational field on the Earth’s surface is so weak that Newton’s theory is a good
approximation in most of the cases.
[The End of Optional Reading A.1]

The geometrized unit system is very convenient for general relativity. For a quantum
theory that does not involve gravity, a natural unit system is frequently used, in
which c =  = 1. Depending on the field involved, sometimes one can also set a
third physical constant to 1. For example, kB (the Boltzmann constant) is set to 1
when thermodynamics is involved, m e (the value of the electron mass) is set to 1
when atomic physics is involved, m p or m n (the value of the proton or neutron mass)
is set to 1 when nuclear physics is involved, and G = 1 when gravity is involved
(e.g., a theory of quantum gravity). The unit system with G = c =  = 1 is also
called the Planck unit system. Now we discuss the conversion between the Planck
system and SI. Compared with the geometrized system, the Planck system has an
additional constraint  = 1 besides G = c = 1, which prevents one from choosing
the unit of time (and thus all the quantities) arbitrarily. Therefore, we should start
from (A.2) [instead of (A.3)], and change (A.4) to

[V ] = [T ]−1 [L] , [G] = [T ]−2 [M]−1 [L]3 . (A.15)


Appendix A: The Conversion Between Geometrized and Nongeometrized … 533

Combining (A.2) and (A.15) yields

[A] = [V ]λ+3μ [G]−μ [T ]λ+μ+τ . (A.16)

It is not difficult to show that the “unique” quantity with time dimension constructed
by the speed of light, gravitational constant, and the reduced Planck constant is
the Planck time t P , whose numerical value in SI is tP = (G/c5 )1/2 ∼ 10−43 (s),
where c, G and  are the numerical values of the speed of light, the gravitational
constant, and the reduced Planck constant in SI, respectively. The values of these three
quantities in SI are all 1, and hence [V ] = 1/c, [G] = 1/G, [T ] = 1/tP . Suppose χ̃
is the conversion factor for the quantity A between SI and the Planck system, then it
follows from (A.16) that
−(λ+μ+τ )
χ̃ = c−λ−3μ G μ tP = c−λ−3μ G μ (G/c5 )−(λ+μ+τ )/2 . (A.17)

Example 5 The relation of the energy E and the frequency ν in the Planck form
reads E  = 2π ν  . Find its form in SI.

Solution Suppose A ≡ E/ν, then [A] = [E][ν]−1 = [T ]−1 [M][L]2 , and hence τ =
−1, μ = 1, λ = 2. Plugging this into (A.17) yields

χ̃ = c−5 G(G/c5 )−1 = −1 .

Thus, A = τ̃ A = −1 A, i.e., E  /ν  = −1 E/ν. Hence, it follows from E  /ν  = 2π


that E/ν = 2π , or E = 2π ν. 

Example 6 The “unique” quantity with mass dimension constructed by the speed
of light, the gravitational constant, and the reduced Planck constant is the Planck
mass mP , whose numerical value in the Planck system is m P = 1. Find its numerical
value m P in SI.

Solution From [m P ] = [M] we can see that τ = λ = 0, μ = 1. Plugging this


into (A.17) yields χ̃ = c−3 G(G/c5 )−1/2 = (c/G)−1/2 , and hence m P = χ̃ m P =
(c/G)−1/2 m P . Thus, it follows from m P = 1 that m P = (c/G)1/2 . Plugging in
the numerical values  = 10−34 , c = 3 × 108 , G = 6.67 × 10−11 we obtain m P =
2.1 × 10−8 kg. 

Exercises

A.1. The form of the relation of the energy, mass and momentum of a point mass in
the geometrized system reads E 2 = m 2 + p 2 . Find its form in SI.
A.2. Find the form of hydrostatic equation in SI from its form d p  /dr  = −ρ  m  /r 2
in the geometrized system.
534 Appendix A: The Conversion Between Geometrized and Nongeometrized …

A.3. The form of the Euler equation of non-relativistic hydrodynamics in the


geometrized system reads −∇ p  = ρ  [∂ u /∂t  + (
u  · ∇)
u  ]. Find its form in
SI.
A.4. Show that the equations U a = γ (Z a + u a ) [see (6.3.30)] and ω = − K a Z a
[see (6.6.42)] have the same form in the geometrized system and SI.
A.5. The geometrized system in some literature [e.g., Sachs and Wu (1977)] is
defined by c = 1 = 8π G (the time unit is still s). Find the units of length
and mass in this system (in this problem, one should regard the gravitational
constant as one of the base quantities of the geometrized system).
A.6. The “unique” quantity with length dimension constructed by the speed of light,
the gravitational constant, and the reduced Planck constant is the Planck length
l P , whose numerical value in the Planck system is lP = 1. Find its numerical
value lP in SI.

Reference

Sachs, R. K. and Wu, H. (1977), General Relativity for Mathematicians, Spinger-Verlag, New York.
Conventions and Notation

Note on Conventions

(1) Starting from Sect. 2.6, this work has adopted the abstract index notation to
represent tensors. For instance, v a represents a vector, where the Latin letter
a, called an abstract index, plays a similar role to the → in the commonly
used notation v. Do not interpret v a as the ath component of v a . When talking
about the components we use Greek letters as the indices (called component
indices or concrete indices); for example, v μ represents the μth component of the
vector v a . There is only one exception: a vector v a in a 4-dimensional spacetime
has three spatial components, for which we will use the most commonly used
convention, i.e., using v i (where i = 1, 2, 3) to represent the ith component of
v a . Although this violates the stipulation of “using Latin letters to represent the
abstract indices”, it is convenient in many ways. In order to distinguish from
the abstract indices a, b, c, d, e, . . ., we only use Latin letters starting from i
(usually i, j, k) as the labels for the spatial components. Practice has shown that
this can effectively avoid confusion. For more details about the index notation,
see Sect. 2.3.
(2) This work adopts the signature convention − + + + for the metric of 4-
dimensional spacetime.
(3) The definitions of the Riemann tensor Rabc d and the Ricci tensor Rab have various
conventions in the literature. This work follows the conventions of Wald (1984).

© Science Press 2023 535


C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0
536 Conventions and Notation

Notation List

{ } Set. First appears in Sect. 1.1. E.g., X = {1, 4, 5.6} stands for the set
formed by the real numbers 1, 4 and 5.6.
R The set of real numbers. First appears in Sect. 1.1.
N The set of natural numbers. First appears in Sect. 1.3.
Sn n-dimensional sphere.
∀x For all x. First appears in Sect. 1.1.
∃ There exists. First appears in Sect. 1.1.
∈ Belongs to. First appears in Sect. 1.1. E.g., x ∈ X stands for “x belongs
to the set X ”, i.e., x is an element of X .
∈/ Does not belong to. First appears in Sect. 1.1.
⊂ Contained in. First appears in Sect. 1.1. E.g., A ⊂ X stands for “A is
contained in the set X ”, i.e., A is a subset of X .
 Contained in but not equal to. First appears in Sect. 1.1. E.g., A ⊂ X
stands for “A is contained in but not equal to the set X ”, i.e., A is a
proper subset of X .
∪ Union (see Definition 2 of Sect. 1.1).
∩ Intersection (see Definition 2 of Sect. 1.1).
− Difference of sets, e.g., A − B stands for the difference of the sets A
and B (see Definition 2 of Sect. 1.1).
−A Complement of A (see Definition 2 of Sect. 1.1).
∅ Empty set. First appears in Sect. 1.1.
:= Defined as. First appears in Sect. 1.1.
≡ Identical to or denoted by. First appears in Sect. 1.1. E.g., A ≡ B ∪ C
means “denote B ∪ C by A”.

= Approximately equal to.
⇒ Implies (if ... then), e.g., A ⇒ B stands for “if A then B”.
⇔ Equivalent to (if and only if).
× Cartesian product (see Definition 3 of Sect. 1.1).
 Q.E.D. (Denotes the end of a proof, aligned to the right.)
Rn The set of n-tuples (x 1 , . . . , x n ) of real numbers, i.e., Rn = R × · · · × R
(n factors in total).
⊗ Tensor product (see Definition 2 of Sect. 2.4).
:→ Map. First appears in Sect. 1.1. E.g., f : X → Y stands for “the map
from X to Y ”.
f [A] Suppose f : X → Y , A ⊂ X , then the image of A under the action of
f is denoted by f [A] in order to distinguish it from the image f (x) of
x ∈ X under f .
→ Maps to (image of a function), e.g., suppose f : X → Y , x ∈ X , y ∈ Y ,
then x → y stands for “the image of x is y”.
◦ Composite map. First appears in Sect. 1.1. E.g., φ ◦ ψ stands for the
composite map of φ and ψ (ψ after φ).
(X, T ) The topological space with X as the base set and T as the topology (see
Definition 2 of Sect. 1.2 and the following paragraph).
Tu Usual topology (see Example 3 of Sect. 1.2).
Cr The first r derivatives exist and are continuous.
C∞ Smooth (derivatives of all orders exist and are continuous).
Conventions and Notation 537

Ā The closure of the set A (see Definition 8 in Sect. 1.2).


i(A) The interior of the set A (see Definition 9 in Sect. 1.2).
Ȧ or ∂ A The boundary of the set A (see Definition 10 in Sect. 1.2).
T2 space Hausdorff space (see Definition 3 in Sect. 1.3).
dim V The dimension of V .
V∗ The dual space of the vector space V . First appears in Sect. 2.3.
Vp The tangent space at a point p in the manifold. First appears in Sect.
2.2.
V p∗ The dual space of the vector space V p .
E 3-dimensional (spatial) vector. (Use an arrow above the letter instead of
boldface, which is used elsewhere, see the ω later.)
eμ or (eμ )a The μth basis vector in the chosen basis {(eμ )a }.
eμ∗ or (eμ )a The μth dual basis vector of the basis {(eμ )a }.

 ∂ a
∂ x μ or ∂ x μ The μth coordinate basis vector field. First appears in Example 2 of
Sect. 2.2.
dx μ or (dx μ )a The μth dual coordinate basis vector field. First appears after (2.3.8).
TV (k, l) The set of all the tensors of type (k, l) on a vector field V . First appears
after Example 1 of Sect. 2.4.
F M or F The set of all the smooth functions on a manifold M (see Definition 5
of Sect. 2.1).
F M (k, l) The set of all the smooth tensor fields of type (k, l) on a manifold M.
First appears in Definition 1 of Sect. 3.1.
à The transpose of a matrix A.
[u, v] The commutator of the vector fields u and v (see Definition 10 of Sect.
2.2).
C Contraction. E.g., suppose T ∈ TV (2, 2), then C12 stands for the con-
traction between the first upper index and the second lower index. First
appears in Remark 2 of Sect. 2.4. In terms of abstract indices it is
expressed by C12 T ≡ T ab ca .
δ or δab Euclidean metric (see Definition 8 of Sect. 2.5).
η or ηab Minkowski metric (see Definition 8 of Sect. 2.5).
δa b Identity map. First appears in the paragraph around (2.6.4).
g(u, v) The result of acting the metric tensor g on u and v. First appears in
Definition 1 of Sect. 2.5. Same as gab u a v b .
T(abc) The total symmetrization over the indices a, b, c [see (2.6.13) for defi-
nition].
T[abc] The total antisymmetrization over the indices a, b, c [see (2.6.14) for
definition].
∇a Derivative operator (see Definition 1 of Sect. 3.1).
∂a The ordinary derivative operator in a coordinate system [see (3.1.9)
for definition]. In special relativity it refers to the ordinary derivative
operator in an inertial coordinate system, satisfying ∂a ηbc = 0.
 a bc The Christoffel symbol of a derivative operator in a coordinate system
(see Definition 2 of Sect. 3.1).
 μ νσ The components of the Christoffel symbol  a bc in a coordinate system.
exp Exponential map (see Optional Reading 3.3.1).
φ∗ The pullback map induced by the map φ (see Definitions 1 and 3 of
Sect. 4.1).
538 Conventions and Notation

φ∗ The pushforward map induced by the map φ (see Definitions 2 and 4 of


Sect. 4.1).
Lv T ··· ··· The Lie derivative of the tensor field T ··· ··· along a vector field v a (see
Definition 1 of Sect. 4.2).
ω Differential form (field) (in boldface, indices are omitted). For instance,
ε is the abbreviation for the volume element εa1 ···an (n-form field).
∗ω The dual differential form of ω (see Definition 1 of Sect. 5.6).
(l) The set of all the l-forms on a vector space V . First appears after The-
orem 5.1.2.
 M (l) The set of all the l-form fields on a manifold M. First appears in Defi-
nition 3 of Sect. 5.1.
 p (l) The set of all the l-forms at a point p (i.e., on V p ). First appears in the
beginning of Sect. 5.6.
d Exterior differentiation operator (see Definition 3 of Sect. 5.1), e.g., dω
stands for the exterior differentiation of the differential form ω.
∧ Wedge product (see Definition 2 of Sect. 5.1).
ωμ ν Connection 1-form, also denoted by ωμ ν a . First appears in (5.7.4).
Rμ ν Curvature 2-form, also denoted by Rabμ ν . First appears in (5.7.7).
R Reference frame. First appears in Sect. 6.1.1.
a
D
dτ The covariant derivative along a curve G(τ ), e.g., Dv dτ means the same
as T ∇b v [T stands for the tangent vector of G(τ )].
b a b
DF
dτ The Fermi derivative along a curve G(τ ) (see Definition 1 of Sect. 7.3).
Re Take the real part.
Im Take the imaginary part.
(εμ )a The μth basis vector in a null tetrad. First appears in Sect. 8.7.

Reference

Wald, R. M. (1984), General Relativity, The University of Chicago Press, Chicago.


Index

A Big bang, 495, 504


Absolute simultaneity, 176 Bijection, 4
surface of, 176, 178, See also Exercise Binary star, 318
6.21 Binding energy, 509
Absolute time, 176, 178 Birkhoff’s theorem, 347, 355, 358, 453, 456
Abstract index notation, 55 Black hole, 317, 434, 439, 452
Abundance, 500, 508 Blackbody radiation (theorem, curve, spec-
Accumulation point, 16 trum), 213, 512
Active viewpoint, 107 Blueshift
Active viewpoint (language), 401 Doppler, 234
Adapted coordinate system, 111 B-mode, 514
Affine parameter, 84 Boost, 53, 115, 169
Age of the universe, 499, See also Exercise Boundary, 12, 139, 146
10.4
Angular frequency, 228, 249, 305, 413
Angular momentum, 411 C
Angular velocity 2-form, 254 Calibration curve, 174
Antisymmetric tensor, 60, 130 Cartan’s first equation of structure, 154
Arc length, 49, 344 Cartan’s second equation of structure, 154,
of a geodesic, 85, 347 361
parameter, 50 Cartesian coordinate system, 52
Arcwise connected, 136 Cartesian product, 2
Associated volume element, 142 Charge density, 219
Asymmetry of matter and antimatter, 507 Chart, 20
Atlas, 20 Christoffel symbol, 72, 79, 97, 153, 268
Average mass-to-light ratio, 518 contracted, 99
Axisymmetry, 355 equivalent definition of, 153
of a static spherically symmetric line ele-
ment, 342
B of the Schwarzschild metric, 345
Baryon, 508, 510, 515 of the Vaidya metric, 379, See also Exer-
Baryonic dark matter, 515 cise 3.4
Basis Clock synchronization, 170
dual, 38, 43 Closed differential form, 133, 226
dual coordinate, 40 Closed set, 12
orthonormal, 47 Closed universe, 479
Bianchi identity, 94, 283, 366 Closure, 12
© Science Press 2023 539
C. Liang and B. Zhou, Differential Geometry and General Relativity,
Graduate Texts in Physics, https://doi.org/10.1007/978-981-99-0022-0
540 Index

Cluster of galaxies, 468, 515 Cosmic rest frame, 474, 514


CMB, see cosmic microwave background Cosmic time, 481
radiation Cosmological constant, 501
CMB polarization, 321, 514 Cosmological scale, 468
COBE, 513 Cosmology, 467
Commutation relation, 366 Covariant derivative, 74, 79
Commutator, 33, 75, 112, 276 Covariant index, 60
Comoving observer, see observer Covariant vector, 60
Comoving reference frame, 212, 474 Covector, 120
Compactness, 13 Critical density, 436, 517
Compatible, 20, 141, 142, 147 Curvature singularity, 442
Complement, 2 Curvature tensor, 92, 100, 154
Complete vector field, 37, 110, 113 Curve, 28
Component index notation, 56 Cyclic identity, 94
Composite map, see map Cylindrical symmetry, 357, 370
Concrete index notation, see component
index notation
Congruence, 276 D
Conjugate points, 87, 281, 347 Dark matter, 515, 519
Connected manifold, 136 Deceleration parameter, 522
Connected topological space, 12, 136 Degeneracy pressure, 432
Connection, 77, 81 Degenerate electron gas, 432
Degenerate “metric”, 124, 179
Connection 1-form, 153, 361
De Morgan’s law, 2
Connection coefficients, 153
Density fluctuation, 515
Conservation equation, 209
Derivative operator, 67
Conservation of baryon number, 508
associated with a metric, 79
Conservation of mass, 193
covariant, 72
Conserved quantity, 195
non-commutativity of, 93, 100, 247
Constant map, see map
ordinary, 72
Continuity equation, 209, 214
torsion-free, 69
Continuous map, see map
Diffeomorphism, 22
Contraction, 44 group, 337, See also one-parameter
Contravariant index, 60 group of diffeomorphisms
Contravariant vector, 60 Difference, 2
Conversion factor, 528 Differentiable manifold, 19
Convex neighborhood, 484 Differential form, 132
Coordinate Differential structure, 21
basis, 27 Dipole anisotropy, 513
basis vector, 27 Discrete topology, 7
components, 27 Distance, 3, 339, 344, 387
line, 29, 111, 168, 253, 260, 267, 331, Doppler effect, 233, 462
341, 355, 398 Dual basis, see basis
patch, 20 Dual coordinate basis, see basis
singularity, 439, 442 Dual differential form, 149, 217, 249, 254
system, 20 Duality rotation, 404
time, 171 Dual space, 37
transformation, 20, See also Exercise 9.9 Dual vector, 37
Coordinate clock, 281 Dust, 214, 218, 286, 462
Coordinate condition, 398
Coordinates, 20
Coriolis force, 262 E
Cosmic microwave background radiation, Eddington-Finkelstein coordinates, 457, See
493, 511, 513 also Exercise 9.13
Index 541

Einstein (field) equation, 282 F


linearized, 288 Faster-than-light, 167
vacuum, 285, 341, 396, 453 Fermi energy, 432
with source, 285, 349, 399, 422 Fermi momentum, 435
Einstein’s elevator, 267, 275, 279 Fermi-Walker derivative, 251
Einstein’s equation with source, see Einstein Fermi-Walker transport, 252
(field) equation Finite subcover, 13
Einstein equivalence principle (EEP), 267 Fluid particle, 212
Einstein-Maxwell equations, 350, 370 Foliation, 469
Einstein spacetime, 405 leaf, 469
Einstein’s static universe, 503 homogeneous, 471
Einstein tensor, 284 4-acceleration, 187, 201, 244, 246, 250
linearized, 288 4-current density, 219
4-force, 203
Electric field, 217, 350, 354
4-momentum, 200, 389
Electromagnetic 4-potential, 226, 247, 288
conservation of, 201
equation of motion of, 226, 248
density, 208
gauge freedom of, 226, 289
4-potential, see electromagnetic 4-potential
of the RN solution, 351 4-velocity, 195
Electromagnetic field 4-velocity field, 211
nonnull, 350, 354, 358, 372 Frame, 154, 205, 411
null, 350, 359, 369, 372 Frame of reference, see reference frame
Electromagnetic field tensor, 220, 223, 225, Freely falling observer, 263, 462, 464
245 Freely falling reference frame, 276
in the NP formalism, 369, 372 Free point mass, 178, 202, 242, 247, 263,
Electromagnetic wave, 226, 248, 348 272
Electron degeneracy pressure, 432 Friedmann equation, 495
Electrovacuum, 349, 355, 371 Friedmann-Lemaître-Robertson-Walker
Elliptical polarization, 232 model, 498
Embedded submanifold, 119 Future (past)-directed, 177
Embedding, 119
Embedding diagram, 454
E-mode, 514 G
Empty set, 1 Galaxy, 468, 515
Energy conservation, 193, 209, 212 Galilean coordinates, 178
Energy density, 206, 211, 225, 284, 424, 512 Garage paradox, 188
Gauge freedom (gauge transformation)
Energy flux density, 207, 225, 390
of angular velocity, 255, 257, 258
Energy-momentum tensor, 207
of electromagnetic 4-potential, 226, 289,
of an electromagnetic field, 225
351
of a perfect fluid, 211, See also Exercise
of general relativity, 402
6.17, 6.18
of the linearized theory of gravity, 289
Equivalence principle, 267 Gaussian normal coordinates, 398
Euclidean group, 357 Gauss’s theorem, 148, 209
Euclidean metric (space), 51 General covariance, principle of, 245
Euler equation, 214 Generalized Riemannian space, 50
Event, 163 Geodesic, 83, 178, 407
Event horizon, 452, 456, 459 Geodesic deviation equation, 278, 282, 322
Exact differential form, 133, 226 Geometric optics approximation, 229, 248,
Exponential map, see map 413
Extension, 34, 92, 349, 440, 445 Geometrized unit system, 527
Kruskal, 449 Gravitational collapse, 317, 434, 439, 456
Exterior differentiation, 132 Gravitational lensing, 515
Extrinsic curvature, 100 Gravitational mass, 241
542 Index

Gravitational potential, 178, 239, 275, 294, Integral curve, 34, 228
425 Integral of a function, 145
Gravitational radiation, 296, 348 Integration on manifolds, 134
Gravitational redshift, 414, 461, 463 Interior, 12
Gravitational wave, 296 Interior Schwarzschild solution, 428
cross-polarized, 307 Intersection, 2
plus-polarized, 307 Intrinsic curvature, 92, 100
polarization modes of, 299, 324 Invariant, 195
primordial, 515 Inverse image, 3
Graviton, 308, 326 Inversion, 54
Group, 35 Isometry, 113, 333, 336, See also Exercise
4.12
Isometry group, 337
H Isotropic coordinate system, 397
Harmonic coordinate condition, 398 Isotropic observer, 471
Harmonic function, 398 Isotropic reference frame, see reference
Hausdorff space, 14 frame
Hodge dual, see dual differential form Isotropic spacetime, 471
Homeomorphism, 9 Isotropy, 212, 468
Homogeneity, 468, 471
spatial homogeneity, 469
Homogeneous, 471 J
Hubble constant, 488 Jacobi identity, see Exercise 2.8
Hubble-Lemaître law, 488
Hubble parameter, 489
Hubble tension, 500 K
Hulse-Taylor binary, 318 Killing equation, 114
Hydrostatic equilibrium, 426 Killing vector field, 113
Hypersurface, 119 Kinnersley metric, 383, 387
null, 122, 176, 228, 315, 452 Kruskal extension (coordinates), 446
spacelike, 122, 170, 176, 398
timelike, 122
Hypersurface orthogonal, 333, 340, 453 L
CDM model, 523
Laser interferometer, 319
I Leaf, 469, 470
Identity map, see map Left-handed system (basis), 136
Incomplete geodesic, 440, 449 Length contraction, 179, 189, 219
Incomplete vector field, see complete vector Lie derivative, 110
field Lightlike vector, 48
Indiscrete topology, 7 LIGO, 319
Inertial coordinate system, 165, 168 Line element, 50
Inertial coordinate time, 171 induced, 84
Inertial force, 262, 267 Linear approximation, 287
Inertial mass, 241 Linearized Einstein (field) equation, 288
Inertial reference frame, see reference frame Linearized Einstein tensor, 288
Inextensible integral curve, 35 Linearized Riemann tensor, 288
Infinitesimal coordinate transformation, 290 Local inertial frame, 248, 269
Inflation, 317, 467 Locally unique, 34, 85
Injection, 4 Local Lorentz frame (system), 269
Instantaneous observer, see observer Local measurement, 196, 410, 424
Instantaneous rest (inertial) reference frame Lorentz contraction, see length contraction
(observer, coordinate system), 199, Lorentz covariance, 189, 239
263, 387 Lorentz 4-force, 223, See also Exercise 6.18
Index 543

Lorentzian coordinate system, 53, 118, 260, Neutrino background, 508


269, 478 Neutrino decoupling, 508
Lorentzian metric, see metric Neutron degeneracy pressure, 433
Lorentz transformation, 117, 169 Neutron star, 317, 433
Lorenz gauge, 226 Newman-Penrose formalism, 360
of linearized gravity, 289, 296 Newtonian gravity
Luminosity density, 518 4-dimensional formulation of, 177
Newtonian limit, 292
Newtonian spacetime, 178
M Non-degenerate, 47, 58
Mach’s principle, 240, 469 Non-locality of the gravitational field energy,
Macroscopic point, 468 425
Magnetic field, 217, 350, 351 Non-rotating observer, 249, 263, 267
Manifold, 19 Normal coordinates, see Gaussian or Rie-
Manifold with boundary, 139 mannian normal coordinates
Map, 3 Normal covector, 120, 228, 315
composite, 4 Normal neighborhood, 90
constant, 4 Normal vector, 121, 228, 315
continuous, 5, 9 NP equations, 364, 371
exponential, 89 Null curve, 49
identity, 20 Null electromagnetic field, see electromag-
one-to-one, 4 netic field
onto, 4 Null hypersurface, see hypersurface
projection, 124, See also Exercise 1.9 Null tetrad, 360, 367, 381
Mass conservation, 214 Null vector, 48, 360, 380
Mass defect, 193 Number of neutrino species, 511
Mass dipole moment, 316 Numerical-value equation, 527
Matter, 493
Matter field, 206
Maxwell’s equations, 220, 350
in NP formalism (source-free), 368 O
in NP formalism (with source), 405 Observer, 164
Metric, 47 comoving, 211
indefinite, 47 inertial, 167
induced, 122, 125, 147 instantaneous, 196
Lorentzian, 47 instantaneous rest, 199
negative definite, 47 instantaneous rest inertial, 199, 263, 387
positive definite, 47 non-rotating, 249, 263, 267
Microwave background, 511 stationary (static), 334
Milky Way, 468 One-parameter family of geodesics, 276
Minimal substitution rule, 246 One-parameter group of diffeomorphisms,
Minkowski metric (space, spacetime), 53, 36, 110, 113
166 One-parameter group of isometries, 113
Mode +, 307, 324 One-parameter local group of diffeomor-
Mode ×, 307, 324 phisms, 37
Momentum density, 207, 225, See also Exer- One-to-one map, see map
cise 6.17 Onto map, see map
Momentum flux density, 208, 390 Open (sub)set, 7
Open ball, 7
Open cover, 13, 19
N Open disk, 8
Natural coordinates, 3, 19 Open universe, 479
Natural unit system, 532 Oppenheimer-Volkoff equation, 426
Neighborhood, 11 Orbit, 36, 337
544 Index

Orbit sphere, 337, 408 Pullback map, 105


Orientable manifold, 135 Pure radiation field, 380, 388, 391
Orientation, 135, 139 Pushforward map, 105
induced, 140, 147
Orthonormal, 47
Q
Quadrupole anisotropy, 513
P Quantity equation, 527
P.p. curvature singularity, 442 Quantum gravity, 308, 505
Parallel (vectors), 31
Parallel transport, 75, 80
R
Parameter, 28, 36
Radiation, 213, 380, 493
Parametric representation (equation), 29
Radiation gauge, 297
Parametrization, 28, 49
of linearized gravity, 298
Passive viewpoint (language), 107, 401
Radius, 340
Perfect fluid, 211, 422
Schwarzschild, 439, 529
Perihelion precession, 415
Rate, 170
Perturbation, 515 Ratio of the densities of the baryons and pho-
Phase, 227 tons, 510
Photon, 164, 167, 230 Recessional velocity, 488
background, 512 Red giant, 431
decoupling, 512 Redshift, 453, 461, 488
energy, 230 cosmological, 490
momentum, 230 Doppler, 234, 463
rocket, 389 gravitational, 414, 463
Photon gas, 213 Redshift parameter, 414
Planck energy (mass), 533 Reference frame, 165
Planck length, 506, 534 geodesic, 276, 321
Planck’s law of blackbody radiation, 512 inertial, 168, 172
Planck time, 505, 533 isotropic, 213, 471, 480
Planck unit system, 532 static, 334
Plane symmetry, 357 stationary, 334
Polarization Reflection, 52, 53
cross-polarized, 307 Regular embedding, 119
plus-polarized, 307 Reissner-Nordström (RN) metric (line ele-
Polarization direction, 310 ment), 354
Polarization vector (tensor), 227, 303 Relativistic mass, 191
Polytrope, 437 Relativistic particle, 507
Pp-wave, 305 Reparametrization, 28
Preimage, 3 Rest energy, 193
Primordial nucleosynthesis, 508 Rest mass, 191, 206
Primordial perturbation, 516 Restriction, 138, 139
Product topology, 8 Retarded distance (time), 387
Projection map, see map Revolution, 262
Proper coordinate system, 259 Ricci flat, see Exercise 7.7
Proper distance, 345 Ricci rotation coefficients, 156, 361
Proper number density, 219 Ricci tensor, 96, 363
Proper subset, 1 components in a null tetrad, 363
Proper time, 164, 170 first-order (linear) approximation of, 288
Pseudo-Cartesian coordinates, 53 Riemann (curvature) tensor, 92, 153
Pseudo-Euclidean space, 93 of the Schwarzschild metric, 345
Pseudo-Riemannian space, 51 Riemannian normal coordinates, 91
Pseudo-rotation, 255 Riemannian space, 51
Index 545

Riemann tensor Stationary spacetime, 331


linearized, 288 Stokes’s Theorem, 139
Right-handed system (basis), 136, 140, 142, Strong equivalence principle (SEP), 272
143, 146, 148, 151 Structure formation, 515
Rigid frame, 156, 360 Subset, 1
Rindler spacetime (coordinates), 442 Supercluster, 468, 515
Robertson-Walker metric, 483 Supernova, 317, 434, 522
Rotation, 52, 116, 254, 262 Surface of homogeneity, 471
Rotation curve, 518 Surface of infinite redshift, 453
Surjection, 4
Symmetric tensor, 47, 60
S Synchronous gauge, 300
Scalar curvature, 96, 285, 364
Scalar field, 23, 26, 351
Scalar multiplication, 23 T
Scale factor, 484 T2 space, 14
Schwarzschild radius, 439, 529 Tangent space (tangent plane), 31, 119, 269
Semi-plane symmetric, 359 Tangent vector, 31, 119
Sequence, 16 Tensor, 42
Setting, 170 multifaceted view of, 43, 56, 322
Signature, 47 product, 43, 55
Simultaneity transformation law, 46
absolute, 176, 178 Tensor field, 46
relative, 175, 176 Tetrad, 205, 252, 360, 367, 381, 411
surface of, 175, 334, 470 3-acceleration, 187, 195, 202, 241, 261, 278
Singularity, 439 >big bang, see big bang 3-current density, 219
coordinate, 439, 442 3 + 1 decomposition, 175, 469
spacetime, 440, 441, 450, 504 3-force, 195, 204
theorems, 504, See also Exercise 9.9 3-momentum, 195, 200, 389
Singular spacetime, 441 conservation of, 201
Slice, 470 of a photon, 230
Slicing, 469 3-momentum flux density (tensor), 208, 390
Smooth, 10, 19, 23, 32, 40, 46 3-speed, 198
Space of constant curvature, 476 3-velocity, 195, 198, 261, 278
Spacelike curve, 49 Tidal acceleration, 275, 278, 296, 322
Spacelike hypersurface, see hypersurface Tilde force (effect), 273, 346, 460
Spacelike vector, 48 Time, 469
Spacetime, 163, See also Exercise 5.9 Time dilation, 181, 461
Spacetime diagram, 164, 172, 462 Timelike curve, 49
Spacetime rotation, 254 Timelike hypersurface, see hypersurface
Spatial distance, 344, 387, See also Exercise Timelike vector, 48
8.8 Time-orthogonal coordinate system, 334
Spatial homogeneity, 470, 471 Topological space, 7
Special relativity, 469 Topology, 6
Spherically symmetric metric field, 337 induced, 9
Spherically symmetric spacetime, 337 Torsion 2-form, 155
Spin coefficients, 363 Torsion tensor, 155, See also Exercise 3.1
Standard clock, 164, 170 Tortoise coordinate, 378, 447
Standard cosmological model, 498, 515 Totally (anti)symmetric tensor, 61
Standard model, see standard cosmological Trace, 44, 95, 350, See also Exercise 8.4
model Translation, 36, 52, 53, 116, 168
Static reference frame (observer, spacetime), Translational invariance, 35, 115, 332, 357
334 Transverse gauge, 299
546 Index

Transverse-traceless (TT) gauge, 298, 303, Vector space, 23


See also Exercise 7.9 Volume, 143
Triad, 205, 254, 411 Volume element, 141
Trivial manifold, 20, See also Exercise 2.2 compatible with the metric (orientation),
True singularity, 439 142, 147
Twin paradox, 186 induced, 147

U W
Union, 2 Wave 3-vector, 228, 305
Universe, 467 Wave 4-vector, 228, 248, 303, 413
Usual topology, 7 Wavefront, 228
Weak Equivalence Principle (WEP), 267
Weber bar, 318
V Wedge product, 130
Vacuum Einstein equation, see Einstein Weyl tensor, 96, 363, 374
(field) equation component in a null tetrad, 363, 374
Vacuum Schwarzschild solution, 341 White dwarf, 433
Vaidya metric, 378 White hole, 452
Vector, 24 World line, 164
(components) transformation law, 28 World sheet, 175, 180, 228, See also Exer-
Vector field, 32 cise 5.10

You might also like