0% found this document useful (0 votes)
342 views346 pages

A Second Course in Analysis: M. Ram Murty

Uploaded by

林Arthur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
342 views346 pages

A Second Course in Analysis: M. Ram Murty

Uploaded by

林Arthur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 346

HBA Lecture Notes in Mathematics

IMSc Lecture Notes in Mathematics

M. Ram Murty

A Second Course
in Analysis
HBA Lecture Notes in Mathematics

IMSc Lecture Notes in Mathematics

Series Editor
Sanoli Gun, C.I.T. Campus, Institute of Mathematical Sciences, Chennai,
Tamil Nadu, India

Editorial Board
R. Balasubramanian, Institute of Mathematical Sciences, Chennai
Abhay G. Bhatt, Indian Statistical Institute, New Delhi
Yuri F. Bilu, Université Bordeaux I, France
Partha Sarathi Chakraborty, Institute of Mathematical Sciences, Chennai
Carlo Gasbarri, University of Strasbourg, Germany
Anirban Mukhopadhyay, Institute of Mathematical Sciences, Chennai
V. Kumar Murty, University of Toronto, Toronto
D. S. Nagaraj, Institute of Mathematical Sciences, Chennai
Olivier Ramaré, Centre National de la Recherche Scientifique, France
Purusottam Rath, Chennai Mathematical Institute, Chennai
Parameswaran Sankaran, Institute of Mathematical Sciences, Chennai
Kannan Soundararajan, Stanford University, Stanford
V. S. Sunder, Institute of Mathematical Sciences, Chennai
The IMSc Lecture Notes in Mathematics series is a subseries of the HBA Lecture
Notes in Mathematics series. This subseries publishes high-quality lecture notes
of the Institute of Mathematical Sciences, Chennai, India. Undergraduate and
graduate students of mathematics, research scholars, and teachers would find this
book series useful. The volumes are carefully written as teaching aids and highlight
characteristic features of the theory. The books in this series are co-published with
Hindustan Book Agency, New Delhi, India.

More information about this subseries at https://link.springer.com/bookseries/15465


M. Ram Murty

A Second Course in Analysis

123
M. Ram Murty
Department of Mathematics and Statistics
Queen’s University
Kingston, ON, Canada

ISSN 2509-8063 ISSN 2509-8071 (electronic)


HBA Lecture Notes in Mathematics
ISSN 2509-808X ISSN 2509-8098 (electronic)
IMSc Lecture Notes in Mathematics
ISBN 978-981-16-7246-0 (eBook)
https://doi.org/10.1007/978-981-16-7246-0
This work is a co-publication with Hindustan Book Agency, New Delhi, licensed for sale in all countries
in electronic form only. Sold and distributed in print across the world by Hindustan Book Agency, 11,
Zamrudpur Community Centre, Kailash Colony Extension, New Delhi 110048, India. ISBN:
978-93-86279-84-2 © Hindustan Book Agency 2020.

Jointly published with Hindustan Book Agency

Mathematics Subject Classification: 26-01, 30-01, 42-01, 55-01

© Hindustan Book Agency 2022


This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publishers, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publishers nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publishers remain neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
This therefore is Mathematics:
She reminds you of the invisible forms of the
soul; She gives life to her own discoveries;
She awakens the mind and purifies the
intellect; She brings light to our intrinsic
ideas.
—Proclus (412–485 CE)
Preface

This book is based on graduate courses given by author since 2007 at Queen’s
University. The course was meant to bring the typical graduate student (in pure
mathematics) up to speed and orient his or her mind toward the research frontier.
This is a difficult task. On the one hand, one cannot begin again a program of
instruction covered in at least half a dozen undergraduate courses. On the other
hand, if the course is pitched at too high a level, there will be a high dropout rate
and the student is prone to say “I used to like mathematics until I took this course!”
Therefore, a balance must be achieved in the process of instruction.
For the most part, we may assume our student was a teenager when he or she
began their college education and drilled in the routine theorems of single variable
and multi-variable calculus, real and complex analysis, probability and measure
theory. But now at the graduate level, not only is a synthesis of these topics
appropriate, but it is essential for the formation of the research mind. Such a mind
has to re-examine many of the subtle notions and to have some understanding of
how mathematics evolved over the centuries. Perhaps, through such a presentation,
the student may find the necessary subliminal suggestions that form the intuitive
basis of understanding of mathematics at a deeper level.
In this context, the student will understand that the process of mathematical
research follows a basic template involving six steps. First, one identifies the need
to quantify or make precise a particular idea. This is a process of pattern recognition
and requires the student to have a wealth of examples already learned from his or
her undergraduate days. Concepts have an evolutionary history, and they often arise
as a response to a particular need to understand phenomenon in the physical world.
For instance, calculus arose from the study of physical motion. Multivariable cal-
culus arose from a study of motion in three dimensions and a deeper study of
electrostatics and fluid dynamics. Mathematical rigor and the need to establish
foundations for mathematics arose to avoid pathologies and logical contradictions.
Thus, the process of identification leads to a precise mathematical definition. This
is the source of power in mathematics. What is often referred by the multitude at
large in hazy and nebulous terms is made precise and quantified by the mathe-
matician. Having made the definition, the mathematician should then provide

vii
viii Preface

examples in which the mathematical principle that has been identified and defined
is highlighted and the student can analyze them and see a common thread that
unites the seemingly diverse array of examples. Such an intuitive understanding
leads to the formulation of a theorem or even a collection of theorems. Having the
theorem in hand not only allows one to understand all the disparate examples in one
unifying comprehensive glance, but it allows for further applications and exten-
sions. This is the predictive power of mathematics. The six-step process of higher
mathematical learning at the graduate level can then be summarized by the
appropriate acronym “ideate.”
This text assumes therefore that the reader is a graduate student who has had the
standard regimen of mathematical courses at the undergraduate level. Often, in such
a phase, one gains only information and not intuition that is essential for the
research mathematician. It is hoped that this text can form the basis for a
semester-long or year-long graduate course in analysis so as to prepare the student
for a career as a research mathematician. The author has tested the material over a
span of two decades and can certify that indeed the presentation here takes the
student through the sacred rites needed to enter the sanctum sanctorum of the
temple of mathematics.

Kingston, Ontario M. Ram Murty


Acknowledgements

This book is based on graduate courses given by the author at Queen’s University
since 2007. I would like to thank Drs. Jung-Jo Lee and Purusottam Rath for helping
to put into LaTeX a large part of these notes. I thank Drs. Akshaa Vatwani, Siddhi
Pathak, Kumar Murty, Steven Spallone, Seoyoung Kim and the referees for reading
sections of the original manuscript and giving me feedback.

Kingston, Ontario M. Ram Murty

ix
Contents

1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Axioms of Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Constructing Numbers from Sets . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Set-Theoretic Construction of the Real Numbers . . . . . . . . . . . . 10
1.4 Sequences of Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Infinite Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6 Sequences of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.7 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.8 Metric Spaces and Euclidean Spaces . . . . . . . . . . . . . . . . . . . . . 40
1.9 The Heine–Borel Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.10 Vector-Valued Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.11 Derivatives of Multivariable Functions . . . . . . . . . . . . . . . . . . . . 52
1.12 The Inverse Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.13 The Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 67
1.14 The Lagrange Multiplier Method . . . . . . . . . . . . . . . . . . . . . . . . 70
1.15 Level Sets and Tangent Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 73
1.16 Changing Variables in Integrals . . . . . . . . . . . . . . . . . . . . . . . . . 75
1.17 Volume and Surface Area of the Hypersphere . . . . . . . . . . . . . . 80
1.18 Green’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
1.19 Theorems of Gauss and Stokes . . . . . . . . . . . . . . . . . . . . . . . . . 87
1.20 Differential Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2 Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.1 Topological Spaces and Measure Spaces . . . . . . . . . . . . . . . . . . 103
2.2 The Lebesgue Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
2.3 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
2.4 Orthonormal Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
2.5 Trigonometric Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
2.6 Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

xi
xii Contents

2.7 Baire’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133


2.8 Hahn–Banach Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2.9 Examples of Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
3 Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3.1 Fubini’s Theorem and Convolutions . . . . . . . . . . . . . . . . . . . . . 145
3.2 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3.3 Differentiation Under the Integral Sign . . . . . . . . . . . . . . . . . . . . 152
3.4 Further Examples of Fourier Transforms . . . . . . . . . . . . . . . . . . 156
3.5 A Convolution Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
3.6 The Inversion Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
3.7 Further Properties of the Fourier Transform . . . . . . . . . . . . . . . . 164
3.8 The Plancherel Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
3.9 The Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
3.10 Trigonometric Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
3.11 The Isoperimetric Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
3.12 Weyl’s Criterion and Uniform Distribution . . . . . . . . . . . . . . . . . 178
3.13 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
3.14 The Poisson Summation Formula . . . . . . . . . . . . . . . . . . . . . . . 185
3.15 A Fourier Analytic Proof of the Central Limit Theorem . . . . . . . 192
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
4 Complex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4.2 Integration over Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
4.3 The Local Cauchy Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
4.4 Zeros and Singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
4.5 The Maximum Modulus Principle . . . . . . . . . . . . . . . . . . . . . . . 218
4.6 The Global Cauchy Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
4.7 The Calculus of Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
4.8 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
4.9 Rouché’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
4.10 Infinite Products and Weierstrass Factorization . . . . . . . . . . . . . . 243
4.11 The Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
4.12 The Phragmén–Lindelöf Theorem and Jensen’s Theorem . . . . . . 256
4.13 Entire Functions of Order 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
4.14 The Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
4.15 Stirling’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
4.16 The Wiener–Ikehara Tauberian Theorem . . . . . . . . . . . . . . . . . . 272
4.17 The Analytic Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
4.18 The Proof of the Tauberian Theorem . . . . . . . . . . . . . . . . . . . . . 277
Contents xiii

4.19 The Prime Number Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 280


4.20 Further Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
4.21 The Paley–Wiener Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
5 Introduction to Algebraic Topology . . . . . . . . . . . . . . . . . . . . . . . . . 293
5.1 A Very Brief Historical Introduction . . . . . . . . . . . . . . . . . . . . . 293
5.2 Homotopic Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
5.3 The Fundamental Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
5.4 Examples of Some Fundamental Groups . . . . . . . . . . . . . . . . . . 301
5.5 Covering Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
5.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
5.7 Group Actions and Orbit Spaces . . . . . . . . . . . . . . . . . . . . . . . . 315
5.8 Automorphisms of Covering Spaces . . . . . . . . . . . . . . . . . . . . . 320
5.9 The Universal Covering Space . . . . . . . . . . . . . . . . . . . . . . . . . 325
5.10 Suggestions for Further Reading . . . . . . . . . . . . . . . . . . . . . . . . 328

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Chapter 1
Background

1.1 Axioms of Set Theory

The concept of number lies at the heart of mathematics. The subject of mathematical
analysis, in the formal sense of the word, began when mathematicians initiated the
exploration of infinity as a precise mathematical idea. This perhaps is the greatest
event of the nineteenth century, on par with the discovery of zero as a mathematical
idea. Both zero and infinity are now essential to the understanding of limits and the
related processes of real and complex analysis.
Because these two ideas were poorly understood, mathematicians in the nineteenth
century introduced the axiomatic method with the theory of sets as the foundation
for all mathematics. In this approach, Georg Cantor (1845–1918) was definitely the
leading figure. Sadly, his work received considerable opposition at first, but is now
universally accepted. It is not an exaggeration to say that research into the foundations
of mathematics has given us a larger understanding of the nature of mathematical
logic, both its strengths and its limitations.
Set theory is the standard foundation of all mathematics. Mathematical proofs use
set-theoretic ideas in one form or another. Below, we give a quick overview of the
axioms of set theory so that the student is introduced to the relevant vocabulary. One
need not become overly preoccupied with these formalities for otherwise a serious
study of these axioms will take us into the realm of mathematical logic which is
not the province of this short survey. Rather, we indicate the main themes that are
relevant to the later chapters, very much similar to a quick sight-seeing tour of an
ancient city. There will undoubtedly be many side streets to explore and since time is
of the essence, we need to be content with a one-hour tour highlighting the significant
monuments. For a gentle introduction to mathematical logic, we refer the reader to
[1].
The basic objects of set theory are classes, some of which are sets. An informal
definition of a set is that it is an unordered collection of objects. The basic relation
between them is membership indicated by the symbol ∈. Thus x ∈ A means that x
is a member of A. Its negation, x is not a member of A, is written x ∈ / A. Already,

© Hindustan Book Agency 2022 1


M. R. Murty, A Second Course in Analysis, IMSc Lecture Notes in Mathematics,
https://doi.org/10.1007/978-981-16-7246-0_1
2 1 Background

with the notion of a set and the relation of membership, we can easily run into
logical contradictions if we do not formulate proper axioms for the construction
of sets. The famous “Russell paradox” (named after the British philosopher and
mathematician Bertrand Russell (1872–1970)) shows that the “set of all sets” is not
a set. Indeed, if it were a set, it would be a proper member of itself by definition,
which is a contradiction. The paradox is often posed as the following riddle: a barber
shaves anyone who does not shave himself; does the barber shave himself? The
contradiction arises from self-reference and so it was quickly realized that to avoid
logical contradictions in mathematics, one must formulate precise axioms for the
construction of sets. In a formal system of axiomatic set theory, the notions of “class”
and “∈” are undefined terms. There are several well-developed axiomatic systems
such as the Zermelo–Frankel system and the von Neumann–Gödel–Bernays system.
Our treatment below follows the latter.
The underlying idea of any of these systems is that we construct new sets from
old sets in a systematic manner, with the existence of the empty set as a basic axiom.
This is how we avoid the Russel paradox.
The words “and” and “not” are used as propositional connectives. We usually write
these words out and do not use symbols for them, though some logicians do. To say
that “Proposition P implies Proposition Q” we write “P =⇒ Q”. If P =⇒ Q and
Q =⇒ P, we write P ⇐⇒ Q. The usual logical quantifiers are: ∀ which means
“for all” and ∃ which means “there exists”. The first one is sometimes called the
universal quantifier and the second one as the existential quantifier. The statement,
“For all x, the proposition F(x) holds” is denoted ∀x (F(x)) with the parentheses
used to designate the proposition. Such a statement is called a universal proposition.
A statement of the form “There exists x such that F(x) holds” is written ∃x (F(x)).
Such a statement is called an existential proposition. Using class variables like
x, y, A, B and so on, together with the basic relation of membership ∈, quantifiers,
connectives and parentheses, we can construct “well-formed” formulas. If variables
are quantified, they are called “bound” and should only appear after the quantifier.
Variables without quantifiers are called “free”. For example,

∀x∃y((x ∈ y) =⇒ (x ∈ z))

is well-formed in which x and y are bound variables and z is free, whereas

∃x((x ∈ z) =⇒ ∀z(z ∈ y))

is not well-formed since z precedes its quantifier.


As noted earlier, it is not our goal here to develop axiomatic set theory. On the
other hand, the mature mathematician cannot ignore the need for foundations and
must have some idea of the basic axioms and notation. So we content ourselves with a
cursory overview of the basic axioms and set the stage for both notation and structure
of logical proofs.
1.1 Axioms of Set Theory 3

The essential idea of set theory is that sets are built up in a progressive hierarchy
and there are precise rules as to how to form new sets from old sets. We begin with
the axiom of extensionality which allows us to determine when two sets are equal.
Axiom 1: (extensionality) For any classes A and B, A = B if and only if

∀x(x ∈ A ⇐⇒ x ∈ B).

This axiom can be taken as the definition of equality.


Axiom 2: (existence of the empty set) There is a set ∅ such that

∀x(x ∈
/ ∅).

The axiom of extensionality implies that the empty set ∅ is unique.


Axiom 3: (unordered pairs) For all sets x, y there is a set {x, y} such that

u ∈ {x, y} ⇐⇒ (u = x or u = y).

If x = y, we obtain the singleton {x}: u ∈ {x} if and only if u = x. Again, these sets
are unique by extensionality.
As defined earlier, a variable is called free if there are no quantifiers defining it.
Otherwise, the variable is said to be bounded. This concept is used in the next axiom
and allows us to form new sets from old using well-formed formulas. In fact, the next
axiom is more appropriately called an axiom schema or a collection of axioms.
Axiom 4: (selection) Given any well-formed formula P(x) containing one free vari-
able x, there is a class A such that for any set x, we have x ∈ A if and only if P(x)
holds.
We often write
A = {x : P(x)}

and read this as “A is the class of all (sets) x such that P(x) holds.”
The distinction between sets and classes in the last axiom is important as noted
earlier. Without it, we fall into “Russell’s paradox”. Indeed, let R = {x : x ∈
/ x}. If R
were a set, then R ∈ R implies R ∈ / R and if R ∈ / R, then R ∈ R, a contradiction in
both cases. Technically, R is not a set but a proper class consisting of all sets which
we call “the universe” and the theory contains no logical contradiction. The previous
axioms do allow us to construct new sets from old ones in a systematic manner. Thus,
a set may consist of other sets that have already been well-defined.
Axiom 5 : (union) For any set x, there is a set ∪x such that y ∈ ∪x if and only if
y ∈ z for some z ∈ x.
We sometimes write ∪ y∈x y for ∪x in the last axiom. Thus,

x ∪ y = ∪{x, y} = {z : z ∈ x or z ∈ y}.
4 1 Background

For any well-formed formula Q(x) with a free variable x,

(∀x ∈ A) Q(x)

means
(∀x)(x ∈ A =⇒ Q(x)).

The relation of A being a subset of B is defined by the symbol

(∀x ∈ A) (x ∈ B)

and we write this as A ⊂ B or equivalently B ⊃ A.


Axiom 6: (power set axiom) For any set x, there is a set 2x such that y ∈ 2x if and
only if y ⊂ x.
The reader should understand that the symbol 2x is pure symbology for the set of
all subsets of x and that no exponentiation is meant since x is a set. It will always be
clear from the context which meaning is indicated.
For any two classes A and B, we define the intersection as

A ∩ B = {x : x ∈ A and x ∈ B}.

If A is a set, then A ∩ B is a set for any class B by Axiom 6 and the definition of a
set. If x is a non-empty set, its intersection is defined by

∩x = {y : (∀u ∈ x) =⇒ y ∈ u}.

This is easily seen to be a set. We can also write ∩x as



y.
y∈x

We can use set theory to define an ordered pair (x, y) as the set of the form

{{x}, {x, y}}.

Indeed, for any ordered pair Q, its first member is the unique x such that {x} ∈ Q
and its second member is the unique y such that Q = {{x}, {x, y}} where x is the
first member.
A relation is any set of ordered pairs. For any relation E, we define the inverse
relation by
E −1 := {(y, x) : (x, y) ∈ E}.

From the definition of an ordered pair, we are led to the definition of a function
as a set f of ordered pairs such that (x, y) ∈ f and (x, z) ∈ f implies y = z. We
1.1 Axioms of Set Theory 5

write f (x) = y to mean (x, y) ∈ f for any given function f . Thus, a function is a
special kind relation. In other words, all functions are relations but not all relations
are functions.
If f is a function, f −1 is not necessarily a function. A function f is called one-
to-one if and only if f −1 is a function.
This now allows us to construct the Cartesian product of two sets X and Y ,
denoted X × Y , which consists of ordered pairs (x, y) with x ∈ X and y ∈ Y .
With the axioms listed so far, we cannot prove the existence of an infinite set. The
next axiom allows us to do this along with our earlier axioms.
Axiom 7: (axiom of infinity) There is a set N = ∅ such that for all x ∈ N , x ∪ {x} ∈
N.
This axiom allows us to construct the natural numbers using the empty set because
we can now define inductively the sequence of sets

∅, {∅}, {∅, {∅}}, . . .

and define the non-negative integers using the “zero symbol” 0 to designate the empty
set and define
1 := {0}, 2 := {0, 1}, 3 := {0, 1, 2}, . . .

We could have also stated Axiom 7 in logical notation as: ∃N such that ∅ ∈ N
and ∀x ∈ N ,
x ∪ {x} ∈ N .

To prevent x ∪ {x} from being equal to x, and in general to keep order among sets
and prevent unwanted closed cycles for the membership relation, we need a further
axiom:
Axiom 8: (axiom of regularity) For any set A = ∅, (∃X ∈ A) X ∩ A = ∅.
This allows us to prove:

Theorem 1.1 For all x and y, x ∈


/ x and if x ∈ y, then y ∈
/ x.

Proof By Axiom 8, x ∩ {x} = ∅ since x ∩ {x} = x ∩ y∈x y. If x ∈ x, then clearly


x ∈ x ∩ {x} which implies x ∈ ∅, a contradiction. This proves the first half of our
assertion. For the second half, if x ∈ y then by Axiom 8, the set {x, y} has a member
disjoint from it. This member cannot be x for otherwise x ∈ x, a contradiction by
the first half of the theorem. If the member is y, then y ∩ {x, y} = ∅ is also a
contradiction since x ∈ y. 
6 1 Background

Exercises

1. Recall that an ordered pair (a, b) can be defined as the set {{a}, {a, b}}. Show that
(a, b) = (c, d) if and only if a = c and b = d.

2. Define the ordered triple (a, b, c) to be the ordered pair ((a, b), c) where the
ordered pair is defined as usual. Show that

(a, b, c) = (a , b , c )

if and only if a = a , b = b and c = c .

3. If x ∈ y ∈ z ∈ w, prove that w ∈
/ x.

4. Which of the following are functions and why?


(a) {(1, 2), (2, 3), (3, 1)};
(b) {(1, 2), (2, 3), (2, 1)};
(c) {(2, 1), (3, 1), (1, 2)};
(d) {(x, y) ∈ R2 : x = y 2 };
(e) {(x, y) ∈ R2 : y = x 2 }.
5. For any relation V (that is, any set of ordered pairs), define the domain of V as

{x : (x, y) ∈ V for some y}

and the range of V as


{y : (x, y) ∈ V for some x}.

Find the domain and range for each relation in the previous question whether or
not it is a function.

1.2 Constructing Numbers from Sets

Mathematicians and philosophers of the nineteenth century pondered deeply into the
nature of a number. The question of “what is a number?” is not a simple one. But
since mathematicians decided to give foundations of mathematics using the axiomatic
method and sets as the basic building blocks, we are led to define numbers using
sets. We follow Richard Dedekind (1831–1916) and Giuseppe Peano (1858–1932)
in the following construction. It was as late as 1888 and 1889 when this construction
was described in two papers written independently by Dedekind and Peano.
We construct a sequence of sets to represent the natural numbers. As noted earlier,
zero is represented by the empty set. We have already described the construction of
1.2 Constructing Numbers from Sets 7

the natural numbers using the empty set. For each natural number n, the successor
of n is denoted n + 1 (and sometimes as n ) and defined as

n ∪ {n}.

Thus, each natural number n is a set with n elements, namely

{0, 1, 2, . . . , n − 1}.

We designate the set of natural numbers by the symbol N. (It is a matter of personal
convenience whether to include zero as a natural number or not. In this discussion,
zero is a natural number. In other settings, it may not be. There is no universal
convention regarding this and the student is expected to understand depending on
the context. Some authors use the term “whole numbers” to indicate that zero is
included in the discussion.)
The arithmetic operations on N are now defined recursively. Addition is defined
as a function from N × N to N:

+(m, n) := m + n

where m + n is defined recursively by 0 + n = n and m + n = (m + n) . A similar


definition is given for multiplication × by defining 0 × n := 0 and

m × n := (m × n) + n.

We also define m × n as simply mn which is the familiar symbology.


An equivalence relation on a set S is a subset R of S × S satisfying
1. (reflexive axiom) (a, a) ∈ R ∀a ∈ S,
2. (symmetry axiom) (a, b) ∈ R ⇐⇒ (b, a) ∈ R,
3. (transitive axiom) (a, b) ∈ R, and (b, c) ∈ R implies (a, c) ∈ R.
The notion of an equivalence relation is an abstraction of our concept of equality,
or at least what we implicitly expect of the notion of equality. It is more suggestive
to write the equivalence relation, not as a subset of S × S as indicated above, but
rather more symbolically as ∼ so that our axioms become
1. (reflexive) a ∼ a ∀a ∈ S,
2. (symmetry) a ∼ b if and only if b ∼ a,
3. (transitive) a ∼ b and b ∼ c implies a ∼ c.
Equivalence relations play a fundamental role in all of mathematics. They allow
us to understand aspects of sets by grouping them using certain properties.
To construct negative integers, we define an equivalence relation on N × N. We
write
(m, n) ∼ ( j, k) ⇐⇒ m + k = j + n.
8 1 Background

Intuitively, we think of (m, n) as m − n so that it becomes evident that our definition


is now in terms of concepts that have been defined earlier. This is very similar to
how the ancients worked with negative numbers that appeared in an equation. They
usually moved them to the other side so that the equation became an equation of
non-negative numbers. However, with our set-theoretic definition, we have reached
a more fundamental and higher level of abstraction. Thus, with our equivalence
relation above on the natural numbers, we define the set of integers as the set of
equivalence classes of such ordered pairs. It is now easy to see that the following
lemma holds:
Lemma 1.1 If ( j, k) is an ordered pair of non-negative integers, then exactly one
of the following statements holds:
(a) ( j, k) is equivalent to (m, 0) for a unique non-negative integer m;
(b) ( j, k) is equivalent to (0, m) for a unique non-negative integer m;
(c) ( j, k) is equivalent to (0, 0).
Sometimes, we denote by [( j, k)] the equivalence class of ( j, k). With this lemma
in place, we now denote by m the set of pairs of non-negative integers equivalent
to (m, 0), by −m the set of pairs equivalent to (0, m) and by 0 the set of pairs
equivalent to (0, 0). We denote these equivalence classes by Z. This gives us a set
theoretic construction of the set of integers.
We can define the operations of addition and multiplication by setting

[( j1 , k1 )] + [( j2 , k2 )] = [( j1 + j2 , k1 + k2 )],

[( j1 , k1 )] × [( j2 , k2 )] = [( j1 j2 + k1 k2 , j1 k2 + j2 k1 )].

This latter definition is best understood if we recall that the symbol ( j, k) represents
j − k so that the left hand side of the above equation is

( j1 − k1 )( j2 − k2 ) = j1 j2 + k1 k2 − ( j1 k2 + j2 k1 ).

One needs to check that these definitions are “well-defined” in the sense that they are
independent of the representatives chosen for the equivalence class. We leave that to
the student as an exercise (see exercises below).
In this way, we have now extended the notion of addition and multiplication from
the set of natural numbers to the set of integers. Subtraction of integers can be defined
by
[( j1 , k1 )] − [( j2 , k2 ) = [( j1 , k1 )] + (−1)[( j2 , k2 )],

where −1 represents the equivalence class (0, 1). All of these definitions correspond
to our usual notion of addition, subtraction and multiplication. Their virtue lies in
their purely set-theoretic formulation.
We can also order the set of integers in the usual way. Thus,

j1 + k2 < k1 + j2 ⇐⇒ [( j1 , k1 )] < [( j2 , k2 )]
1.2 Constructing Numbers from Sets 9

and
j1 + k2 ≤ k1 + j2 ⇐⇒ [( j1 , k1 )] ≤ [( j2 , k2 )].

This corresponds to our usual notion of “less than” and “less than or equal to”.
Finally, we can define the absolute value on the set of integers by setting

⎨ k 0 < k,
|k| = 0 k = 0

−k k < 0

We can now construct the rational numbers Q from the set of integers. We do
this by defining an equivalence relation on the set Z × Z>0 by stating that two pairs
( j1 , k1 ) and ( j2 , k2 ) are equivalent if and only if j1 k2 = j2 k1 . Intuitively, we think of
( j1 , k1 ) as representing the “fraction” j1 /k1 and examining what we would mean by
j1 /k1 = j2 /k2 by reducing it to notions already defined. The set of rational numbers
Q is then defined as the set of such equivalence classes.
The expected operations of addition and multiplication are now evident:

[( j1 , k1 )] + [( j2 , k2 )] = [( j1 k2 + j2 k1 , k1 k2 )]

[( j1 , k1 )][( j2 , k2 )] = [( j1 j2 , k1 k2 )].

Again, these definitions are easily verified to be well-defined. Finally, we can now
define “division”. If [( j1 , k1 )], [( j2 , k2 )] ∈ Q with j2 = 0, we define

[( j1 , k1 )]
:= [( j1 k2 , j2 k1 )].
[( j2 , k2 )]

These operations satisfy the familiar laws of associativity, commutativity and dis-
tributivity. Subtraction of rational numbers then can be written as:

[( j1 , k1 )] − [( j2 , k2 )] = [( j1 , k1 )] + [(−1, 1)][( j2 , k2 )].

The ordering of rational numbers can also be written as

[( j1 , k1 )] < [( j2 , k2 )] ⇐⇒ j1 k2 < j2 k1 .

[( j1 , k1 )] ≤ [( j2 , k2 )] ⇐⇒ j1 k2 ≤ j2 k1 .

These definitions agree with our usual notions of ordering of the rational numbers.
Finally, the definition of absolute value can be extended as:

⎨ [( j, k)] if [(0, 1)] < [( j, k)],
|[( j, k)]| = [(0, 1)] if [( j, k)] = [(0, 1)],

−[( j, k)] if [( j, k)] < [(0, 1)].
10 1 Background

Again, our familiar properties of the absolute value of rational numbers hold. With
this foundational construction in place, we can conveniently represent the equivalence
class of ( j, k) as simply the fraction j/k and continue to work with these numbers
as we were (hopefully) taught from childhood.
In the next sections, we construct the real numbers from this axiomatic framework.

Exercises

1. Let [( j1 , k1 )], [( j2 , k2 )] be two elements of Z. Show that the addition

[( j1 , k1 )] + [( j2 , k2 )] = [( j1 + j2 , k1 + k2 )]

is well-defined. That is, prove that for any ( j1 , , k1 ) ∈ [( j1 , k1 )] and ( j2 , k2 ) ∈


[( j2 , k2 )], we have that ( j1 + j2 , k1 + k2 ) is equivalent to ( j1 + j2 , k1 + k2 ).

2. For j1 , j2 , k ∈ Z, prove the distributive law:

( j1 + j2 ) · k = j1 k + j2 k.

3. Show that the relations < and ≤ on Z have the following properties:
(a) [(0, j)] < [(0, 0)] for all j ∈ Z>0 ;
(b) [(0, j)] < [(k, 0)] for all j, k ∈ Z>0 ;
(c) [(0, j)] < [(0, k)], j, k ∈ Z>0 if and only if k < j;
(d) [(0, 0)] < [( j, 0)] for all j ∈ Z>0 ;
(e) [( j, 0)] < [(k, 0)], j, k ∈ Z≥0 if and only if j < k;
(f) [(0, j)] ≤ [(0, 0)] for all j ∈ Z≥0 ;
(g) [(0, j)] ≤ [(k, 0)] for all j, k ∈ Z≥0 ;
(h) [(0, j) ≤ [(0, k)] for j, k ∈ Z≥0 if and only if k ≤ j;
(i) [(0, 0)] ≤ [( j, 0)] for all j ∈ Z≥0 ;
(j) [( j, 0)] ≤ [(k, 0)] j, k ∈ Z≥0 if and only if j ≤ k.

1.3 Set-Theoretic Construction of the Real Numbers

The rational number system is insufficient to “measure” all the lengths that arise in the
“real world.” This was discovered by the ancient school of Pythagoras which viewed
the world through a strange mix of mathematics and mysticism. The aphorism “all
is number” seems to have been the underlying mantra of the Pythagoreans. In our
modern digital world, this mantra imposes its universality more than in any earlier
age.
1.3 Set-Theoretic Construction of the Real Numbers 11

Pythagoras seems to have been a contemporary of the Buddha in India, of Con-


fucius and Lao-Tzu in China. He seems to have traveled widely in Egypt, Babylon
and India. The Pythagoreans believed in reincarnation, and this was wedded to their
strict adherence to vegetarianism for by eating meat they might be eating a friend!
They are credited to the discovery of both the words: philosophy and mathematics.
The word “philosophy” literally means “the love of wisdom” and the word “mathe-
matics” means “that which is learned.” Philosophy then, for the Pythagoreans, was
seen as a byproduct of mathematics.
The celebrated theorem of Pythagoras dealing with the length of the hypotenuse
of a right-angled triangle was known to earlier cultures stretching far back in time
to Babylon, China and India. It is possible that Pythagoras learned of this theorem
during his extensive peregrinations. But the Pythagoreans could see the mysticism
inherent in the theorem, for the rule applied to all right angled triangles and there
was an element of universality in it and in mathematics in general.
But when the Pythagoreans declared “all is number” they were referring to whole
numbers and it is accurate to say that the concept of number had not been clearly
articulated at that time. With the underlying idea that whole numbers govern the
universe, their theory of ratio and proportion√was in for a rude shock. Legend has
it that the Pythagorean who discovered that 2 is an irrational number (that is, a
number which is not rational)
√ was drowned!
The irrationality of 2 represents a major episode in the history of mathematics,
and we owe this to the unfortunate Pythagorean who lost his life by discovering
it. The proof is by contradiction and invokes the rudimentary arithmetical idea of
a prime number, a concept first enunciated in Euclid’s Elements. A natural number
is called a prime number if it has no proper divisors. Thus, the sequence of prime
numbers begins as 2, 3, 5, 7, 11, . . . (The number 1 is not included in the list of
prime numbers. It is called a unit, a concept which generalizes to rings.) Two natural
numbers a and b are said to be coprime if they have no common prime factor. When
we write a fraction a/b in “lowest terms”, we are writing it such that a and b are
coprime. We sometimes write (a, b) = 1 to indicate that a and b are coprime. This
is not to be confused with√ the concept of an ordered pair introduced earlier.
The irrationality of 2 is established√ by using the method of contradiction. It
is instructive to see all the details. If 2 is a rational number, then we can write
√ a
2= , (a, b) = 1.
b
Upon squaring both sides, we see

2b2 = a 2

which implies that a is even since the left hand side of the equation is patently an
even number. Writing a = 2c, we now get

b2 = 2c2
12 1 Background

and so b is even. But this contradicts the coprimality of a and b.


This was the theorem that led to the demise of the heroic Pythagorean for it
violated the aphorism “all is number” because the ancients viewed numbers as only
whole numbers. They also allowed for rational numbers since these can be viewed as
representing ratio and proportion of whole quantities. This innocuous theorem opens
the door for the discovery of real numbers. For it signals the lack of the “least upper
bound property” in the realm of the rational numbers.
The student should keep in mind that so far, we have only constructed the universe
of rational numbers and thus any further definition must be given only in terms of
rational numbers. A set A of rational numbers is said to be bounded if there is a
rational number M such that |x| ≤ M for all x ∈ A. An upper bound for A is any
rational number u such that x ≤ u for all x ∈ A. A lower bound for A is a rational
number  such that  ≤ x for all x ∈ A. A rational number v is called a least upper
bound for A if v ≤ u for every upper bound u of A. A rational number m is called
a greatest lower bound for A if  ≤ m for every lower bound  of A. Thus, a set A
is bounded if and only if it has both an upper bound and a lower bound.
Several questions now arise. Given a set A of rational numbers, does it always
have a least upper bound? Does it have a greatest lower bound? If so, are they unique?
The theorem discovered by the Pythagoreans can be explained as follows. Nascent
in this example are many of the fundamental properties we would expect of the real
numbers. Essential to the understanding is the familiar fact that the sequence of
rational numbers 1/n gets smaller and smaller and tends to zero.
If we consider the set
A = {x ∈ Q : x 2 ≤ 2}

then A has no least upper bound. Indeed, if u is a least upper bound,√x ≤ u for √
any
x ∈ A. By definition, u is a rational number and so cannot first equal 2. If u < 2,
then we can choose a natural number N such that
1 √
< 2 − u,
N
so that
1 √
u+ < 2
N

contradicting that u is an upper bound for the set A. If u > 2, then we can similarly
find N such that
1 √
u− > 2
N
which again contradicts that u is the least upper bound for A. Thus, A does not have
a least upper bound in the realm of rational numbers.
Two ways of constructing the real numbers were proposed independently by
Richard Dedekind (1831–1916) and Augustin Cauchy (1789–1857). Each method
has its virtues. The method of Dedekind, using what are called Dedekind cuts is
1.3 Set-Theoretic Construction of the Real Numbers 13

closer to the axiomatic


√ foundations we have been discussing and is suggested by the
example discussing 2 above. The method of Cauchy using what are now called
Cauchy sequences has a wider applicability. In our example above, an essential role
is played by the ordering of the rational numbers. The method of Dedekind cuts
uses this idea in a fundamental way. It coincides with our intuitive and visual idea of
the real numbers as being “points on the number line stretching from minus infinity
to plus infinity.”
A Dedekind cut in Q is a partition of Q into two sets A and B with the following
properties:
1. A and B are both non-empty;
2. A ∪ B = Q;
3. A is “closed downwards”, that is, if r ∈ A and q < r , then q ∈ A;
4. B is “closed upwards”, that is, if q ∈ B and q < r , then r ∈ B;
5. A contains no greatest element, that is, there is no q ∈ A such that r ≤ q for all
r ∈ A.
The set R of real numbers is then defined
√ as the set of all Dedekind cuts.
For example, the irrational number 2 is defined by the Dedekind cut

Q = A ∪ B, A = {x ∈ Q : x 2 < 2} B = Q\A.

Rational numbers are also defined by Dedekind cuts. Indeed, the rational number q
is defined by the cut

A = {x ∈ Q : x < q}, B = Q\A.

For example, the number zero is represented by A ∪ B with A being the set of all
nonzero negative rational numbers and B being the set of all nonzero positive rational
numbers. As the Dedekind cut A ∪ B is uniquely defined by A, with B being the
complement of A in the set of rationals, we may as well associate A to each real
number, or even think of A as the real number. One would then have to define how
one would work with this definition from the perspective of addition, multiplication
and so on. This is not difficult to do. For instance, we have

A1 < A2 ⇐⇒ A1 ⊂ A2 , A1 ≤ A2 ⇐⇒ A1 ⊆ A2 .

Addition is simple enough:

A1 + A2 = {a1 + a2 : a1 ∈ A1 , a2 ∈ A2 }.

Multiplication is defined (rather cumbersomely) by setting A1 · A2 to be


14 1 Background

{a1 · a2 : a1 ∈ A1 , a2 ∈ A2 , a1 , a2 ≥ 0} ∪ {q ∈ Q : q ≤ 0} if A1 , A2 ≥ 0;
−(A1 · (−A2 )), if A1 ≥ 0, A2 < 0;
−((−A1 ) · A2 ), if A1 < 0, A2 ≥ 0;
(−A1 ) · (−A2 ), if A1 , A2 < 0.

The absolute value function is then



⎨ A if A > 0
|A| = 0 if A = 0

−A if A < 0.

One can show that all the usual operations with numbers satisfy the expected prop-
erties of commutativity, associativity and distributivity.
By contrast, the method of Cauchy sequences begins with√a definition of con-
vergence. In some ways, it √ is motivated by our discussion of 2 earlier.
√ If A is the
Dedekind√cut representing 2, then the least upper bound of A “is” 2. Thus, the
number 2 can be thought of as a limit of a sequence of rational numbers. But
there can be many such sequences and so one needs a more formal approach. The
underlying observation is that a limit of a sequence of rational numbers need not be
rational.
This is perhaps best illustrated by Euler’s series for e. Students of calculus are
familiar with
 ∞
1
e=
j=0
j!

and the right hand side can be thought of as a limit of a sequence of rational numbers

n
1
Sn = .
j=0
j!

The irrationality of e is easily proved by the method of contradiction. Suppose that


e = a/b for two coprime natural numbers a and b. Then


b ∞
b! b!
b!e = + (1.1)
j=0
j! j=b+1
j!

and the second term on the right hand side is strictly less than the geometric series

 1 1
=
k=1
(b + 1)k b
1.3 Set-Theoretic Construction of the Real Numbers 15

which is less than or equal to 1. But the left hand side of (1.1) is a positive integer,
the first term on the right hand side of (1.1) is a positive integer and the second term
is not an integer, which is a contradiction.
What this example shows is that the “limit” of the sequence of partial sums Sn
(each of which is a rational number) is the irrational number e.
The reader is familiar with the usual notion of convergence. We say a sequence
of real numbers xn converges to L if given  > 0 there exists N such that

|xn − L| <  ∀ n ≥ N.

One may want to view real numbers as “limits of sequences of rational numbers.” But
in our formal construction of numbers, several difficulties arise with this definition.
The first is that as we have so far only constructed the rational numbers, we need
to take the xn ’s to be all rational numbers. Secondly, the limit L need not be a
rational number as the example with e shows. This motivates the formal definition
of a Cauchy sequence of rational numbers.
A sequence qn in Q is called a Cauchy sequence if given  > 0, there is an N
such that
|qn − qm | <  ∀ m, n ≥ N .

Now it is easy to see that any convergent sequence is a Cauchy sequence. For future
reference, we record this below.
Theorem 1.2 Any convergent sequence qn is a Cauchy sequence.
Proof Suppose that qn converges to L. Then, choosing N such that

|qn − L| < ∀ n ≥ N,
2
we have for all m, n ≥ N , by the triangle inequality

|qm − qn | ≤ |qm − L| + |L − qn | < .


Later, we will show that every Cauchy sequence converges. Thus, a sequence is
Cauchy if and only if it is convergent and the two notions are equivalent. But the
advantage of the notion of a Cauchy sequence is that the limit value L is not mentioned
in the definition.
The construction of the real numbers is now brought about by defining an equiva-
lence relation on the set of all Cauchy sequences of rational numbers. Since we have
constructed rational numbers, we are now allowed to construct the set of all Cauchy
sequences of rational numbers. On this set, we say two Cauchy sequences {qn } and
{rn } are equivalent if for any  > 0, there exists a natural number N such that

|qn − rn | <  ∀ n ≥ N.
16 1 Background

This is easily seen to define an equivalence relation on the set of all Cauchy sequences
of rational numbers. The set of real numbers R is then defined as the set of all
equivalence classes. We will denote the equivalence class of the sequence {qn } by
[{qn }].
We leave as an exercise (see Exercise 4 below) to verify that if {qn } and {rn } are
two Cauchy sequences, then so are {qn + rn } and {qn · rn }. Thus, the usual operations
of addition and multiplication are easily defined:

[{qn }] + [{rn }] := [{qn + rn }], [{qn }] · [{rn }] := [{qn · rn }].

One needs to verify that these are well-defined by checking that the definition does
not depend on the choice of the representative of the equivalence class.
The order relation of the real numbers can also be defined. Thus, viewing real
numbers as equivalence classes or Cauchy sequences of rational numbers, we define
[{qn }] < [{rn }] if there is an N such that qn < rn for all n ≥ N . We also write [{qn }] ≤
[{rn }] if either [{qn }] < [{rn }] or [{qn }] = [{rn }]. Again, these order relations have
the expected properties.
Finally, the absolute value function can also be defined. We set

|[{qn }]| := [{|qn |}]

and the reader can easily verify that (indeed) the right hand side is a Cauchy sequence
if {qn } is Cauchy.
The absolute value has the usual properties, the most notable being the triangle
inequality:
|[{qn }] + [{rn }]| ≤ |[{qn }]| + |[{rn }]|.

Having given this formal definition of the real numbers using Cauchy sequences, it
is convenient to drop the sequence notation and just represent the real numbers as
single letters. Thus, if a, b ∈ R, the triangle inequality is the familiar

|a + b| ≤ |a| + |b|.

It is now convenient to visualize the set of real numbers in the usual way as points
of the line stretching from −∞ to +∞ and introduce the following notation. If a and
b are real numbers such that a ≤ b, then the interval [a, b] indicates the set of real
numbers {x : a ≤ x ≤ b}. The half-open interval (a, b] represents {x : a < x ≤ b}.
Similarly [a, b) means {x : a ≤ x < b} and the open interval (a, b) is {x : a < x <
b}.
By its very construction, the set of real numbers has the property of complete-
ness in that every Cauchy sequence converges. The reader will also observe that the
construction depended only on the property of the absolute value viewed as a metric
to measure the distance between two rational numbers. This signals a wider appli-
cability of the method of Cauchy sequences valid more generally in metric spaces
which we discuss later.
1.3 Set-Theoretic Construction of the Real Numbers 17

With the construction of the real numbers, we have restored the least upper bound
property and can now define that a sequence of numbers (rational or real) xn con-
verges to a number L ∈ R if given any  > 0, there is an N such that

|xn − L| < , ∀ n ≥ N.

A non-empty subset A of R is said to be bounded below if there is a number


s ∈ R such that s ≤ a for all a ∈ A. Any such number s is called a lower bound for
A. Analogously, A is said to be bounded above if there is a number t ∈ R such that
a ≤ t for all a ∈ A. Any such number t is called an upper bound for A. The set A is
said to be bounded if it is both bounded below and bounded above and unbounded
if it is not bounded.
The construction of the real numbers using Dedekind cuts shows that every non-
empty subset of R that is bounded above has a least upper bound denoted sup A.
The least upper bound is also called the supremum. Every non-empty subset which
is bounded below has a greatest lower bound denoted inf A. The greatest lower
bound is also called the infimum. It is clear that if A and B are non-empty subsets
of R such that A ⊆ B, then sup A ≤ sup B (see Exercise 5). The following theorem
gives a convenient characterization of the supremum.
Theorem 1.3 Let A be a non-empty subset of R that is bounded above. Then s =
sup A if and only if
(i) a ≤ s for all a ∈ A and
(ii) for any  > 0, A ∩ (s − , s] = ∅.

Proof Let s = sup A. Then s is an upper bound for A so that (i) holds. Now if (ii)
were false, there would be an  > 0 such that A ∩ (s − , s] = ∅. But then s − 
would also be an upper bound for A, a contradiction. Conversely, suppose that (i)
and (ii) hold. Then (i) implies s is an upper bound for A. If s were not the least
upper bound for A, there would be a t < s such that a ≤ t for all a ∈ A. Taking
 = (s − t)/2 > 0 in (ii) we get that A ∩ (s − , s] = ∅. But s −  = t +  so that
A ∩ (t + , s] = ∅ which is a contradiction since a ≤ t for all a ∈ A. 

Theorem 1.3 can also be stated as follows. If A is a non-empty subset of R that is


bounded above and s = sup A, then either s ∈ A or A ∩ (s − , s) = ∅ for all  > 0.
A similar theorem can be written for the infimum of a set (see Exercise 6.)

Exercises

1. If p is a prime number, show that p is irrational.
18 1 Background

2. Show that
∞
1
< 1.
n=2
n!

3. Show that if en = ±1, then the number



 en
n=1
n!

is an irrational number.

4. Show that if {qn } and {rn } are two Cauchy sequences, then so are {qn + rn } and
{qn · rn }.

5. If A and B are non-empty subsets of R such that A ⊆ B, then show that


sup A ≤ sup B.

6. Let A be a non-empty subset of R that is bounded below. Then s = inf A if and


only if (i) s ≤ a for all a ∈ A and (ii) for any  > 0, A ∩ [s, s + ) = ∅.

1.4 Sequences of Real Numbers

A sequence xn of real numbers is said to be bounded if there exists an M > 0 such


that |xn | ≤ M for all n. If a sequence of numbers is not convergent, it is said to be
divergent. If the sequence is not bounded above, it is said to diverge to ∞. If the
sequence is not bounded below, it is said to diverge to −∞.
Thus, every convergent sequence of real numbers is bounded. A sequence xn is
said to be increasing if xm ≤ xn whenever m < n. It is said to be strictly increasing
if xm < xn whenever m < n. A similar definition applies for decreasing and strictly
decreasing sequences. A sequence is said to be monotonic if it is either increasing
or decreasing. We use the terms monotonic increasing and monotonic decreasing
according as the monotonic sequence is increasing or decreasing.
Theorem 1.4 Every bounded monotonic sequence is convergent.

Proof Suppose first that an is a bounded monotonic increasing sequence. Then L =


sup{an : n ≥ 1} is finite. By Theorem 1.3, we see that given any  > 0 there is an m
such that L −  < am ≤ L. But as our sequence is increasing, am ≤ an ≤ L for all
n ≥ m. Thus
|L − an | < , ∀ n ≥ m.

In other words, the sequence converges to L. The case when the sequence is mono-
tonic decreasing is similar. 
1.4 Sequences of Real Numbers 19

Given a sequence of numbers xn , a subsequence is a sequence of the form

xnk , k = 1, 2, . . .

with n k being a sequence of strictly increasing natural numbers. The reader can easily
check that if xn is a convergent sequence with limit L then any subsequence is also
convergent with the same limit L. (See Exercise 1 below.)
The following theorem is fundamental in the theory. It was discovered indepen-
dently by Bernard Bolzano (1781–1848) and Karl Weierstrass (1815–1897). Bolzano
studied mathematics, physics, philosophy and theology at the University of Prague
and became a Catholic priest in 1804. He then taught at the University of Prague
where he was chair of philosophy of religion and later was the dean of the philoso-
phy department. However, his pacifist views and strong opposition to militarism led
to his alienation from both academics and church leaders. He was dismissed from
the university in 1819 and exiled to the countryside when he refused to retract his
views. Much of his mathematical and philosophical work was recognized only after
his death. Bolzano is now remembered for the introduction of rigor in mathematics,
especially with regard to the  − δ definition of continuity, the greatest lower bound
property of the real numbers and the intermediate value theorem.
Weierstrass trained to be a high-school teacher of mathematics obtaining his teach-
ing certificate in 1841 at the age of twenty-six. For more than a dozen years, he taught
at various high schools until 1854, when his paper on abelian functions which he
published in Crelle’s Journal had brought him instant recognition from mathemati-
cians at the University of Berlin that he was offered a professorship there when he
was nearly forty years old. Until his retirement in 1890, he developed the theory of
infinite series and infused mathematical rigor into analysis. Weierstrass discovered
the theorem below independently and it was only much later that mathematical his-
torians recognized that Bolzano had foreseen it and proved it much earlier. Today, we
honor both mathematicians by referring to it as the Bolzano–Weierstrass theorem.

Theorem 1.5 (Bolzano-Weierstrass) Every bounded sequence has a convergent sub-


sequence.

Proof Since our sequence xn is bounded, let m and M be such that m ≤ xn ≤ M


for all n. We bisect interval I0 = [m, M] using the midpoint c = (m + M)/2 and
consider the two intervals [m, c] and [c, M]. For at least one of these intervals (call
it I1 ), there are infinitely many n such that xn lies in that interval I1 . Choose n 1 such
that xn 1 ∈ I1 . Writing I1 = [m 1 , M1 ] we iterate the procedure to successively obtain a
sequence of nested intervals Ik with the length of Ik being (M − m)/2k and xn k ∈ Ik .
The sequence m k is monotonic increasing and by Theorem 1.4 converges to a limit
L (say). Since (Mk − m k ) = (M − m)/2k , we see that the monotonic decreasing
sequence Mk also converges to L. Thus the subsequence xn k converges to L. 

This theorem allows us to define the important notions of lim sup and lim inf of
any sequence of real numbers. Indeed, given a sequence of numbers xn , if it is not
bounded above, we write
20 1 Background

lim sup xn = +∞.


n→∞

If it is not bounded below, we write

lim inf xn = −∞.


n→∞

If the sequence is bounded, let T be the set of all convergent subsequences. By


Theorem 1.5, this is a non-empty set so we define:

lim sup xn = sup T, lim inf xn = inf T.


n→∞ n→∞

There are several characterizations of lim sup and lim inf that are convenient in many
applications. We record these below.
Theorem 1.6 Let L be a real number and an a sequence of real numbers. Then,

L = lim sup an
n→∞

if and only if
(a) for each  > 0, there is a positive integer N such that an < L +  for all n ≥ N ;
(b) for each  > 0, the inequality L −  < an holds for infinitely many n,
Proof Suppose first that lim supn→∞ an = L. As L is a real number, the sequence is
bounded above by M (say). From the definition of lim sup, we also see that L is the
limit of a convergent subsequence and that it is the largest of such limits. If (a) did
not hold, then for some  > 0, we have an > L +  for infinitely many n so that by
the Bolzano–Weierstrass theorem there would be a subsequence converging to a limit
point in [L + , M] contradicting the definition of L. Therefore (a) holds. If (b) did
not hold, then for some  > 0, we have L −  < an holds for only finitely many n.
Thus, there is an N such that L −  ≥ an for all n ≥ N . But then lim supn→∞ an ≤
L −  contradicting our definition of lim sup. 
In an analogous fashion, the reader can show (see Exercise 2):
Theorem 1.7 Let L be a real number and an a sequence of real numbers. Then,

L = lim inf an
n→∞

if and only if
(a) for each  > 0, there is a positive integer N such that an > L −  for all n ≥ N ;
(b) for each  > 0, the inequality L +  > an holds for infinitely many n.
We can now return to a general discussion of Cauchy sequences of real numbers. A
sequence xn of real numbers is called a Cauchy sequence if for every  > 0, there
is an N such that
1.4 Sequences of Real Numbers 21

|xn − xm | <  ∀ m, n ≥ N .

We are now in a position to prove the important theorem:


Theorem 1.8 A sequence xn of real numbers is convergent if and only if it is a
Cauchy sequence.

Proof The fact that a convergent sequence is a Cauchy sequence was proved earlier
(see Theorem 1.2). Now suppose our sequence is Cauchy. We want to show that it
converges. By taking  = 1 in the definition of a Cauchy sequence, we see that xn is
bounded. By the Bolzano–Weierstrass theorem, there is a convergent subsequence
xn k . Putting
L = lim xn k
k→∞

we see that given  > 0, there is a K such that



|xn k − L| < ∀ k ≥ K.
2
As our sequence is Cauchy, there is an N such that

|xn − xm | < ∀ n, m ≥ N .
2
As n k is a strictly increasing sequence of natural numbers, we choose k such that
n k ≥ N and k ≥ K . Now, by the triangle inequality,

|xn − L| ≤ |xn − xn k | + |xn k − L| < 

provided n ≥ N . In other words, xn converges to L. This completes the proof. 

The precise understanding of the concept of a limit allows us to give a formal


definition of continuity. A function f : [a, b] → R is said to be continuous at
c ∈ [a, b] if given any  > 0 there is a δ > 0 such that

| f (x) − f (c)| <  whenever |x − c| < δ.

Intuitively, f (x) is close to f (c) whenever x is sufficiently close to c. A function


is said to be continuous on [a, b] if it is continuous at every point of the interval
[a, b]. At the point a we speak of right continuity and at b we speak of left con-
tinuity because technically, we can only take our limit either from the right or left
respectively.
It is convenient to introduce the notation Br (c) to be the “open ball” or the open
interval of radius r :
{x ∈ R : |x − c| < r }.
22 1 Background

With this notation, we can reformulate the notion of continuity as follows. A function
f : [a, b] → R is continuous at c if and only if for every  > 0, the inverse image
under f of any open ball of radius  contains an open ball Bδ (c) for some δ > 0.
This reformulation allows us to generalize the notion of continuity in a wider context,
namely to the setting of metric spaces and later to topological spaces.

Exercises

1. If xn is a convergent sequence with limit L, then any subsequence is also con-


vergent with the same limit L.

2. Prove Theorem 1.7.

3. For any bounded sequence xn , show that the sequences defined as

Mn = sup {xk : k ≥ n}, m n = inf {xk : k ≥ n}

are monotonic sequences and hence convergent.


4. For any bounded sequence xn show that

lim sup xn = lim sup {xk : k ≥ n}


n→∞ n→∞

and
lim inf xn = lim inf {xk : k ≥ n}.
n→∞ n→∞

5. Show that the sequence

1 1 1 1
an := 1 − + − + · · · + (−1)n−1
2 3 4 n
is Cauchy and hence converges. (We will see in the next section that it converges
to log 2.)
√ √
6. Let a1 = 2 and define recursively an+1 = 2 + an .
(a) Show by induction that an < 2 for all n ≥ 1.

(b) Prove that an is a strictly increasing sequence.

(c) Show that limn→∞ an = 2.


1.4 Sequences of Real Numbers 23

7. Prove that lim supn→∞ sin n = 1 and lim inf n→∞ sin n = −1. [Hint: You may
use the fact that π is irrational.]

1.5 Infinite Series

Given a sequence ak of real numbers, the formal expression




ak
k=1

is called an infinite series. The nth partial sum is defined as


n
sn := ak .
k=1

If the limit
lim sn
n→∞

exists and equals L, we say the series converges to L. Otherwise, we say the series
diverges.
The geometric series
∞
rk
k=0

plays an important role in mathematics. Its partial sums are easily seen to be (see
Exercise 1)
1 − r n+1
sn = 1 + r + r 2 + · · · + r n = ,
1−r

provided r = 1. It is clear that if |r | < 1, the geometric series converges to

1
,
1−r

and diverges otherwise.


The reader is undoubtedly familiar with some standard tests for the convergence
of an infinite series. We review these here. The comparison test is by far the most
useful:
Theorem 1.9  (Comparison test) Let a k and bk be sequences of real numbers. If
|ak | ≤ bk and ∞k=1 bk converges, then ∞
k=1 ak converges.

Proof We show that the sequence of partial sums


24 1 Background


n
sn = ak ,
k=1


is a Cauchy sequence and hence converges. Indeed, as ∞ k=1 bk converges, the
sequence of its partial sums is Cauchy so that given  > 0, there is a positive integer
N such that  n 
 
 
 bk  < , ∀ n ≥ m ≥ N.
 
k=m

Hence  n 
 
 
|sn − sm | ≤  bk  <  ∀ n ≥ m ≥ N.
 
k=m

Theorem 1.10 (Ratio test) Suppose that ak is a sequence of nonzero real numbers
and  
 ak+1 
lim sup   < 1.
k→∞ ak 
∞
Then the series k=1 ak converges.

Proof Let L be  
 ak+1 
lim sup  .
k→∞ ak 

Let r be any real number such that L < r < 1. Then, by the definition of limsup (see
Theorem 1.6), there exists an N such that
 
 ak+1 
 
 a  < r, ∀ k ≥ N.
k

An easy induction now shows that

|a N + j | ≤ |a N |r j , j = 1, 2, . . .

By the comparison test, our series converges. 

In the above theorem, if L = 1, the result is inconclusive. If


 
 ak+1 
lim   > 1,
k→∞  ak 

and the sequence ak is strictly positive for sufficiently large k, the student can verify
that the series diverges.
1.5 Infinite Series 25

Theorem 1.11 (Root test) If ak is a sequence of positive real numbers, and


1/k
L := lim sup ak < 1,
k→∞

∞
then the series k=1 ak converges. If L > 1, the series diverges. If L = 1, the test is
inconclusive.

Proof If L < 1, then as before let r be such that L < r < 1. By the property of
limsup (see Theorem 1.6) we see ak < r k for all k sufficiently large. Hence, our
series converges by the comparison test. If L > 1, then again by Theorem 1.6, we
1/k
have ak > 1 for infinitely many k. Thus, ak > 1 for infinitely many k and our series
diverges. This completes the proof. 

Theorem 1.12 (Integral test) Suppose that ak is a sequence of non-negative real


numbers and that f is a continuous decreasing
 function on [1, ∞) such that f (k) =
ak for all k ≥ 1. Then, the infinite series ∞
k=1 k converges if and only if the integral
a

f (x)d x
1

converges.

Proof Since f is decreasing, it is clear by examining the graph of f (x) that


k+1
ak+1 = f (k + 1) ≤ f (x)d x ≤ f (k) = ak
k

for all k ≥ 1. Summing this inequality from k = 1 to infinity gives the result. 

Thus, for example, we immediately deduce that

∞
1
n s
n=1

converges if and only if s > 1.


It is convenient to introduce the “big O” notation. Given two sequences of numbers
an and bn , with the bn ’s non-negative, we write an = O(bn ) if there is a constant C
that |an | ≤ Cbn for all n ≥ 1. A similar notation is applicable for functions. Given
two functions f (x) and g(x) defined on an interval [a, b] and with g(x) non-negative,
we write f (x) = O(g(x)) if there is a constant C such that | f (x)| ≤ Cg(x) for all
x ∈ [a, b]. The interval need not be finite. Thus, our proof of the integral test allows
us to deduce that for s > 1, (see Exercise 4 below)
 1
s
= O(N 1−s ).
n>N
n
26 1 Background

The “little o” notation is also convenient. We write an = o(bn ) if an /bn → 0


as n tends to infinity. Similarly, we write f (x) = o(g(x)) as x → c to mean that
f (x)/g(x) = 0 as x → c. Depending on the context, the appearance of c may be
suppressed if it is clear what it is. In practice, it is often as x tends to infinity or x
tends to zero. If it is not evident, one can of course spell it out for clarity.
The following example illustrates that the idea is capable of further refinement.

Example 1.1 Let


k+1
1 dx
ak = − .
k k x

Since 1/x is a decreasing function of x, we see that ak > 0. On the other hand,
k+1
1 1 k+1
x −k
ak = − dx = d x.
k k x k kx

The numerator in the integrand is at most 1 in the interval [k, k + 1] and so the
integrand is at most 1/k 2 . Consequently, ak ≤ 1/k 2 . Therefore, by the comparison
test, the series
∞ k+1
1 dx

k=1
k k x

converges to a finite limit γ , often called Euler’s constant. In fact, our estimate on
ak shows that
 n
ak = γ + O(1/n).
k=1

In other words,
1
= log(n + 1) + γ + O(1/n),
k≤n
k

giving us a refinement on the familiar divergence of the harmonic series.

These tests of convergence that we have so far adumbrated are not sufficient to
deal with alternating series such as

 (−1)k−1
.
k=1
k

The following theorem discusses this situation.


Theorem 1.13 (Alternating series test) If ak is a decreasing sequence of positive
real numbers tending to zero, then the series
1.5 Infinite Series 27



(−1)k−1 ak
k=1

converges.
Proof Let us consider the partial sums


n
sn = (−1)k−1 ak .
k=1

Then
s2n = (a1 − a2 ) + · · · + (a2n−1 − a2n )

is easily seen to be positive and greater than s2n−2 so that s2n is increasing. Moreover,

s2n = a1 − (a2 − a3 ) − · · · − (a2n−2 − a2n−1 ) − a2n ≤ a1 ,

so that our sequence is bounded. By Theorem 1.4, this sequence converges to a finite
limit. A similar analysis applies to the partials sums of odd index:

s2n+1 = a1 − (a2 − a3 ) + · · · + (a2n − a2n+1 ).

We can see that s2n+1 is a decreasing sequence which is bounded below by a1 − a2


since we can write

s2n+1 = a1 − a2 + (a3 − a4 ) + · · · + (a2n−1 − a2n ) + a2n+1 .

Again by Theorem 1.4, the sequence converges. But now, as

s2n+1 = s2n + a2n+1 ,

and a2n+1 tends to zero, we deduce the series converges. 


Our theorem now implies that the alternating series

 (−1)k−1
k=1
k

converges. The partial sums of this series were shown to be Cauchy in an exercise in
Sect. 1.4. However, the special nature of the series can be used to explicitly evaluate
it as the following example shows.
Example 1.2 We observe that
1
1
= x k−1 d x,
k 0
28 1 Background

so that

n
(−1)k−1 
n 1
= (−1)k−1 x k−1 d x.
k=1
k k=1 0

Interchanging the integral and sum on the right-hand side and noting the geometric
series
n
1 − (−x)n
(−1)k−1 x k−1 = ,
k=1
1+x

we find


n
(−1)k−1 1
1 − (−x)n 1
dx 1
(−x)n
= dx = − d x.
k=1
k 0 1+x 0 1+x 0 1+x

The first integral is log 2, and the second integral is bounded by


1
1
xndx = ,
0 n+1

so that
1 1 1
1− + − + · · · = log 2.
2 3 4

Exercises

1. If r = 1, show that

1 − r n+1
1 + r + r2 + · · · + rn = .
1−r

∞
2. If the series k=1 ak converges, show that limk→∞ ak = 0.
∞ ∞
3. If k=1 |ak | converges, show that k=1 ak converges.

4. For s > 1, show that


 1
s
= O(N 1−s ).
n>N
n

5. Show that log(n + 1) = log n + O(1/n).

6. Show that the series


1.5 Infinite Series 29


 1
k=2
k(log k)s

converges if and only if s > 1.

7. Show that1
π 1 1 1
= 1 − + − + ··· .
4 3 5 7
1
[Hint: Follow the template of Example 1.2 by noting that 1/(2k + 1) = 0 x 2k d x.]

1.6 Sequences of Functions

Let I be an interval in R and suppose that we have a sequence of functions f n (x)


(sometimes written simply as f n ) defined on I . We say that the sequence converges
pointwise if
lim f n (x)
n→∞

exists for each x ∈ I . The series




f n (x)
n=1

is said to converge pointwise on I if the sequence of partial sums converges pointwise


for each point x in I . Thus, the sequence of functions f n converges pointwise to f
if for every  > 0, and every x ∈ I , there is an N such that

| f n (x) − f (x)| <  ∀ n ≥ N.

The reader should understand that in the definition of pointwise convergence, N


would depend on both  and x. For example, observe that

1 x =1
lim x n =
n→∞ 0 0 ≤ x < 1.

So if f n (x) = x n for x ∈ [0, 1] and if

1 This seems to have been first discovered by Madhava in fourteenth-century India. He is also
credited to have discovered the familiar Taylor series for sine and cosine functions more than 250
years before the advent of Sir Isaac Newton. In fact, the Kerala school of mathematics led by
Madhava is now credited to have discovered much of what we would now describe as precalculus.
See p. 224 of [2]. The formula in this exercise was rediscovered several centuries later by G. Leibniz.
We therefore refer to the series as the Madhava–Leibniz series.
30 1 Background

1 x =1
f (x) =
0 0 ≤ x < 1,

then limn→∞ f n (x) = f (x) pointwise. Note that even though each f n (x) is a con-
tinuous function, the limit function is not continuous. In many applications, this
phenomenon is not useful. Thus, pointwise convergence is too weak a notion of
convergence for a sequence of functions. This motivates the definition of uniform
convergence. As before, let I be an interval and f n a sequence of functions defined
on I . We say the sequence converges uniformly to f on I if for every  > 0, there
is an N such that

| f n (x) − f (x)| <  ∀ n ≥ N , and ∀ x ∈ I.

The important thing to note here is that N does not depend on x and applies for all
x ∈ I , hence the term uniform is applicable to emphasize this property.
In a similar vein, we say
∞
f n (x)
n=1

converges uniformly to f (x) if the sequence of partial sums converges uniformly to


f . This property is sufficiently strong to prevent the pathology observed earlier as
the following theorems elucidate.
Theorem 1.14 Suppose that f n is a sequence of continuous functions defined on an
interval I . If there is a function f on I such that limn→∞ f n = f uniformly on I ,
then f is continuous.
Proof Let x0 ∈ I . We want to show that f is continuous on I . That is, given  > 0,
we want to find δ > 0 such that

| f (x) − f (x0 )| <  if |x − x0 | < δ.

Noting that by the triangle inequality

| f (x) − f (x0 )| ≤ | f (x) − f n (x)| + | f n (x) − f n (x0 )| + | f n (x0 ) − f (x0 )|

for all x ∈ I and all n, we deduce by the uniform convergence property, that there is
an N such that

| f n (x) − f (x)| <  ∀ n ≥ N and ∀x ∈ I.

With this choice of N , we have

| f (x) − f (x0 )| ≤  + | f N (x) − f N (x0 )| + 

for all x in I . Now, as f N (x) is continuous on I , there is a δ > 0 such that


1.6 Sequences of Functions 31

| f N (x) − f N (x0 )| <  when |x − x0 | < δ.

With this choice of δ we deduce the desired result. 


Theorem 1.15 Suppose that f n is a sequence of continuous functions defined on an
interval I = [a, b]. If there is a function f on I such that limn→∞ f n = f uniformly
on I , then
b b
lim f n (x)d x = f (x)d x.
n→∞ a a

Proof By Theorem 1.14, the limit function f is continuous and so all the integrals
in the statement of the theorem exist. Now, given  > 0, we want to show that
 
 b b 
 f n (x)d x − f (x)d x  <  ∀ n ≥ N.

a a

Since f n converges to f uniformly on I , there is an N such that

| f n (x) − f (x)| <  ∀ n ≥ N ∀ x ∈ I.

Thus, for all n ≥ N , we have


 
 b b  b
 f n (x)d x − f (x)d x  ≤ | f n (x) − f (x)|d x ≤ (b − a)

a a a

for all n ≥ N . This completes the proof. 


For the next theorem, we introduce a new class of functions. We write C 1 [a, b]
to denote the space of continuously differentiable functions defined on [a, b]. It is
understood that at the end points, a and b, we require only the right derivative and
the left derivative respectively. For functions in this space, we sometimes say f is
C 1 (the domain being understood). More generally, we say f is C r if it is r -times
continuously differentiable.
Theorem 1.16 Suppose that f n is a sequence of functions defined on an interval
[a, b] such that
(a) f n is C 1 for all n;
(b) there is a function g defined on [a, b] such that limn→∞ f n = g uniformly on
[a, b] ;
(c) there is a point c ∈ [a, b] for which the limit limn→∞ f n (c) exists.
Then, there is a differentiable function f on [a, b] such that limn→∞ f n = f uni-
formly on [a, b] and f = g.
Proof We first note that g is continuous by (a) and (b) and an application of Theo-
rem 1.14. Hence g is integrable. The desired function f is the primitive of g and by
the fundamental theorem of calculus
32 1 Background

x b
f (x) = f (a) + g(t)dt = lim f n (a) + g(t)dt,
a n→∞ a

which suggests that we first show that

lim f n (a)
n→∞

exists and define f by


x
f (x) := lim f n (a) + g(t)dt. (1.2)
n→∞ a

Indeed, again by the fundamental theorem of calculus, with c as in (c),


c
f n (c) − f n (a) = f n (t)dt,
a

so that c
lim f n (a) = lim f n (c) − f n (t)dt . (1.3)
n→∞ n→∞ a

Condition (b) and an application of Theorem 1.15 shows that


c c
lim f n (t)dt = g(t)dt.
n→∞ a a

Therefore, the limit of the right-hand side of (1.3) exists. Therefore f is well-defined
by (1.2) and by the fundamental theorem of calculus, f = g. To complete the proof,
we need to show that limn→∞ f n = f uniformly on [a, b]. By (b), given  > 0, there
is an N such that

| f n (x) − g(x)| <  ∀ x ∈ [a, b] ∀ n ≥ N .

Hence,
   
| f m (x) − f (x)| =  f m (a) + g(t)dt 
x x
a f m (t)dt − limn→∞ f n (a) + a
 
| f m (a) − limn→∞ f n (a)| +  − g(t))dt  .
x
≤ a ( f m (t)

If m ≥ N , the absolute value of the integrand is less than  so that the integral is less
than (b − a). Since limn→∞ f n (a) exists, there is an N1 such that for m ≥ N1 , we
have | f m (a) − limn→∞ f n (a)| < . Therefore if m ≥ max(N , N1 ), we have

| f m (x) − f (x)| <  ∀ x ∈ [a, b].


1.6 Sequences of Functions 33

This completes the proof. 


Condition (c) of the theorem is not superfluous. For example, if f n (x) = n for all
x ∈ [0, 1], we see that both (a) and (b) are satisfied with g identically zero. But (c)
is not satisfied. And our sequence of functions does not even converge pointwise on
the interval.
The following is now a simple consequence of our theorem.
Corollary 1.1 Suppose that f n is a sequence of C 1 -functions defined on an interval
[a, b] and assume that there are two functions f and g also defined on [a, b] such
that limn→∞ f n = f and limn→∞ f n = g uniformly on [a, b]. Then, f = g.
A convenient test for uniform convergence of a sequence of functions is provided by
the following Weierstrass M-test.
Theorem 1.17 (Weierstrass M-test) Suppose that f n is a sequence of functions
defined on an interval I and assume that there is a sequence of positive real numbers
Mn such that
(a) | f n (x)| ≤ 
Mn ∀x ∈ I and ∀ n ≥ 1 and
(b) the series n≥1 Mn converges.

Then, the series ∞ n=1 f n converges uniformly on I .

Proof Let

n
Sn (x) := f k (x),
k=1

and

n
Tn := Mk
k=1


be the respective partial sums. Then, as the series n≥1 Mn converges, the sequence
of partial sums Tn is Cauchy. That is, given  > 0, there is an N such that

|Tm − Tn | <  ∀ m ≥ n ≥ N.

This means that for each x, the sequence Sn (x) is also Cauchy because
 m 
  
m 
m
 
|Sm (x) − Sn (x)| =  f k (x) ≤ | f k (x)| ≤ Mk = |Tm − Tn | < 
 
k=n+1 k=n+1 k=n+1

for all x ∈ I . Therefore the sequence Sn (x) converges pointwise to a limit f (x) (say).
We need to show this convergence is uniform. Indeed,
 
 f (x) − n f k (x) = | f (x) − Sm (x) + Sm (x) − Sn (x)|
k=1
≤ | f (x) − Sm (x)| + |Sm (x) − Sn (x)|
≤ | f (x) − Sm (x)| + |Tm − Tn |
34 1 Background

for any m ≥ n. Given  > 0, we have an N such that |Tm − Tn | <  for m ≥ n ≥ N
so that for all x ∈ I ,
 
 
n 
 
 f (x) − f k (x) ≤ | f (x) − Sm (x)| + 
 
k=1

whenever m ≥ n ≥ N . Letting m tend to infinity, we see that the first term on the
right-hand side of the inequality tends to zero. This completes the proof. 

The concept of uniform convergence suggests that it is convenient to introduce


the following spaces. Let I be an interval and define B(I ) to be the space of bounded
functions on I and C(I ) the space of all continuous functions on I . If I is closed and
bounded C(I ) ⊆ B(I ). Both of these spaces are infinite-dimensional vector spaces
over R as the reader can verify using basic calculus. We can make both of these
spaces into metric spaces by defining the sup norm

|| f || := sup{| f (x)| : x ∈ I }.

Metric spaces are reviewed in Sect. 1.8. The student can verify that given two func-
tions, f, g ∈ B(I ), the distance || f − g|| defines a metric. It is now clear from the
above discussion that a sequence of functions f n in B(I ) converges to f ∈ B(I ) if
and only if limn→∞ f n = f uniformly on I .

Exercises

1. Show that each of the following sequence of functions converges pointwise on


[0, 1] to the zero function and determine if the convergence is uniform on [0, 1].
(a) f n (x) = nxe−nx ;
(b) f n (x) = nxe−n x ;
2

(c) f n (x) = nxe−nx .


2

2. Show that the series



 sin nx
n=1
n3

converges uniformly to a differentiable function f (x) on [0, 2π ] and



 cos nx
f (x) = .
n=1
n2

3. Let f n (x) = nx/(nx + 1) for x ∈ [0, 1] and n ≥ 1.


1.6 Sequences of Functions 35

(a) Compute the pointwise limit limn→∞ f n (x) for x ∈ [0, 1]. Is the convergence
uniform?
(b) Compute
1 1
lim f n (x)d x and lim f n (x)d x.
n→∞ 0 0 n→∞

Are they equal?


4. Let x
f n (x) =
1 + n2 x 2

for n ≥ 1 and x ∈ [−1, 1]. Prove that

1
| f n (x)| ≤ ∀n ≥ 1 and ∀x ∈ [−1, 1].
2n
Deduce that limn→∞ f n = 0 uniformly on [−1, 1].

5. With f n (x) as in the previous exercise, show that

1 − n2 x 2
f n (x) = .
(1 + n 2 x 2 )2

Deduce that
1, x =0
lim f n (x) =
n→∞ 0, 0 < |x| ≤ 1.

Deduce that f n (x) does not converge uniformly on [−1, 1].

1.7 Power Series

With x a real variable, and c a real number, an infinite series of the form


an (x − c)n = a0 + a1 (x − c) + a2 (x − c)2 + · · ·
n=0

is called a power series. Such series have an ubiquitous role in mathematics, ranging
from approximation theory to the solution of differential equations. As we shall see
later, they play a vital role in the development of complex analysis.
There is no loss of generality if we consider the series of the form


an x n (1.4)
n=0
36 1 Background

as can be seen by a simple change of variable. Thus, for the sake of elegance and
simplicity, we will formulate our results with c = 0.
By a simple application of the root test, the series converges if

lim sup |an |1/n |x| < 1.


n→∞

In other words, the series converges for |x| < R where

1
R= , (1.5)
lim supn→∞ |an |1/n

where we understand 1/0 to be infinity and 1/∞ to be zero. This motivates the
definition of the radius of convergence as the largest number R such that the series
(1.4) converges for |x| < R and R is given by the formula (1.5).
An important application of the theory of power series is the expansion of a
function as a Taylor series. One can also view the theory as giving polynomial
approximations to C ∞ functions. The idea is simple enough and is an application of
repeated integration by parts.
Without any loss of generality, suppose that f is a C ∞ function defined on [−1, 1].
By the fundamental theorem of calculus
x
f (x) − f (0) = f (t)dt.
0

Integrating by parts, we deduce


 t=x x
f (x) − f (0) = f (t)(t − x) − f (t)(t − x)dt.
t=0 0

In other words,
x
f (x) = f (0) + f (0)x + f (t)(x − t)dt.
0

The key observation here is that the primitive of 1 is t + c for any constant c and we
have chosen the constant to be −x in this calculation. Iterating this procedure, it is
now evident that we have the following theorem.
Theorem 1.18 (Taylor’s theorem) Suppose that f is a function defined on an open
interval I and that c is a point in I . Suppose that f (n+1) exists and is continuous on
I . Then,
n
f (k) (c)
f (x) = (x − c)k + Rn (x),
k=0
k!

where
1.7 Power Series 37

x
1
Rn (x) = (x − t)n f (n+1) (t)dt. (1.6)
n! c

Moreover, there is a point ξ ∈ I such that

f (n+1) (ξ )
Rn (x) = (x − c)n+1 . (1.7)
(n + 1)!

Proof It is clear that our discussion preceding the statement of the theorem (replacing
zero by c) leads to the first assertion. Indeed,
x x
f (x) − f (c) = (x − t)0 f (t)dt = (x − c) f (c) + (x − t) f (t)dt,
c c

by the fundamental theorem of calculus and an integration by parts. Iterating this


process n times leads to the first assertion with Rn (x) given by (1.21). For the estima-
tion of the remainder, let us suppose first that c < x (the case c > x being similar),
and put

m = inf{ f n+1 (t) : c ≤ t ≤ x}, M = sup{ f (n+1) (t) : c ≤ t ≤ x}.

Then,
x x
m M
(x − t)n dt ≤ Rn (x) ≤ (x − t)n dt.
n! c n! c

The integral is easily evaluated to be

(x − c)n+1
.
n+1

Consequently,
Rn (x)(n + 1)!
m≤ ≤ M.
(x − c)n+1

By the intermediate value theorem, we see that there is a ξ ∈ [c, x] such that (1.7)
holds. This completes the proof. 
It is worth remarking that in Theorem 1.18, the term

n
f (k) (c)
(x − c)k
k=0
k!

is called the nth-order Taylor polynomial. Equation (1.7) is called the Lagrange form
for the remainder. For n = 0, the theorem reduces to the usual mean value theorem.
Theorem 1.18 allows us to derive polynomial approximations of some familiar
functions such as e x and the trigonometric functions like sin x and cos x.
38 1 Background

Example 1.3 We have for x ∈ (0, 2π ),


 
 n
(−1)k x 2k+1  |x|2n+2

sin x − ≤ .
 (2k + 1)!  (2n + 2)!
k=0

Indeed, by Taylor’s theorem, the remainder R2n+1 (x) is equal to

x 2n+2
sin(2n+2) (ξ ) ,
(2n + 2)!

for some ξ ∈ (0, 2π ) and the assertion is now evident.

Exercises

1. If P is a polynomial, show that limn→∞ |P(n)|1/n = 1.

2. If


an x n
n=0

has radius of convergence R, and P and Q are polynomials such that Q(n) = 0
for all n ≥ 0, show that
 ∞
P(n)
an x n
n=0
Q(n)

also has radius of convergence R. In particular,



 ∞
an n+1
nan x n−1 and x
n=0 n=0
n+1

both have radius of convergence R.

3. Show that the Taylor series of (1 + x)r when r is a real number is given by

∞
r k
x ,
k=0
k

where
r r (r − 1) · · · (r − k + 1)
= .
k k!
1.7 Power Series 39

Show further that the power series converges for |x| < 1.

4. Using the previous exercise, show that the Taylor series of (1 − 4x)−1/2 about
x = 0 is given by
∞
2n n
x .
n=0
n

Show that the series converges for x ∈ (−1/4, 1/4).

5. Prove that
∞
2n (−1)n 1
n
=√ .
n=0
n 4 2

6. Let r be a fixed non-negative integer. Show that the power series



 (−1)n x 2n+r
Jr (x) := ,
n=0
22n+r n!(n + r )!

has radius of convergence R = ∞. (Jr (x) is called the Bessel function of the
first kind.)

7. With Jr (x) as in the previous exercise, show that Jr (x) is a solution y(x) to
Bessel’s differential equation

x 2 y + x y + (x 2 − r 2 )y = 0.

(Bessel functions have many applications in the study of the propagation of elec-
tromagnetic waves through cylindrical waveguides.)

8. (Generalized mean value theorem for integrals) If f and g are two continuous
functions on an interval [a, b] and if f is non-negative there, show that there is a
ξ ∈ [a, b] such that
b b
f (t)g(t)dt = g(ξ ) f (t)dt.
a a

9. (Second mean value theorem for integrals) Let f (x) be a bounded, monotonic
decreasing, non-negative, differentiable function on [a, b] and let g(x) be a
bounded integrable function. Show that for some ξ ∈ [a, b], we have
b ξ
f (x)g(x)d x = f (a) g(x)d x.
a a
40 1 Background

[Hint: define the function


t
G(t) = g(x)d x.
a

Then integrate by parts


b
f (x)g(x)d x
a

and apply the previous exercise.]

1.8 Metric Spaces and Euclidean Spaces

The construction of the real numbers from the set of rational numbers used two
fundamental ideas: the concept of a metric and Cauchy sequences. It is convenient
to define both of these ideas in a more general context.
Given a set X , a pseudo-metric for X is a function d : X × X → R+ , the set of
non-negative real numbers satisfying the following properties:
1. d(x, x) = 0 for all x ∈ X ;
2. d(x, y) = d(y, x) for all x, y ∈ X (symmetry);
3. d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z ∈ X (triangle inequality).
If in addition d(x, y) = 0 implies x = y, then d is called a metric. We sometimes
write (X, d) to indicate either the pseudo-metric space or metric space accordingly.
The usual absolute value |x − y| for x, y ∈ R defines a metric and more generally
for m-dimensional space Rm , we define the distance between x = (x1 , . . . , xm ) and
y = (y1 , . . . , ym ), as 

 m
|x − y| :=  (x j − y j )2 .
j=1

Occasionally, we also use the notation ||x|| to denote the length of the vector x. The
fact that this distance function satisfies the triangle inequality is not totally trivial
and is equivalent to the Cauchy–Schwarz inequality. (Note that there is no “t” in the
spelling of Schwarz here.) Essentially, we need to show that

|x + y| ≤ |x| + |y|, ∀ x, y ∈ Rm . (1.8)

It is convenient to introduce the dot product (or sometimes called the inner product):


m
x · y := x j yj. (1.9)
j=1
1.8 Metric Spaces and Euclidean Spaces 41

The reader can verify that the distributive


√ property holds: x · (y + z) = x · y + x · z.
Then, it is easily seen that |x| = x · x. so that squaring the triangle inequality (1.8),
it is equivalent to showing

(x + y) · (x + y) ≤ x · x + y · y + 2|x||y|,

so that we are reduced to showing

x · y ≤ |x||y|,

which is the Cauchy–Schwarz inequality.


This ubiquitous Cauchy–Schwarz inequality is easily proved as follows. Clearly

2x j y j ≤ x 2j + y 2j .

The homogeneity of the left-hand side shows that for any λ = 0, we can replace x j
by λx j and y j by y j /λ to get

2x j y j ≤ λ2 x 2j + λ−2 y 2j .

We sum this over j = 1 to m to deduce


m 
m 
m
2 x j y j ≤ λ2 x 2j + λ−2 y 2j . (1.10)
j=1 j=1 j=1

We choose λ so that we can minimize the right-hand side. Thus, setting


⎛ ⎞⎛ ⎞−1

m 
m
λ4 = ⎝ y 2j ⎠ ⎝ x 2j ⎠ ,
j=1 j=1

and inserting this into (1.10) gives the result.


Yet another proof of the Cauchy–Schwarz inequality can be deduced from the
geometric meaning of the dot product which is a simple consequence of the cosine
law that is in turn derived from the celebrated Pythagorean theorem.
The oldest source of the theorem seems to be rooted in the Sulva Sutras of Baud-
hayana dated as being written around 800 BCE. It has been suggested by many
historians that Pythagoras, who was a contemporary of the Buddha and Lao-Tzu,
learned of the theorem from his travels in India and later formed a religious and
mystical cult around the theorem. In his book, “The Message of Plato”, Urwick
writes that Pythagoras may have actually been an Indian teacher of Vedanta philos-
ophy, “whose very name was only the Greek form of the Indian title Pitta Guru, or
42 1 Background

Fig. 1.1 Bhaskaracharya’s C


proof of the Pythagorean
theorem b
θ a

θ c−x x
A B
D

Father-teacher.”2 Some Indian connection seems evident from all written accounts
of Pythagoras and his school with its belief in reincarnation and its insistence on
vegetarianism and moral practices as a means for attaining higher knowledge. In his
research article highlighting the influence of Indian philosophy on Greek thought,
Marlow writes that the philosophy of Pythagoras and Plato “is as unlike anything in
Greek thought as it is like the Hindu mysticism of the Upanisads.”3
The Indian penchant for abstraction, with its legendary discovery of zero and the
decimal system, demonstrates a striking difference between the Greek geometric (and
visual or sensory) approach to mathematics and the Indian abstract (or transcendental)
approach. This is evidenced by the short and elegant proof of the Pythagorean theorem
by Bhaskaracharya (1114–1185 CE). By contrast, Euclid’s proof constructs squares
on each side and proceeds to subdivide them, comparing them with areas of numerous
triangles on each side. The proof is more geometric and visual.
Here is Bhaskara’s short proof of the Pythagorean theorem. Consider the right-
angled triangle ABC with the right angle at C along with the perpendicular C D to
AB (see Fig. 1.1). The triangles ABC, AC D and C B D are similar.
Thus, comparing the smaller triangles to the big triangle ABC, we get

c−x b
cos θ = = ,
b c

so that b2 = c2 − cx. Similarly,


x a
sin θ = = ,
a c

so that a 2 = cx. Putting these two equations together gives us the familiar
Pythagorean theorem:
c2 = a 2 + b2 ,

or alternatively, 
c= a 2 + b2 .

2 See p. 14 of The Message of Plato by Edward J. Urwick, Methuen and Company Limited, 36
Essex Street W.C., London, 1920.
3 See p. 39 of A.N. Marlow, Hinduism and Buddhism in Greek Philosophy, Philosophy East and

West, Vol. 4, No. 1, (Apr., 1954), pp. 35–45.


1.8 Metric Spaces and Euclidean Spaces 43

The Pythagorean theorem lies at the heart of measuring distance in Euclidean


space. An easy induction to n dimensions shows that the length of the vector
(x1 , x2 , . . . , xn ) in Rn is 
x12 + x22 + · · · + xn2 .

But the concept of a vector was slow in coming. William Rowan Hamilton (1805–
1865) is said to be the first mathematician who defined the notion of a vector in the
context of developing his theory of quaternions. Throughout his life, he was obsessed
in developing his quaternionic theory and in doing so, was led to give a precise
definition of a vector. Both vectors and quaternions have revolutionized mathematics,
but it was the idea of a vector that has had a more profound impact. Hamilton’s work,
however, was confined to four dimensions. It was Hermann Grassmann (1809–1877)
who gave us the modern version of a vector in higher-dimensional space.
Grassmann, interestingly, never had an academic position in a university. He was
a high-school teacher who also had expertise in languages. In particular, he was a
Sanskrit scholar who compiled the first translation of the Rig Veda into German.
It was his mastery of Sanskrit that inspired him to introduce the term “matrix”
in mathematics. In doing this, he was fully aware of the Latin roots, mater and
the Sanskrit root matr, both signifying “mother” or more accurately, “the womb,”
because he felt that the theory of matrices is the “womb of mathematics” from which
everything emerges.
Grassmann saw mathematics more as a “theory of forms” rather than as a “theory
of measurement.” It would be accurate to say that the modern viewpoint is that math-
ematics is both. But Grassmann’s emphasis on forms led to fundamental abstractions
leading to the concepts of vectors and matrices.
The concept of a vector is inevitable if one wants to study motion in three-
dimensional space. In fact, it would be fair to say that multivariable calculus partly
arose motivated by a need to study three-dimensional motion, just as one-variable
calculus arose to describe linear motion, or motion in the plane. Vectors represent
forces. They have a magnitude and a direction. For the novice, the formal operations
with vectors and matrices seem ad hoc. A familiar case in point is the dot product of
two vectors, which lies at the heart of the definition of matrix multiplication:

a = (a1 , a2 , . . . , an ), b = (b1 , b2 , . . . , bn ), a · b = a1 b1 + a2 b2 + · · · + an bn .

Often, in the development of mathematics, we forget the physical origins of con-


cepts and present these ideas very abstractly and thus lose the visual aspect of the
theory as stressed by Grassmann.
The familiar cosine law which generalizes the Pythagorean theorem motivates
the definition of the dot product. Indeed, given two vectors a and b, let θ be the
angle between them. Let us consider first the simple case where the vectors are
two-dimensional. Without any loss of generality, we may coordinatize choosing the
plane spanned by these two vectors as indicated in Fig. 1.2. The cosine law is easily
deduced by calculating the distance from the origin to (a − b cos θ, b sin θ ). We find
44 1 Background

Fig. 1.2 Cosine law (a − b cos θ, b sin θ)

c b

θ φ

(0, 0) (a, 0)

c2 = a 2 + b2 − 2ab cos θ.

From Fig. 1.2, the student can easily see that the vector sum a + b has coordinates

(a + b cos φ, b sin φ).

Since θ + φ = π , we find either by looking at the graph of the cosine and sine
functions, or by using the addition formulas, that

a + b cos φ = a − b cos θ, b sin φ = b sin θ,

and our claim is now evident.


Thus, given two vectors a and b in R2 with angle θ between them (see Fig. 1.3),
the length of the vector c = b − a is by the cosine law,

||b − a||2 = ||b||2 + ||a||2 − 2||a|| ||b|| cos θ, (1.11)

since we may choose, without any loss of generality, our coordinate x-axis along the
vector a.
On the other hand, if a = (a1 , a2 ) and b = (b1 , b2 ), then b − a = (b1 − a1 , b2 −
a2 ) so that by the Pythagorean theorem, we have

||b − a||2 = (b1 − a1 )2 + (b2 − a2 )2 = b12 + b22 + a12 + a22 − 2(a1 b1 + a2 b2 ).


(1.12)
The right hand side of (1.12) is

||b||2 + ||a||2 − 2(a · b).


1.8 Metric Spaces and Euclidean Spaces 45

Fig. 1.3 Dot product

Comparing this with (1.11) gives us

a · b = ||a|| ||b|| cos θ

which gives us the visual interpretation of the dot product.


This is easily generalized to higher dimensions since two linearly independent
vectors determine a plane and the lengths and angle between the two vectors are
both independent of the coordinatization: If θ is the angle between the two vectors
a and b, then
a · b = ||a|| ||b|| cos θ,

and the Cauchy–Schwarz inequality is now immediate since cos θ is bounded by 1.


In physics, the dot product arises in many contexts such as giving a mathematical
description of work. The physical definition of “work” is the dot product of force
and displacement. In mathematical notation,

W = F · d.

In an introductory course in physics, one often confines the study of motion to a


single dimension. However, motion is three-dimensional and the notions of vectors
and dot products are inevitable. For example, the kinetic energy of a particle with
mass m and velocity v is often defined as 21 mv 2 . However, if the particle is moving
in space and velocity is seen as a vector v, then the kinetic energy is

1
m(v · v).
2
Energy does not have a direction, but velocity does as it is a vector.
46 1 Background

From the geometric meaning of the dot product, we see that two vectors x and y are
orthogonal if and only if their dot product is zero. It is important to keep the geometric
meaning of vectors in mind. It adds another dimension to our understanding.
It is convenient to recall here the cross-product of two vectors in R3 . Formally,
given two vectors
a = (a1 , a2 , a3 ), b = (b1 , b2 , b3 )

in R3 , the cross-product is by definition

a × b := (a2 b3 − a3 b2 , a3 b1 − a1 b3 , a1 b2 − a2 b1 ).

The formal definition lacks any hint of its importance and meaning. In physics, the
concept arises to describe torque. If a represents the displacement of a particle from
a fixed point to a movable point, and b is the force applied at the movable point,
then the cross-product a × b is the torque exerted by the force about the fixed point.
The above opaque definition is better remembered using determinants. One writes
symbolically,  
 i j k
 
a × b = a1 a2 a3 
b1 b2 b3 

where i, j, k are the unit vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1) respectively. Expand-
ing the determinant using the first row leads to the earlier definition so that the “deter-
minant expression” serves as a useful mnemonic for the cross-product. A straight-
forward and tedious computation shows that

|a × b|2 = (a12 + a22 + a32 )(b12 + b22 + b32 ) − (a1 b1 + a2 b2 + a3 b3 )2

so that
|a × b|2 = |a|2 |b|2 − |a|2 |b|2 cos2 θ. (1.13)

Consequently,
|a × b| = |a||b| sin θ,

where θ is the angle between the two vectors a and b. This formula implies a geo-
metric interpretation of the magnitude of the cross-product vector. It is the area of the
parallelogram spanned by the two vectors a and b. The direction of the cross-product
is given by the familiar “right-hand rule,” where if you align the fingers of your right
hand along the vector a and bend your fingers around in the direction of rotation
from a to b, your thumb will point in the direction of a × b. We also note that the
cross-product of a and b is zero if and only if they are parallel.
With this interlude on the dot product and cross-product, we return to our general
discussion and consider sequences in Rm . It will be clear that much of this discussion
applies to a general metric space. For the sake of clarity, we will confine our attention
to a discussion of sequences in Rm . They will be sequences of vectors xn and we can
1.8 Metric Spaces and Euclidean Spaces 47

write
xn = (xn1 , . . . , xnm ).

A sequence is said to converge to L ∈ Rm , if given  > 0 there exists N such that

|xn − L| <  ∀ n ≥ N.

We then write limn→∞ xn = L. The reader can easily verify that convergence of a
sequence of vectors is equivalent to the convergence of the component sequences
(see Exercise 1).
The notion of Cauchy sequences is Rm is analogous (as it is in any metric space).
The student can show again that a sequence in Rm is Cauchy if and only if it converges
(see Exercise 3).
A sequence xn in Rm is said to be bounded if there is a number M such that
|xn | ≤ M for all n. Again, the reader can verify that a sequence in Rm is bounded if
and only if each component sequence is bounded (Exercise 2).
The analogue of the Bolzano–Weierstrass theorem goes through.
Theorem 1.19 Every bounded sequence in Rm has a convergent subsequence.

Proof Let xn be a bounded sequence is Rm . We first apply Theorem 1.5 to the


sequence xn1 in R and obtain a convergent subsequence xn k ,1 . We again apply
Theorem 1.5 to the bounded sequence xn k ,2 to get another convergent subsequence.
Admittedly, the notation is cumbersome but it is clear that after m applications of
Theorem 1.5 in this way, we derive a convergent subsequence of vectors, as desired.


Exercises

1. Writing xn = (xn1 , . . . , xnm ) and L = (L 1 , . . . , L m ), show that limn→∞ xn = L


if and only if
lim xn j = L j 1 ≤ j ≤ m.
n→∞

2. Show that a sequence xn is bounded if and only if each component sequence xn j


is bounded for 1 ≤ j ≤ m.

3. Show that a sequence xn in Rm is Cauchy if and only if it converges.

4. (Cauchy–Schwarz inequality for integrals) Let f and g be integrable on [a, b].


Prove that
 2
 b  b b
 f (t)g(t)dt  ≤ f 2 (t)dt g(t)2 dt .

a a a
48 1 Background

5. A contraction on Rd is a function f : Rd → Rd such that there is a number


K < 1 satisfying
| f (u) − f (v)| ≤ K |u − v|,

for all u, v ∈ Rd . Prove that f is continuous.

6. Let f be a contraction as in the previous exercise. The following exercises


establish the contraction mapping theorem: any contraction has a unique fixed
point.
(a) Prove that f has at most one fixed point. [Hint: suppose there were two.]
(b) Let x1 be any point in Rd . Define a sequence recursively by xn+1 = f (xn )
for n ≥ 1. Show that for m ≥ 2,

|xm+1 − xm | < K m−1 |x2 − x1 |.

(c) Obtain an estimate for |xn − xm | in terms of K and |x2 − x1 | for any two
positive integers m, n.
(d) Show that xn is a Cauchy sequence in Rd .
(e) Show that the vector
x := lim xn
n→∞

is a fixed point of f
7. For each number a put f a (x) = ax. For which numbers a is f a a contraction on
R? When f a is a contraction, what is the fixed point?

8. For what intervals [0, r ] with r ≤ 1, is the map f : [0, r ] → [0, r ] given by
x → x 2 a contraction?

1.9 The Heine–Borel Theorem

Much of our discussion regarding analysis on the real line extends to Rm . For example
in R2 , the closed rectangle [a, b] × [c, d] is

{(x, y) : x ∈ [a, b], y ∈ [c, d]},

and more generally we can speak of closed rectangles in Rm (even though they are
not rectangles per se, but rather generalizations of them). A similar definition can
be made about open rectangles (a, b) × (c, d) in R2 and analogously in higher
dimensions. A subset U ⊂ Rm is called open if for each point x ∈ U there is an open
rectangle A such that x ∈ A ⊂ U . This is equivalent to the definition that there is
an open ball Br (x) contained in U . A subset C of Rm is closed if its complement is
1.9 The Heine–Borel Theorem 49

open. A set A is said to be compact if every open cover of A can be reduced to a


finite subcover. The reader should be familiar with the following theorem of Heine
and Borel.
The theorem has a history somewhat reminiscent of the Bolzano–Weierstrass
theorem in that it was first discovered in 1872 by H.E. Heine (1821–1881) and later
rediscovered in 1895 by Emile Borel (1871–1956).4
Theorem 1.20 (Heine–Borel theorem) The closed interval [a, b] is compact.
Proof If O is an open cover of [a, b], let A be the set

{x : a ≤ x ≤ b and [a, x] is covered by a finite number of open sets in O}.

Clearly a ∈ A and A is bounded above by b. We want to show that b ∈ A. We will


do this by showing that the least upper bound α (say) of A lies in A and that α = b.
First, α > a because as a is contained in some open set U of O, there is an open
ball of radius r centered at a (which is necessarily of the form [a, r ) contained in U
Since α is the least upper bound of A, and α ≤ b, α lies in some open set V of O.
But then there is an open ball of radius r centered at α that lies in V . If α < b, this
would give us a contradiction to the definition of α and so α = b. 
It is not difficult to show that if A ⊂ Rm and B ⊂ Rn are compact, then A × B ⊂
Rm+n is compact. In particular, a closed rectangle is compact. In fact, in Rm , compact
sets are characterized by the fact that they are closed and bounded (see Exercise 4).

Exercises

1. Show that a closed rectangle is a closed set.

2. If B ⊂ Rm is compact and x ∈ Rn , show that {x} × B ⊂ Rn+m is compact.

3. If B is compact and O is an open cover of {x} × B, show that there is an open set
U ⊂ Rn containing x such that U × B is covered by a finite number of sets in O.

4. Show that a closed and bounded subset of Rm is compact and conversely.

5. In Rm , show that the union of any (even infinite) number of open sets is open.
Prove that the intersection of finitely many open sets is open, and find a coun-
terexample to show how this fails for infinitely many open sets.

6. If A is a closed set that contains every rational number r ∈ [0, 1], show that
[0, 1] ⊂ A.

4 See p. 618 of Boyer [3].


50 1 Background

1.10 Vector-Valued Functions

A function f from Rn to Rm is called a vector-valued function of n variables. We


write f : Rn → Rm and the component functions are designated f i (x) for 1 ≤ i ≤ m
and x = (x1 , . . . , xn ). If n = 1, and f is continuous, then one calls f a curve, so
that the component functions are functions of one-variable. (In practice, a curve is
a map from an interval in R into Rm .) If f : Rn → Rm and m = 1, one calls f a
scalar field. When m = n, we speak of f as a vector field.
If I is an interval [a, b] in R, and X : I → Rm is a curve, with component functions
x1 (t), . . . , xm (t) which are differentiable, it is convenient to visualize the domain as
a time parameter and the curve being traced out in m-dimensional space as t moves
from a to b. By the usual Pythagorean theorem and the familiar limit process, the
length of the curve traced out in time t is
t 
s(t) := x1 (u)2 + · · · + xm (u)2 du, (1.14)
a

as the reader can easily verify. For example, the circle of radius r is parameterized
by
(x(t), y(t)) = (r cos t, r sin t), 0 ≤ t ≤ 2π.

The circumference of the circle is readily seen to be 2πr by formula (1.14).


A curve, being a function of a single variable, can easily be studied using one-
variable calculus. If r : I → Rm is a curve C whose component functions are dif-
ferentiable, then r (t) is the vector of component derivatives and often called the
velocity vector or tangent vector. The vector

r (t)
T(t) =
|r (t)|

is called the unit tangent vector to the curve C. It is again a vector-valued function
of a single variable. The magnitude of the rate of change of the tangent vector with
respect to the length of the curve is called the curvature of C. More precisely, the
curvature κ is given by  
 dT 
κ :=   ,
ds

where s(t) is the curve length given by (1.14).


Using the chain rule, one can rewrite this formula in a more amenable form.
Indeed,
dT dT ds
T (t) = = ,
dt ds dt
so that
1.10 Vector-Valued Functions 51
 
 dT  |T (t)|
κ =   =   .
ds  ds 
 
 dt 

By (1.14),
ds
= |r (t)|.
dt
Combining these formulas, we obtain

|T (t)|
κ= .
|r (t)|

The notion of a continuous function is easily extended to vector-valued functions


of several variables. Let A be an open subset of Rn . The function f : A → Rm is
said to be continuous at a ∈ A if

lim f (x) = f (a).


x→a

We can also define this without using limits. We say f is continuous if and only if
f −1 (U ) is open for every open set U in Rm . The following important theorem states
that the continuous image of a compact set is compact.
Theorem 1.21 If f : A → Rm is continuous and A ⊂ Rn is compact, then f (A) is
compact.

Proof Let O be an open cover of f (A). For each open set U of O, there is an open
set VU such that f −1 (U ) = VU ∩ A, by the continuity of f . The collection of VU ’s
is an open cover of A which can be reduced to a finite subcover VU1 , . . . , VUn (say)
since A is compact. Thus, U1 , . . . , Un cover f (A). 

Exercises

1. The straight line in R3 is parameterized by the vector-valued function

r(t) = (at + b, ct + d, et + f )

for some constants a, b, c, d, e, f . Show that the curvature of the line is zero.

2. Noting that the circle of radius a in R2 is parameterized by

r(t) = (a cos t, b sin t),


52 1 Background

show that the curvature of the circle of radius a is 1/a.

3. Given a curve r(t), assume that the unit tangent vector to this curve is differen-
tiable. Define the principal unit normal vector N(t) to be

T (t)
.
|T (t)|

Show that
1 dT
N(t) = ,
κ ds
where κ denotes the curvature.

4. Under the same conditions as in the previous exercise, show that |r(t)| is constant
if and only if r(t) and r (t) are orthogonal to each other, for all values of t in the
domain of r(t).

5. If A is a compact subset of Rd and f : A → R is continuous, show that f attains


a maximum value and a minimum value on A. In other words, there are points u
and v such that
f (u) ≤ f (x) ≤ f (v),

for all x in A.

1.11 Derivatives of Multivariable Functions

The concept of a derivative from one-variable calculus is generalized to the multidi-


mensional case in several ways. There is the directional derivative, partial derivative
and the Jacobian matrix. The Jacobian emanates from the idea that the derivative is
really a linear transformation. Let us first consider the case f : Rn → R. Given a
vector u ∈ Rn , we can define the directional derivative of f in the direction u at
x0 , sometimes denoted f u (x0 ) by the limit,

f (x0 + hu) − f (x0 )


lim ,
h→0 h

provided the limit exists.


In the case that the range space is one-dimensional, it is convenient to consider
partial derivatives. Indeed, if f : Rn → R, and a = (a1 , . . . , an ) ∈ Rn , the limit

f (a1 , . . . , ai + h, . . . , an ) − f (a1 , . . . , an )
lim
h→0 h
1.11 Derivatives of Multivariable Functions 53

is called the ith partial derivative of f at a and denoted Di f (a). We can con-
tinue taking partial derivatives, provided they exist. For example, we can consider
D j (Di f ), sometimes denoted D j,i f . There are various theorems that ensure that
D j,i f = Di, j f . For instance, if both of these are continuous on an open set con-
taining a, then we have Di, j f (a) = D j,i f (a) (see Exercise 7). The equality is true
assuming weaker hypotheses which we do not discuss here. The function Di, j f is
called a second-order partial derivative or sometimes a mixed partial derivative.
In the case of functions of two variables f (x, y), it is convenient to write f x and f y
for ∂∂ xf and ∂∂ yf respectively. Thus f x y would be ( f x ) y and so on. A similar tradition is
sometimes adopted for functions of three or more variables.
The directional derivatives and the partial derivatives are related. Namely, if ei
is an element of the standard basis of Rn with 1 in the ith component and all other
components being zero, then Di f (a) = f ei (a).
For functions whose range is contained in R, we can speak about the maximum
and minimum values. The reader can verify that if A ⊂ Rn and f : A → R is dif-
ferentiable and the maximum value occurs at a, then Di f (a) = 0 for every i.
Historically, multivariable calculus emerged in the study of electromagnetism
and in particular, in the derivation of Maxwell’s equations. This is analogous to the
development of one-variable calculus that arose from Newton’s study of motion and
gravitation. Partly motivated by this context and for other practical reasons, the study
of scalar fields has been given detailed attention. Thus, given a continuous function
f : Rn → R, we define the gradient of f , denoted ∇ f as a function from Rn to Rn
given by
∇ f = (D1 f, . . . , Dn f ).

Thus, ∇ f is a vector field. Points x ∈ Rn where ∇ f (x) = 0 are called critical points.
If f (x, y) is a function f : R2 → R, of two variables and we have a curve z(t) =
(x(t), y(t)) ∈ R2 , we can study f along this curve and it can be viewed as a function
of a single variable. That is, we can consider

F(t) = f (x(t), y(t)).

We want to calculate F (t0 ). Writing (x(t0 ), y(t0 )) = (x0 , y0 ), we have from one-
variable calculus,

F(t) − F(t0 ) f (x, y) − f (x0 , y0 )


F (t0 ) = lim = lim .
t→t0 t − t0 t→t0 t − t0

We write the numerator in the last limit as

[ f (x, y) − f (x0 , y)] + [ f (x0 , y) − f (x0 , y0 )].

We apply the mean value theorem to each of the bracketed terms:


54 1 Background

f (x, y) − f (x0 , y) = f x (ξ, y)(x − x0 ), f (x0 , y) − f (x0 , y0 ) = f y (x0 , η)(y − y0 ),

for suitable ξ ∈ [x(t0 ), x(t)] and η ∈ [y(t0 ), y(t)]. Thus,

x − x0 y − y0
F (t0 ) = lim f x (ξ, y) + f y (x0 , η)
t→t0 t − t0 t − t0

which equals
f x (x0 , y0 )x (t0 ) + f y (x0 , y0 )y (t0 ).

In other words,
df ∂ f dx ∂ f dy
= + .
dt ∂ x dt ∂ y dt

In general, if f : Rn → R is a function of n-variables, we can study the function f


along a curve
t → x(t) = (x1 (t), . . . , xn (t)),

as a function of a single variable and we have

df  ∂ f d xi
n
= .
dt i=1
∂ xi dt

This can be rewritten as a dot product:

df
= ∇ f (x) · x (t), (1.15)
dt
and we see the analogy to the one-variable version of the chain rule. Thus, (1.15)
is often referred to as the chain rule for functions of several variables, though it is
only a special case of the general chain rule for the derivative of the composition of
functions f : Rn → Rm and g : Rm → Rr (see Exercise 3 at the end of this section).
This observation allows us to prove Taylor’s theorem in several variables.
We say a subset S of Rn is star-shaped with respect to the point c if for every x
in S the line segment joining x to c lies in S.
Lemma 1.2 Let f be a C ∞ function on an open subset U of Rn which is star-shaped
with respect to a point c = (c1 , . . . , cn ) in U . Then there are C ∞ -functions g1 , . . . , gn
on U such that


n
∂f
f (x) = f (c) + (xi − ci )gi (x), gi (c) = (c).
i=1
∂ xi

Proof Since U is star-shaped with respect to c, we have for any x in U that the line
segment
1.11 Derivatives of Multivariable Functions 55

c + t (x − c), 0 ≤ t ≤ 1

lies in U . Thus, f (c + t (x − c)) is defined for 0 ≤ t ≤ 1. By the chain rule,

d  n
∂f
f (c + t (x − c)) = (xi − ci ) (c + t (x − c)).
dt i=1
∂ xi

We integrate both sides with respect to t for 0 ≤ t ≤ 1 to get

t=1 
n 1
∂f

f (c + t (x − c)) = (xi − ci ) (c + t (x − c))dt.
t=0
i=1 0 ∂ xi

Let
1
∂f
gi (x) = (c + t (x − c))dt.
0 ∂ xi

Then gi is C ∞ and we obtain


n
f (x) = f (c) + (xi − ci )gi (x).
i=1

Moreover,
1
∂f ∂f
gi (c) = (c)dt = (c).
0 ∂ xi ∂ xi

The lemma shows that the linear polynomial


n
∂f
f (x) = f (c) + (xi − ci ) (c),
i=1
∂ xi

is a good approximation to f (x) in a neighborhood of c. To derive the general Taylor


formula for multivariable functions, it is convenient to introduce the operator


n

h · ∇ := hi ,
i=1
∂ xi

where h = (h 1 , . . . , h n ). We will then write

 
∂ f 
n
(h · ∇) f (c) := hi .
i=1
∂ xi x=c
56 1 Background

More generally, we can introduce powers of this operator (h · ∇)r . For instance,


n
∂2
(h · ∇)2 := hi h j .
i, j=1
∂ xi ∂ x j

For a function f : Rn → R which has partial derivatives up to order r , we write




(h · ∇)r f (c) := (h · ∇)r f  .
x=c

With this understanding, we can now state:


Theorem 1.22 (Taylor’s theorem with remainder) Let f be a real-valued function
defined on an open subset U of Rn which is star-shaped with respect to a point c
having continuous partial derivatives up to order r + 1. Then


r
((x − c) · ∇) j ((x − c) · ∇)r +1
f (x) = f (c) + f (c) + f (c + ξ(x − c)),
j=1
j! (r + 1)!

for some ξ ∈ (0, 1)

Proof We apply the one-variable Taylor’s theorem (see Theorem 1.18) to the function

g(t) := f (c + th).

It is now easy to see that

g ( j) (t) = (h · ∇) j f (c + th)

and the result now follows from Theorem 1.18. 

It is instructive to review here the situation in the special case n = 2 and m = 1,


that is, for scalar valued functions of two variables. The reader should be familiar
with the following.
Theorem 1.23 Let f (x, y) be continuous and have continuous partial derivatives
of order at least two in a region A of R2 including (x0 , y0 ). A sufficient condition
that (x0 , y0 ) is a relative maximum is that
 
 fx x fx y  ∂2 f

0<  and (x0 , y0 ) < 0.
f yx f yy (x,y)=(x0 ,y0 ) ∂x2

Proof By Taylor’s theorem for two variables, we have in a neighborhood of (x0 , y0 ),

1
f (x0 + h, y0 + k) = f (x0 , y0 ) + (h 2 f x x + 2hk f x y + k 2 f yy ) + o((h 2 + k 2 )3/2 )
2
1.11 Derivatives of Multivariable Functions 57

Fig. 1.4 Saddle point at


(0, 0)

1
−1
−1 0
−0.5 0
0.5 1 −1

where the second derivatives on the right are evaluated at (x0 + θ h, y0 + θ k) for
some 0 < θ < 1. The second expression on the right-hand side is a quadratic form
in h, k and by completing the squares, we can write
   
1 fx y 2
f x x f yy − f x2y
f (x0 + h, y0 + k) − f (x0 , y0 ) = f x x h+ k + k 2
.
2 fx x f x2x

By hypothesis, there is a neighborhood of (x0 , y0 ) in which f x x < 0. Also by hypoth-


esis, the coefficient of k 2 is positive at (x0 , y0 ) so that the whole expression in the
square brackets is positive in a neighborhood of (x0 , y0 ). Whence, the entire expres-
sion on the right-hand side is negative in a neighborhood of (x0 , y0 ). Therefore,
f (x0 , y0 ) is a local maximum. 

A similar result holds for relative minima (see Exercise 8). Critical points that do not
correspond to either a relative maxima or minima are called saddle points.
The notion of a saddle point is best illustrated by considering the function
f (x, y) = x 2 − y 2 . Clearly, (0, 0) is a critical point. However, neither a local maxi-
mum nor a local minimum occurs at (0, 0). This can also be seen visually in Fig. 1.4.
The appearance of the quadratic form signals a generalization in the case of more
than two variables. Accordingly, we introduce the Hessian matrix. Given a function
f : A ⊂ Rn → R which has derivatives at least to order two, we define the Hessian
to be the n × n matrix
∂2 f
H= .
∂ xi ∂ x j

Sometimes, we write H ( f ) for this matrix. It is then easy to see that in a neighborhood
of c, we have from Taylor’s Theorem 1.22 that upon writing h = x − c, we have

1
f (x) = f (c) + (h · ∇ f )(c) + hT [H ( f )(c)]h + o(|h|3 ),
2
58 1 Background

where hT denotes the transpose of h.


The reader will recall that a matrix A is said to be positive definite if hT Ah > 0
for all h = 0. It is negative definite if hT Ah < 0 for all h = 0.
An analysis similar to the one for two variables leads us to the following:
Theorem 1.24 Let A ⊂ Rn be open and f : A → R be a function of class C 2 .
Suppose that c in A is a critical point of f .
(a) If the Hessian H ( f )(c) is positive definite, then f has a local minimum at c.
(b) If the Hessian H ( f )(c) is negative definite, then f has a local maximum at c.
(c) If det H ( f )(c) = 0, and H ( f )(c) is neither positive nor negative definite, then
f has a saddle point at c.
If the determinant of the Hessian of f at c is zero, the test is inconclusive. In
such a case, the point c is said to be degenerate and we must use other methods to
determine whether or not it is an extremum of f .
There is another way to generalize the notion of a derivative in the multidimen-
sional case. It is motivated by Taylor’s theorem. Indeed, given a differentiable func-
tion f : R → R, we have

f (a + h) = f (a) + f (a)h + o(h),

as h → 0. Thus, the linear map h → f (a)h can be viewed as a good linear approx-
imation to the difference f (a + h) − f (a). Now, given f : Rn → Rm , we say it is
differentiable at a ∈ Rn if there exists a linear transformation λ : Rn → Rm such
that
f (a + h) = f (a) + λ(h) + o(|h|)

as h → 0. The linear transformation λ depends on a and so we denote it by D f (a)


and call it the derivative of f at a. It is easy to prove that this linear transformation
is unique (see Exercise 1).
It is convenient to consider the matrix D f (a) with respect to the standard basis
of Rn and Rm . In this way, we can represent the derivative as an m × n matrix called
the Jacobian matrix at a and denoted f (a). If we write

f (x) = ( f 1 (x), . . . , f m (x))

then ⎛ ∂f ⎞
1 ∂ f1
(a) · · · (a)
⎜ ∂ x1 ∂ xn ⎟
⎜ . .. ⎟

D f (a) = ⎜ .. ⎟
. ⎟,
⎝∂ f ∂ fm ⎠
m
(a) · · · (a)
∂ x1 ∂ xn

where the i jth entry is


∂ fi
(a).
∂x j
1.11 Derivatives of Multivariable Functions 59

The Jacobian determinant is the determinant of the Jacobian matrix. We use the
notation
∂( f 1 , . . . , f n )
∂(x1 , . . . , xn )

to designate this determinant.


This generalized derivative enjoys some of the properties shared by the one-
variable counterpart. For instance, if f is a constant function, then the derivative
is zero (see Exercise 4). If f is a linear transformation, then D f (a) = f (see
Exercise 2).
If A is an open set of Rn and f is defined only on A, we say f is differentiable
on A if it is differentiable at each point of A. The reader can verify that the usual
chain rule and product rules are valid (see Exercises 3 and 6).
As with many aspects of multivariable calculus, the chain rule is best understood
first for scalar valued functions of several variables, and in particular two variables. It
may amplify our understanding to reduce the calculation to single variable calculus
by considering parametric curves. Indeed, let f (x, y) be a differentiable function of
two variables and consider g(t) = f (x(t), y(t)) where t ranges over an interval in
R. By one-variable calculus,

f (x(t + h), y(t + h)) − f (x(t), y(t))


g (t) = lim .
h→0 h

In other words, as h → 0,

f (x(t + h), y(t + h)) = f (x(t), y(t)) + g (t)o(h).

By definition of differentiability,

∂f ∂f
f (x(t + h), y(t + h)) = f (x(t), y(t)) + x + y + o(x + y),
∂x ∂y

where x = x(t + h) − x(t) and y = y(t + h) − y(t). But x = x (t)o(h) and


y = y (t)o(h). Dividing through by h and letting h → 0, we deduce

∂ f dx ∂ f dy
g (t) = + .
∂ x dt ∂ y dt

It is convenient to view this equation as representing the infinitesimal change of f


in terms of the infinitesimal changes of x and y respectively and write it symbolically
as the total differential
∂f ∂f
df = dx + dy,
∂x ∂y

with a corresponding generalization to higher variables.


60 1 Background

There are certain notions in multivariable calculus that only pertain to dimensions
2 or 3, given that these concepts arose in the context of physics. For example, in R3 ,
if f is the vector field ( f 1 , f 2 , f 3 ), we define the vector field curl by

∂ f3 ∂ f2 ∂ f1 ∂ f3 ∂ f2 ∂ f1
curl f = − , − , − .
∂y ∂z ∂z ∂x ∂x ∂y

An easy way to remember this is to make symbolic use of the cross-product notation:
 
 i j k 

∂ ∂ ∂  ∂ f3 ∂ f2 ∂ f1 ∂ f3 ∂ f2 ∂ f1
∇ × f =  = − , − , − .
∂x ∂y ∂z  ∂y ∂z ∂z ∂x ∂x ∂y
 f1 f2 f3 

As noted earlier, the notion has its roots in physics and was introduced to measure
the rotation of a vector field in R3 . Thus, if curl f = 0, we say the vector field f is
irrotational at that point or has no rotation.
A vector field F is called conservative if F = ∇ f for some scalar valued function
f , which is referred to in the literature as a potential function. If F has continuous
first-order partial derivatives, then it is a consequence of Stokes theorem (see below)
that F is conservative if and only if curl F = 0.
It is not hard to see that if F is conservative, then curl F = 0 if F has continuous
second-order partial derivatives. Indeed, if F = ∇ f = ( ∂∂ xf , ∂∂ yf , ∂∂zf ), we have
 
 i j k 

∂ ∂ ∂ 
curl F =  = ( f zy − f yz , f x z − f zx , f yx − f x y ) = 0.
∂x ∂y ∂z 
 fx fy fz 

We record without proof, the following.


Theorem 1.25 Suppose F is a vector field whose component functions have contin-
uous first-order partial derivatives. Then, curl F = 0 if and only if F is conservative.
Generally, given a function f : Rn → Rm we can write

f (x) = ( f 1 (x), . . . , f m (x)),

so that we obtain component functions f i : Rn → R and we can therefore consider


the partial derivatives of each of these component functions. It is now easily seen
that if f is differentiable at a, then f (a) is the m × n matrix of partial derivatives
D j f i (a) which is the Jacobian matrix. If m = 1, we see that the Jacobian matrix is
the row vector ∇ f , the gradient of f .
If f is a scalar field (that is, m = 1), there is a nice relationship between the
directional derivative and the gradient of f . The reader can verify that if u is a vector
of length 1,
1.11 Derivatives of Multivariable Functions 61

f u (x) = ∇ f (x) · u.

Exercises

1. If f : Rn → Rm is differentiable at a show that the derivative λ = D f (a) is


unique. [Hint: Note that for a real number t, λ(th) = tλ(h) by the linearity of
λ.]

2. Show that if f : Rn → Rm is differentiable at a, then it is continuous at a.

3. If f : Rn → Rm is differentiable at a and g : Rm → Rr is differentiable at f (a),


show that D(g ◦ f )(a) = Dg( f (a)) ◦ D( f (a)).

4. If f : Rn → Rm is a constant function, then show that D f (a) = 0 for every


a ∈ Rn .

5. If f : Rn → Rm is a linear transformation, then D f (a) = f .

6. If f, g : Rn → R are differentiable at a, then

D( f · g)(a) = g(a)D f (a) + f (a)Dg(a).

7. Suppose that f : Rn → R is differentiable. If Di, j f and D j,i f are continuous


in an open set containing a then Di, j f (a) = D j,i f (a).

8. Let f (x, y) be continuous and have continuous partial derivatives of order at


least two in a region A of R2 including (x0 , y0 ). A sufficient condition that
(x0 , y0 ) is a relative minimum is that
 
 fx x fx y  ∂2 f
  > 0 and (x0 , y0 ) > 0.
 f yx f yy  ∂x2
(x,y)=(x0 ,y0 )

9. Find the relative maxima and minima of f (x, y) = x 3 + y 3 − 3x − 12y + 20.

10. Let f be a C ∞ function of two variables. Suppose that f (0, 0) = 0 and that
f (ta, tb) = t 2 f (a, b) for all real numbers t and all vectors (a, b). Show that

1
f (x) = (x · ∇)2 f (0, 0).
2
11. A ray of light travels at a constant speed in a uniform medium. In different media
(such as air and water), light travels at different speeds. When light passes from
one medium into another, light is refracted as shown in the figure below. If the
62 1 Background

Fig. 1.5 Snell’s law sin θ1 v1


= .
sin θ2 v2

θ1
P

θ2

speed of light is v1 in medium 1 and v2 in medium 2, then by Fermat’s principle


of least time, light will strike the boundary between medium 1 and medium 2 at
a point P so that the total time is minimized. With the angles θ1 and θ2 shown
in the figure, derive Snell’s law (Fig. 1.5):

sin θ1 v1
= .
sin θ2 v2

1.12 The Inverse Function Theorem

From single-variable calculus, the student is familiar with the following theorem: if
f : R → R is continuously differentiable in an open set containing a and f (a) = 0,
then there is an open set V with a ∈ V and an open set W with f (a) ∈ W such
that f : V → W has a continuous inverse f −1 : W → V which is differentiable.
Moreover, for y ∈ W , we have

( f −1 ) (y) = [ f ( f −1 (y)]−1 .

The reason for this is clear. Indeed, if f (a) > 0, there is an open interval V containing
a such that f (x) > 0 for x ∈ V . Thus, f is increasing on V and is therefore one-to-
one with an inverse function f −1 defined on some open interval W containing f (a).
It is not difficult to show that f −1 is differentiable and that

1
( f −1 ) (y) = .
f ( f −1 (y))
1.12 The Inverse Function Theorem 63

Indeed, as f ( f −1 (y)) = y, we have by the chain rule (differentiating with respect


to y)
f ( f −1 (y))( f −1 ) (y) = 1.

A similar statement can be made if f (a) < 0. There is an analogous theorem for
higher dimensions, and this is the content of the general inverse function theorem.
To this end, we begin with the following lemma.
Lemma 1.3 Let A ⊂ Rn be a rectangle and let f : A → Rn be continuously differ-
entiable. If there is a number M such that |D j f i (x)| ≤ M for all x ∈ Ao , then

| f (x) − f (y)| ≤ n 3/2 M|x − y|,

for all x, y ∈ A.

Proof Suppose first n = 2. Then

f i (y1 , y2 ) − f i (x1 , x2 ) = [ f i (y1 , y2 ) − f i (y1 , x2 )] + [ f i (y1 , x2 ) − f i (x1 , x2 )].

Applying the mean value theorem to each of the bracketed terms on the right-hand
side, we obtain

f i (y1 , y2 ) − f i (y1 , x2 ) = (y2 − x2 )D2 f i (z i2 ),

f i (y1 , x2 ) − f i (x1 , x2 ) = (y1 − x1 )D1 f i (z i1 ),

for some z i j . Thus,



| f i (y1 , y2 ) − f i (x1 , x2 )| ≤ M(|y1 − x1 | + |y2 − x2 |) ≤ 2M|y − x|,

by Exercise 1. Now,


2
| f (y) − f (x)| ≤ | f i (y) − f i (x)| ≤ 23/2 M|y − x|.
i=1

For the general case, we have


n
f i (y) − f i (x) = [ f i (y1 , . . . , y j , x j+1 , . . . , xn ) − f i (y1 , . . . , y j−1 , x j , . . . , xn )].
j=1

Applying the mean value theorem for each component, we get


n

| f i (y) − f i (x)| ≤ M|y j − x j | ≤ n M|y − x|.
j=1
64 1 Background

Again by Exercise 1 below, we see


n
| f (y) − f (x)| ≤ | f i (y) − f i (x)| ≤ n 3/2 M|y − x|.
i=1

Theorem 1.26 (The inverse function theorem) Suppose that

f : Rn → Rn

is continuously differentiable in an open set containing a and det f (a) = 0. Then


there is an open set V containing a and an open set W containing f (a) such that
f : V → W has a continuous inverse f −1 : W → V which is differentiable and for
all y ∈ W satisfies
( f −1 ) (y) = [ f ( f −1 (y))]−1 .

Proof Let λ be the linear transformation D f (a). As det f (a) = 0, λ is non-singular.


Now D(λ−1 ◦ f )(a) = D(λ−1 )( f (a)) ◦ D f (a). As λ−1 is a linear transformation,
D(λ−1 ) = λ−1 = (D f (a))−1 . Thus, D(λ−1 ◦ f )(a) is the identity linear transforma-
tion. If the theorem is true for λ−1 ◦ f , then it is clearly true for f , so we may assume
without any loss of generality that λ is the identity. Thus, whenever f (a + h) = f (a)
we have
| f (a + h) − f (a) − λ(h)|
= 1.
|h|

But as f is differentiable at a with derivative λ, this limit is zero as h tends to zero.


This means we cannot have f (x) = f (a) for x arbitrarily close to, but unequal to,
a. Thus, there is a closed rectangle U containing a in its interior such that

f (x) = f (a) if x ∈ U and x = a. (1.16)

Since f is continuously differentiable in an open set containing a, we can also assume


that
det f (x) = 0 for x ∈ U (1.17)

and
|D j f i (x) − D j f (a)| ≤ 1/2n 3/2 ∀i, j and x ∈ U.

This last condition combined with Lemma 1.3 applied to g(x) = f (x) − x gives for
x1 , x2 ∈ U that
1
| f (x1 ) − x1 − ( f (x2 ) − x2 )| ≤ |x1 − x2 |.
2
Since (by the triangle inequality)
1.12 The Inverse Function Theorem 65

1
|x1 − x2 | − | f (x1 ) − f (x2 )| ≤ | f (x1 ) − x1 − ( f (x2 ) − x2 )| ≤ |x1 − x2 |
2
we deduce
|x1 − x2 | ≤ 2| f (x1 ) − f (x2 )| for x1 , x2 ∈ U. (1.18)

Now f (∂U ) is a compact set which by (1.16), does not contain f (a). Therefore,
there is a number d > 0 such that | f (a) − f (x)| ≥ d for x ∈ ∂U . Let

W = {y : |y − f (a)| < d/2}.

If y ∈ W and x ∈ ∂U , then

|y − f (a)| < |y − f (x)|. (1.19)

We will show that for any y in W there is a unique x in the interior of U such that
f (x) = y. To prove this, consider the function g : U → R given by


n
g(x) = |y − f (x)| = (yi − f i (x))2 .
2

i=1

This function is continuous and therefore has a minimum on U . If x ∈ ∂U , then


by (1.19), we have g(a) < g(x). Therefore, the minimum does not occur on the
boundary of U . Thus, there is a point x in the interior of U such that D j g(x) = 0 for
all j. In other words,


n
2(yi − f i (x))D j f i (x) = 0, for all j.
i=1

By (1.17), the matrix (D j f i (x)) has nonzero determinant for all x ∈ U . Therefore,
yi = f i (x) for all i. In other words, y = f (x). This proves the existence of x. Unique-
ness follows from (1.18). If V = U o ∩ f −1 (W ), we have shown that the function
f : V → W has an inverse f −1 : W → V . We can rewrite (1.18) as

| f −1 (y1 ) − f −1 (y2 )| ≤ 2|y1 − y2 | y1 , y2 ∈ W. (1.20)

This implies that f −1 is continuous. To complete the proof, we need to show that
f −1 is differentiable. Let u = D f (x). We will show that f −1 is differentiable at
y = f (x) with derivative u −1 . By the definition of the derivative,

f (x1 ) = f (x) + u(x1 − x) + r (x1 − x)

where
|r (x1 − x)|
lim = 0. (1.21)
x1 →x |x1 − x|
66 1 Background

Therefore
u −1 ( f (x1 ) − f (x)) = x1 − x + u −1 (r (x1 − x)).

Since every y1 ∈ W is of the form f (x1 ) for some x1 ∈ V , we can rewrite this as

f −1 (y1 ) = f −1 (y) + u −1 (y1 − y) − u −1 (r ( f −1 (y1 ) − f −1 (y))),

so to complete the proof, it suffices to show that the last term is o(|y1 − y|) as y1 → y.
As u −1 is a linear transformation (see Exercise 5 below) , it suffices to show that

(r ( f −1 (y1 ) − f −1 (y))) = o(|y1 − y|).

But this is immediate as f −1 is continuous. Indeed, writing

|r ( f −1 (y1 ) − f −1 (y))| |r ( f −1 (y1 ) − f −1 (y))| | f −1 (y1 ) − f −1 (y)|


= ·
|y1 − y| | f −1 (y1 ) − f −1 (y)| |y1 − y|

we see by (1.20), the second factor is bounded by 2 and the first factor approaches
zero by (1.21) because f −1 (y1 ) → f −1 (y) by the continuity of f −1 . 

Exercises

1. Prove that if x = (x1 , . . . , xn ) ∈ Rn , then


n

|x j | ≤ n||x||.
j=1

2. If T : Rm → Rn is a linear transformation, show that there is a number M such


that
|T (h)| ≤ M|h| ∀ h ∈ Rm .

3. Let f : R → R be differentiable such that f (a) = 0 for all a ∈ R. Show that


f is one-to-one on all of R.

4. Determine if the following functions are locally C 1 -invertible at the given point.
(a) f (x, y) = (x 2 − y 2 , 2x y) at (x, y) = (0, 0);

(b) f (x, y) = (x 3 y + 1, x 2 + y 2 ) at (1, 2);

(c) f (x, y) = (x/(x 2 + y 2 ), y/(x 2 + y 2 ) at (x, y) = (0, 0);

(d) f (x, y) = (x + x 2 + y, x 2 + y 2 ) at (x, y) = (5, 8).


1.12 The Inverse Function Theorem 67

5. Let f : R2 → R2 be given by f (x, y) = (e x cos y, e x sin y). Show that f is


invertible in a neighborhood of each point of R2 .

6. With f as in the previous exercise, show that f has no global inverse. That is,
there is no function g : R2 → R2 such that g( f (x)) = x.

7. Let A ⊂ Rn be an open set and f : A → Rn a continuously differentiable one-


to-one function such that det f (x) = 0 for all x. Show that f (A) is an open set
and f −1 : f (A) → A is differentiable.

8. Consider the function


⎧x
⎨ 2 + x 2 sin 1
x
x =0
f (x) =

0 x =0

to show that the continuity of the derivative cannot be eliminated from the hypoth-
esis of Theorem 1.26.

1.13 The Implicit Function Theorem

The implicit function theorem and the inverse function theorem have a long and
luminous history that the student can find described in [4]. They seem to have roots
in the works of Isaac Newton (1642–1727) and Gottfried Leibniz (1646–1716).
Though Joseph Louis Lagrange (1736–1813) found a version of the theorem which
is essentially the inverse function theorem discussed below, it was Augustin Louis
Cauchy (1789–1857) who studied the theorem with sufficient mathematical rigor
and so is acknowledged today as its discoverer.
We begin with a motivating example. Suppose we are given the function f (x, y) =
x 2 + y 2 − 1. If we choose a, b such that f (a, b) = 0 and a = ±1, then there are open
intervals A containing a and B containing b such that if x ∈ A, there is a unique y ∈ B
with f (x, y) = 0. We can therefore define a function g √ : A → R by the condition
that g(x) ∈ B and f (x, g(x)) = 0. If b > 0, then g(x) = 1 − x 2 (See Fig. 1.6). For
our function, there is another number b1 such that f (a, b1 ) = 0. There will also be
another interval B1 containing b1 such that when
√ x ∈ A, we have f (x, h(x)) = 0 for
a unique h(x) ∈ B1 . In this case, h(x) = − 1 − x 2 . Both g and h are differentiable,
and these functions are said to be defined implicitly by the equation f (x, y) = 0.
For a = ±1, it is impossible to find any such function g defined on an open interval
containing a.
We would like a simple criterion for deciding when such a function can be found
for any general continuously differentiable function f of several variables. This
is supplied by the implicit function theorem. More generally, we ask the follow-
ing. Given f : Rn × R → R and f (a1 , . . . , an , b) = 0, when can we find for each
68 1 Background


b
B

A
( )
a
−1 1

b1

Fig. 1.6 Graph of x 2 + y 2 − 1 = 0

(x1 , . . . , xn ) near (a1 , . . . , an ) a unique y such that f (x1 , . . . , xn , y) = 0? We can


even be a bit more general. Given a set of functions

f i : Rn × Rm → R, 1≤i ≤m

that satisfy
f i (a1 , . . . , an , b1 , . . . , bm ) = 0, 1 ≤ i ≤ m,

when can we find, for each (x1 , . . . , xn ) near (a1 , . . . , an ) a unique (y1 , . . . , ym ) near
(b1 , . . . , bm ) which satisfies

f i (x1 , . . . , xn , y1 , . . . , ym ) = 0, 1 ≤ i ≤ m.

This is the content of the following implicit function theorem which uses the inverse
function theorem in an essential way.
Theorem 1.27 (The implicit function theorem) Suppose f : Rn × Rm → Rm is
continuously differentiable in an open set containing (a, b) with a ∈ Rn and b ∈ Rm
respectively. Suppose further that f (a, b) = 0. Let M be the m × m matrix

(Dn+ j f i (a, b)), 1 ≤ i, j ≤ m.


1.13 The Implicit Function Theorem 69

If det M = 0, then there is an open set A ⊂ Rn containing a and an open set B ⊂


Rm containing b such that for each x ∈ A, there is a unique g(x) ∈ B such that
f (x, g(x)) = 0. Moreover, the function g is differentiable.

Proof Define F : Rn × Rm → Rn × Rm by F(x, y) = (x, f (x, y)). Then det


F (a, b) = det M = 0. By the inverse function theorem, there is an open set W ⊂
Rn × Rm containing F(a, b) = (a, 0) and an open set in Rn × Rm containing (a, b)
which we may take to be of the form A × B such that F : A × B → W has a dif-
ferentiable inverse h : W → A × B. Clearly, h is of the form h(x, y) = (x, k(x, y))
for some differentiable function k (since F is of this form). Let p : Rn × Rm be the
projection map p(x, y) = y. Then p ◦ F = f . Therefore

f (x, k(x, y)) = f ◦ h(x, y) = ( p ◦ F) ◦ h(x, y) = p ◦ (F ◦ h)(x, y) = p(x, y) = y,

since F ◦ h is the identity map. Therefore, f (x, k(x, 0)) = 0. In other words, we can
define g(x) = k(x, 0). This completes the proof. 

Exercises

1. The equations relating Cartesian coordinates and polar coordinates are given by
the familiar
x = r cos θ, y = r sin θ.

Using the inverse function theorem, show that we can solve (locally) for r and
θ uniquely in terms of x and y as long as we are away from the origin.

2. The cylindrical coordinates in R3 are given by

x = r cos θ, y = r sin θ, z = z.

Determine the points of R3 for which we can solve (locally) for r , θ and z in
terms of Cartesian coordinates.

3. Spherical coordinates in R3 are given by (Fig. 1.7)

x = r sin φ cos θ, y = r sin φ sin θ, z = r cos φ.

Determine the points in R3 for which we can solve (locally) for r, φ, θ in terms
of x, y and z.

4. Let y = g(x) be an implicit function satisfying f (x, g(x)) = 0 with both f, g


being continuously differentiable real-valued functions defined on R2 and R
70 1 Background

Fig. 1.7 Spherical z


coordinates

P
φ

θ
y

respectively. Show that

D1 f (x, g(x))
g (x) = − ,
D2 f (x, g(x))

whenever D2 f (x, g(x)) = 0.

5. Generalize the previous exercise to functions of three variables as follows. Sup-


pose that the equation F(x, y, z) = 0 implicitly defines a function z = f (x, y)
where f is differentiable. Show that

∂z D1 F ∂z D2 F
=− , =− ,
∂x D3 F ∂y D3 F

provided D3 F = 0.

1.14 The Lagrange Multiplier Method

The Lagrange multiplier method gives conditions for finding the maxima or minima
of a scalar field subject to a side condition. Suppose we want to find the maximum
or minimum of a function f : Rn → R subject to the side condition g(x) = 0 for
some differentiable g : Rm → R. Let C be any curve given by r : [0, 1] → Rn lying
on the hypersurface defined by g(x) = 0. Thus, if r(t) = (x1 (t), . . . , xn (t)), then
g(r (t)) = 0. Now if f has an extremum at x0 (say), and C passes through x0 , then
setting h(t) := f (r(t)), we see that h also has an extremum at t0 where t0 is such
that r (t0 ) = x0 . Thus, by the chain rule, we deduce that
1.14 The Lagrange Multiplier Method 71

0 = h (t0 ) = ∇ f (r(t0 )) · r (t0 ).

In other words, ∇ f (x0 ) is orthogonal to the tangent vector r (t0 ) for every curve C
lying on g = 0 passing through x0 . But if g(x) = 0, we see that for any u,

0 = Du g = ∇g · u,

which means that ∇g is also orthogonal to r (t0 ). Therefore, ∇ f and ∇g must be


parallel at the extreme point. In other words, there is a λ such that

∇ f (x0 ) = λ∇g(x0 ).

This is the essential idea in the Lagrange multiplier method. It was discovered by
Joseph Louis Lagrange (1736–1813) who made fundamental contributions not only
to analysis but also to number theory and group theory.
We summarize our discussion formally in the following theorem and leave the
proof to the reader.
Theorem 1.28 (Lagrange multiplier method) Let U be an open set of Rn and suppose
that g : U → R is a continuously differentiable function on U . Let S be the set of
points x in U such that g(x) = 0 and ∇ g(x) = 0. Let f : U → R be continuously
differentiable on U and assume that x0 is a point of S such that x0 is an extreme point
for f on S. That is, x0 is an extremum for f subject to the constraint g. Then, there
is a number λ such that
∇ f (x0 ) = λ∇ g(x0 ).

We can consider the situation where there are more constraints. Suppose we are
interested in finding the extrema of a function f : Rn → R subject to the constraints

gi (x) = 0, 1 ≤ i ≤ k,

with k < n. Let S be the hypersurface defined by this system of constraints. If f | S


has an extremum at x0 , with ∇g1 (x0 ), . . . , ∇gk (x0 ) linearly independent, then there
are scalars λ1 , . . . , λk such that

∇ f (x0 ) = λ1 ∇g1 (x0 ) + · · · + λk ∇gk (x0 ).

This is deduced by a reasoning similar to the idea indicated above.


72 1 Background

Exercises

1. Find a formula for the surface area of an open box with length x, width y and
height z. If the volume V is fixed, determine the minimum surface area.

2. The unit hypersphere in Rn centered at the origin is defined by the equation

x12 + x22 + · · · + xn2 = 1.

Find the pair of points x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) on the unit sphere
that maximize and minimize the function


n
f (x1 , . . . , xn , y1 , . . . , yn ) = x j yj.
j=1

What are the maximum and minimum values of f ?

3. Let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) be any points in Rn . Noting that the


vectors u and v with xi yi
ui = , vi = ,
|x| |y|

lie on the unit hypersphere, use the previous exercise to derive the Cauchy–
Schwarz inequality:
|x · y| ≤ |x| |y|.

4. Use the Lagrange multiplier method to show that the distance from the point
(x0 , y0 ) to the line ax + by = d is

|ax0 + by0 − d|
√ .
a 2 + b2

5. Show that the distance from the point (x0 , y0 , z 0 ) ∈ R3 to the plane

ax + by + cz = d

is
|ax0 + by0 + cz 0 − d|
√ .
a 2 + b2 + c2

6. Show that the maximum of

f (x, y, z) = x 2 y 2 z 2
1.14 The Lagrange Multiplier Method 73

subject to the constraint x 2 + y 2 + z 2 = a 2 is a 6 /27.

7. Use the previous question to show that for all x, y, z,

x 2 + y2 + z2
(x 2 y 2 z 2 )1/3 ≤ .
3
8. Prove the arithmetic mean - geometric mean inequality: for any positive num-
bers x1 , x2 , . . . , xn , we have

x1 + x2 + · · · + xn
(x1 x2 · · · xn )1/n ≤ .
n
9. Heron’s formula for the area of triangle whose sides have length x, y, z is given
by 
s(s − a)(s − b)(s − c),

where s is the semiperimeter (i.e., equal to (x + y + z)/2). If the perimeter is


fixed, show that the triangle with the largest area is the equilateral triangle.

10. Let A be the n × n symmetric matrix with real entries ai j so that ai j = a ji . Show
that the global maximum value of the quadratic form


n
ai j xi x j
i, j=1

subject to the constraint

x12 + x22 + · · · + xn2 = 1

is equal to the largest eigenvalue of A. Show that the global minimum value of the
quadratic form subject to the same constraint is equal to the smallest eigenvalue
of A. (Recall that the eigenvalues of a real symmetric matrix are real.)

1.15 Level Sets and Tangent Spaces

Given a function f : Rn → Rm , it is often difficult to visualize the function due to


many dimensions. It is therefore convenient to consider level sets which are defined
as follows. For each a ∈ Rm , the inverse image f −1 (a) is called the level set at a.
Suppose now f is differentiable at a. Then

h(x) = f (a) + f (a)(x − a),


74 1 Background

(where f (a) is the Jacobian matrix D( f ) evaluated at a) is the closest linear approx-
imation to f (x) near a. The level set h −1 ( f (a)) also contains a and we call this the
tangent space at a. One can visualize this tangent space as the vector space of all
tangent vectors to all possible curves on the surface through the point a. Clearly, the
tangent space at a consists of all vectors v such that

f (a) · v = 0.

If C is a curve parameterized by the map t → (x1 (t), . . . , xn (t)) for a ≤ t ≤ b,


then, we see that the line integral along C is
b
f (x1 , . . . , xn )ds := f (x1 (t), . . . , xn (t))(x1 (t)2 + · · · + xn (t)2 )1/2 dt.
C a

In effect, we are integrating along the curve and this can be seen as a generalization
of (1.14). One can integrate with respect to the coordinate parameters as well and
consider generally,
f (x1 , . . . , xn )d xi , 1 ≤ i ≤ n.
C

Thus, if C is parameterized as before, then


b
f (x1 , . . . , xn )d xi = f (x1 (t), . . . , xn (t))xi (t)dt.
C a

Exercises

1. In R3 , quadric surfaces are described by equations of the form

Ax 2 + Bx y + C x z + Dy 2 + E yz + F z 2 + Gx + H y + I z + J = 0

with A, B, C, D, E, F, G, H, I, J being fixed real numbers. An example of a


quadric surface is the ellipsoid given by

x2 y2 z2
2
+ 2 + 2 = 1.
a b c
Show that the volume of this
ellipsoid is 41 π 2 abc.

2. Find equations for the tangent plane to the surface

x 2 yz + 3y 2 − 2x z 2 − 8z = 0
1.15 Level Sets and Tangent Spaces 75

at the point (1, 2, −1).


3. Show that the surface
x 2 − 2yz + y 3 = 4

is perpendicular to any member of the family of surfaces

x 2 + (4a − 2)y 2 − az 2 + 1 = 0

at the point of intersection (1, −1, 2) for any real value a.

1.16 Changing Variables in Integrals

The first application of the Jacobian matrix is the generalization to the multivariable
setting of the change of variable method of one-variable integral calculus. The reader
will recall that if g : [a, b] → R is continuously differentiable, and f : R → R is
continuous, then
g(b) b
f (x)d x = f (g(x))g (x)d x.
g(a) a

If g is one-to-one, then this can be rewritten as

f (x)d x = f (g(x))|g (x)| d x.


g(a,b) (a,b)

The generalization of this formula to higher dimensions involves the Jacobian matrix.
The situation is summarized by the following theorem whose proof can be found in
many books such as [5].
Theorem 1.29 Let A ⊂ Rn be an open set and g : A → Rn a one-to-one, continu-
ously differentiable function such that det g (x) = 0 for all x ∈ A. If f : g(A) → R
is integrable, then

f (x)dx = f (g(x))| det g (x)| dx.


g(A) A

Alternately, this theorem can also be written as follows (which may be more familiar
to the student). For ease of notation, we specialize to the case n = 2. Let R be a region
in the x y-plane which is mapped in a one-to-one and onto fashion to the region R
in the uv-plane via the transformation

x = F(u, v) y = G(u, v).

Then,
76 1 Background

Fig. 1.8 Area of the ×


parallelogram

(b, d)

ad − bc
(a, c)

∂(F, G)
f (x, y)d xd y = f (F(u, v), G(u, v)) dudv. (1.22)
R R ∂(u, v)

The essential idea of the proof is understood by noting how an infinitesimal volume
element changes under a linear transformation. We illustrate this in the case of R2 .
The student will recall that if we have two vectors

a b
v= w= ,
c d

the area of the parallelogram spanned by them is given by the absolute value of the
determinant ad − bc. Indeed, the area is (see (1.13) and Fig. 1.8 below)

|v|2 |w|2 − (v · w)2 .

Computing this directly using vector coordinates leads to the square root of

(a 2 + c2 )(b2 + d 2 ) − (ab + cd)2

which is easily seen to be


(ad − bc).

This is the determinant of the 2 × 2 matrix whose columns are v and w.


With this observation, it is possible to provide the following intuitive proof. Sup-
pose we make the change of variables

x = F(u, v), y = G(u, v).


1.16 Changing Variables in Integrals 77

The following is suggested by the chain rule. The infinitesimal changes x and y
in x and y are given by:

x = Fu u + Fv v, y = G u u + G v v.

In the x y coordinate system, the area of the rectangle xy is transformed in the
uv-coordinate system to the area of the parallelogram spanned by the vectors

Fu Gu
and .
Fv Gv

In our notation, this area is the Jacobian determinant

∂(F, G)
.
∂(u, v)

Thus, if the region R is transformed into R , it is now intuitively clear that (1.22)
holds. A more rigorous proof can be found in [5]. In the case n = 2, we will show
how this is done in the section on Green’s theorem.
A classical application of the above theorem is to the evaluation of the probability
integral

e−x d x.
2
I =
−∞

Indeed, we have
∞ ∞
e−x −y 2
2
I2 = d xd y.
−∞ −∞

The integration is over R2 which we can also parameterize using polar coordinates:

x = r cos θ, y = r sin θ, r > 0, 0 ≤ θ ≤ 2π.

The Jacobian matrix is easily calculated to be

cos θ −r sin θ
,
sin θ r cos θ

so that our double integral becomes


2π ∞
e−r r dr dθ.
2
I2 =
0 0


The inner integral is evidently 1/2 so that I 2 = π . Hence I = π .
Even though the area of a circle can be calculated using elementary calculus, it
is instructive to observe that the above integral I allows us to find a formula for the
78 1 Background

area of a circle of radius r which leads us to a generalization to higher dimensions.


To this end, we can calculate our integral without invoking any Jacobians but rather
appealing to geometric intuition. Indeed,
∞ ∞
e−(x +y 2 )
2
π = I2 = d xd y,
−∞ −∞

and the integral on the right can be viewed as integrating

e−x −y 2
2

over expanding circles of radius r with r ranging from zero to infinity. In this expan-
sion, the infinitesimal change in area as the circle expands is given by 2πr dr . In
other words, we are integrating over the infinitesimal change in the circumference
as r ranges from zero to infinity. Thus,

e−r r dr = π.
2
I 2 = 2π
0

This point of view will be useful in the next section.


There is another way to evaluate the integral using only single variable calculus.
We have ∞ ∞
e−(x +y 2 )
2
I2 = 4 d xd y.
0 0

We change x = t y in the inner integral and interchange the order, which we can do
because of absolute convergence:
∞ ∞
e−y (1+t 2 )
2
I2 = 4 y dy dt.
0 0

The inner integral is easily evaluated and we find:


 

−e−y (1+t )  y=∞ ∞ ! "∞
2 2
dt
I =4
2
 dt = 2 = 2 arctan t 0 = π.
0 2(1 + t 2 ) y=0 0 1+t 2

This idea will be used later when we discuss the beta function and its functional
relation to the gamma function.

Exercises

1. Let a = (a1 , a2 , a3 ), b = (b1 , b2 , b3 ) and c = (c1 , c2 , c3 ) be three indepen-


dent vectors spanning a parallelepiped in R3 . Show that the volume of this
1.16 Changing Variables in Integrals 79

parallelepiped is given by the absolute value of

a · (b × c).

2. With a, b and c as in the previous exercise, show that


 
a1 a2 a3 
 
a · (b × c) = b1 b2 b3  .
c1 c2 c3 

3. With a, b and c as in the previous exercise, show that

a · (b × c) = (a × b) · c.

4. Show that
 2 3
x 2 + y 2 d xd y = πa ,
R 3

where R is the circular region x 2 + y 2 ≤ a 2 .

5. With R being the region in the previous exercise, show that

e−x +y 2
d xd y = π(1 − e−a ).
2 2

6. By making the transformation x + y = u and y = uv, show that

e−1
e y/(x+y) d xd y =
T 2

where T is the region defined by x, y ≥ 0 and x + y ≤ 1.

7. Let R be the region bounded by x + y = 1, x = 0 and y = 0. Under the trans-


formation x − y = u and x + y = v, show that R is transformed into the region
|u| ≤ v ≤ 1 in the uv-plane.

8. With R as in the previous exercise, show that

x−y sin 1
cos d xd y = .
R x+y 2

9. Show that
d xd ydz
= 4π log(a/b),
R (x 2 + y 2 + z 2 )3/2
80 1 Background

where R is the region bounded by the two spheres x 2 + y 2 + z 2 = a 2 and


x 2 + y 2 + z 2 = b2 , with a > b > 0.

10. Let R be the region x 2 + x y + y 2 ≤ 1 in R2 . Show that

2π(e − 1)
e−(x +x y+y 2 )
2
d xd y = √ ,
R 3

by making the transformation

x = u cos a − v sin a, y = u sin a + v cos a,

with an appropriate choice of a so as to eliminate the x y term in the integrand.

11. Let S be the square [0, 1] × [0, 1]. Show that

1 1
(1 − x 2 y 2 )−1 d xd y = 1 + 2
+ 2 + ··· .
S 3 5

By making the transformation

sin u sin v
x= , y= , u, v > 0, u + v ≤ π/2,
cos v cos u

show that the integral equals π 2 /8. (This exercise is due to E. Calabi.)

1.17 Volume and Surface Area of the Hypersphere

We can apply this perspective to calculate the volume and surface areas of the n-
dimensional hypersphere of radius R. The reader will recall that this hypersphere is
the locus of points (x1 , . . . , xn ) in Rn such that

x12 + x22 + · · · + xn2 ≤ R 2 .

Denoting its volume by Vn (R), we see that it is given by the n-dimensional integral

Vn (R) = ··· d x1 · · · d xn .
x12 +···+xn2 ≤R 2

Replacing xi by Rxi , we find


Vn (R) = Cn R n ,
1.17 Volume and Surface Area of the Hypersphere 81

where Cn is the volume of the hypersphere (denoted S n−1 ) of radius 1. If we let


Sn−1 (R) denote the surface area of the n-sphere of radius R, we have
R
Vn (R) = Sn−1 (r )dr,
0

so that by the fundamental theorem of calculus,

d Vn (R)
Sn−1 (R) = = nCn R n−1 .
dR

Thus, the surface area of S n−1 is nCn . We can also perform this integration by
integrating the infinitesimal change in the surface area dωn−1 as r ranges from zero
to infinity so that
nCn = dωn−1 .
S n−1

This idea was alluded to in the previous section. To determine Cn , we use the fol-
lowing trick:
∞ n ∞ ∞
e−x d x e−(x1 +···+xn ) d x1 · · · d xn .
2 2 2
π n/2 = = ···
−∞ −∞ −∞

The n-fold integral on the right can be viewed as an integration over the surface area
element dωn−1 (say) of the hypersphere

x12 + · · · + xn2 = r 2

integrated over the radial component. Thus,



e−r
2
π n/2 = r n−1 dωn−1 dr.
0 S n−1

In other words,

e−r r n−1 dr = π n/2 .
2
nCn
0

The integral on the left-hand side is really a special value of the -function (which
will be studied in more detail in Chap. 4). Defining

(s) := e−t t s−1 dt, s > 0,
0

we find (after an easy change of variable) that


82 1 Background


1
e−r r n−1 dr =
2
(n/2).
0 2

Therefore,
2π n/2
Cn = ,
n(n/2)

so that
2π n/2 R n
Vn (R) = .
(n/2)

We thus have formulas for the volume and surface area of the n-dimensional sphere
of radius R.

1.18 Green’s Theorem

The generalization of the fundamental theorem of calculus to R2 is called Green’s


theorem after George Green (1793–1841) who first discovered it. It seems that Green
was largely self-educated and the son of a miller. He received very little formal
education and spent most of his time working in his father’s bakery and teaching
himself mathematics. In 1828, he published an essay on electricity and magnetism in
which he proved the theorem now named after him. It is also here that he introduced
the term “potential function” as well as other functions which we now call Green’s
functions used in the study of partial differential equations. He was finally admitted to
Cambridge University at the age of 40 and managed to publish several papers before
his untimely death. His original essay was largely overlooked and many of its new
contributions were rediscovered in 1846 by Sir William Thomson (1824–1907) also
known as Lord Kelvin. It was again rediscovered in Russia by Mikhail Ostrogradski
(1801–1861) where Green’s theorem is called Ostrogradski’s theorem to this day.
The first formal statement of the theorem appears in a postscript to a letter to Stokes
written by Kelvin on July 2, 1850. This letter forms the cover page of Spivak’s little
book, Calculus on Manifolds [5]. We now understand that Green’s theorem is a
special case of Stokes’ theorem. The latter along with Gauss’s divergence theorem
are succinctly formulated using the calculus of differential forms.
Katz [6] writes that the theorems attributed to Green, Gauss and Stokes were
actually discovered implicitly much earlier. Special cases appear in the works of
Lagrange, Laplace, Cauchy and Riemann. All of them were interested in higher-
dimensional versions of the fundamental theorem of calculus to tackle specific prob-
lems motivated by physics. They range from the study of heat, electricity, magnetism,
elasticity and fluid dynamics. But since most of these papers were quite long, the
theoretical aspects that now can be identified as special cases of Stokes theorem were
buried in these earlier works. We refer the reader to [6] for this enchanting history.
1.18 Green’s Theorem 83

Fig. 1.9 Region R


C = C1 ∪ C2

C2 : y = g2 (x)
<

>
C1 : y = g1 (x)
a b

To state and prove Green’s theorem, we need first the notion of a simple closed
curve. A curve x : [a, b] → Rn is called closed if x(a) = x(b). It is called a simple
closed curve if it does not intersect itself except at the endpoints.
Theorem 1.30 (Green’s theorem) Let C be a piecewise smooth, simple closed curve
in the plane with positive orientation and let R be the region enclosed by C together
with C. Suppose that M(x, y) and N (x, y) are continuous and have continuous
first-order partial derivatives in some open region D that contains R. Then,

∂N ∂M
M(x, y)d x + N (x, y)dy = − d A.
C R ∂x ∂y

Proof (Sketch) We will assume that there are two differentiable functions g1 (x) and
g2 (x) such that the region R can be described as

R = {(x, y) : a ≤ x ≤ b and g1 (x) ≤ y ≤ g2 (x)}

as illustrated in Fig. 1.9.


We can divide our contour C into two oriented parts C1 and C2 as indicated in the
figure where C1 is the bottom portion and C2 is the top portion of the curve defined
by y = g1 (x) and y = g2 (x) respectively. Thus,

M(x, y)d x = M(x, y)d x + M(x, y)d x.


C C1 C2

Taking into account the orientation of C2 , we find


b
M(x, y)d x = [M(x, g1 (x)) − M(x, g2 (x))]d x.
C a

On the other hand, let us note that


84 1 Background

Fig. 1.10 Region R


C = C1 ∪ C2

d
C2 : x = h2 (y) <

>
c C1 : x = h1 (y)

g2 (x)
∂M b
∂M
dA = d yd x
R ∂y a g1 (x) ∂y

and by the fundamental theorem of calculus, we get

∂M b  y=g2 (x) b

dA = M(x, y) dx = [M(x, g2 (x)) − M(x, g1 (x))]d x.
R ∂y a y=g1 (x) a

Thus,
∂M
M(x, y)d x = − d A.
C R ∂y

In a similar way, we can write our region R as

R = {(x, y) : c ≤ y ≤ d and h 1 (y) ≤ x ≤ h 2 (y)}.

Proceeding as before, we find

∂N
N (x, y)dy = d A.
C R ∂x

Putting everything together gives Green’s theorem. To convey the essential idea of
the proof, we had assumed that our contour was of the form indicated in the figures.
However, it is not difficult to modify our proof to consider the more general case.
We discuss the necessary modifications below. 

One important corollary of Green’s theorem is the following formula for comput-
ing the area of R:
Corollary 1.2 Suppose that C is a piecewise smooth, simple closed curve enclosing
the region R with area A. Then

2A = xdy − yd x.
C
1.18 Green’s Theorem 85

Proof We take M(x, y) = −y and N (x, y) = x in Green’s theorem to deduce the


result. 

To illustrate the use of the corollary, we show that the area of an ellipse defined
by
x2 y2
+ = 1,
a2 b2
is πab. The ellipse is a simple closed curve which can be defined parametrically by

C = {(x, y) : x = a cos t, y = b sin t, 0 ≤ t ≤ 2π }.

Our line integral becomes



1
xdy − yd x = [(a cos t)(b cos t) − (b sin t)(−a sin t)]dt = πab.
C 2 0

As mentioned earlier, in our sketch of the proof of Green’s theorem, we assumed a


simple shape for our region so that we can execute the main idea of the proof quickly.
However, our proof admits no loss of generality in that any piecewise smooth, simple,
closed curve admits a decomposition into regions of the type we considered and one
invokes the theorem for each piece and adds up the results. We illustrate the idea
with the figure below.
The cycle C in the figure can then be decomposed into two oriented cycles C1
and C2 which share the line segment traversed indicated in Fig. 1.10, in opposite
directions, and thus the contribution along this is cancelled. In other words, we can
derive Green’s theorem for more general regions by applying this idea. We will not
go into any further details (Fig. 1.11).
As an application of Green’s theorem, we derive the change of variables integration
formula in the case n = 2. We will use Green’s theorem and Corollary 1.2. Indeed by
the usual subdivision of the region of integration on the right-hand side into rectangles
R, it suffices to prove the theorem for rectangles R and f being a constant function.
That is, we need to show

Fig. 1.11 A typical


region R C = C1 ∪ C2

<

>
86 1 Background

| det g (x, y)|d xd y = d xd y.


R g(R)

Letting S denote the region g(R) and C its boundary, we have by Corollary 1.2,

2 d xd y = x dy − y d x.
S C

We change variables to evaluate the integral on the right-hand side: x = F(u, v), y =
G(u, v). Then,

∂G ∂G ∂F ∂F
dy = du + dv, dx = du + dv.
∂u ∂v ∂u ∂v

Denoting by C the image of C under this transformation, we have that our integral
is
∂G ∂F ∂G ∂F
F −G du + F −G dv.
C ∂u ∂u ∂v ∂v

If we denote by R the region bounded by C in the uv-plane, we have by Green’s


theorem that this integral is
∂ ∂G ∂F ∂ ∂G ∂F 
F −G − F −G dudv.
R ∂u ∂v ∂v ∂v ∂u ∂u

This easily simplifies to


 ∂(x, y) 
 
 dudv.
R ∂(u, v)

Exercises

1. Evaluate
y 2 d x + x 2 dy,
C

where C is the closed path formed by the square with vertices (0, 0), (1, 0), (1, 1)
and (1, 0) oriented counterclockwise.
2. Given an n-sided polygon in the plane whose vertices are

(a1 , b1 ), (a2 , b2 ), . . . , (an , bn )

arranged counterclockwise around the polygon, show that the area inside the
polygon is given by
1.18 Green’s Theorem 87

1 a1 b1  a2 b2    
an−1 bn−1  an bn 

 +  + ··· +  +  .
2 a2 b2 a3 b3 an bn a1 b1

[Hint: parameterize each of the line segments of the polygon.]

3. If R is a region to which Green’s theorem applies and C is its oriented boundary


so that R is always on the left as we travel along C, show that the area of R is
given by either of the two integrals:

xdy = − yd x.
C C

1.19 Theorems of Gauss and Stokes

There are several standard higher-dimensional versions of Green’s theorem that the
student has undoubtedly seen in a second course in calculus. We recall these theorems
(without proofs below).
The first is the divergence theorem. Recall that a smooth surface S is said to
be orientable if, to each point on S, we can assign a unit normal vector in such
a way that the resulting vector field of normals is continuous everywhere. Such a
surface then is said to be oriented. Spheres and ellipsoids are common examples of
orientable surfaces. The Möbius strip, obtained by taking a strip of paper twisting it
around and then joining the ends, is not orientable. The following theorem is usually
attributed to Gauss.
Theorem 1.31 (The divergence theorem) Let f = ( f 1 , f 2 , f 3 ) be a vector field
defined and continuously differentiable in a domain D in R3 . Let R be a region
in D that is bounded by a piecewise smooth, closed orientable surface S. Then,

f · n dσ = ∇ · f d V,
S R

where S is oriented so that at each point on S, n is the outwardly directed unit normal.
Given a continuously differentiable vector field f = ( f 1 , f 2 , f 3 ) in R3 , the function

∂ f1 ∂ f2 ∂ f3
+ +
∂x ∂y ∂z

is called the divergence of f and denoted div f. It is also symbolically denoted by


∇ · f as in the theorem.
The divergence theorem is not difficult to prove once appropriate notation is
introduced. It is relatively simple to give a proof along the lines that we gave for
Green’s theorem keeping in mind that we are now working in three dimensions.
Without going into too much detail, here is the idea.
88 1 Background

We can rewrite the equation to be proved as:

∂ f1 ∂ f2 ∂ f3
f 1 dydz + f 2 dzd x + f 3 d xd y = + + d V.
S R ∂x ∂y ∂z

For simplicity, we first assume that any line parallel to the coordinate axes cuts S in
at most two points. The reader may find it convenient to visualize our surface as that
of an oblate football. We can then refer to the “lower” portion and “upper” portion
of the surface when we view it from either the x y-plane, or yz-plane or the x z-plane.
Let us first look at the x y-plane. We can write z = g1 (x, y) and z = g2 (x, y) for
the upper and lower portions of S denoted S2 and S1 respectively. If we denote the
projection of S onto the x y-plane as R1 , then noting that d V = d xd ydz,
g2 (x,y)
∂ f3 ∂ f3
dV = dz d y d x,
R ∂z R1 g1 (x,y) ∂z

and we immediately see that it is equal to

[ f 3 (x, y, g2 ) − f 3 (x, y, g1 )]d y d x.


R1

If n2 is the normal to S2 , it is easily verified that d yd x = k · n2 d S2 . Similarly, if n1


is the unit normal to the lower portion S1 , then d yd x = −k · n1 d S1 since the normal
vector n1 makes an obtuse angle with k. Combining these observations yields the
equations
f 3 (x, y, g2 ) = f 3 k · n2 d S2 ,
R1 S2

f 3 (x, y, g1 ) = − f 3 k · n1 d S2 .
R1 S2

Therefore

[ f 3 (x, y, g2 ) − f 3 (x, y, g1 )]d y d x = f 3 k · n d S.


R1 S

In other words, putting everything together, we have

∂ f3
dV = f 3 k · n d S.
R ∂z S

A similar analysis projecting the surface onto the other two coordinate planes gives

∂ f1
dV = f 1 i · n d S.
R ∂x S
1.19 Theorems of Gauss and Stokes 89

∂ f2
dV = f 2 j · n d S.
R ∂z S

Adding up both sides of our equations now gives the divergence theorem.
Suitable modifications can be made now to consider the more difficult case when
lines parallel to the coordinate axes meet the surface in more than two points. As in
the case of our proof of Green’s theorem, we subdivide the region into subregions
that satisfy our conditions above and then add up the final result, keeping in mind
that we have oriented surfaces and that cancellations take place in the appropriate
way.
Perhaps an important theme in both Green’s theorem and Gauss’s divergence
theorem is how the notion of oriented surface and boundary impose themselves in
the course of the proof. It is this essential ingredient that is formalized in the theory
of differential forms.
One interesting application of the divergence theorem is a simple formula for the
volume a region R bounded a piecewise smooth closed oriented surface S. If n is the
unit normal function at each point of S, then

1
Volume of R = r · n dσ
3 S

where r is the position vector (x, y, z). This formula is analogous to the one we
obtained earlier using Green’s theorem for the area enclosed by a simple closed
curve. Since r = (x, y, z), we see that ∇ · r = 3 and the result is immediate from
the divergence theorem.
As highlighted before, much of the development of multivariable calculus was
motivated by questions arising in physics. For instance, in fluid mechanics, the diver-
gence of a vector field F at a point (x, y, z) corresponds to the net flow of fluid out
of a small box centered at (x, y, z). The gradient of F has the following meaning. If
div F(x, y, z) = ∇ · F(x, y, z) > 0, there is more fluid going out of the box than into
the box in which we call (x, y, z) a source. If div F(x, y, z) = ∇ · F(x, y, z) < 0,
there is more fluid entering into the box than that which goes out of the box, in which
case we call (x, y, z) a sink. If ∇ · F(x, y, z) = 0 in a region D, we say that the
vector field is incompressible or source-free in D.
The divergence of a vector field is then a measure of the “flow” of the vector field.
For example, for the vector field F(x, y) = (x, y), the divergence is 2 indicating that
the flow is outward. For the vector field F(x, y) = (y, −x), the divergence is zero
so that the net flow is zero.
The third theorem in the “holy trinity” of theorems from vector calculus is Stokes
theorem.
Theorem 1.32 (Stokes theorem) Let S be an oriented surface parameterized by
a one-to-one parameterization r : D ⊆ R2 → R3 , where D is a region to which
Green’s theorem applies. Let ∂ S be the positively oriented piecewise smooth bound-
ary of S. Then
90 1 Background

F · ds = curl F · dS,
∂S S

for any C 1 -vector field F on S.


We will not give the proof here but only indicate that it follows similar lines
of argument as in Green’s theorem and our sketch of the divergence theorem. An
important consequence of the Stokes theorem is:
Theorem 1.33 Let F be a vector field in R3 whose component functions have con-
tinuous first-order partial derivatives in all of R3 . Then, the following are equivalent:
(1) F is conservative.

(2) C F · dr is independent of the path for any piecewise smooth curve C in R3 .

(3) C F · dr = 0 for any closed piecewise smooth curve C.

(4) ∇ × F = 0.

(5) F is a gradient field; that is, F = ∇ f for some potential function f .


As remarked earlier, vector calculus developed as a result of the study of elec-
tromagnetism. In this context, it may help the students in their understanding if we
explain briefly, the celebrated Maxwell equations.
If E denotes the electric field and B is the magnetic field for electromagnetic waves
propagating through a vacuum, Maxwell’s laws are (with c denoting the speed of
light):
∂B
div E = 0 curl E = − ,
∂t
1 ∂E
div B = 0 curl B =
c2 ∂t

Exercises

1. Let C(t) = (x(t), y(t)) for a ≤ t ≤ b be a curve with x(t), y(t) being continu-
ously differentiable. The right normal vector at t is defined as

dy dx
N (t) = ,− .
dt dt

Show that N (t) is perpendicular to the curve.


1.19 Theorems of Gauss and Stokes 91

2. Let A be a region in R2 which is the interior of a simple closed curve oriented


counterclockwise. Let F = ( p, q) be a vector field on A. Let G be the vector field
G = (−q, p). Let C be the curve in the previous exercise.
(a) By applying Green’s theorem to G, show that

∂ p ∂q
(−qd x + pdy) = + d yd x.
C A ∂x ∂y

(b) Show that the left side of the equation in (a) is


b
F · N (t)dt,
a

and thus deduce the divergence theorem.

3. Let F(x, y) = (y, −x). Let C be the circle of radius 1 oriented counter clockwise.
Show that
F · n ds = 0.
C

4. Recall that a region is called simply connected if every closed path can be contin-
uously shrunk to a point. Let P(x, y) and Q(x, y) be continuous and continuous
first-order partial derivatives at each point of a simply connected region R. Show
that a necessary and sufficient condition that

Pd x + Qdy = 0
C

around every closed path C in R is that

∂P ∂Q
=
∂y ∂x

at every point of R.

5. With notation as in the previous exercise, we say Pd x + Qdy is an exact differ-


ential if there is a differentiable function φ(x, y) such that

dφ = Pd x + Qdy.

Prove that a necessary and sufficient condition that Pd x + Qdy is an exact dif-
ferential is that
∂P ∂Q
= .
∂y ∂x
92 1 Background

1.20 Differential Forms

The fundamental theorem of calculus generalizes to higher dimensions and this is


again often called Stokes theorem. It is neatly stated using the language of differen-
tial forms. We give a cursory overview of the formalism to suggest to the student that
it is profitable to study differential topology in greater detail as a unifying theory of
advanced calculus to the setting of manifolds. An excellent reference is the concise
book by Spivak [5]. The student may also find [7–9] helpful.
To motivate our discussion toward a “calculus of differentials”, let us look at the
following “example”. Consider the double integral

f (x, y)d xd y,

and let us effect the change of variables

x = x(u, v), y = y(u, v)

so that  ∂x 
 ∂x 
 ∂u ∂v 
d xd y =   dudv.

 ∂y ∂y 
∂u ∂v

If we formally set x = y, we see that the determinant is zero and so this sug-
gests d xd x = 0. Interchanging x and y changes the sign of the determinant so that
d xd y = −d yd x. These observations suggest a search for a suitable “algebra” of dif-
ferential forms that satisfy these two properties. Such an algebra leads to an elegant
synthesis of Green’s theorem and Stokes theorem and in a single formula leads to
grand generalization to higher dimensions, which can be seen as the ultimate fun-
damental theorem of calculus. Below, we give a lightning overview of the theory of
differential forms. The student can find a detailed treatment in [5].
Let V be a vector space over R and denote by V k the k-fold product

V × · · · × V.

A function T : V k → R is called a multilinear map (or sometimes a k-linear map) if


it is linear in each of the arguments. Such multilinear maps are also called k-tensors.
We will denote by Tk (V ) the set of all k-tensors on V . This set is easily seen to
be a vector space over R. Given S ∈ Tr (V ) and T ∈ Ts (V ), we define the tensor
product S ⊗ T ∈ Tr +s (V ) by

S ⊗ T (v1 , . . . , vr , vr +1 , . . . , vr +s ) := S(v1 , . . . , vr )T (vr +1 , . . . , vr +s ).

Note that the order of the factors S and T matters here so the operation ⊗ is not
commutative. The reader can easily verify the following properties:
1.20 Differential Forms 93

(S1 + S2 ) ⊗ T = S1 ⊗ T + S2 ⊗ T,

S ⊗ (T1 + T2 ) = S ⊗ T1 + S ⊗ T2 ,

(aS) ⊗ T = S ⊗ (aT ) = a(S ⊗ T ), a ∈ R

(S ⊗ T ) ⊗ U = S ⊗ (T ⊗ U ).

The last property allows us to write unambiguously S ⊗ T ⊗ U and a similar prop-


erty applies to higher-order products. The space T1 (V ) is simply the dual space of
functionals. The tensor product allows us to describe the space of k-tensors simi-
larly. Indeed, if V is n-dimensional over R, with basis v1 , . . . , vn and corresponding
dual basis φ1 , . . . , φn so that φi (v j ) = δi j (where δi j is the usual Kronecker delta
function, equal to 1 if i = j and zero otherwise), then the set of k-fold tensors

φi1 ⊗ · · · ⊗ φik , 1 ≤ i 1 , . . . , i k ≤ n,

is a basis for Tk (V ) and consequently, Tk (V ) has dimension n k .


The familiar dot product of two vectors is an example of a 2-tensor. In fact, one
defines generally an inner product on a vector space V over R to be a 2-tensor which
is symmetric, that is, T (v, w) = T (w, v), and positive definite, that is T (v, v) > 0
for all vectors v = 0.
Another example of a tensor perhaps familiar to readers is the determinant func-
tion. This is really an n-tensor on the vector space Rn . To explain this connection,
we define a k-tensor to be alternating if

T (v1 , . . . , vi , . . . , v j , . . . , vk ) = −T (v1 , . . . , v j , . . . , vi , . . . , vk ) ∀ v1 , . . . , vk ∈ V

where we have interchanged vi and vi on the right-hand side and left all other v’s
fixed in their position. In other words, the sign of the tensor changes if we apply
a transposition to the order of the arguments, a property shared by the determinant
function. The set of alternating tensors is denoted k (V ). The reader will recall
that for the permutation group Sk on k letters, the sgn function assigns +1 if the
permutation is even and −1 if it is odd. This function allows us to construct an
alternating tensor from any tensor as follows. Given T ∈ Tk (V ), we define

1 
Alt (T )(v1 , . . . , vk ) = (sgn σ )T (vσ (1) , . . . , vσ (k) ).
k! σ ∈S
k

We leave the verification that this is indeed an alternating tensor to the student who
we assume is familiar with basic linear algebra and has already demonstrated this
property for the determinant function. The proof here is similar.
94 1 Background

If T is alternating, then Alt (T ) = T . Just as we defined the tensor product of two


tensors, we can define the wedge product, denoted ω ∧ η, of two alternating tensors
ω ∈ r (V ) and η ∈ s (V ) as

(r + s)!
ω∧η = Alt (ω ⊗ η).
r !s!
The reason for the coefficient will be apparent later. The wedge product has the
usual properties of distributivity and associativity. It has the important property that
ω ∧ η = (−1)r s η ∧ ω. In particular, if r is odd, ω ∧ ω = 0. If r or s is even, then
ω ∧ η = η ∧ ω. If r and s are both odd, then ω ∧ η = −η ∧ ω.
We can construct a basis for k (V ) using the wedge product. In fact, the set of
all
φi1 ∧ · · · ∧ φik , 1 ≤ i1 < i2 < · · · < ik ≤ n

is a basis for k (V ) with V being an n-dimensional vector space over R. It therefore


has dimension
n
.
k

Thus, if V has dimension n, the space n (V ) is one-dimensional and spanned by


the determinant function. Therefore, it is not difficult to deduce that if v1 , . . . , vn is
a basis for V and w1 , . . . , wn are n vectors such that


n
wi = ai, j v j ,
j=1

then for η ∈ n (V ), we have

η(w1 , . . . , wn ) = det(ai j )η(v1 , . . . , vn ). (1.23)

If v1 , . . . , vn and w1 , . . . , wn are two bases of V , the change of basis matrix A =


(ai j ) is non-singular and consequently, det A is either positive or negative. This allows
us to define an equivalence relation on the set of bases of V . Two bases are in the
same class if and only if their change of basis matrix has positive determinant. It is
evident that this partitions the bases into two groups. Indeed, if v1 , . . . , vn is a basis,
all other bases are obtained by applying a non-singular matrix to this basis and the
sign of the determinant determines the class. Either of these two groups is called an
orientation for V . Thus, an orientation is an equivalence class of bases and we denote
the orientation to which the basis v1 , . . . , vn belongs to by [v1 , . . . , vn ]. If e1 , . . . , en
is the standard basis for Rn , we define the usual orientation as [e1 , . . . , en ].
Equation 1.23 also tells us that we can partition n-tensors of an n-dimensional
vector space V into two groups using the following idea. Suppose we have an inner
product T on V and that v1 , . . . , vn and w1 , . . . , w n are two bases of V which are
orthonormal with respect to a 2-tensor T and wi = nj=1 ai j v j . Then,
1.20 Differential Forms 95


n
δi j = T (wi , w j ) = aik a j T (vk , v )
k,=1


n
= aik a jk .
k=1

In other words, if At is the transpose of A, the above equation says A At = I so that


det A is ±1.
From (1.23), we deduce that if ω ∈ n (V ) satisfies ω(v1 , . . . , vn ) = ±1 then
ω(w1 , . . . , wn ) = ±1. Thus, if an orientation for V has been given, there is a unique
ω ∈ n (V ) such that ω(v1 , . . . , vn ) = 1 whenever v1 , . . . , vn is an orthonormal basis
lying in the given orientation. This unique ω is called the volume element of V
determined by the orientation and the inner product T . Thus, det is the volume
element determined by the usual dot product on Rn and the usual orientation given
by the standard basis.
The reader may already be familiar with vector-valued functions. As discussed
earlier, these are sometimes called vector fields. It is now convenient to have a
slightly refined version of this definition.
For a fixed vector p ∈ Rn , we consider the set of all pairs ( p, v) with v ∈ Rn and
turn this set into a vector space over R in the obvious way:

( p, v) + ( p, w) = ( p, v + w), a · ( p, v) = ( p, av)

for a ∈ R. Essentially, the vector p can be thought of as a “book-keeping” device. We


call this the tangent space of Rn at p and denote it by T p (Rn ). Clearly, an analogous
construction can be made for any vector space V over R. We can endow T p (Rn ) with
an inner product in the obvious way as well as the usual orientation.
A vector field is a function F of Rn such that F( p) ∈ T p (Rn ). If

(e1 ) p , . . . , (en ) p

is the standard basis of T p (Rn ), our function can then be written as

F( p) = F1 ( p)(e1 ) p + · · · + Fn ( p)(en ) p .

We thus obtain component functions Fi : Rn → R. The vector field is called contin-


uous, differentiable, and so on according as the component functions are continuous,
differentiable and so on. Of course, these definitions can be made for any function
F defined on an open set of Rn .
We define the divergence of F, denoted div F as


n
Di Fi ,
i=1
96 1 Background

where Di denotes differentiation with respect to the ith variable.


The set of all vector fields on Rn can be made into a vector space in the obvious
way: Given two vector fields F, G, we define F + G to be the vector field that assigns
to p the vector
F( p) + G( p)

If we introduce the symbolism


n
∇= Di · ei ,
i=1

then we can write symbolically, div F = ∇, F.


These definitions are motivated by what can be described as the algebraic view of
vector calculus. We have already seen that the derivative is a linear transformation
between two vector spaces. In order to speak about “higher derivatives”, we would
need to topologize these vector spaces of linear transformations. On the other hand,
the concept of partial derivatives and component functions allows us to descend back
into familiar Euclidean spaces.
It is convenient to consider alternating k-tensors on T p (Rn ). These are called
differential k-forms or simply k-forms. In a way, differential forms generalize the
notion of functions. Indeed, 0-forms are simply functions f : Rn → R. The wedge
product allows us to speak about the algebra of differential forms in the following
sense.
If f : Rn → R is differentiable, then the total differential (that we met earlier in
Sect. 1.11):
d f := D1 f · d x1 + · · · + Dn f · d xn

is a 1-form. Thus, the map f → d f changes 0-forms into 1-forms. This observation
allows us to construct an explicit basis for the vector space of k-forms, namely,

d xi1 ∧ · · · ∧ d xik , i1 < i2 < · · · < ik

is a basis for k (Rn ). Every k-form ω can be written as a linear combination



ωi1 ,...,ik d xi1 ∧ · · · ∧ d xik
i 1 <···<i k

and we say ω is C r if the functions ωi1 ,...,ik are all C r . If ω is a differentiable k-form,
then we define dω as

dω := dωi1 ,...,ik ∧ d xi1 ∧ · · · ∧ d xik
i 1 <···<i k
1.20 Differential Forms 97

which is a (k + 1)-form. This is the algebra of differential forms that we have been
looking for. The d-operator is called the exterior derivative operator.
On the geometric side, we define a singular n-cube in A ⊂ Rn to be a contin-
uous function c : [0, 1]n → A. The standard n-cube is In : [0, 1]n → Rn given by
In (x) = x for x ∈ [0, 1]n . We can consider the Z-module generated by the singular
n-cubes. These are simply finite formal sums of the form

n jcj,
j

with n j ∈ Z and c j ’s are singular n-cubes. Such sums are called n-chains. In partic-
ular, any singular n-cube is also an n-chain.
For each singular n-chain c in A, we will define an (n − 1)-chain in A called the
boundary of c and denoted ∂c. For example, the boundary of I2 may be defined as
the sum of four singular 1-cubes arranged counterclockwise around the boundary of
[0, 1]2 as indicated in the figure below.
However, it will be convenient to define it as the sum of four singular 1-cubes
with the indicated coefficients:
With this as motivation, we can define generally the boundary of In as follows.
For each i with 1 ≤ i ≤ n, we define two singular (n − 1)-cubes In(i,0) and In(i,1) as
follows. If x ∈ [0, 1]n−1 , then

In(i,0) (x) = (x1 , . . . , xi−1 , 0, xi , . . . , xn−1 )

and
In(i,1) (x) = (x1 , . . . , xi−1 , 1, xi , . . . , xn−1 ).

We call In(i,0) the (i, 0)-face of In and In(i,1) the (i, 1)-face. Thus, in the case n = 1,
I1(1,0) = (0, 0) and I1(1,1) = (0, 1) are simply two points. In the case n = 2 illustrated
in Figs. 1.12 and 1.13, they are given as indicated in the figure below (Fig. 1.14).
Thus, the boundary of I2 is seen to be the formal sum

Fig. 1.12 Boundary of I2

(−1, 0) (0, 0) (1, 0)


98 1 Background

Fig. 1.13 Signed boundary


of I2

−1

−1 +1
+1

(−1, 0) (0, 0) (1, 0)

Fig. 1.14 Faces of I2

(2,1)
I2

(1,0) (1,1)
I2 I2
(2,0)
I2

(−1, 0) (0, 0) (1, 0)

I2(2,0) − I2(2,1) + I2(1,1) − I2(1,0) .

This partially motivates our definition of the boundary of In :


n 
∂ In := (−1)i+a In(i,a) .
i=1 a=0,1

For a general singular n-cube c : [0, 1]n → A ⊂ Rn , we first define

c(i,a) = c ◦ (In(i,a) )

and then define the boundary of c as the alternating sum


n 
∂c = (−1)i+a c(i,a) .
i=1 a=0,1


We then define the boundary of an n-chain i n i ci as
 
 
∂ n i ci = n i ∂ci .
i i
1.20 Differential Forms 99

It is somewhat tedious (but not difficult) to verify that ∂ 2 = 0.


This language and formalism allows for an elegant reformulation of Stokes the-
orem:
Theorem 1.34 (Stokes theorem) If ω is a (k − 1)-form on an open set A ⊂ Rn and
c is a k-chain in A, then
dω = ω.
c ∂c

This is the higher-dimensional generalization of the fundamental theorem of calculus.


It has a synthetic generalization to smooth manifolds with boundary as explained in
the concise book by Spivak [5].

Exercises

1. Let S ∈ Tr (V ) and T ∈ Ts (V ). Suppose Alt (S) = 0. Show that

Alt (S ⊗ T ) = Alt (T ⊗ S) = 0.

2. Prove that if ω ∈ Tr (V ), η ∈ Ts (V ) and θ ∈ Tt (V ), then

(r + s + t)!
(ω ∧ η) ∧ θ = (ω ∧ (η ∧ θ ) = Alt (ω ⊗ η ⊗ θ ).
r !s!t!

3. If v1 , v2 , . . . , vn−1 ∈ Rn , define φ : Rn → R by
⎛ ⎞
v1
⎜ .. ⎟
⎜ ⎟
φ(w) = det ⎜ . ⎟ .
⎝vn−1 ⎠
w

Show that there is a unique z ∈ Rn such that

φ(w) = (w, z).

We call z the cross-product of v1 , . . . , vn−1 . This generalizes the cross-product


to higher dimensions.

4. If ω and η are k-forms, show that d(ω + η) = dω + dη.

5. If ω is a k-form and η is an -form, prove that

d(ω ∧ η) = dω ∧ η + (−1)k ω ∧ dη.


100 1 Background

6. If ω is a k-form, show that d(dω) = 0.

7. Let M(x, y) and N (x, y) be C 1 . Show that

∂N ∂M
d(Md x + N dy) = − d xd y.
∂x ∂y

Deduce Green’s theorem from Theorem 1.34.

8. If F = (F1 , F2 , F3 ) is a vector field in R3 , let

ω = F1 dy ∧ dz + F2 dz ∧ d x + F3 d x ∧ dy.

Show that dω = (div F)d x d y dz. Deduce the divergence theorem from Theo-
rem 1.34.

9. If F = (F1 , F2 , F3 ) is a vector field in R3 , let

η = F1 d x + F2 dy + F3 dz.

Show that

dη = (D2 F3 − D3 F2 )dy ∧ dz + (D3 F1 − D1 F3 )dz ∧ d x + (D1 F2 − D2 F1 )d x ∧ dy.

This observation can be used to give a proof of Theorem 1.32 using Theorem 1.34.

10. A k-form ω is called exact if either ω = 0 or ω = dη for some form η. A k-form


ω is called closed if dω = 0. Show that an exact form is closed.

11. Consider the 1-form in R2 \(0, 0):

−y x
ω= dx + 2 dy.
x 2 + y2 x + y2

Show that ω is closed but not exact. [Thus, the converse of the previous exercise
does not hold. This is because the domain is not simply connected. In a simply
connected domain, every closed 1-form is exact.]

References

1. M. Ram Murty, B. Fodden, Hilbert’s Tenth Problem, An introduction to logic, number theory and
computing, Student mathematical library, vol. 88 (American Mathematical Society, Providence,
Rhode Island, 2019)
References 101

2. K. Plofker, Mathematics in India (Princeton University Press, Princeton, 2009)


3. C. Boyer, A History of Mathematics, 2nd edn. revised by U.C. Merzbach (Wiley, New York,
1991)
4. S.G. Krantz, H.R. Parks, The Implicit Function Theorem: History, Theory and Applications
(Birkhäuser, Boston, 2002)
5. M. Spivak, Calculus on Manifolds (Benjamin, 1965)
6. V.J. Katz, The history of Stokes theorem. Math. Mag. 52, 146–156 (1979)
7. R. Bott, L. Tu, Differential Forms and Algebraic Topology (Springer, 1982)
8. I. Madsen, J. Tornehave, From Calculus to Cohomology: De Rham Cohomology and Charac-
teristic Classes (Cambridge University Press, 1997)
9. V. Guillemin, A. Pollack, Differential Topology (Prentice Hall, Englewood Cliffs, 1974)
Chapter 2
Measure Theory

2.1 Topological Spaces and Measure Spaces

By the end of the nineteenth century, it was clear that Riemann integration (about
which one learns in a first course in calculus) had to be replaced by a more versatile
method of integration. For instance, the characteristic function of the irrational num-
bers is not Riemann integrable, and yet intuition suggests that over any finite interval
it should integrate to the length of that interval. It was Henri Lebesgue (1875–1941)
who came up with the appropriate theory in his doctoral thesis written in 1902. There
was an urgency in developing such a theory because there were other disciplines in
mathematics that were struggling to find an appropriate framework, most notably,
probability theory.
Metric spaces are examples of a larger universe of objects called topological
spaces. A topological space is a pair (X, O) consisting of a set X and a collection
O of subsets of X (called open sets) such that
(a) Ø and X ∈ O;
(b) U, V ∈ O implies U ∩ V ∈ O; 
(c) for any collection of open sets Ua , we have a Ua ∈ O.
The complements of open sets are called closed sets. Given a topological space
(X, O), a base U for O is any collection of open sets such that every open set is a
union of a subcollection of U. A neighborhood of a point x ∈ X is any set N (not
necessarily open) such that x ∈ U ⊂ N for some open U . We say that a collection
N of neighborhoods of x is a neighborhood base at x iff for every neighborhood V
of x, we have x ∈ N ⊂ V for some N ∈ N .
Every metric space gives rise to a topology. Indeed, if (X, d) is a metric space,
we define the open ball of radius r and centered at x as

Br (x) = {y ∈ X : d(x, y) < r }.

© Hindustan Book Agency 2022 103


M. R. Murty, A Second Course in Analysis, IMSc Lecture Notes in Mathematics,
https://doi.org/10.1007/978-981-16-7246-0_2
104 2 Measure Theory

The closed ball is denoted as Br (x) and is defined as

{y ∈ X : d(x, y) ≤ r }.

The punctured ball is sometimes denoted 


Br (x) and is defined as

{y ∈ x : 0 < d(x, y) < r }.

A subset A of X is then said to be open if for every x ∈ A, there is an r > 0 such


that Br (x) ⊆ A. The open sets of (X, d) now make X into a topological space.
It is convenient to consider the extended real line [−∞, ∞] which is the set
of real numbers along with two new elements +∞ and −∞. We make this into a
topological space by declaring that for any two real numbers a, b ∈ R, with a < b,
the sets (a, b), [−∞, a), (a, ∞] and any union of segments of this type, as open.
As noted earlier, we can define the notion of a continuous function in the context
of general topological spaces. Given two topological spaces X and Y , a function
f : X → Y is said to be continuous if the inverse image f −1 (U ) is open in X for
every open set U of Y .
One advantage in the study of metric spaces is that we can “measure” the distance
between two points and this can be used as a starting point for an integration theory.
However, on close examination, we can identify certain properties of the metric and
abstract these properties so as to arrive at a more general notion of a measure space.
Let X be an arbitrary set. A subset A of the power set 2 X is called a σ-algebra if
X ∈ A , and it is closed under complements and countable unions. A map μ : A →
[0, ∞] is called a positive measure (or simply a measure) if for any countable
collection of disjoint sets {Ai } of A , we have
 
 
μ Ai = μ(Ai ).
i i

That is, μ is countably additive. We then speak of the triple (X, A , μ) (or simply
(X, μ)) as a measure space. One can also consider more generally real measures
and complex measures where μ takes values in R or C, respectively. But we will
only be dealing with positive measures here.
If (X, μ) is a measure space, members of A are called measurable sets. If Y
is a topological space and f : X → Y , then f is called measurable if f −1 (U ) is
measurable for every open U of Y . In Lebesgue’s original theory, Y was the set
of non-negative real numbers [0, ∞]. If f : X → R is measurable, we say f is a
real measurable function. If f : X → C is measurable, we say f is a complex
measurable function. Such a function can clearly be decomposed as f = u + iv
with u and v being real measurable functions. One can also show that | f | is a real
measurable function in such an instance.
Theorem 2.1 Let X be a measure space. If f n : X → [−∞, ∞] is a sequence of
measurable functions, and
2.1 Topological Spaces and Measure Spaces 105

g = sup f n , h = lim sup f n ,


n≥1 n→∞

then g and h are measurable.

Proof For any α ∈ R,




g −1 ((α, ∞]) = f n−1 ((α, ∞]).
n=1

Thus g is measurable. It is clear that the same result holds with sup replaced by inf.
Thus, as 
h = inf sup f i
k≥1 i≥k

we see that h is also measurable. 

Exercises

1. Let X be a measure space and E a measurable set of X . Show that the charac-
teristic function χ E of E defined by χ E (x) = 1 if x ∈ E and zero otherwise is
a measurable function.
2. If f and g are measurable functions with range in [−∞, ∞], show that the func-
tions
max{ f, g}, and min{ f, g}

are measurable functions.


3. Let X be a set. Define μ(E) for any subset E of X as follows. If E is infinite,
μ(E) = ∞. If E is finite, μ(E) is the cardinality of E. Show that μ defines a
measure on X . (This is often called the counting measure.)
4. Let (X, μ) be a measure space. If

A1 ⊂ A2 ⊂ A3 ⊂ · · ·

are measurable sets and




A= An ,
n=1

show that μ(An ) → μ(A) as n → ∞.


5. Let (X, μ) be a measure space. If

A1 ⊃ A2 ⊃ · · ·
106 2 Measure Theory

are measurable sets with μ(A1 ) finite, show that if



A= An
n=1

then μ(An ) → μ(A) as n → ∞.


6. Let μ be the counting measure on the set of natural numbers. Let

An = {n, n + 1, n + 2, ...}.

If

A= An ,
n=1

then A is empty and thus has measure zero, whereas μ(An ) = ∞. Does this
contradict the previous exercise?
7. If f is a real-valued function on a measurable space X such that

{x : f (x) ≥ r }

is measurable for every rational r , then show that f is measurable.

2.2 The Lebesgue Integral

A function on a measure space X is called a simple function if its range is finite.


Clearly, such a function s can be written as a finite linear combination of characteristic
functions
 n
s= αi χ Ai ,
i=1

where α1 , ..., αn are the distinct values of s and Ai = {x : s(x) = αi }. It is clear that
s is measurable if and only if each Ai is measurable.
For a simple function s and a measurable set E of A , we define


n
s dμ := αi μ(Ai ∩ E).
E i=1

It is not hard to show that for any measurable function f : X → [0, ∞], there are
simple measurable functions s1 , s2 , ... such that

0 ≤ s1 ≤ s2 ≤ · · · ≤ f
2.2 The Lebesgue Integral 107

and
sn (x) → f (x) as n → ∞

for every x ∈ X .
This allows us to define the Lebesgue integral. If f : X → [0, ∞] is measurable,
and E ∈ A , we define the Lebesgue integral of f over E as

f dμ := sup s dμ,
E E

where the supremum is taken over all simple measurable functions s such that 0 ≤
s ≤ f . If f is simple, the two definitions of the Lebesgue integral agree.
We now come to the interesting part of the theory. The following theorems encap-
sulate the versatility of Lebesgue’s theory to handle limit operations. Throughout,
(X, μ) is a measure space.
Theorem 2.2 (Lebesgue’s monotone convergence theorem) Let { f n } be a sequence
of measurable functions and suppose that
(a) 0 ≤ f 1 (x) ≤ f 2 (x) ≤ · · · ≤ ∞ for every x ∈ X ;
(b) f n (x) → f (x) as n → ∞ for every x ∈ X .
Then f is measurable and
f n dμ → f dμ
X X

as n → ∞.
Proof Since the sequence of numbers

f n dμ
X

is monotonic increasing, there is an α ∈ [0, ∞] such that

f n dμ → α
X

as n → ∞. By Theorem 2.1, f is measurable and as f n ≤ f , we have

f n dμ ≤ f dμ
X X

for every n. Consequently,


α≤ f dμ.
X

Let s be a simple measurable function such that 0 ≤ s ≤ f and let c be a constant


with 0 < c < 1. Put
108 2 Measure Theory

E n = {x : f n (x) ≥ cs(x)}, n∈N

Then, every E n is measurable, and we have



E1 ⊃ E2 ⊃ E3 · · · , and X= En .
n≥1

To see this last equality, let x ∈ X . If f (x) = 0, then x ∈ E 1 ; if f (x) > 0, then
cs(x) < f (x) because c < 1, and so x ∈ E n for some n. Clearly,

f n dμ ≥ f n dμ ≥ csdμ, n∈N
X En En

Letting n → ∞ and applying Exercise 1, we deduce

α≥c sdμ.
X

This holds for every c < 1 and so,

α≥ sdμ,
X

for every simple measurable s satisfying 0 ≤ s ≤ f . Thus, by the definition of the


Lebesgue integral, we now have

α≥ f dμ.
X

The following theorem highlights the simplicity of Lebesgue’s theory.

Theorem 2.3 If f n : X → [0, ∞], is measurable for n = 1, 2, ... and




f (x) = f n (x) (x ∈ X ),
n=1

then


f dμ = f n dμ.
X n=1 X

Proof Put
gn = f 1 + f 2 + · · · + f n .
2.2 The Lebesgue Integral 109

Then, gn converges monotonically to f , and by Theorem 2.2, the result is immediate.




Theorem 2.4 (Fatou’s lemma) If f n : X → [0, ∞] is measurable for each positive


integer n, then
lim inf f n dμ ≤ lim inf f n dμ.
X n→∞ n→∞ X

Proof Put
gn = inf f i (x), n = 1, 2, ...; x ∈ X.
i≥n

Clearly gn ≤ f n , so that

gn dμ ≤ f n dμ, n = 1, 2, 3, ...
X X

Moreover, 0 ≤ g1 ≤ g2 ≤ · · · and each gn is measurable and by the definition of lim


inf,
gn (x) → lim inf f k (x).
k→∞

The monotone convergence theorem now implies the result. 

For (X, μ) a positive measure space, we define L 1 (μ) to be the set of all complex
measurable functions f on X for which

| f | dμ < ∞.
X

The reader will recall that f measurable implies that | f | is measurable and so the
above integral is well-defined. Elements of L 1 (μ) are then called Lebesgue inte-
grable functions. We now come to what may be termed as the most important theorem
in Lebesgue’s theory.

Theorem 2.5 (Lebesgue’s dominated convergence theorem) Suppose { f n } is a


sequence of measurable functions on X such that

f (x) = lim f n (x)


n→∞

exists for every x ∈ X . If there is a function g ∈ L 1 (μ) such that

| f n (x)| ≤ g(x) ∀x ∈ X, ∀n = 1, 2, ...

then f ∈ L 1 (μ) and


lim f n dμ = f dμ.
n→∞ X X
110 2 Measure Theory

Proof Since | f | ≤ g and f is measurable, we see f ∈ L 1 (μ). Now

| f n − f | ≤ 2g

and so Fatou’s lemma applied to 2g − | f n − f | gives

2gdμ ≤ lim inf (2g − | f n − f |)dμ


X n→∞ X

and we deduce


2gdμ ≤ 2gdμ + lim inf − | f n − f |dμ = 2gdμ − lim sup | f n − f |dμ.
X X n→∞ X X n→∞ X

Since
2gdμ
X

is finite, we can subtract it from both sides of the inequality and deduce

lim sup | f n − f |dμ ≤ 0.


n→∞

This implies the theorem. 

Sets of measure zero play a special role in Lebesgue’s theory. If a certain property
holds apart from a set of measure zero, we say the property holds almost everywhere
and abbreviate it as a.e. More precisely, a property P holds almost everywhere on
a measurable set E of the σ-algebra A if there exists N ⊂ A such that μ(N ) = 0
and P holds on E\N . In fact, we can define an equivalence relation on measurable
functions and say two functions f and g are equivalent if f − g = 0 a.e. In such a
case, we have for every measurable set E of X that

f dμ = gdμ.
E E

For 0 < p < ∞, and a measure space X , we define L p (μ) to be the set of mea-
surable functions f on X for which
 1/ p
|| f || p := | f | p dμ < ∞.
X

We call || f || p the L p -norm of f . These spaces are examples of Banach spaces,


and the student should review the basic properties of these spaces as adumbrated in
Sect. 2.6. If X = Rk , we denote this space as L p (Rk ). These are examples of complete
metric spaces. Technically, L p (μ) is not a space of functions, but rather a space of
2.2 The Lebesgue Integral 111

equivalence classes of functions where two functions f and g are equivalent if they
are equal a.e. However, in common parlance, we drop this subtle distinction and still
continue to refer to elements of L p (μ) as functions with this tacit understanding in
the background.
The space of continuous functions with compact support on X , denoted Cc (X ),
is dense in L p (μ). This fact is often useful in proving many fundamental theorems
concerning L p (μ).
We also introduce the space L ∞ (X ) consisting of (real or complex-valued) func-
tions f for which
|| f ||∞ := sup | f (x)| < ∞.
x∈X

Elements of this space are called essentially bounded functions.

Exercises

1. Let s be a simple function on a measure space X and for each E a measurable


set, define
φ(E) := sdμ.
E

Show that φ defines a measure on X .


2. If ai j ≥ 0 for i, j ≥ 1 is a double sequence non-negative real numbers, show that

∞ 
 ∞ ∞ 
 ∞
ai j = ai j .
i=1 j=1 j=1 i=1

3. Suppose f : X → [0, ∞] is measurable and for each E a measurable set define

φ(E) := f dμ. (2.1)


E

Show that φ is a measure on X and that

gdφ = g f dμ
X X

for every measurable g on X with range in [0, ∞]. (The converse of this result is
called the Radon–Nikodym theorem: any positive measure φ on X is of the form
(2.1) for some f ∈ L 1 (μ). One refers to f as the Radon–Nikodym derivative
of φ.)
4. Let f : X → [0, ∞] be measurable and suppose that for some measurable set E,
we have
112 2 Measure Theory

f dμ = 0.
E

Show that f = 0 a.e. on E. [Hint: for each n, show that the set

{x ∈ E : f (x) > 1/n}

has measure zero.]


5. If f ∈ L 1 (μ) and E f dμ = 0 for every measurable set E of X , show that f = 0
a.e. on X .

2.3 Inner Product Spaces

A vector space H over C is called an inner product space if for every pair of
elements x, y ∈ H , we associate (x, y) ∈ C (called the inner product of x and y)
satisfying the following:
(a) (x, y) = (y, x); (the bar denotes complex conjugation)
(b) (x + y, z) = (x, z) + (y, z) for all x, y, z ∈ H ;
(c) (αx, y) = α(x, y) for all x, y ∈ H and α ∈ C;
(d) (x, x) ≥ 0 for all x ∈ H with equality if and only if x = 0.
Property (c) implies that (0, y) = 0 for all y ∈ H . Properties (b) and (c) may be
combined to say that the map x → (x, y) is a linear functional on H . Properties
(a) and (c) show that (x, αy) = α(x, y). Finally (a) and (b) imply that (z, x + y) =
(z, x) + (z, y) and (d) allows us to define the norm of x, denoted ||x|| to be the
non-negative square root of (x, x).

Theorem 2.6 (The parallelogram law) In any inner product space H , we have

||x + y||2 + ||x − y||2 = 2(||x||2 + ||y||2 ).

Proof We have

(x + y, x + y) = (x, x) + (x, y) + (y, x) + (y, y)

and
(x − y, x − y) = (x, x) − (x, y) − (y, x) + (y, y).

The result is obtained by adding these two equations. 

The above theorem has the following geometric interpretation in Euclidean geom-
etry. In any parallelogram, the sum of the squares of the diagonals is equal to the
sum of the squares of its sides.
2.3 Inner Product Spaces 113

Theorem 2.7 (Cauchy–Schwarz inequality) In any inner product space, we have

|(x, y)| ≤ ||x||||y||.

Proof Without any loss of generality, we may suppose that (x, y) is real since we
can multiply x by eiθ for some real θ without altering ||x|| or ||(x, y)||. Thus, for r
real,
0 ≤ (x − r y, x − r y) = ||x||2 − 2r (x, y) + r 2 ||y||2 .

In particular, the corresponding quadratic equation must have negative discriminant.


Thus,
|(x, y)|2 − (x, x)(y, y) ≤ 0,

as desired. 

The Cauchy–Schwarz inequality is one of the most ubiquitous inequalities in


mathematics. If we consider Rn with the usual inner product, the inequality
 2 ⎛ ⎞⎛ ⎞
 n 
  n  n
 a j b j  ≤ ⎝ |a j |2 ⎠ ⎝ |b j |2 ⎠

 j=1  j=1 j=1

was discovered by Cauchy in 1821. The corresponding inequality for integrals,


namely
 b 2  b  b
 
 f (x)g(x)d x  ≤ | f (x)| 2
d x |g(x)|2 d x ,
 
a a a

seems to have been first stated by Buniakowsky in 1859 and later independently by
Schwarz in 1885. The interested student may find an exposition of the influence of
this inequality in mathematics, statistics and physics in [1].

Theorem 2.8 (The triangle inequality) For any x, y ∈ H , we have

||x + y|| ≤ ||x|| + ||y||.

Proof By the Cauchy–Schwarz inequality, we have

||x + y||2 = (x + y, x + y) = (x, x + y) + (y, x + y) = (x, x) + (x, y) + (y, x) + (y, y)

≤ ||x||2 + 2||x||||y|| + ||y||2 ,

from which the result follows immediately.

Thus,
||x − z|| ≤ ||x − y|| + ||y − z||,
114 2 Measure Theory

for any x, y, z ∈ H . This result allows us to define a metric on H by defining the


distance between x and y to be ||x − y||. Then, all the axioms for a metric space are
satisfied in view of the triangle inequality. If this metric space is complete, that is,
every Cauchy sequence converges, then we call H a Hilbert space.
We now consider some examples of Hilbert spaces.
(a) For any fixed n, let H = Cn and define for x = (x1 , ..., xn ) and y = (y1 , ..., yn ),


n
(x, y) = xi yi .
i=1

Since C is complete, we see that H is a Hilbert space.


(b) If μ is any positive measure, then the space of square integrable functions, L 2 (μ)
becomes an inner product space if we set

( f, g) = f gdμ.
X

How do we know the integral is well-defined? This follows from first applying
the Cauchy–Schwarz inequality to simple functions and then taking limits. Since
L p (μ) is a complete metric space for every 1 ≤ p ≤ ∞, we see that L 2 (μ) is a
Hilbert space.
(c) The vector space of all continuous complex functions on [0, 1] is an inner product
space if
1
( f, g) = f (t)g(t)dt,
0

but is not a Hilbert space. To show this, we need to produce a sequence of


continuous functions whose limit is not continuous. But this is clear upon con-
sidering the sequence continuous functions f n (x) = x n whose limit function is
not continuous (see Exercise 4).

Theorem 2.9 For any fixed y ∈ H , the maps,

x → (x, y), x → (y, x), x → ||x||,

are continuous functions on H .

Proof The Cauchy–Schwarz inequality implies

||(x1 , y) − (x2 , y)|| ≤ ||x1 − x2 || ||y||,

which shows that the map x → (x, y) is uniformly continuous. The same argument
works for x → (y, x). Finally, the triangle inequality shows that

||x1 || − ||x2 || ≤ ||x1 − x2 ||.


2.3 Inner Product Spaces 115

Interchanging x1 , x2 shows that

| ||x1 || − ||x2 || | ≤ ||x1 − x2 ||,

again showing that x → ||x|| is uniformly continuous. 


A subset M of a vector space V is called a subspace if M is itself a vector space
with respect to the addition and scalar multiplication defined on V . Clearly, M is a
subspace if and only if for all x, y ∈ M and α ∈ C, we have x + y ∈ M and αx ∈ M.
A closed subspace of H is one that is closed relative to the topology of H . Let us
observe that if M is a subspace, then so is its closure.
A set E in a vector space V is said to be convex if for every x, y ∈ E, and
0 < t < 1, we have
(1 − t)x + t y ∈ E.

It is evident that every subspace of V is convex. Moreover, if E is convex, then so


are its translates.
If (x, y) = 0, we say x and y are orthogonal and sometimes write x ⊥ y. Let x ⊥
denote the set of all vectors orthogonal to x. If M is a subspace, we denote by M ⊥
to be the set of all vectors orthogonal to every vector in M. It should be clear that
x ⊥ is a subspace of H . Moreover, it is a closed subspace since the map x → (x, y)
is continuous. For any subspace M, it is clear that

M⊥ = x ⊥.
x∈M

Since M ⊥ is the intersection of closed subspaces, it is also closed.


Notice that if x and y are orthogonal, then the proof of the parallelogram shows
that
||x + y||2 = ||x||2 + ||y||2 ,

which can be viewed as a generalization of the theorem of Pythagoras.


Theorem 2.10 Every non-empty, closed, convex set E in a Hilbert space H contains
a unique element of smallest norm. That is, there is a unique x0 ∈ E such that

||x|| ≥ ||x0 ||,

for all x ∈ E.
Proof Let δ be the infimum of ||x|| for x ∈ E. For any x, y ∈ E, we apply the
parallelogram law to x/2 and y/2 to get

1 1 1
||x − y||2 = ||x||2 + ||y||2 − ||(x + y)/2||2 .
4 2 2
Since E is convex, (x + y)/2 ∈ E. Hence,
116 2 Measure Theory

||x − y||2 ≤ 2||x||2 + 2||y||2 − 4δ 2 . (2.2)

From this we see that if x0 exists, then it must be unique for ||x|| = ||y|| = δ implies
||x − y||2 = 0. The definition of δ implies that there is a sequence of elements yn ∈ E
such that ||yn || → δ. Replacing x, y in (2.2) by yn , ym , we see that the sequence yn is
Cauchy. Since H is complete, there is an x0 ∈ H such that yn → x0 . As E is closed,
we must have x0 ∈ E. 

Theorem 2.11 Let M be a closed subspace of a Hilbert space H . Then, every x ∈ H


has a unique decomposition as y + z with y ∈ M and z ∈ M ⊥ . The points y and z
are the nearest points to x in M and M ⊥ , respectively. The maps x → y and x → z
are linear, and we have
||x||2 = ||y||2 + ||z||2 .

Proof Uniqueness is clear for if y + z = y  + z  , we have y − y  = z  − z, and since


M ∩ M ⊥ = 0, we see that y = y  and z = z  . To prove existence, consider the set

x + M = {x − y : y ∈ M}.

This set is closed and convex. Define z to be the element of smallest norm in x + M,
which exists by the previous theorem. We need to show that z ∈ M ⊥ . Let w ∈ M
and consider (w, z). Without loss of generality, we may suppose that ||w|| = 1. The
minimizing property of z means that

||z||2 ≤ ||z − αw||2 = (z − αw, z − αw)

for any α ∈ C. Thus,


0 ≤ −α(w, z) − α(z, w) + |α|2

for every α ∈ C. Choosing α = (z, w) gives 0 ≤ −|(z, w)|2 from which we deduce
that z ∈ M ⊥ . Thus, x = y + z with y ∈ M and z ∈ M ⊥ . To see that y and z are the
nearest points to x in M and M ⊥ , respectively, we have by the theorem of Pythagoras,
for u ∈ M,
||x − u||2 = ||y + z − u||2 = ||z||2 + ||y − u||2 ,

which is clearly minimized for y = u. A similar argument shows that z is the nearest
point to x in M ⊥ . The linearity of the maps is clear, and the last assertion follows
from Pythagoras. 

Corollary 2.1 If M = H , there exists a z ∈ H , z = 0 which is orthogonal to M.

Proof Take x ∈ H , x ∈
/ M. Take z as in the theorem. 

We have already seen that the map x → (x, y) is a linear functional for all y ∈ H .
It is a very important result of Riesz that all continuous linear functionals on H are
of this type.
2.3 Inner Product Spaces 117

Theorem 2.12 (Riesz representation theorem) If L is a continuous linear functional


on H , then there is a unique y ∈ H such that L(x) = (x, y), for all x ∈ H .

Proof If L(x) = 0 for all x ∈ H , we can take y = 0. Otherwise, define M = {x :


L(x) = 0}. Linearity of L implies this is a subspace. Continuity of L implies that it
is closed. Thus, by the previous theorem, M has an orthogonal complement which
is not the zero subspace. Thus, there is a z ∈ M ⊥ with ||z|| = 1. Put

u = L(x)z − L(z)x.

Note that L(u) = 0 so that u ∈ M. Therefore, (u, z) = 0 and

L(x) = L(x)(z, z) = (L(x)z, z) = (u + L(z)x, z) = L(z)(x, z) = (x, L(z)z).

In other words, L(x) = (x, y) with y = L(z)z. The uniqueness is obvious since
(x, y) = (x, y  ) for all x means that y − y  is orthogonal to every x ∈ H . In particular,
it is orthogonal to itself, which means that ||y − y  || = 0 from which we deduce that
y = y. 

Exercises

1. For real a, b and x > 0 and y > 0 show that

(a + b)2 a2 b2
≤ + .
x+y x y

Replacing b by b + c and y by y + z deduce that

(a + b + c)2 a2 b2 c2
≤ + + .
x+y+z x y z

Iterating, deduce that for xi > 0,

(a1 + a2 + · · · + an )2 a2 a2 a2
≤ 1 + 2 + ··· + n .
x1 + x2 + · · · + xn x1 x2 xn

Replacing ai with yi z i and xi with z i2 , deduce the Cauchy–Schwarz inequality


over the reals.
2. Prove that
 n  n   n 2
   1 
n n
ak2 bk2 − ak bk = (ai b j − a j bi )2 .
k=1 k=1 k=1
2 i=1 j=1
118 2 Measure Theory

3. For any set of strictly positive real numbers a1 , ..., an , show that

1 1 1
(a1 + a2 + · · · + an ) + + ··· + ≥ n2.
a1 a2 an

4. Show that the space C[0, 1] of real-valued continuous functions on [0, 1] with
the usual inner product
1
( f, g) = f (t)g(t)dt,
0

is not a Hilbert space.


5. If M is a closed subspace of a Hilbert space H , prove that (M ⊥ )⊥ = M.

2.4 Orthonormal Sets

It may be instructive to begin with an example. Consider the space C[0, 1] and
the function f (x) = x − 1/2. The functions φn (x) = e2πinx are orthonormal with
respect to the usual inner product. We can try to expand f in terms of these functions:
1
1
cn = f (x)e−2πinx d x = − ,
0 2πin

by an easy calculation. If we apply Parseval’s formula, we get

 1 1
1
= (x − 1/2)2 d x = .
n=0
4π 2 n 2 0 12

From this, we deduce


∞
1 π2
2
= .
n=1
n 6

To what extent can this calculation be made rigorous?


Let us consider the general setting. A set of vectors {u α } is orthonormal if
(u α , u β ) = 0 for α = β and 1 otherwise. We define the Fourier transform

x̂(α) := (x, u α )

and we refer to these as the Fourier coefficients of x relative to the set u α . We expect
in some sense, 
x= x̂(α)u α .
α∈A
2.4 Orthonormal Sets 119

Given x in a Hilbert space H , and F any finite subset of A, we can consider the sum

s F (x) = (x, u α )u α .
α∈F

If s in the span of F, then we claim that

||x − s F || ≤ ||x − s||

which is to say that s F is the vector nearest to x in the space spanned by F. Indeed,
it is easy to see that s F is in the space spanned by F and that x − s F is orthogonal to
every u α with α ∈ F by direct computation. Thus, x − s F is orthogonal to s − s F .
Now, x − s = x − s F + s f − s and by Pythagoras, we get

||x − s||2 = ||x − s F ||2 + ||s F − s||2 ,

from which we see the stated claim immediately. Setting s = 0 gives us

||s F ||2 ≤ ||x||2 ,

for all finite F. We record this discussion in the following.

Theorem 2.13 Suppose that {u α : α ∈ A} is an orthonormal set in a Hilbert space


H and that F is a finite subset of A. Let M F be the span of {u α : α ∈ F}. If φ is a
complex-valued function on A which is zero outside F, then there is a vector y ∈ M F ,
namely 
y= φ(α)u α ,
α∈F

which has ŷ(α) = φ(α) for every α ∈ A and



||y||2 = |φ(α)|2 .
α∈F

If x ∈ H and 
s F (x) = x̂(α)u α ,
α∈F

then
||x − s F (x)|| < ||x − s||,

for every x ∈ M F , except for s = s F (x), and



|x̂(α)|2 ≤ ||x||2 .
α∈F
120 2 Measure Theory

Proof Only the first part needs proving. But this is clear from the orthogonality
relations. The last inequality is called Bessel’s inequality. 
We want to extend this discussion to infinite F. We want to give meaning to

φ(α),
α∈A

where φ(α) are non-negative numbers. To this end, we simply define the sum to be
the supremum of the set of all sums over finite subsets of A. We consider the space
2 (A), the set of functions φ with domain A and satisfying

|φ(α)|2 < ∞.
α∈A

This is a Hilbert space with inner product



(φ, ψ) = φ(α)ψ(α).
α∈A

If φ ∈ 2 (A), then it is easy to see that φ(α) = 0 for at most countably many elements
of A. Indeed, the set An of α for which |φ(α)| > 1/n satisfies
 
|nφ(α)|2 ≤ n 2 |φ(α)|2 < ∞,
α∈An α∈A

so that An is finite for every n.


Recall that an isometry between two metric spaces X and Y is a one-to-one and
onto map f : X → Y that preserves distances. That is, if d1 and d2 are the distance
functions on X and Y , respectively, then

d2 ( f (x), f (x  )) = d1 (x, x  ), ∀ x, x  ∈ X.

We then say that the two spaces X and Y are isometric.


Lemma 2.1 Suppose that
(a) X and Y are metric spaces, X is complete,
(b) f : X → Y is continuous,
(c) X has a dense subspace X 0 on which f is an isometry, and
(d) f (X 0 ) is dense in Y .
Then f is an isometry of X onto Y .
The important part of the conclusion is that f maps X onto all of Y .
Proof The fact that f is an isometry on X follows from the continuity of f and
since X 0 is dense in X . To show surjectivity, let y ∈ Y and let xn be a sequence in
2.4 Orthonormal Sets 121

X 0 satisfying f (xn ) → y, which exists since f (X 0 ) is dense in Y . Thus, f (xn ) is


a Cauchy sequence. Since f is an isometry on X 0 , it follows that xn is a Cauchy
sequence in X . Since X is complete, this sequence converges to some x ∈ X and we
have f (x) = y by continuity. 

Theorem 2.14 Let {u α : α ∈ A} be an orthonormal set in a Hilbert space H . Let P


be the space of all finite linear combinations of the vectors u α . Then,

|x̂(α)|2 ≤ ||x||2 , (2.3)
α∈A

holds for every x ∈ H . The map x → x̂ is a continuous linear map from H onto
2 (A) whose restriction to the closure P of P is an isometry of P onto 2 (A).

Proof The inequality is immediate from Theorem 2.13. Define f on H by f (x) = x̂.
Then, the inequality shows that f maps H into 2 (A). The linearity of this map is
clear. Applying (2.3) to x − y gives

|| f (y) − f (x)|| = || ŷ − x̂|| ≤ ||y − x||,

from which we see the continuity of f . Theorem 2.13 shows that f is an isometry
of P onto the dense subspace of 2 (A) consisting of those functions whose support
is a finite set F of A. The theorem now follows from the previous lemma applied
to x = P, X 0 = P, Y = 2 (A). Note that P being a closed subspace of a complete
metric space is itself complete. 

The fact that the map x → x̂ carries H onto 2 (A) is known as the Riesz–Fischer
theorem.
We now prove the important theorem concerning orthonormal bases.

Theorem 2.15 Let {u α : α ∈ A} be an orthonormal set in H . The following are


equivalent.
(a) {u α } is a maximal orthonormal set;
(b) The set P of all finite linear combinations of elements of {u α } is dense in H ;
(c) The equality 
|x̂(α)|2 = ||x||2 ,
α∈A

holds for every x ∈ H ;


(d) Parseval’s identity holds:

x̂(α) ŷ(α) = (x, y),
α∈A

holds for all x, y ∈ H .


122 2 Measure Theory

Proof We will show that (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒ (a). If P is not dense, then its
closure is not all of H and we can write

H=P⊕P ,

and a non-zero vector in the complement can be added to our orthonormal set,
contrary to its maximal property. (b) implies (c) follows from Theorem 2.13. To see
that (c) implies (d), we use the “polarization identity”:

4(x, y) = ||x + y||2 − ||x − y||2 + i||x + i y||2 − i||x − i y||2 ,

which is easily verified (see exercises below) to be valid in any Hilbert space. We
apply this with x, y replaced by x̂, ŷ since 2 (A) is also a Hilbert space. The sum in
(c) is ||x̂||2 and the sum in (d) is (x̂, ŷ). Finally, to see that (d) implies (a), suppose that
(a) is false and our set is not a maximal orthonormal set. Then, we can find a vector
u = 0 which is orthogonal to every u α and we can add u to this set. If x = y = u,
we have (x, y) = ||u||2 > 0. On the other hand, all the Fourier coefficients of u are
zero so that (d) does not hold for this vector, a contradiction. This completes the
proof. 

Exercises

1. Let {xn : n = 1, 2, ...} be a linearly independent set of vectors in a Hilbert space


H . Show that the following construction yields an orthonormal set {u n } such that
{x1 , ..., x N } and {u 1 , ..., u N } have the same span for all N . Inductively define:
u 1 = x1 /||x1 ||, and u n = vn /||vn ||, where


n−1
vn = xn − (xn , u j )u j ,
j=1

for n ≥ 2. This is the extension of the Gram–Schmidt process that the student
has undoubtedly seen in a first course in linear algebra.
2. If M = {x : L(x) = 0} where L is a continuous linear functional on H , which is
not identically zero, prove that M ⊥ is a vector space of dimension 1.
3. Let H be a Hilbert space and {u 1 , ..., u N } an orthonormal system of vectors. Let
M be the span of these orthonormal vectors. Given x ∈ H , show that the vector


N
y= (x, u i )u i ∈ M
i=1

is the unique vector in M minimizing ||x − y|| with y ∈ M.


2.4 Orthonormal Sets 123

4. In the space L 2 [−1, 1], let M be the subspace spanned by the functions 1, x, x 2 .
Find an orthonormal basis for M.
5. Compute
1
min |x 3 − a − bx − cx 2 |2 d x.
a,b,c −1

6. Fix a positive integer N , put ω = e2πi/N . Prove the orthogonality relations

1  nk
N
ω =1
N n=1

if k ≡ 0 (mod N ) and 0 otherwise. Use this to derive the identities

1 
N
(x, y) = ||x + ω n y||2 ω n ,
N n=1

that hold in every inner product space for N ≥ 3. Show also that
π
1
(x, y) = ||x + eiθ y||2 eiθ dθ.
2π −π

2.5 Trigonometric Series

Let T be the unit circle. We would like to consider the Hilbert space L 2 (T) and show
that functions in this space can be approximated by trigonometric polynomials. A
trigonometric polynomial is a finite sum of the form


N
a0 + an cos nt + bn sin nt.
n=1

Because of the formulas

eit + e−it eit − e−it


cos t = , and sin t = ,
2 2i
we can rewrite these sums as

N
cn eint .
n=−N

This is the form we will consider below. Observe that the set of functions
124 2 Measure Theory

u n (t) = eint ,

forms an orthonormal family with respect to the usual inner product on L 2 (T) given
by
π
1
( f, g) = f (t)g(t)dt.
2π −π

To show that this is a maximal orthonormal family, it suffices to show that the
trigonometric polynomials are dense in L 2 (T) by the theorem of the previous section.
Since C(T) is dense in L 2 (T), it suffices to show that every continuous function can
be approximated by a trigonometric polynomial. In fact, we will prove the following
stronger theorem.

Theorem 2.16 Fix > 0 and f ∈ C(T). There exists a trigonometric polynomial
P such that
| f (t) − P(t)| ≤ ,

uniformly for all t ∈ T.

We need the following lemma.

Lemma 2.2 There exist trigonometric polynomials Q 1 , Q 2 , Q 3 , ... with the follow-
ing properties:
(a) Q k (t) ≥ 0, for all t ∈ R;
(b)
π
1
Q k (t)dt = 1.
2π −π

(c) let ηk (δ) = sup{Q k (t) : δ ≤ |t| ≤ π},


then
lim ηk (δ) = 0,
k→∞

for every δ > 0.

Proof Put
 k
1 + cos t
Q k (t) = ck ,
2

where ck is chosen so that (b) holds. This is the family of trigonometric polynomials
we will use. Clearly (a) also holds. We need to show (c). To this end, we first note
that since Q k (t) is even,

π  k π  k
ck 1 + cos t ck 1 + cos t 2ck
1= dt > sin tdt = .
π 0 2 π 0 2 π(k + 1)
2.5 Trigonometric Series 125

Since Q k is decreasing in [0, π], we have


 k
π(k + 1) 1 + cos δ
Q k (t) ≤ Q k (δ) ≤ ,
2 2

for 0 < δ ≤ |t| ≤ π. Since 1 + cos δ < 2, the result is now immediate. 

We can now prove Theorem 2.16. To each f ∈ C(T), let


π
1
Pk (t) = f (t − s)Q k (s)ds.
2π −π

Replacing s by t − s, and using the periodicity of f , we see that


π
1
Pk (t) = f (s)Q k (t − s)ds.
2π −π

Since each Q k is a trigonometric polynomial, it follows that each Pk is also a trigono-


metric polynomial. Now let > 0. Since f is continuous, it is uniformly continuous
on the compact set T. That is, there is a δ > 0, such that

| f (t) − f (s)| < , whenever |t − s| < δ.

Thus,
π
1
Pk (t) − f (t) = { f (t − s) − f (t)}Q k (s)ds.
2π −π

Thus, by the positivity of Q k , we have


π
1
|Pk (t) − f (t)| ≤ | f (t − s) − f (t)|Q k (s)ds + A1 + A2 (say)
2π −π

where A1 is the integral over the interval [−δ, δ], and A2 is over the complementary
set. In A2 , we use the estimate on ηk (δ) to see that

A2 ≤ 2|| f ||∞ ηk (δ) < ,

for sufficiently large k. For A1 , we have by the uniform continuity of f that A1 ≤


using (2) of Lemma 2.2. This completes the proof. 
In Sect. 3.10, we will use the Féjer kernel to give a more constructive proof of the
previous theorem.
As a consequence of the theorem proved in the previous section, we have the
Parseval theorem in the special case of L 2 (T). For f ∈ L 2 (T), we define the Fourier
coefficients by
π
1
fˆ(n) = f (t)e−int dt.
2π −π
126 2 Measure Theory

For f, g ∈ L 2 (T), we have




( f, g) = fˆ(n)ĝ(n)
n=−∞

and specializing to f = g gives




|| f ||2 = | fˆ(n)|2 .
n=−∞

We can ask if the converse is true. That is, suppose that




|cn |2 < ∞.
n=−∞

Does there exist an f ∈ L 2 (T) whose sequence of Fourier coefficients coincides with
the cn . The answer is provided by the Riesz–Fischer theorem. Recall that the map
x → x̂ is an isometry from H to L 2 (A) in the context of Theorem 2.15. Since this
map is onto, there is an f ∈ L 2 (T) such that fˆ(n) = cn for all n.
Let us look at the following example. Consider the function f (t) = t for −π ≤
t < π and extended by periodicity on the real line. Its Fourier coefficients are easily
computed:
π
2(−1)n π
xe−inx d x = ,
−π n

for n = 0 and for n = 0, the Fourier coefficient is zero. Thus, by Parseval’s formula,
we immediately deduce that
∞
1 π2
= .
n=1
n2 6

This gives another proof of this celebrated result of Euler.

Exercises

1. Suppose f is a continuous function of R1 with period 1. Prove that

1 
N 1
lim f (nα) = f (t)dt
N →∞ N n=1 0
2.5 Trigonometric Series 127

for every irrational real number α. [Hint: do it first for f (t) = e2πikt , with k =
0, ±1, ±2, ...]
2. Is the series
∞
cos nx
√ , x ∈T
n=1
n

the Fourier series of some function in L 2 (T)?


3. Is the series
 ∞
sin nx
, x ∈T
n=1
n 3/2

the Fourier series of some function in L 2 (T)?


4. If f ∈ L 2 (T) show that
fˆ(n) → 0

as |n| tends to infinity.


5. Find the Fourier series of
f (x) = x 2

for x ∈ T and use it to show that

∞
1 π4
= .
n=1
n4 90

2.6 Banach Spaces

A complex vector space X is said to be a normed linear space if there is a map


|| · || : X → R such that
(a) ||x|| ≥ 0 with equality if and only if x = 0;
(b) ||λx|| = |λ|||x||, for every scalar λ;
(c) ||x + y|| ≤ ||x|| + ||y||, for all x, y ∈ X (triangle inequality).
It is easily seen that we may define a metric on X by defining the distance between
x and y as ||x − y||. A Banach space is a normed linear space which is complete
with respect to this metric. Every Hilbert space is a Banach space. However, the class
of Banach spaces is a wider class than the class of Hilbert spaces, and we proceed
to develop this theory. In particular, let us note that C(T) is a Banach space relative
to the sup norm.
Here are some examples. Let X be a measure space with measure μ. Consider the
space L p (μ) of all complex valued functions with norm defined by
128 2 Measure Theory
 1/ p
|| f || := | f (x)| p dμ ,
X

for p ≥ 1. We have to check that this satisfies the axioms for our norm. The first
two are clear. The third follows from Minkowski’s inequality. For p = 1, this is
clear. So let us assume 1 < p < ∞ and proceed to show that Minkowski’s inequality
follows from the Cauchy–Schwarz inequality.
To prove this, we begin with an elementary remark. Suppose that φ(t) is a twice
differentiable function on [0, 1] such that φ (t) ≥ 0. Thus, φ (t) is increasing, and
φ(t) is concave up in this interval. Thus,

φ(0) + φ(1)
≥ φ(1/2).
2
We will make use of this observation below. For any set a1 , ..., an , b1 , ..., bn of non-
negative real numbers, we have
⎛ ⎞1/ p ⎛ ⎞1/ p ⎛ ⎞1/ p

n 
n 
n
⎝ (a j + b j ) p ⎠ ≤ ⎝ aj ⎠ + ⎝
p
bj ⎠ ,
1/ p

j=1 j=1 j=1

from which the inequality for L p -spaces follows by looking at finite sums and taking
limits. To prove the inequality, let A j = ta j + (1 − t)b j and set
⎛ ⎞1/ p

n
φ(t) = ⎝ Aj ⎠
p
.
j=1

The first derivative is


⎛ ⎞1/ p−1
n 
n
⎝ Aj ⎠
p p−1
A j (a j − b j ).
j=1 j=1

The second derivative is easily seen to be

⎛ ⎞1/ p−2 ⎧⎛ ⎞ ⎛ ⎞2 ⎫

⎨  ⎪


n n n n
( p − 1) ⎝ Aj ⎠ ⎝ Aj ⎠ A j (a j − b j )2 − ⎝ A j (a j − b j )⎠ .
p p p−2 p−1

⎩ j=1 ⎪

j=1 j=1 j=1

The quantity in the braces is non-negative by an application of the Cauchy–Schwarz


inequality. Since p ≥ 1, we see that φ (t) ≤ 0. By our initial observation, we deduce
that
φ(0) + φ(1)
≥ φ(1/2),
2
2.6 Banach Spaces 129

which is Minkowski’s inequality:


⎛ ⎞1/ p ⎛ ⎞1/ p ⎛ ⎞1/ p
  p  p
⎝ (a j + b j ) p ⎠ ≤ ⎝ aj ⎠ + ⎝ bj ⎠ ,
j j j

for non-negative a j , b j .
By the usual limiting process for integrals, we deduce
 1/ p  1/ p
| f + g| p dμ ≤ (| f | + |g|) p dμ ≤ || f || p + ||g|| p ,
X X

showing that L p (μ) is a normed linear space. Technically speaking, as stated ear-
lier, L p (μ) does not consist of functions but rather equivalence classes of functions
equal almost everywhere. One can show that this space is complete (Riesz–Fischer
theorem). Thus, L p (μ) is a Banach space.
In this context, it is useful to derive another fundamental inequality called
Hölder’s inequality. We begin with an elementary observation. Let p, q be pos-
itive real numbers such that 1p + q1 = 1. Then, for a, b ≥ 0

ap bq
ab ≤ + . (2.4)
p q

To see this, let us first note that the inequality is trivially true if a or b equals zero. So,
let us assume a, b > 0. Replacing a and b by x 1/ p and y 1/q , respectively, we need to
prove that
x y
x 1/ p y 1/q ≤ + .
p q

Taking logarithms, this is equivalent to



1 1 x y
log x + log y ≤ log + .
p q p q

But this is now immediate from the concavity of the logarithmic function. We can
now prove:

Theorem 2.17 (Hölder’s inequality) Let a1 , a2 , ..., an , b1 , b2 , ..., bn be complex


numbers and let 1 < p < ∞, 1 < q < ∞ satisfy 1/ p + 1/q = 1. Then,
 n   n 1/ p  n 1/q
   
 
 ak bk  ≤ |ak | p
|bk |q
.
 
k=1 k=1 k=1

The case p = q = 2 is the Cauchy–Schwarz inequality.


130 2 Measure Theory

Proof In (2.4), putting

|ak | |bk |
a = n 1/ p , b = n 1/q
k=1 |ak |p k=1 |bk |q

and summing over k gives the desired result. 

By the usual limit process, Hölder’s inequality translates to an integral version:


   
  1/ p 1/q
 f (x)g(x)dμ ≤ | f (x)| dμ
p
|g(x)|q dμ

X X X

provided all of the integrals exist. An important consequence of this is the following.

Theorem 2.18 Let (X, μ) be a measure space and let 1 ≤ p ≤ ∞ and q be such
that
1 1
+ = 1.
p q

If f ∈ L p (μ) and g ∈ L q (μ). Then f g ∈ L 1 (μ) and

|| f g||1 ≤ || f || p ||g||q .

Proof For p < ∞, this is an immediate consequence of Hölder’s inequality. For


p = ∞, we have
| f (x)g(x)| ≤ || f ||∞ |g(x)|

almost everywhere. Integrating this over X gives the desired inequality in this case
also. 

An important class of examples is obtained by considering linear transformations


from a normed linear space X to a normed linear space Y . Let T : X → Y . We say
T is a bounded linear transformation if there is an M < ∞ such that

||T (x)|| ≤ M||x||,

for all x ∈ X . (Note that the norm on the left hand side of the inequality is in Y and
the norm on the right hand side is in X . We will continue this convention since it will
be clear from the context, which norm is being used.) It is easy to see that the set of
all bounded linear transformations is a vector space. We define a norm on this space
by setting
||T || = sup{||T (x)|| : x ∈ X, ||x|| ≤ 1}.

It turns out that the space of all bounded linear transformations from a normed vector
space X to a Banach space Y is itself a Banach space.
2.6 Banach Spaces 131

In the special case that Y is the field of complex numbers, we call such linear
transformations, functionals. If X is a Banach space, the space X ∗ consisting of
all bounded linear functionals is again a Banach space, called the dual space of X .
Since X ∗ is again a Banach space, one can take its dual X ∗∗ , and one can show that
X can be identified with an isometric subspace of X ∗∗ . It is not in general true that
X is isomorphic to X ∗∗ . However, for some important spaces, like L p -spaces, it is
true that X  X ∗∗ . We refer the reader to [2] for further details.
The space L 1 (μ) is of some interest in the theory of Fourier series. Recall that
we proved that any f ∈ L 2 (T) has a Fourier series which converges to f in the L 2 -
norm. A natural question to ask is if for any f ∈ C(T), the Fourier series converges
to f . This question has stimulated extensive research in the subject. In 1876, du
Bois Reymond showed that there exists a continuous function whose Fourier series
diverges at a point. He asked if the Fourier series converges almost everywhere. This
was finally settled by Carleson in 1966, who showed that for any L 2 -function, its
Fourier series converges almost everywhere. We will use the theory of Banach spaces
to show that there exist continuous functions whose Fourier series diverge on a dense
subset of T.
In this context, let us consider the problem of convergence of Fourier series. The
question is if the partial sums,


n
sn ( f, x) = fˆ( j)ei j x
j=−n

converge to f (t). To this end, we observe that


π
1
sn ( f, x) = f (t)Dn (x − t)dt,
2π −π

where Dn (t) is the Dirichlet kernel:


n
Dn (t) = ei jt .
j=−n

Noting that
eit/2 Dn (t) − e−it/2 Dn (t) = 2i sin(n + 1/2)t,

we find
sin(n + 1/2)t
Dn (t) = .
sin t/2

Let us show that the L 1 -norm of Dn tends to infinity as n tends to infinity. Indeed,
using the inequality | sin t| ≤ |t|,
132 2 Measure Theory

π (n+1/2)π
2 dt 2 dt
||Dn ||1 > | sin(n + 1/2)t| = | sin t| .
π 0 t π 0 t

The last integral is easily seen to be greater than

2 1 4 1
n kπ n
| sin t|dt = → ∞.
π k=1 kπ (k−1)π π 2 k=1 k

We also consider the space L ∞ (μ) consisting of essentially bounded functions.


Recall that this space has norm defined by

|| f ||∞ = essential supremum of | f |,

where the essential supremum is

inf{M : μ(t : | f (t)| > M) = 0}.

The fact that L ∞ (μ) is a normed linear space follows from the usual triangle inequal-
ity.
We will prove in the next section:

Theorem 2.19 (Banach–Steinhaus theorem) Suppose X is a Banach space and Y


is a normed linear space. Suppose that Tα with α ∈ A is a family of bounded linear
transformations from X to Y . Then, either all the Tα are uniformly bounded or

sup ||Tα (x)|| = +∞,


α∈A

for all x belonging to a dense subset of X .

Remark 2.1 One can show that this dense subset is of type G δ , that is, a countable
intersection of open sets.

Before we prove this theorem, we want to consider its application to the theory of
Fourier series alluded to above. Consider the linear functional Tn : C(T) → sn ( f, 0).
Clearly, each Tn is bounded since

||Tn || ≤ ||Dn ||1 .

Now fix n. Put g(t) = 1 if Dn (t) ≥ 0 and −1 if Dn (t) < 0. There exist continuous
functions f j ∈ C(T) such that −1 ≤ f j ≤ 1 and f j (t) → g(t) for every t, as j →
∞. By the dominated convergence theorem,
π π
1 1
lim Tn ( f j ) = lim f j (−t)Dn (t)dt = g(−t)Dn (t)dt = ||Dn ||1 .
j→∞ j→∞ 2π −π 2π −π
2.6 Banach Spaces 133

Thus, these operators are unbounded, and so by the Banach–Steinhaus theorem, there
exist functions in C(T) whose Fourier series diverges on a dense subset of T.

Exercises

1. Let X = Cn . For x = (x1 , ..., xn ), define

||x||∞ = max |xi |.


1≤i≤n

Show that this is a norm on X .


2. With X as in the previous exercise, show that

||x||∞ = lim ||x|| p .


p→∞

3. Let ∞ be the space of all bounded sequences. Show that for x = {xn },

||x|| := sup |xn |


n

defines a norm on ∞ .
4. Let c be the subspace of ∞ consisting of convergent sequences and c0 the subspace
of null sequences. Show that c and c0 are closed Banach subspaces of ∞ .

2.7 Baire’s Theorem

The proof of the Banach–Steinhaus theorem will rely on the following theorem due
to Baire.
Theorem 2.20 (Baire) If X is a complete metric space, then the intersection of a
countable collection of dense open subsets is dense in X .
Proof Let V1 , V2 , ... be dense and open in X . Let W be any open set of X . We have
to show that ∩Vn has a point in W for W = Ø. Let B(x, r ) denote the open ball of
radius r centered at x ∈ X and let B(x, r ) be its closure. Since V1 is dense, W ∩ V1
is non-empty and so we can find x1 , r1 such that

B(x1 , r1 ) ⊂ W ∩ V1 and 0 < r1 < 1.

We now proceed inductively. If n ≥ 2, we can choose xn−1 and rn−1 such that Vn ∩
B(xn−1 , rn−1 ) is non-empty and so we can find xn and rn so that

B(xn , rn ) ⊂ Vn ∩ B(xn−1 , rn−1 ), and 0 < rn < 1/n.


134 2 Measure Theory

This process gives us a sequence of elements xn which is easily seen to be Cauchy


in X . As X is complete, the limit x of the sequence lies in X . Since xi lies in the
closed set B(xn , rn ) for i > n, it follows that x lies in each Vn . By construction, we
have x ∈ W . This completes the proof. 

We can now prove the Banach–Steinhaus theorem.

Proof of Theorem 2.19 Put φ(x) = supα∈A ||Tα (x)|| and let Vn be the set of x ∈ X
for which φ(x) > n. Since each Tα is continuous and the norm is a continuous map,
each function x → ||Tα (x)|| is continuous on X . It is easy to see that each Vn is
open. If all of these Vn ’s are dense, then by Baire’s theorem, ∩Vn is dense in X , and
the second half of the theorem is established. Therefore, let us suppose that VN fails
to be dense. Then, there is an x0 and r > 0 such that B(x0 , r ) ∩ VN = ∅. In other
words,
φ(x) = sup ||Tα (x)|| ≤ N
α∈A

for all x ∈ X satisfying ||x − x0 || ≤ r (the fact that we can take the closed disk
follows by continuity). Putting y = x − x0 , we can rephrase this as φ(x0 + y) ≤ N
for all ||y|| < r . In particular, with y = 0, we have φ(x0 ) ≤ N . Now,

Tα (y) = Tα (y + x0 ) − Tα (x0 ),

so by the triangle inequality and taking sup over α ∈ A, we get

φ(y) ≤ φ(y + x0 ) + φ(x0 ) ≤ 2N .

Thus, if ||y|| < r , we have ||Tα (y)|| ≤ 2N . Changing y to r y shows that for ||y|| ≤ 1,
we have ||Tα (y)|| ≤ 2N /r for all α. This completes the proof. 

There are several important corollaries of the Banach–Steinhaus theorem. The


most notable is the following.

Corollary 2.7.1 Let X be a Banach space and Y a normed linear space. Suppose
that we have a sequence of operators Tn : X → Y such that for each x, the sequence
Tn (x) is bounded. Then, it follows that all these sequences are uniformly bounded.

Although we do not develop the theory here, it is clear that many of the notions
of calculus generalize to the setting of normed linear spaces. For instance, the notion
of derivative can be defined for any map f : X → Y of normed linear spaces. Fix
a ∈ X . Then, we say f is differentiable at a if there is a linear transformation
D f,a : X → Y such that

|| f (a + h) − f (a) − D f,a (h)||


lim = 0.
h→0 ||h||
2.7 Baire’s Theorem 135

One can show that this linear transformation, if it exists, is unique, and thus, we can
speak of D f,a as the Fréchet derivative of f at a. It is then possible to derive a
general theory of calculus in a Banach space. We refer the interested reader to [2].

Exercises

1. Show that the norm of f in a Banach space can be evaluated on the boundary of
the unit ball.
2. Show that the derivative in a Banach space, when it exists, is unique.
3. Let X be a complete metric space. A subset E of X is called nowhere dense if its
closure contains no non-empty open subset of X . Any countable union of nowhere
dense sets is called a set of the first category. All other subsets are said to be of
second category. Show that no complete metric space is of the first category.

2.8 Hahn–Banach Theorem

Theorem 2.21 If M is a subspace of a normed linear space X and if f is a bounded


linear functional on M, then f can be extended to a bounded linear functional F on
X so that ||F|| = || f ||.

We make some comments. To say that F is an extension of f means that the


domain of F includes that of f and that F(x) = f (x) on the domain of f . Second,
the norms are computed relative to the domains of F and f , respectively. Thus,

|| f || = sup{| f (x)| : x ∈ M, ||x|| ≤ 1}

and
||F|| = sup{|F(x)| : x ∈ X, ||x|| ≤ 1}.

The third comment concerns the field of scalars. So far, we have been concerned
about spaces over C but all of the theory is valid over R. In fact, the Hahn–Banach
theorem was originally proved for real normed spaces.
A complex function φ on a complex vector space V is a complex linear functional
if
φ(x + y) = φ(x) + φ(y), φ(αx) = αφ(x), (2.5)

for all x, y ∈ V and all complex α. A real-valued function φ on a complex (real)


vector space is a real-linear functional if (2.5) holds for all real α. If u is the real part
of a complex linear functional, then u is a real-linear functional.

Proposition 2.8.1 Let V be a complex vector space.


136 2 Measure Theory

(a) If u is the real part of a complex linear functional f on V , then

f (x) = u(x) − iu(i x), (x ∈ V ). (2.6)

(b) If u is a real-linear functional on V and if f is defined as in (2.6), then f is a


complex linear functional on V .
(c) If V is a normed linear space and f and u are related as in (2.6), then || f || =
||u||.
Proof If z = α + iβ with α, β ∈ R, the real part of i z is −β. Thus,

z = Re(z) − i Re(i z),

for all complex numbers z. Since f is linear,

Re(i f (x)) = Re( f (i x)) = u(i x),

from which (2.6) follows. To prove the second part, we have to check that if f is
defined as in (2.6), then it is complex linear. But this follows easily by noting that

f (i x) = u(i x) − iu(−x) = u(i x) + iu(x) = i f (x).

Finally, since |u(x)| ≤ | f (x)| we have ||u|| ≤ || f ||. On the other hand, to every
x ∈ V , there is a complex number α, |α| = 1 so that α f (x) = | f (x)|. Then,

| f (x)| = f (αx) = u(αx) ≤ ||u||||αx|| = ||u|| · ||x||

so that || f || ≤ ||u||. This completes the proof. 


In the proof of the Hahn–Banach theorem, a key role is played by Zorn’s lemma
(which is equivalent to the axiom of choice). We recall this for the sake of complete-
ness.
A set P is said to be a partially ordered set if there is a relation ≤ on the set
satisfying
(a) a ≤ b and b ≤ c implies a ≤ c,
(b) a ≤ a for all a ∈ P and
(c) a ≤ b and b ≤ a implies a = b.
A subset Q of a partially ordered set P is said to be linearly ordered if for every
pair a, b, either a ≤ b or b ≤ a. A subset Q is a maximal totally ordered subset if
it is totally ordered and if adjoining any new element of P not in Q is adjoined to
Q, the resulting set is no longer totally ordered. The Hausdorff maximality theorem
says that every non-empty partially ordered set contains a maximal totally ordered
subset. This is equivalent to Zorn’s lemma and the axiom of choice.
We are now ready to prove the Hahn–Banach theorem. We assume first that X is
a real normed linear space. Consequently, f is a real-linear bounded functional on
2.8 Hahn–Banach Theorem 137

M. If || f || = 0, then F = 0 is the desired extension. There is no loss of generality in


assuming that || f || = 1. Choose x0 ∈ X , x0 ∈/ M. Let M1 be the vector space spanned
by M and x0 . We first extend f to M1 . M1 consists of vectors of the form x + λx0
with x ∈ M and λ a real scalar, so that if we define f 1 (x + λx0 ) = f (x) + λα with
α any fixed real number, then it is easy to verify that we have an extension of f
to M1 . The problem is to choose α so that the norm is preserved. That is, we must
ensure that
| f (x) + λα| ≤ ||x + λx0 ||, (x ∈ M, λ ∈ R).

Replacing x by λx and dividing both sides of the above by |λ|, the requirement is

| f (x) − α| ≤ ||x − x0 ||, (x ∈ M).

That is, we must have

A x := f (x) − ||x − x0 || ≤ α ≤ f (x) + ||x − x0 || =: Bx .

There exists such an α if all the intervals [A x , Bx ] have a common point. That is, if
and only if
Ax ≤ By

for all x, y ∈ M. But

f (x) − f (y) = f (x − y) ≤ ||x − y|| ≤ ||x − x0 || + ||y − x0 ||

from which we see that A x ≤ B y for all x, y ∈ M. Thus, there is a norm-preserving


extension f 1 of f to M1 . Now let P be the collection of ordered pairs (M  , f  ) where
M  is a subspace of X containing M and f  is a real-linear extension of f to M 
with || f  || = 1. Partially order P by declaring that (M  , f  ) ≤ (M  , f  ) to mean
that M  ⊂ M  and f  (x) = f  (x) for all x ∈ M  . The axioms for partial order are
clearly satisfied. Moreover, by the previous discussion, P is not empty. Thus, by
Hausdorff’s maximality principle, there is a maximal totally ordered subcollection
Ω of P. Let Φ be the collection of all M  such that (M  , f  ) ∈ Ω. Then Φ is totally
ordered by set inclusion, so that the union M̃, of all members of Φ, is a subspace of X
(Exercise). We extend f to M̃ as follows. If x ∈ M̃, then x ∈ M  for some M  ∈ Φ.
Define F(x) = f  (x) where f  is the function that occurs in the pair (M  , f  ) ∈ Ω.
Since Ω is totally ordered, it is immaterial which M  ∈ Φ we choose to define F(x)
as long as M  contains x. It is now easy to check that F is a linear functional on M̃
and ||F|| = 1. If M̃ were a proper subspace of X , then the construction in the first
part of the proof gives us a further extension which would contradict the maximality
of Ω. Thus, M̃ = X . This completes the proof in the case of real scalars.
If now, f is a complex linear functional on the subspace M of the complex normed
linear space X , let u be the real part of f and use the real Hahn–Banach theorem to
extend u to a real-linear functional U on X with ||U || = ||u||. Now define,
138 2 Measure Theory

F(x) = U (x) − iU (i x), x ∈ X.

By Proposition 2.8.1, F is a complex linear extension of f and ||F|| = ||U || =


||u|| = || f ||, which completes the proof. 
An important consequence is:

Theorem 2.22 If X is a normed linear space and x0 ∈ X with x0 = 0, then there is


a bounded linear functional f on X of norm 1 so that f (x) = ||x0 ||.

Proof Let M = {λx0 } and define f (λx0 ) = λ||x0 ||. Then, f is a linear functional
of norm 1 on M. By the Hahn–Banach theorem, f can be extended to X with the
norm preserved. 

The previous theorem allows us to define the notion of a dual space. If X is a


normed linear space, let X ∗ be the set of all bounded linear functionals on X . With
addition and scalar multiplication defined in the obvious way, we see that X ∗ is again
a normed linear space. In fact, as C and R are complete metric spaces, X ∗ is a Banach
space as will be shown below. It is customary to denote elements of X ∗ by x ∗ and
the value of x ∗ ∈ X ∗ at the point x ∈ X is denoted X ∗ (x) or by the more suggestive
notation, x, x ∗ .
By the previous theorem, X ∗ is not trivial if X is not trivial. In fact, X ∗ separates
points on X . That is, given x1 , x2 ∈ X with x1 = x2 , there is a functional f ∈ X ∗
such that f (x1 ) = f (x2 ). Indeed, take x0 = x1 − x2 in the previous theore, and the
result is immediate. We call X ∗ the dual space of X .
Another consequence is that for x ∈ X , we have

||x|| = sup{| f (x)| : f ∈ X ∗ , || f || = 1}.

Indeed, the right hand side is certainly a lower bound for ||x||. However, by the
Hahn–Banach theorem, there is a continuous linear functional f of norm 1 with
f (x) = ||x|| for a given x. Thus, the right hand side is also an upper bound for ||x||.
Hence, for fixed x ∈ X , the map f → f (x) is a bounded linear functional on X ∗ .
This gives us an injection
X → X ∗∗ .

The study of the interplay between X and X ∗ forms a large part of what is called
functional analysis.

Theorem 2.23 If X is a Banach space, then X ∗ is a Banach space.

Proof Since X ∗ is already a normed linear space, we have to show that it is complete.
Let xn∗ be a Cauchy sequence in X ∗ . Then, ||xn∗ − xm∗ || tends to zero as n, m both
tend to infinity. Thus, for any fixed x ∈ X , the sequence xn∗ (x) is a Cauchy sequence
of scalars since
|xn∗ (x) − xm∗ (x)| ≤ ||xn∗ − xm∗ || · ||x||.
2.8 Hahn–Banach Theorem 139

Thus, for each x ∈ X , there is a scalar x ∗ (x) such that xn∗ (x) → x ∗ (x). The functional
x ∗ defined on all of X in this way is clearly linear since

x ∗ (ax + by) = lim xn∗ (ax + by) = lim axn∗ (x) + bxn∗ (y)

= a lim xn∗ (x) + b lim xn∗ (y) = ax ∗ (x) + bx ∗ (y).

Now since xn∗ is a Cauchy sequence, given > 0, there is an M such that

|xn∗ (x) − xm∗ (x)| < ||x||,

for all n, m > M and all x. Letting n tend to infinity, we get

|x ∗ (x) − xm∗ (x)| < ||x||,

for m > M. Thus,

|x ∗ (x)| = |x ∗ (x) − xm
∗ (x) + x ∗ (x)| ≤ |x ∗ (x) − x ∗ (x)| + |x ∗ (x)| ≤ ( + ||x ∗ ||)||x||,
m m m m

so we conclude that x ∗ is a bounded linear functional. Also from

|x ∗ (x) − xm∗ (x)| < ||x||, m > M

follows the inequality ||x ∗ − xm∗ || < so that xm∗ → x ∗ in X ∗ .

Exercises

1. Recall that 1 consists of sequences x = {xn } such that




|xn | < ∞.
n=1

Let c0 be the space of null sequences. Fix y ∈ 1 . Show that T : c0 → 1 defined


by
∞
T (x) = xn yn
n=1

defines a bounded linear functional on c0 and that ||T || = ||y||1 .


2. Prove that (c0 )∗ = 1 .
3. Using a construction similar to the first exercise, prove that ∗1 = ∞ .
4. Show that ∗∞ contains non-trivial functionals that vanish on all of c0 .
140 2 Measure Theory

2.9 Examples of Dual Spaces

Let us first consider the dual space of Rn . Recall that the norm of the vector x =
(x1 , ..., xn ) is given by
 1/2

||x|| = 2
xi .
i

Let e1 , ..., en be the standard basis vectors. Then, any linear map f is determined by
the values f (ei ). Thus, letting yi = f (ei ), we have

f (x) = xi yi ,
i


since x = i xi ei . By Cauchy–Schwarz, we have
 1/2
 
| f (x)| = | xi yi | ≤ yi2 ||x||.
i i

We see that f is a bounded linear functional with norm bounded by


 1/2

yi2 .
i

However, choosing x = (y1 , ..., yn ), we see that this bound is achieved. Hence, every
bounded linear functional corresponds to a vector (y1 , ..., yn ) and

f (x) = xi yi
i

is the usual inner product. Thus, the dual space of Rn is again Rn as Banach spaces.
This example should not surprise us in view of our earlier characterization of
linear functionals on a Hilbert space H . They are all of the form x → (x, y). What
is the norm of this functional? By Cauchy–Schwarz, we have

|(x, y)| ≤ ||y||||x||,

so that the norm is bounded by ||y||. However, choosing x = y, this bound is attained,
and hence the norm of this functional is ||y||.
We will now determine the dual of the space  p , consisting of all sequences (xn )
such that 
|xn | p < ∞.
n
2.9 Examples of Dual Spaces 141

We need to use Hölder’s inequality:


   1/ p  1/q
   
 
 xi yi  ≤ |xi | p
|yi |q
,
 
i i i

for any p, q > 1 satisfying 1/ p + 1/q = 1. The special case p = q = 2 is the famil-
iar Cauchy–Schwarz inequality.
Every bounded linear functional on  p is of the form

f (x) = xn yn ,
n

where (yn ) is a sequence in q . Using Hölder’s inequality, it is easy to verify that f


is a bounded linear functional on  p . Conversely, if ei denotes the sequence which
is zero everywhere except at the i-th coordinate, we see that yi = f (ei ) determines
by continuity f (x) since 
f (x) = xi yi .
i

We want to show that the sequence (yi ) lives in q . To this end, define for each natural
number N , the sequence x N having for its i-th component |yi |q/ p sgn yi for i ≤ N
and zero otherwise. Then,
 N 1/ p

||x N || = |yi |
q
,
i=1

and

N 
N
f (x N ) = |yi |q/ p+1 = |yi |q .
i=1 i=1

But | f (x N )| ≤ || f ||||x N || so that


 N 1/q

|yi |q
≤ || f ||,
i=1

for all N . Thus, the sequence (yi ) is an element of q .


For p = 1, the dual of 1 is ∞ , the space of bounded sequences. To see this, let
as before yi = f (ei ). Define the sequence u N by xi = sgn y N for i = N and zero
otherwise. Then, ||u N || ≤ 1 and

|y N | = f (u N ) ≤ || f ||||u N || ≤ || f ||.
142 2 Measure Theory

Thus, the sequence of y’s is bounded. Conversely, given such an element y, we can
clearly define and element of ∗1 by setting

f (x) = xi yi .
i

We caution the reader that the dual space of ∞ is NOT 1 (see Exercise 4 in the
previous section and the exercises below).
The above discussion can be extended to determine the dual space of L p (X ) for
any measure space X . It turns out that it is L q (X ) for 1 ≤ p < ∞. The proof is
similar to what we have said above and makes use of the Radon-Nikodym theorem.
The dual space of a Hilbert space is itself as we have already seen.
The notion of orthogonality can be introduced in normed spaces through the dual
space. The vectors x ∈ X and x ∗ ∈ X ∗ are said to be orthogonal if x, x ∗  = 0. If
S is a subset of a normed linear space X , the orthogonal complement of S, denoted
S ⊥ , consists of all elements x ∗ ∈ X ∗ orthogonal to every vector of S. Recall that if
in a Hilbert space H , M is a closed subspace, we have the decomposition

H = M ⊕ M ⊥.

Given any vector x ∈ H , we have a unique decomposition

x = y + z,

with y ∈ M and z ∈ M ⊥ . The vector y can be characterized as the unique vector in


M that comes closest to x. How close it gets to y is measured by ||z|| which is also
equal to (x, z)/||z|| since z ∈ M ⊥ . The analogous result is the following.
Theorem 2.24 Let x be an element in a real normed vector space and let d be its
distance from the subspace M. Then,

x, z
d = inf ||x − y|| = max ,
y∈M z∈M ⊥ ||z||

where the maximum on the right is achieved for some z ∈ M ⊥ .


Proof For > 0, let y ∈ M be such that ||x − y || ≤ d + . Then, for any z ∈ M ⊥ ,
we have

x, z = x − y , z = z(x − y ) ≤ ||z||||x − y || ≤ ||z||(d + )

so that the right hand side of the equation in the theorem is bounded by d + . Since
was arbitrary, the right hand side is less than or equal to d. We now have to find a z
so that x, z = d||z||. Let N be the subspace spanned by x and M. Elements of N
can be written uniquely as u = ax + m with a ∈ R. Define a linear functional on N
by setting f (u) = ad. The norm of this functional is given by
2.9 Examples of Dual Spaces 143

|a|d d
sup{| f (u)|/||u|| : u ∈ N } = sup = = 1.
|a|||x + m/a|| inf ||x + m/a||

By the Hahn–Banach theorem, f has an extension z to X with ||z|| = 1. By con-


struction, we have z ∈ M ⊥ and x, z = d, which completes the proof. 

Exercises

1. Give an explicit isometric isomorphism between c∗ and 1 .


2. Give an explicit isometric isomorphism between c0∗ and 1 .
3. For x ∈ c, define x∞ to be limn→∞ xn . Define a map T : c → c0 by setting T (x)
to be the sequence x∞ , x1 − x∞ , x2 − x∞ , .... Show that T is one-to-one and that
T (c) = c0 .

References

1. M. Ram Murty, The Cauchy-Schwarz inequality in mathematics, physics, and statistics. Math.
Student 88, 17–25 (2019)
2. W. Rudin, Functional Analysis (McGraw Hill, New York, 1973)
Chapter 3
Fourier Transforms

3.1 Fubini’s Theorem and Convolutions

Jean Baptiste Joseph Fourier (1768–1830), the son of a tailor, was orphaned at the
age of eight and brought up and educated by the clergy. In 1790, at the age of 22, he
was appointed as a professor at the Ecole Polytechnique and in 1798, Napoleon took
him on his campaign to Egypt. His research on the heat equation began in 1800, and
by 1811, he had developed the theory now called the theory of Fourier series as the
essential tool for solving it. Fourier is best known today for his 1822 book, “Théorie
Analytique de la Chaleur” which was described by Kelvin as a “great mathematical
poem.” The theory of Fourier series and Fourier transforms that are in use today in
mathematics, physics and engineering can be traced back to this remarkable work.
In the subsequent years, Fourier’s theory has been extended to the context of the
Lebesgue integral. The essential theorems are easily proved using Fubini’s theorem
discovered by Guido Fubini (1879–1943). We begin with a discussion of this theorem
in the context of measure theory.
Let (X, μ) and (Y, ν) be two measure spaces. We can define a measure on X × Y
as follows. Let A ⊆ X be measurable and B ⊆ Y be measurable. Then we call A × B
a measurable rectangle and define a measure λ on X × Y by setting

λ(A × B) = μ(A)ν(B).

One can check that λ extends uniquely to a complete measure on the σ-algebra
generated by these rectangles. Recall that (X, μ) is said to be a complete measure
space if it contains all the subsets of sets of measure zero. We denote λ as μ × ν.
Now if f is an integrable function on X × Y , then the functions f (x, ·) on Y and
f (·, y) on X are both integrable for almost all x and y. In this situation, we have the
following theorem of Fubini.

Theorem 3.1 Let f be a function on X × Y , measurable with respect to the product


σ-algebra defined above. If either
© Hindustan Book Agency 2022 145
M. R. Murty, A Second Course in Analysis, IMSc Lecture Notes in Mathematics,
https://doi.org/10.1007/978-981-16-7246-0_3
146 3 Fourier Transforms

(i) 0  f  ∞ or
 
(ii) X Y | f |dνdμ < ∞
then       
f dλ = f dν dμ = f dμ dν.
X ×Y X Y Y X

In the case (ii) holds, we have that f ∈ L 1 (λ) and that f (x, ·) ∈ L 1 (ν) for almost
every x ∈ X and f (·, y) ∈ L 1 (μ) for almost every y ∈ Y .
Throughout (unless stated otherwise)

dx
dμ = √

where d x is the Lebesgue measure on R.

Definition 3.1 We define the convolution of two functions f, g ∈ L 1 (R) by


 ∞
( f ∗ g)(x) = f (x − y)g(y)dμ(y).
−∞

It is easy to see that the translation invariance of the Lebesgue measure as well as
Fubini’s theorem now implies that the convolution is well-defined. More precisely,
we have

Theorem 3.2 Suppose f, g ∈ L 1 (R). Then


 ∞
| f (x − y)g(y)|dμ(y) < ∞
−∞

for almost every x. Moreover, f ∗ g ∈ L 1 (R) and

 f ∗ g1   f 1 g1 .

Proof Observe that F(x, y) = f (x − y)g(y) is a measurable function on the prod-


uct space R2 . Moreover, by Fubini’s theorem
 ∞  ∞   ∞  ∞
|F(x, y)|d x dy = |g(y)| | f (x − y)|d xd y
−∞ −∞ −∞ −∞

and as the Lebesgue measure is translation invariant, the inner integral is  f 1 which
is a constant and as
 ∞ ∞
|F(x, y)|d xd y =  f 1 g1 ,
−∞ −∞
3.1 Fubini’s Theorem and Convolutions 147

we conclude h1   f 1 g1 . 

We say that h is a convolution of f and g and write h = f ∗ g.

Exercises

1. Let ⎧ (x 2 −y 2 )
⎨ (x 2 +y 2 )2 (x, y) ∈ (0, 1) × (0, 1)
f (x, y) =

0 otherwise

Show that  ∞  ∞ 
π
f (x, y)d x dy = − ,
−∞ −∞ 4

and  ∞  ∞ 
π
f (x, y)dy d x = .
−∞ −∞ 4

Why does Fubini’s theorem not apply?

2. Prove that  ∞  
2y 2y
− 2 dy = 2 log x.
0 1 + y2 x + y2

Deduce that  ∞
log x y dy
− = .
1 − x2 0 (1 + y 2 )(x 2 + y 2 )

3. Show that  
1 ∞
log x log x
I := dx = d x.
0 1 − x2 1 1 − x2

Deduce that  ∞  ∞
y d x dy
−2I = .
0 0 (1 + y 2 )(x 2 + y 2 )

4. Applying Fubini’s theorem in the last exercise, use the previous exercises to
deduce that

1 π2
= .
n=0
(2n + 1) 2 8

5. For any natural number n, show that


148 3 Fourier Transforms


1 3
= 2.
m=1
m2 −n 2 4n
m =n

6. Set for k > 1,



1
ζ(k) = .
n=1
nk

Show that
  k−1
1
k+ ζ(2k) = ζ(2 j)ζ(2k − 2 j).
2 j=1

This shows that ζ(2k) can be recursively determined from ζ(2).1 [Hint: write
the summand as a double sum, interchange summations, and use the previous
exercise.]

3.2 The Fourier Transform

For f ∈ L 1 (R), we define the Fourier transform fˆ by


 ∞
fˆ(t) = f (x)e−i xt dμ(x)
−∞


where i = −1. Here are some basic properties of the Fourier transform.
Theorem 3.3 Suppose f ∈ L 1 (R) and α, λ ∈ R. Then
(1) If g(x) = f (x)eiαx , then ĝ(t) = fˆ(t − α).
(2) If g(x) = f (x − α), then ĝ(t) = fˆ(t)e−iαt .
(3) If g ∈ L 1 (R) and h = f ∗ g, then ĥ = fˆĝ.
(4) If g(x) = f (−x), then ĝ(t) = fˆ(t).
(5) If g(x) = f (x/λ) where λ > 0, then ĝ(t) = λ fˆ(λt).
(6) If g(x) = −i x f (x) and g ∈ L 1 (R), then fˆ is differentiable with derivative
fˆ (t) = ĝ(t).
Proof Property (1) is easy to see:
 ∞
ĝ(t) = f (x)eiαx e−i xt dμ(x)
−∞
 ∞
= f (x)e−i(t−α)x dμ(x) = fˆ(t − α).
−∞

1 These exercises are taken from the author’s short note [1] in Math. Student.
3.2 The Fourier Transform 149

To prove (2), we have


 ∞
ĝ(t) = f (x − α)e−i xt dμ(x).
−∞

Putting u = x − α, we have
 ∞
ĝ(t) = f (u)e−i(α+u)t dμ(u),
−∞

since the Lebesgue measure is translation invariant. Thus


 ∞
ĝ(t) = e−iαt f (u)e−iut dμ(u) = e−iαt fˆ(t).
−∞

To prove (3), we apply Fubini’s theorem:


 ∞
ĥ(t) = h(x)e−i xt dμ(x)
−∞
 ∞  ∞ 
= f (x − y)g(y)dμ(y) e−i xt dμ(x).
−∞ −∞

We write x as x − y + y in the exponential to get


 ∞ ∞
ĥ(t) = f (x − y)e−i(x−y)t dμ(x)g(y)e−i yt dμ(y)
−∞ −∞
 ∞
= fˆ(t) g(y)e−i yt dμ(y)
−∞

= fˆ(t)ĝ(t)

because the Lebesgue measure is translation invariant.


To prove (4), put g(x) = f (−x). Then
 ∞
ĝ(t) = f (−x)e−i xt dμ(x)
−∞
 ∞
= f (−x)e−i(−x)t dμ(x) = fˆ(t).
−∞

To prove (5), we have


 ∞
ĝ(t) = f (x/λ)e−i xt dμ(x)
−∞
 ∞
= f (x/λ)e−i(x/λ)λt dμ(x).
−∞
150 3 Fourier Transforms

We change variables and put v = x/λ so that


 ∞
ĝ(t) = f (v)e−iv(λt) λdμ(v)
−∞

= λ fˆ(λt).

Finally, to prove (6), we note that for s = t,


 ∞
fˆ(s) − fˆ(t)
= f (x)e−i xt φ(x, s − t)dμ(x)
s−t −∞

where
e−i xu − 1
φ(x, u) = .
u
Now

ei xu/2 (e−i xu − 1)
φ(x, u)ei xu/2 =
u
e−i xu/2 − ei xu/2 2i sin(xu/2)
= =−
u u
so that for u = 0, we have

|2 sin(xu/2)| sin(xu/2)
|φ(x, u)| = = · x  |x|
|u| xu/2

because |(sin θ)/θ|  1. (See Exercise 1 below)


Thus, we can apply the dominated convergence theorem to get

fˆ(s) − fˆ(t)
fˆ (t) = lim
s→t s−t
 ∞
= f (x)e−i xt lim φ(x, s − t)dμ(x)
−∞ s→t
 ∞
= (−i x)e−i xt f (x)dμ(x)
−∞
 ∞
= g(x)e−i xt dμ(x) = ĝ(t).
−∞

Example 3.1 Compute the Fourier transform of f (x) = e−x /2 .


2

We first compute the integral (which we have seen earlier)


3.2 The Fourier Transform 151
 ∞
dx
e−x /2
2
I = √ .
−∞ 2π

Observe that  ∞  ∞
+y 2 )/2 d xd y
e−(x
2
I2 = .
−∞ −∞ 2π

This suggests that we switch to polar coordinates. We put x = r cos θ, y = r sin θ


with 0  r < ∞ and 0  θ  2π. The Jacobian determinant of this transformation
is r and so
 ∞  2π  ∞  ∞
−r 2 /2 r dr dθ −r 2 /2
∞
I =
2
e = e r dr = e−r dr = −e−r 0 = 1.
0 0 2π 0 0

Therefore I = ±1. But I is the integral of a non-negative function, so we must


have I = 1. 

Now let us compute fˆ(t). To this end, let


 ∞
e−(x+iu) /2
2
I (u) = dμ(x).
−∞

We computed above that I (0) = 1.


Let us observe that
 ∞
d
e−(x+iu) /2 dμ(x)
2
I (u) =
du −∞
 ∞
d  −(x+iu)2 /2 
= e dμ(x)
−∞ du
 ∞
i(x + iu)e−(x+iu) /2 dμ(x)
2
=−
−∞
i  −(x+iu)2 /2 x=∞
=√ e
2π x=−∞

= 0.

The interchange of integration and differentiation can be justified (see next section)
and so we deduce I (u) is constant. But since I (0) = 1, we have I (u) = 1. That is
 ∞
e−x /2 −iux
2
e dμ(x) = 1
−∞

which implies
 ∞
fˆ(u) = e−x /2 −iux
dμ(x) = e−u /2
2 2
e = f (u).
−∞
152 3 Fourier Transforms

That is, f is its own Fourier transform. In other words, when we view the Fourier
transform as a linear transformation, it is an eigenvector with eigenvalue 1.

Exercises

1. Prove that
sin θ
≤ 1,
θ

which was used in the proof of Theorem 3.3.


2. Compute the Fourier transform of the function f ∈ L 1 (R) given by f (x) = 1 if
x ∈ [−1, 1] and zero otherwise. Show that  f ∈ L 2 (R) but not in L 1 (R).
3. Calculate the Fourier transform of f (x) = e−2|x| .
4. In a Hilbert space, show that the map x → (x, y) for fixed y and the map x → ||x||
are continuous maps.
5. For functions f ∈ L 1 (Rn ), we define the Fourier transform

fˆ(t) = (2π)−n/2 f (x)e−i(x,t) d x,
Rn

where (x, t) denotes the usual inner product, and d x is the Lebesgue measure
on Rn . If f (x) = e−|x| /2 , show that fˆ = f . (Here x = (x1 , ..., xn ) and |x|2 =
2

x12 + · · · + xn2 .)
6. Prove that  ∞
1 e−x /2
2
1
e−t /2 dt ∼ √
2
√ ,
2π x 2π x

as x tends to infinity.

3.3 Differentiation Under the Integral Sign

In the previous section, we computed the derivative of I (u) by differentiatiing under


the integral sign. We prove below a general theorem that allows us to justify that step.
For this purpose, let us recall Lebesgue’s dominated convergence theorem: if { f n }∞n=1
is a sequence of measurable functions on x such that | f n (x)|  g(x) for almost all
x ∈ X with g a measurable function and f (x) = limn→∞ f n (x), then
 
f (x)dμ = lim f n (x)dμ.
X n→∞ X
3.3 Differentiation Under the Integral Sign 153

Theorem 3.4 Let [a, b] be an interval in R (which may be infinite) and suppose that
f is a continuous function on the rectangle [a, b] × [c, d]. Set
 b
g(y) = f (x, y) d x.
a

∂f
Suppose that exists and is continuous. Further suppose if [c, d] is infinite then
∂y
∂f
is bounded. Then g is differentiable and
∂y
 b
∂f
g (y) = (x, y) d x.
a ∂y

Proof We compute
g(y + 1/n) − g(y)
lim
n→∞ 1/n

and show that it exists and is equal to


 b
∂f
(x, y) d x.
a ∂y

Indeed,
 b
n [g(y + 1/n) − g(y)] = n [ f (x, y + 1/n) − f (x, y)] d x.
a

Consider the family of functions

f n (x, y) = n [ f (x, y + 1/n) − f (x, y)] .

∂f
By the assumption that exists and the mean value theorem, we have that
∂y

1∂f
f (x, y + 1/n) − f (x, y) = (x, ξ)
n ∂y

for some intermediate point ξ ∈ [y, y + 1/n].


∂f
As is bounded, f n (x, y) is bounded. By the Lebesgue dominated convergence
∂y
theorem  b  b
lim f n (x, y) d x = lim f n (x, y) d x
n→∞ a a n→∞
154 3 Fourier Transforms
 b
∂f
and the latter integral is precisely (x, y) d x. 
a ∂y

We illustrate the theorem through the following interesting example. We want to


evaluate the (conditionally convergent) integral
 ∞
sin x
d x.
0 x

To this end, we consider  ∞


sin x
g(t) = e−t x d x, (3.1)
0 x

which is now absolutely convergent for t > 0. It is now easy to check that the function

sin x
f (x, t) = e−t x
x

satisfies the hypotheses of the theorem for [a, b] = [0, ∞), [c, d] = [ , ∞) with
> 0. Differentiating under the integral sign with respect to t, we obtain
 ∞
g (t) = − e−t x sin xd x.
0

We integrate by parts:
 ∞  ∞
e−t x x=∞ e−t x
e−t x sin xd x = − sin x + cos xd x.
0 t x=0 0 t

The first term on the right is zero so that


 ∞  ∞
−t x e−t x
e sin xd x = cos xd x.
0 0 t

One more integration by parts shows


 ∞  ∞
e−t x e−t x x=∞ e−t x
cos xd x = − 2 cos x − sin xd x.
0 t t x=0 0 t2

The first term on the right hand side is 1/t 2 so that we deduce

−g (t) = 1/t 2 + g (t)/t 2 .

In other words,
1
g (t) = − .
1 + t2
3.3 Differentiation Under the Integral Sign 155

Integrating, we obtain
g(t) = − arctan t + C

for some constant C. By the dominated convergence theorem applied to (3.1),


g(∞) = 0 so that C = π/2. We conclude that for t > 0,
 ∞
sin x π
e−t x d x = − arctan t,
0 x 2

a result which is of independent interest. Applying again the dominated convergence


theorem with t → 0, we find
 ∞
sin x π
dx = .
0 x 2

In his semi-autobiography, “Surely you are joking Mr. Feynman,” Richard Feyn-
man narrates that he learned this method while in high school and it was the one trick
he would use again and again and this is how he earned a reputation for evaluating
integrals.

Exercises

1. Suppose f ∈ L 1 (R) and f (x) > 0 for all real x. Show that | fˆ(y)| < fˆ(0) for
every y = 0.
2. Find the Fourier transform of
x
f (x) = .
(1 + x 2 )2

3. Prove that  1
x −1
d x = log 2.
0 log x

[Hint: Consider the function


 1
xa − 1
f (a) := d x.]
0 log x

4. Let  u2
φ(α) = f (x, α)d x, a ≤ α ≤ b,
u1

where u 1 and u 2 may depend on the parameter α. Prove that


156 3 Fourier Transforms
 u2
∂f du 2 du 1
φ (α) = d x + f (u 2 , α) − f (u 1 , α) .
u1 ∂α dα dα

This is sometimes called Leibniz’s rule for differentiating under the integral sign.
5. Let  a2
sin ax
φ(a) = d x, a = 0.
a x

Show that
3 sin a 3 − 2 sin a 2
φ (a) = .
a
6. Prove that  ∞

−t x 2 /2 2π
e dx = √ .
−∞ t

Differentiating with respect to t, and setting t = 1, deduce that for even n,


 ∞ √
x n e−x /2
2
d x = 1 · 3 · 5 · · · (n − 1) 2π.
−∞

3.4 Further Examples of Fourier Transforms

The Fourier transform can be extended to L 1 (Rn ) by setting



fˆ(t) = f (x)e−i(x,t) dμ(x)
Rn
 
dx n
where now dμ(x) = √ is the product measure on Rn and (x, t) is the dot

product of the vectors x, t ∈ Rn .
It can be shown that if
f (x) = e−|x| /2
2

with x = (x1 , · · · , xn ) and |x|2 = x12 + · · · + xn2 then

fˆ(t) = e−|t| /2 .
2

This was Exercise 5 in the penultimate section.


As another example of an explicit example of the Fourier transform, we consider
H (x) = e−|x| .
Let Hλ (x) = H (λx). We want to compute the Fourier transform of Hλ (x). Set
3.4 Further Examples of Fourier Transforms 157
 ∞
h λ (t) = Hλ (x)e−i xt dμ(x).
−∞

This is
 ∞  0  ∞
−|λ||x| −i xt −|λ||x| −i xt
e e dμ(x) = e e dμ(x) + e−|λ||x| e−i xt dμ(x).
−∞ −∞ 0

Let us note that


 ∞  ∞
−|λ|x−i xt
e dμ(x) = e−x(|λ|+it) dμ(x)
0 0
 −x(|λ|+it) x=∞
e 1 1
= − ·√ =√ .
|λ| + it x=0 2π 2π(|λ| + it)

Also,  
0 0
−|λ||x| −i xt
e e dμ(x) = − e−|λ||x|+i xt dμ(x)
−∞ ∞

upon changing x to −x. This is equal to


 ∞  ∞
1
e−|λ|x+i xt dμ(x) = e−x(|λ|−it) dμ(x) = √ .
0 0 2π(|λ| − it)

Putting the computations together, we get


 ∞  
1 1 1 2|λ|
H (λx)e−i xt dμ(x) = √ + =√ .
−∞ 2π |λ| + it |λ| − it 2π(|λ|2 + t 2 )

In other words, 
2 |λ|
h λ (t) = .
π λ2 + t 2

Let us observe that if λ > 0,


 ∞  ∞
   ∞
2 λ 2 λdμ(x)
h λ (x)dμ(x) = dμ(x) = 2 .
−∞ −∞ π λ2 + x 2 π 0 λ2 + x 2

Changing x to λx gives
 ∞  
2 ∞ dx
h λ (x)dμ(x) = 2 √
−∞ π 0 (1 + x 2 ) 2π

2 ∞ dx 2 ∞
= = arctan x = 1.
π 0 1 + x2 π 0
158 3 Fourier Transforms

We also note for future reference that if H (x) = e−|x| , then

0 < H (x)  1

and for λ > 0, λ → 0, H (λx) → 1.

Exercises

1. Suppose f ∈ L 1 (R) has a Fourier transform. For any t1 , ..., tn ∈ R and z 1 , z 2 , ...,
z n ∈ C, prove that
n n

f (t j − tk )z j z k ≥ 0.
j=1 k=1

2. Calculate the Fourier transform of x n e−x .


2

3. Prove that  ∞

π −t 2 /2
e−x /2
2
cos(t x) d x = e .
0 2

3.5 A Convolution Theorem

Proposition 3.5.1 Let f ∈ L 1 (R). Then


 ∞
( f ∗ h λ )(x) = H (λt) fˆ(t)ei xt dμ(t).
−∞

Proof We have
 ∞
( f ∗ h λ )(x) = f (x − y)h λ (y)dμ(y)
−∞
 ∞  ∞ 
= f (x − y) Hλ (t)e−it y dμ(t) dμ(y)
−∞ −∞
 ∞  ∞ 
= H (λt) f (x − y)e−it y dμ(y) dμ(t)
−∞ −∞

by an application of Fubini’s theorem. We change variables in the inner integral by


putting u = x − y so that our integral becomes
 ∞  ∞   ∞
H (λt)e −it x
f (u)e dμ(u) dμ(t) =
itu
H (λt)e−it x fˆ(−t)dμ(t).
−∞ −∞ −∞
3.5 A Convolution Theorem 159

Changing t to −t gives us the final result. 

Exercises

1. Solve for f :  ∞
4 −|x| 2 −2|x|
f (x − t)e−|t| dt = e − e .
−∞ 3 3

2. For t > 1, define  ∞


dx
F(t) = .
2 xt log x

Show that
F(t) + log(t − 1) = O(1),

as t → 1+.
3. For each natural number n, let gn be the characteristic function of [−n, n] and
let h be the characteristic function of [−1, 1]. Compute the convolution gn ∗ h
explicitly. (The graph is piecewise linear.)
4. With notation as in the previous exercise, show that gn ∗ h is the Fourier transform
of a function f n ∈ L 1 (R), where (apart from a multiplicative constant),

(sin x)(sin nx)


f n (x) = .
x2
5. With f n as in the previous exercise, show that || f n ||1 → ∞ as n → ∞. Con-
clude that the map f → fˆ maps L 1 (R) into a proper subset of C0 , the space of
continuous functions that vanish at infinity.

3.6 The Inversion Theorem

The integrand on the right hand side of the equation in Proposition 3.5.1 is bounded by
| fˆ(t)| since |H (x)|  1. Moreover, as λ → 0+ , H (λt) → 1 and so by the Lebesgue
dominated convergence theorem,
 ∞  ∞
lim+ H (λt) fˆ(t)ei xt dμ(t) = fˆ(t)ei xt dμ(t)
λ→0 −∞ −∞

provided fˆ ∈ L 1 (R). What we would like to prove now is that the right hand side of
the above limit is in fact equal to f (x) almost everywhere. To this end, we need a
few technical results and we collate them below.
160 3 Fourier Transforms

We give two proofs of the inversion theorem. The first proof follows closely Rudin
[2]. The second proof is shorter and makes essential use of the Lebesgue dominated
convergence theorem, Fubini’s theorem and the fact that e−x /2 is its own Fourier
2

transform. Though the first proof is longer, it has the merit of invoking related results
of independent interest. The second proof of course is crisp and succinct.
Proposition 3.6.1 For any f on R and any y ∈ R, let

f y (x) := f (x − y).

If 1  p < ∞ and f ∈ L p (R), then the map y → f y is a uniformly continuous


map from R to L p (R).
Proof Fix ε > 0. Since continuous functions with compact support are dense in
L p (R), we can find a function g with support in [−A, A] such that

 f − g p < ε.

Since continuous functions on closed intervals are uniformly continuous, we have


that there exists δ > 0 with 0 < δ < A satisfying
ε
|g(s) − g(t)| < when |s − t| < δ.
(3A)1/ p

The translate gs of g has support [−A + s, A + s] and similarly for gt . Hence


 ∞
gs − gt  pp = |g(x − s) − g(x − t)| p dμ(x)
−∞

has support in I = [min(−A + t, −A + s), max(A + t, A + s)]. Thus

ε p (2 A + |s − t|) ε p (2 A + δ)
gs − gt  pp  (3A)−1 ε p μ(I )    εp.
3A 3A
Therefore, when |s − t| < δ, gs − gt  p  ε.
Finally,

 f s − f t  p   f s − gs  p + gs − gt  p + gt − f t  p
  f − g p + gs − gt  p +  f − g p
 2ε + ε = 3ε

This completes the proof. 


Proposition 3.6.2 If g ∈ L ∞ (R) is continuous at x, then

lim (g ∗ h λ )(x) = g(x).


λ→0+
3.6 The Inversion Theorem 161

Proof Observe that h λ (y) = λ−1 h 1 (y/λ). Noting


 ∞
h λ (y)dμ(y) = 1,
−∞

we have
 ∞
(g ∗ h λ )(x) − g(x) = [g(x − y) − g(x)]λ−1 h 1 (y/λ)dμ(y)
−∞
 ∞
= [g(x − λs) − g(x)]h 1 (s)dμ(s)
−∞

where we have changed variables y = λs in the penultimate step. Since g is in L ∞ (R),


the integrand is essentially bounded, and we can apply the Lebesgue dominated
convergence theorem with λ → 0+ . The integrand tends to zero as g is continuous
at x. 

Proposition 3.6.3 For 1  p < ∞, let f ∈ L p (R). Then

lim  f ∗ h λ − f  p = 0.
λ→0+

Proof It is clear that h λ ∈ L q (R) for all q  1 because



2 |λ|
h λ (t) = .
π λ2 + t 2

Choose q such that


1 1
+ = 1.
p q

Now,
 ∞  ∞
( f ∗ h λ )(x) − f (x) = f (x − y)h λ (y)dμ(y) − f (x)h λ (y)dμ(y)
−∞ −∞
 ∞
= ( f (x − y) − f (x))h λ (y)dμ(y)
−∞
 ∞
1/ p 1/q
= ( f (x − y) − f (x))h λ (y)h λ (y)dμ(y).
−∞

Applying Hölder’s inequality,


162 3 Fourier Transforms

|( f ∗ h λ )(x) − f (x)|
 ∞ 1/ p  ∞ 1/q
 | f (x − y) − f (x)| h λ (y)dμ(y)
p
h λ (y)dμ(y)
−∞ −∞
 ∞ 1/ p
 | f (x − y) − f (x)| p h λ (y)dμ(y)
−∞

because  ∞
h λ (y)dμ(y) = 1.
−∞

Taking p-th powers and integrating over x, we get


 ∞  ∞
( f ∗ h λ ) − f  pp  | f (x − y) − f (x)| p h λ (y)dμ(y)dμ(x).
−∞ −∞

p
Letting g(y) =  f y − f  p , we have by Fubini’s theorem that the double integral
is equal to  ∞
g(y)h λ (y)dμ(y).
−∞

By Proposition 3.6.1, the map y → f y is a uniformly continuous map from R to


L p (R). Thus g is a continuous map. Moreover,

|g(y)|1/ p   f y  p +  f  p = 2 f  p

so that g is bounded. The previous proposition can be applied to g. That is,

lim (g ∗ h λ )(x) = g(x).


λ→0+

Applying this to x = 0, we get


 ∞  ∞
lim+ g(y)h λ (y)dμ(y) = lim+ g(−y)h λ (y)dμ(y) = g(0) = 0.
λ→0 −∞ λ→0 −∞

This completes the proof. 

Proposition 3.6.4 Let 1  p  ∞. If f n ∈ L p (R) is a Cauchy sequence converging


to f ∈ L p (R), then there is a subsequence λn such that f λn converges to f almost
everywhere.

Proof Exercise.
We are now ready to prove the inversion theorem.

Theorem 3.5 Let f ∈ L 1 (R) and suppose fˆ ∈ L 1 (R). Then


3.6 The Inversion Theorem 163
 ∞
f (x) = fˆ(t)ei xt dμ(t) a.e.
−∞

Proof By Proposition 3.5.1, we have


 ∞
( f ∗ h λ )(x) = H (λt) fˆ(t)ei xt dμ(t)
−∞

and by the dominated convergence theorem,


 ∞
lim+ ( f ∗ h λ )(x) = fˆ(t)ei xt dμ(t) = g(x) (say)
λ→0 −∞

because fˆ ∈ L 1 (R).
But Proposition 3.6.3 says

lim  f ∗ h λ − f 1 = 0.
λ→0+

Proposition 3.6.4 says there is a subsequence λn → 0 such that

lim f ∗ h λn (x) = f (x) a.e.


n→∞

But the left hand side is g(x). Thus g(x) = f (x) a.e. This completes the
proof. 

Here is the promised short proof of the inversion theorem. Let g (x) = e− x /2
2 2
.
Consider the absolutely convergent double integral:
 ∞  ∞
f (u)eiw(x−u) e− w 2 /2
2
I (x) = dμ(u)dμ(w). (3.2)
−∞ −∞

By Fubini’s theorem, we can evaluate the inner integral and deduce that
 ∞
fˆ(w)eiwx e− x /2 dμ(w).
2 2
I (x) =
−∞

Since fˆ ∈ L 1 (R), we can apply the dominated convergence theorem to deduce that
 ∞
lim I (x) = fˆ(w)eiwx dμ(w).
→0 −∞

On the other hand, interchanging the order of integration in (3.2), we have


 ∞
I (x) = f (u)
g (x − u)dμ(u). (3.3)
−∞
164 3 Fourier Transforms

A straightforward computation using (5) of Theorem 3.3 shows that

1
e−x /2
2 2
g (x) =
 .

Changing variables by setting u = x + v in (3.3), we have


 ∞
f (x + v)e−v /2
2
I (x) = dμ(v).
−∞

Letting → 0 and applying again the dominated convergence theorem gives the limit
is f (x) since the probability integral equals 1. This completes the proof. 

Exercises

1. Prove Proposition 3.6.4.


2. For t > 0, show that
 t
log(1 + t x) 1
d x = [arctan t] log(1 + t 2 ).
0 1+x 2 2

Deduce that  1
log(1 + x) π log 2
dx = .
0 1+x 2 8

[Hint: differentiate under the integral sign.]

3.7 Further Properties of the Fourier Transform

Recall that C0 is the space of continuous functions decaying to zero at infinity. We


have
Theorem 3.6 (Riemann–Lebesgue Lemma) Let f ∈ L 1 (R) then 
f ∈ C0 . That is,

lim 
f (t) = 0.
|t|→∞

Proof Let tn → t. Then


 ∞
fˆ(tn ) − fˆ(t) = f (x) e−itn x − e−it x dμ(x)
−∞

so that
3.7 Further Properties of the Fourier Transform 165
 ∞
| fˆ(tn ) − fˆ(t)|  | f (x)||e−itn x − e−it x |dμ(x)
−∞

which is  2 f 1 . By the dominated convergence theorem, we have

fˆ(tn ) → fˆ(t),

so that fˆ is a continuous function.


Moreover, using eiπ = −1, we get
 ∞
fˆ(t) = f (x)e−it x dμ(x)
−∞
 ∞
=− f (x)e−it (x+π/t) dμ(x)
−∞
 ∞
=− f (u − π/t)e−itu dμ(u)
−∞

by a simple change of variable. Thus,


 ∞
2 fˆ(t) = [ f (u) − f (u − π/t)]e−itu dμ(u)
−∞

so that
|2 fˆ(t)|   f − f π/t 1 → 0

as t → ∞ by an application of Proposition 3.6.1. 

Exercises

1. Let f ∈ C[a, b]. Show that


 b  b
f (x)(sin t x)d x → 0, and f (x)(cos t x)d x → 0,
a a

as t → ±∞.
2. For any smooth function f , show that
 ∞
sin t x
lim f (x) d x = π f (0).
|t|→∞ −∞ x
166 3 Fourier Transforms

3.8 The Plancherel Theorem

The main weakness of the inversion theorem is that f ∈ L 1 (R) does not guarantee
fˆ ∈ L 1 (R). It turns out that by considering functions f ∈ L 1 (R) ∩ L 2 (R), we can
guarantee fˆ ∈ L 2 (R). We will show that L 1 (R) ∩ L 2 (R) is dense in L 2 (R). Using
this fact, we can extend the concept of the Fourier transform to all functions f ∈
L 2 (R). Moreover, we will see that for f ∈ L 2 (R), we have  f 2 =  fˆ2 so that the
Fourier transform is an isometry on L 2 (R).

Theorem 3.7 (The Plancherel theorem) To each f ∈ L 2 (R) we can associate fˆ ∈


L 2 (R) (which is sometimes called the Plancherel transform of f ) such that
(1) if f ∈ L 1 (R) ∩ L 2 (R), then
 ∞
fˆ(x) = f (t)e−i xt dμ(t)
−∞

(that is, the Plancherel transform extends the Fourier transform);


(2)  f 2 =  fˆ2 (that is, the Plancherel transform is an isometry);
(3) the map f → fˆ is a Hilbert space isomorphism;
A A
(4) if φ A (t) = −A f (x)e−i xt dμ(x) and ψ A (x) = −A fˆ(t)ei xt dμ(t), then

φ A − fˆ2 → 0 and ψ A − f 2 → 0 as A → ∞.

Proof Fix f ∈ L 2 (R). Suppose first f ∈ L 1 (R). Define


f (x) := f (−x)

and
g= f ∗ 
f ∈ L 1 (R).

Thus,  ∞
g(x) = f (x − y) f (−y)dμ(y) = ( f −x , f ).
−∞

From the fact that the map x → f x is a uniformly continuous map from R to
L p (R) (for any 1  p < ∞), and the fact that the inner product is a continuous map,
we see that g is continuous. Moreover, by the Cauchy–Schwarz inequality,

|g(x)|   f 2 .

In other words, g is bounded. We can therefore apply Proposition 3.6.2 to deduce

lim (g ∗ h λ )(0) = g(0) = ( f, f ) =  f 22 .


λ→0
3.8 The Plancherel Theorem 167

On the other hand, the definition of g implies that

ĝ = fˆ · f˜ˆ = | fˆ|2  0

so that H (λt)ĝ(t) is non-negative and monotonically decreasing as λ → 0+ .


By the monotone convergence theorem,

 f 22 = lim (g ∗ h λ )(0)


λ→0
 ∞
= lim H (λt)ĝ(t)dμ(t)
−∞ λ→0
 ∞
= | fˆ(t)|2 dμ(t) =  fˆ22 .
−∞

Thus, fˆ ∈ L 2 (R).
Next, we need to show that

Y = { fˆ : f ∈ L 1 (R) ∩ L 2 (R)}

is dense in L 2 (R). We just proved that Y ⊆ L 2 (R). Since Y is a Hilbert subspace of


the Hilbert space L 2 (R), we can decompose it as

L 2 (R) = Y ⊕ Y ⊥ .

It suffices to show that Y ⊥ = {0}. Observe that the map

x → eiαx H (λx)

defines a map into L 1 (R) ∩ L 2 (R), for all α ∈ R and λ > 0. Thus the Fourier trans-
form of these functions, namely
 ∞
t → h λ (α − t) = ei(α−t)x H (λx)dμ(x),
−∞

defines a function into Y . If w ∈ Y ⊥ , then (v, w) = 0 for all v ∈ Y . In particular,


 ∞
0= h λ (α − t)w(t)dμ(t) = (h λ ∗ w)(α).
−∞

Recall that for f ∈ L p (R), lim  f ∗ h λ − f  = 0. Thus w = 0 a.e. Therefore,


λ→0
Y is dense in L 2 (R).
Let us temporarily denote fˆ by Φ( f ). By our discussion above, Φ is an L 2 -
isometry from one dense subspace of L 2 , namely L 1 ∩ L 2 onto another, namely Y .
Elementary Cauchy sequence arguments now imply that Φ extends to an isometry
168 3 Fourier Transforms

Φ of L 2 onto L 2 . This suffices to prove (1) and (2). The Hilbert space isomorphism
follows from Parseval’s formula:

( f, g) = ( fˆ, ĝ).

To prove this, we use the following identity:

4(x, y) = x + y2 − x − y2 + ix + i y2 − ix − i y2

which expresses the inner product in terms of norms.


Finally, to prove (4), let k A be the characteristic function of the interval [−A, A].
Then k A f ∈ L 1 (R) ∩ L 2 (R) and φ A =  k A f . But  f − k A f 2 → 0 as A → ∞, and
so taking Fourier transform and the result of (3), we get  fˆ − φ A 2 → 0. The other
limit is proved similarly. 

Exercises

1. Prove that  ∞
eit x dt
= πe−|x| .
−∞ 1 + t2

2. Suppose f, g ∈ L 1 (R) ∩ L 2 (R) are such that fˆ(x) = ĝ(x) a.e. Show that f (x) =
g(x) a.e.
3. Extend Plancherel’s theorem for functions f ∈ L 2 (Rk ).

3.9 The Uncertainty Principle

Recall the following fundamental inequality which is ubiquitous in mathematics. We


have already seen this in different settings but now we state it in the context of an
inner product space. We also need to highlight the case of equality in the inequality.

Theorem 3.8 (The Cauchy–Schwarz inequality) If V is an inner product space (over


R or C) then
|(v, w)|  vw

for all v, w ∈ V with equality if and only if v is a scalar multiple of w.

Proof Decomposing v into its components parallel and perpendicular to w, we can


write
v = λw + (v − λw)
3.9 The Uncertainty Principle 169

for some scalar λ. For v − λw to be perpendicular to w, we need

0 = (v − λw, w) = (v, w) − λw2 .

That is,
(v, w)
λ= .
w2

Now, (v − λw, v − λw)  0 for any λ, so that

v2 − λ(v, w) − λ(w, v) + |λ|2 w2  0.

With our choice of λ in particular, we deduce

|(v, w)|2 |(v, w)|2 |(v, w)|2


v2 − − + w2  0.
w2 w2 w4

In other words,
|(v, w)|  vw.

Clearly, equality can occur if and only of v = λw, for any λ. 

The celebrated Heisenberg uncertainty principle, which is a cornerstone of quan-


tum mechanics, is an immediate consequence of the Cauchy–Schwarz inequality
combined with basic Fourier theory, once we understand the dictionary that trans-
lates the concepts of physics into mathematical language. When it was discovered
by Werner Heisenberg (1901–1976) it created a sensation in that it demonstrated an
inherent obstruction to observations in physics. Heisenberg called his new theory
“matrix mechanics.” A parallel theory was discovered by Erwin Schrödinger (1887–
1961) at about the same time. Schrödinger called his theory “wave mechanics”,
and we now know that both theories are mathematically equivalent. Indeed, from a
mathematical standpoint, Heisenberg’s uncertainty principle states that a function F
 cannot both have compact support. To prove this, it is
and its Fourier transform F
convenient to introduce the following class of functions.

Definition 3.2 We define the Schwartz space2 S to be the space of infinitely dif-
ferentiable functions F : R → C such that

|x k F () (x)| → 0

as |x| → ∞, for all non-negative integers k and .

2 Notice the “t” in the spelling of Schwartz’s name. The space is named after Laurent Schwartz
(1915–2002), who is different from H.A. Schwarz (1848–1921) of the Cauchy–Schwarz inequality.
170 3 Fourier Transforms

In the following discussion, it is convenient to use the following definition of the


Fourier transform:  ∞
 =
ψ(x) ψ(t)e−2πit x dt.
−∞

Theorem 3.9 (Heisenberg’s uncertainty principle) Let ψ be a Schwartz function


satisfying ψ2 = 1. Then
 ∞   ∞ 
 2 dx 1
x |ψ(x)| d x
2 2
x |ψ(x)|
2

−∞ −∞ 16π 2

with equality if and only if ψ(x) = Ae−Bx for some B > 0 and
2


2B
|A| =2
.
π

Proof Writing  
∞ ∞
1= |ψ(x)| d x =
2
ψ(x)ψ(x) d x.
−∞ −∞

We integrate by parts, to get


∞  ∞
d  
xψ(x)ψ(x) − x ψ(x)ψ(x) d x.
−∞ −∞ dx

Because ψ ∈ S, the first term equals zero. Thus,


 ∞  
1=− x ψ(x)ψ (x) + ψ (x)ψ(x) d x
−∞

so that  ∞
12 |x||ψ(x)||ψ (x)| d x.
−∞

By the Cauchy–Schwarz inequality,


 ∞ 1/2  ∞ 1/2
12 x 2 |ψ(x)|2 d x |ψ (x)|2 d x .
−∞ −∞

By the inversion theorem,


 ∞
ψ(x) =  2πit x dt
ψ(t)e
−∞

so that
3.9 The Uncertainty Principle 171
 ∞
ψ (x) =  2πit x dt
(2πit)ψ(t)e
−∞


In other words, ψ is the (inverse) Fourier transform of (2πit)ψ(t).
By Parseval’s formula, the L -norm of ψ is equal to
2

 ∞
 2 dt.
4π 2 t 2 |ψ(t)|
−∞

Thus
 ∞ 1/2  ∞ 1/2
1  2(2π) x |ψ(x)| d x
2 2  2 dx
x |ψ(x)|
2
−∞ −∞

from which the main inequality emerges.


In our application of the Cauchy–Schwarz inequality, equality can occur if and
only if
ψ (x) = λxψ(x)

for some scalar λ. This is an ordinary differential equation which is easily solved:

ψ(x) = Ae−Bx .
2

Substituting this back gives the final result. 

In quantum mechanics, |ψ(x)|2 d x represents the probability density that a particle


is at position x ∈ R. Thus, the probability that such a particle lies in the interval [a, b]
is given by
 b
|ψ(x)|2 d x.
a

Our best guess for the particle’s position is then given by the expectation
 ∞
μ= x|ψ(x)|2 d x
−∞

and the error (or uncertainty) involved in this guess is given by the variance
 ∞
(x − μ)2 |ψ(x)|2 d x.
−∞

A similar analysis holds for the momentum of the particle. Indeed, the probability
that the momentum of a particle lies in (a, b) is given by
172 3 Fourier Transforms
 b
 2 d x.
|ψ(x)|
a

The expectation and variance are defined as before. Without loss of generality,
we can normalize our functions so that the expectations and variances of both the
position and the momentum are zero. With this normalization, we see that Heisenberg
uncertainty principle establishes a lower bound for the product of these variances.
In other words, any attempt to lower the error in our observation of position
increases the error in estimating momentum and vice versa. Though the theorem
originally arose in the context of quantum mechanics, we see that it is really a
theorem about Fourier transforms.
In the terminology of physics, the uncertainty principle is written as


(uncertainty of position) × (uncertainty of momentum) ≥ ,
16π 2
where  denotes Planck’s constant.

Exercises

1. The uncertainty principle can also be formulated in terms of the Hermite operator
H defined as
d2 f
H ( f ) := − 2 + x 2 f.
dx

Show that (H ( f ), f ) ≥ ( f, f ) for all f in the Schwartz space with the usual
inner product.
2. Suppose f is a continuous function in L 1 (R). Show that if f and fˆ both have
compact support, then f is identically zero.

3.10 Trigonometric Polynomials

A “polynomial” of the form


N
an e2πinx
n=−N

represents a continuous function on the space R/Z and is called a trigonometric


polynomial in that it is a “polynomial” in e2πi x . We have already seen these “poly-
nomials” in Sect. 2.5 in our brief study of Fourier series. Here, we will introduce the
impotant Féjer kernel and give a constructive proof of Theorem 2.16 using this kernel
3.10 Trigonometric Polynomials 173

as well as the properties of convolutions that were nascent in our earlier treatment.
The following approximation theorem is due to Weierstrass.
Theorem 3.10 Let f ∈ C(R/Z), that is, f is a continuous function on R/Z. Let
ε > 0 be given. Then there is a trigonometric polynomial p such that

 f − p∞  ε.

To prove this, we introduce the following useful concept. Let ε > 0 and 0 < δ <
1/2. A function f ∈ C(R/Z) is said to be a periodic (ε, δ) approximation to the
identity if
1
(a) f (x)  0 ∀x ∈ R and 0 f (x) d x = 1;
(b) f (x) < ε ∀δ  |x| < 1 − δ.
In other words, the “bulk” of the continuation of f (x) occurs in [0, δ) and (1 −
δ, 1].
Lemma 3.1 For every ε > 0 and 0 < δ < 1/2, there exists a trigonometric polyno-
mial P which is an (ε, δ) approximation to the identity.
Proof We use the Féjer kernel:

N  
|n| 2πinx
FN (x) = 1− e .
n=−N
N

Clearly, FN (x) is a trigonometric polynomial. It is easy to see that (see Exercise


1 below)
N −1 2
1
FN (x) = e2πinx ,
N n=0

an identity easily proved by induction. But the sum

N −1
e2πinx
n=0

is a geometric series which can be summed as (for x ∈


/ Z)

sin N πx
eπi(N −1)x ·
sin πx
so that  2
sin N πx 1
FN (x) = ,
sin πx N

for x ∈
/ Z. If x ∈ Z, then FN (x) = N . In any case, FN (x)  0 for any x. Now
174 3 Fourier Transforms

 N  1
1
|n|
FN (x) d x = 1− e2πinx d x = 1.
0 n=−N
N 0

But | sin π N x|  1 so that

1 1
|FN (x)|  
N | sin πx|2 N (sin πδ)2

for δ < |x| < 1 − δ because the sine function is increasing on [0, π/2] and decreasing
on [π/2, π].
Thus, by choosing N sufficiently large, we can make

|FN (x)|  ε

for all δ < |x| < 1 − δ. This completes the proof. 

Proof of Theorem 3.10 Let f ∈ C(R/Z). Since f is continuous on [0, 1], f is


bounded so that | f (x)|  M ∀x ∈ C(R/Z).
Let ε > 0. As f is uniformly continuous, there is a δ > 0 such that

| f (x) − f (y)|  ε whenever |x − y|  δ.

By Lemma 3.9, we can find a trigonometric polynomial P which is an (ε, δ)


approximation to the identity. Then f ∗ P is also a trigonometric polynomial because
 1
f ∗ P(x) = f (x − y)P(y) dy
0
 1 N
= f (x − y) an e2πiny dy
0 n=−N
N  1
= an f (x − y)e2πiny dy
n=−N 0

N  1
= an f (t)e2πin(x−t) dt
n=−N 0

N  1 
= an f (t)e−2πint dt e2πinx
n=−N 0

which we may write as


N
an fˆ(n)e2πinx
n=−N
3.10 Trigonometric Polynomials 175
1
where fˆ(n) = 0 f (t)e−2πint dt in the sense of classical Fourier series.
Thus,
 1  1
| f (x) − ( f ∗ P)(x)| = f (x)P(y) dy − f (x − y)P(y) dy
0 0

because  1
P(y) dy = 1.
0

Thus, as P is non-negative,
 1
| f (x) − ( f ∗ P)(x)|  | f (x) − f (x − y)|P(y) dy.
0

The right hand side can be split into three integrals:


 δ
| f (x) − f (x − y)|P(y) dy +
0

 1−δ  1
| f (x) − f (x − y)|P(y) dy + | f (x) − f (x − y)|P(y) dy.
δ 1−δ

By the uniform continuity of f , the first and the last integrals are
 δ  1
ε P(y) dy + ε P(y) dy  2ε
0 1−δ

1
because 0 P(y) dy = 1.
The middle integral is bounded by
 1−δ
M P(y) dy  Mε
δ

by the property of the trigonometric polynomial P. Hence

 f − f ∗ P∞  (2M + 2)ε.

As M is fixed and ε is arbitrary, we can make f ∗ P arbitrarily close to f under


the sup norm. This completes the proof. 

The virtue of this proof is that everything can be made explicit. Indeed, given a
continuous function f , the trigonometric polynomial approximations are provided
by f ∗ FN , where FN is the Féjer kernel.
176 3 Fourier Transforms

Exercises

1. Let  
|n| 2πinx
FN (x) = 1− e .
|n|≤N
N

Show that
N −1 2  2
1 1 sin π N x
FN (x) = e 2πinx
= .
N n=0
N sin πx

2. Let f be a Schwartz function. Show that


  
R
|x| ˆ
lim 1− f (x)e2πi xt d x = f (t).
R→∞ −R R

3.11 The Isoperimetric Inequality

An important application of the theory of Fourier series is to the isoperimetric


inequality. Is there a relation between the area enclosed by a simple closed curve
and its length? Here is the answer.
Theorem 3.11 (The isoperimetric inequality) Suppose that C is a simple closed
curve in R2 of length L. If A is the area of the region enclosed by C, then

L2
A≤ ,

with equality if and only if C is a circle.

Proof Our proof is due to Hurwitz. The reader will recall that Green’s theorem gives
us the formula for the area A:

1
A= (x dy − y d x) ,
2 C

where we have parametrized the curve C by γ : [0, 2π] → R2 with γ(t) = (x(t),
y(t)). We first observe that without any loss of generality, we may suppose that
L = 2π because any dilation (x, y) → (λx, λy) increases the area A by λ2 and the
length L by λ. Thus, without any loss of generality, taking λ = 2π/L, we see that
it suffices to show that when L = 2π, then A ≤ π with equality if and only C is a
circle. Utilizing our formula for the length of the curve, we have
3.11 The Isoperimetric Inequality 177
 2π
1
(x (t))2 + (y (t))2 dt = 1.
2π 0

As our curve is closed, the functions x(t), y(t) are periodic with period 2π. We may
write down their Fourier series:

x(t) = an eint , y(t) = bn eint .


n n

Then,
x (t) = an ineint , y (t) = bn ineint .
n n

Parseval’s identity now gives



|n|2 (|an |2 + |bn |2 ) = 1. (3.4)
n=−∞

Now x(t) and y(t) are real valued, so that a−n = an and b−n = bn . Our formula
for the area via Green’s theorem combined with another application of Parseval’s
formula now gives
 2π ∞
1
A= x(t)y (t) − y(t)x (t) dt = π n an bn − bn an .
2 0 n=−∞

Observing that
|an bn − bn an | ≤ 2|an ||bn | ≤ |an |2 + |bn |2 , (3.5)

and since |n| ≤ n 2 , we obtain



A≤π |n|2 (|an |2 + |bn |2 ) ≤ π,
n=−∞

by (3.4). This is the isoperimetric inequality.


Let us now consider the case of equality. We used the inequality |n| ≤ n 2 in the
above proof. But this is a strict inequality for |n| ≥ 2. In other words, if A = π, we
must have an , bn = 0 for |n| ≥ 2, and our two Fourier series become

x(t) = a−1 e−it + a0 + a1 eit , and y(t) = b−1 e−it + b0 + b1 eit .

Now a−1 = a1 , b−1 = b1 so that the Eq. (3.4) gives

2(|a1 |2 + |b1 |2 ) = 1.
178 3 Fourier Transforms

Since we have equality at every stage in (3.5), we deduce |a1 | = |b1 | = 1/2. Thus
writing
1 1
a1 = eiα , and b1 = eiβ ,
2 2
the equality from (3.5) gives

1 = 2|a1 b1 − a1 b1 |

implies | sin(α − β)| = 1. Thus, α − β = kπ/2 where k is an odd integer. We deduce


from this that

x(t) = a0 + cos(α + t), and y(t) = b0 ± sin(α + t),

where the sign in y(t) depends on the parity of (k − 1)/2. These functions parametrize
the circle of radius 1 centered at (a0 , b0 ). This completes the proof. 

Exercises

1. If f is C 1 with period 2π, and


 2π
f (t)dt = 0,
0

show that  
2π 2π
| f (t)| dt ≤
2
| f (t)|2 dt.
0 0

This is called Wirtinger’s inequality.

2. Show that equality can hold in the above inequality if and only if f (t) = A sin t +
B cos t, for some constants A and B.

3.12 Weyl’s Criterion and Uniform Distribution

A sequence of numbers x1 , x2 , · · · in [0, 1] is said to be uniformly distributed (or


equidistributed) if for every interval [a, b] ⊆ [0, 1] we have

#{1  n  N : xn ∈ [a, b]}


lim = b − a.
N →∞ N
3.12 Weyl’s Criterion and Uniform Distribution 179

Another way of stating the equidistribution criterion is as follows. Let I = (a, b)


and χ I (x) the characteristic function of I . Then

N  1
1
lim χ I (xn ) = χ I (x) dμ(x).
N →∞ N 0
n=1

Since step functions are not continuous functions, we would like to replace this
criterion with continuous functions.
To this end, let ε > 0. Then we can find continuous functions

f ε− (x)  χ I (x)  f ε+ (x)

such that  1
f ε± (x) d x = b − a ± ε.
0

We call such functions pseudo-characteristic functions. Thus, it is easy to see that


our equidistribution criterion can be replaced by

N  1
1
f (xn ) → f (x) dμ(x)
N n=1 0

for all continuous functions f .


We leave this equivalence as a simple exercise for the reader.
We are now ready to prove Weyl’s criterion.

Theorem 3.12 (Weyl, 1916) A sequence of numbers {xn }∞


n=1 with x n ∈ [0, 1] is
uniformly distributed if and only if

N
1
lim e2πimxn = 0 ∀m = 0, m ∈ Z.
N →∞ N n=1

Proof Recall that from our previous remark, we know that {xn }∞
n=1 is equidistributed
if and only if
N  1
1
lim f (xn ) = f (x) d x
N →∞ N 0
n=1

for all continuous functions f . In particular, we can apply this to the continuous
function
f m (x) = e2πimx

and so if {xn }∞
n=1 is equidistributed, then
180 3 Fourier Transforms

N  1
1
lim f m (xn ) = e2πimx d x = 0.
N →∞ N 0
n=1

To prove the converse, we recall from the previous section that trigonometric
polynomials are dense in C[0, 1]. Given a continuous function f , let ε > 0 be fixed.
We can find a trigonometric polynomial P(x) such that

P(x) = am e2πimx
|m|R

and
sup | f (x) − P(x)|  ε.
x∈[0,1]

Thus,
N N
1 1
f (xn ) − P(xn )  ε
N n=1
N n=1

and
N N
1 1
lim am e2πimxn = am lim e2πimxn .
N →∞ N N →∞ N
n=1 |m|R |m|R n=1

The inner limits are all zero except for the case m = 0. On the other hand,
 1  1
f (x) d x − P(x) d x  ε
0 0

1
and 0 P(x) d x = a0 .
From our construction in Theorem 3.10, we see that
 1
a0 = f (x) d x.
0

Thus,
N  1
1
lim f (xn ) = f (x) d x
N →∞ N 0
n=1

as desired. 

Example 3.2 If θ ∈ Q, the sequence of fractional parts {nθ}∞ n=1 is not equidis-
/ Q, the sequence of fractional parts {nθ}∞
tributed. If θ ∈ n=1 is equidistributed.
We apply Weyl’s criterion: indeed, if θ = p/q (say), then
3.12 Weyl’s Criterion and Uniform Distribution 181

N
e2πiq(np/q) = N
n=1

so that the Weyl’s limit is 1 for m = q. This proves the first part of the assertion. For
the second part, we see that
N
e2πimnθ
n=1

is a geometric sum easily summed to be

(e2πimθ ) N − 1
e2πimθ ·
e2πimθ − 1

which for m = 0 is bounded. Thus the Weyl limit is zero. 

Exercises

1. As e is irrational, the sequence of fractional parts {ne} is equidistributed mod 1.


Show that
lim {n!e} = 0,
n→∞

and deduce that the subsequence {n!e} is not equidistributed mod 1.


2. Prove the Euler–Maclaurin summation formula: if f (t) is a complex-valued
function with continuous derivative on 1 ≤ t ≤ N , with N a natural number, then

N  N  N
1 1
f (n) = f (t)dt + ( f (1) + f (N )) + ({t} − ) f (t)dt.
n=1 1 2 1 2

3. Show that the sequence {log n} is not equidistributed mod 1. [Hint: apply the
Euler–Maclaurin summation formula with f (t) = e2πi log t .]
4. Let a and b be integers with a < b and suppose that f is twice differentiable on
[a, b]. For all x ∈ [a, b] suppose that either f (x) ≥ δ > 0 or f (x) ≤ −δ < 0
holds. Then,

b  
2πi f (n) 4
e ≤ | f (b) − f (a)| + 2 √ +3 .
n=a δ

[Hint: apply the Euler–Maclaurin summation formula.]


5. Let α satisfy 0 < α < 1. Prove that the sequence {n α } is uniformly distributed
mod 1.
6. Prove that the sequence {n log n} is uniformly distributed mod 1.
182 3 Fourier Transforms

3.13 Fourier Series

Suppose f : R/Z → C is a continuous function. We would like to investigate if f


can be written as a Fourier series
?
f (x) = an e2πinx .
n∈Z

If such a series exists, we see that proceeding formally


  
1
an e 2πinx
e−2πimx d x = am .
0 n∈Z

In other words,  1
am = f (x)e−2πimx d x.
0

Motivated by this observation, we make the following definition. Given a function


f ∈ C(R/Z), we define its n-th Fourier coefficient as
 1
fˆ(n) := f (x)e−2πinx d x
0

and inquire when does the Fourier series

fˆ(n)e2πinx
n∈Z

converge pointwise to f (x)? What condition do we need to impose on f ?


To this end, we begin with the Fourier series analogue of the Riemann–Lebesgue
lemma which we saw earlier in the context of Fourier transforms:

Lemma 3.2 (Riemann–Lebesgue lemma) For f ∈ C(R/Z),

lim 
f (n) = 0.
|n|→∞

Proof We take a trigonometric polynomial P such that

|| f − P||∞ < .

This we can do by Theorem 3.10. Then,


 2π  2π
 1 −2πinx 1
f (n) = f (x)e dx = ( f (x) − P(x))e−2πinx d x
2π 0 2π 0
3.13 Fourier Series 183

for n sufficiently large (depending on P) because P being a trigonometric polyno-


mial, all its Fourier coefficients are zero after some point. But then, as the integrand
is less than , we see that | f (n)| < . Since is arbitrary, this completes the proof.


To study convergence of Fourier series, we introduce the Dirichlet kernel:

D N (x) = e2πinx .
|n|N

Clearly, D N (x) ∈ C(R/Z). Also


 1
D N (x) d x = 1.
0

Finally, by summing the geometric series we find

sin(2N + 1)πx
D N (x) = = cos 2 N πx + cot πx · sin 2 N πx
sin πx

for x = 0. For x = 0, D N (0) = 2N + 1.

Theorem 3.13 Suppose f ∈ C(R/Z) is differentiable and real valued. Then

f (x) = fˆ(n)e2πinx .
n∈Z

Proof Let

S N (x, f ) = fˆ(n)e2πinx
|n|N
 1
= e 2πinx
f (t)e−2πint dt
|n|N 0
 1
= f (t) e2πin(x−t) dt
0 |n|N
= f ∗ D N (x).

Now,
 1
S N (x, f ) − f (x) = f (t)D N (x − t) dt − f (x)
0
 1  1
= f (x − u)D N (u) du − f (x)D N (u) du
0 0
184 3 Fourier Transforms

so that
 1
S N (x, f ) − f (x) = ( f (x − u) − f (x)){cos 2 N πu + cot πu · sin 2 N πu} du.
0

Let
g1 (u) = f (x − u) − f (x)

and
g2 (u) = g1 (u) cot πu.

Both are continuous functions since f is differentiable. To say that g2 (u) is a


continuous function is tantamount to saying f is differentiable. To see this, note

cos πu f (x − u) − f (x) πu f (x)


{ f (x − u) − f (x)} = cos πu → as u → 0.
sin πu πu sin πu π
Lemma 3.2 applies to give

|S N (x, f ) − f (x)| = |Re ĝ1 (N ) + Im ĝ2 (N )|.

so that |S N (x, f ) − f (x)| → 0 as N → ∞, by the Riemann–Lebesgue lemma. 

Exercises

1. With D N denoting the Dirichlet kernel,

sin(N + 1/2)x
D N (x) = ,
sin(x/2)

show that the Féjer kernel FN (x) can be written as

1
(D0 (x) + D1 (x) + · · · + D N −1 (x)) .
N

2. Suppose that f is periodic and of class C k . Show that

fˆ(n) = o(1/|n|k ),

as |n| tends to infinity.


3.14 The Poisson Summation Formula 185

3.14 The Poisson Summation Formula

We will now study how to make the following heuristic argument rigorous. As earlier,
for f ∈ L 1 (R) we define
 ∞
fˆ(t) = f (x)e−2πi xt d x.
−∞

This is a minor variation of the Fourier transform where i xt has been replaced
by 2πi xt in the exponential. This is motivated by a desire to have a more elegant
formulation of the Poisson formula.
Given f ∈ L 1 (R), we define

F(x) = f (n + x).
n∈Z

Then F(x + 1) = F(x) ∀x ∈ R so that F is periodic and so can be viewed as a


function defined on R/Z. Assuming some “suitable” conditions, we can expand F
as a Fourier series:
F(x) = F̂(n)e2πinx .
n∈Z

But,
 1
F̂(n) = F(x)e−2πinx d x
0
 1
= f (m + x)e−2πinx d x
0 m∈Z
 1
= f (m + x)e−2πinx d x.
m∈Z 0

Changing variables in the integral by setting u = m + x, we obtain


 m+1
f (u)e−2πin(u−m) du
m∈Z m
 m+1
= f (u)e−2πinu du
m∈Z m
 ∞
= f (u)e−2πinu du = fˆ(n).
−∞

Thus,
f (m + x) = fˆ(n)e2πinx .
m∈Z n∈Z
186 3 Fourier Transforms

Setting x = 0, we obtain what is usually termed as the Poisson summation for-


mula:
f (n) = fˆ(n),
n∈Z n∈Z

which is evidently a beautiful identity.


To what extent can this be made rigorous? Firstly, we need

f (n + x)
n∈Z

to be absolutely convergent to ensure convergence of the series. Secondly, we want to


expand this as a Fourier series. To do so, it suffices to have F(x) to be differentiable.
This may necessitate that f (x) is differentiable. We would need

fˆ(n)
n∈Z

to be absolutely convergent.
Now, to ensure F(x) is differentiable, we need to study

F(x + h) − F(x) = { f (n + x + h) − f (n + x)}.


n∈Z

Since we are assuming f (x) is differentiable, we have by the mean value theorem

f (n + x + h) − f (n + x) = h f (ξn,x )

for some ξn,x ∈ [n + x, n + x + 1].


In order to take the limit
F(x + h) − F(x)
lim
h→0 h

we would need some form of absolute convergence of

| f (ξn,x )|.
n∈Z

All of these conditions can be ensured if we impose the following conditions.


Suppose f ∈ C 2 (R), that is, f is twice differentiable and that for some δ > 0,

| f (x)|, | f (x)|, | f (x)|  c(1 + |x|)−1−δ

for some fixed constant c > 0.


3.14 The Poisson Summation Formula 187

Theorem 3.14 (Poisson summation formula) Under the above conditions on f , we


have
f (n) = fˆ(n).
n∈Z n∈Z

Proof Let us observe first that under our assumptions,

f (n) and f (n + x)
n∈Z n∈Z

are absolutely convergent for 0  x  1. Also,


 ∞
fˆ(t) = f (x)e−2πi xt d x.
−∞

Integrating by parts the integral, we see that it is for t = 0,


∞  ∞
e−2πi xt e−2πi xt d x
f (x) + f (x) .
−2πit −∞ −∞ 2πit

Another integration by parts shows



1
| fˆ(t)|  f (x)e−2πi xt d x
2π|t|2 |x|1

for |t|  1 (say). Our estimates on f (x) ensure absolute convergence of the integral.
Therefore,
fˆ(n)
n∈Z

is absolutely convergent.
The differentiability of f along with our estimates on f, f now ensure F(x) is
differentiable so that it admits Fourier series expansion.
With those remarks in place, the heuristic “proof” now can be made rigorous. We
leave the details to the reader. 

The Poisson summation formula can be generalized to arbitrary lattices in Rn .


Such a result finds numerous applications in number theory and in particular, in the
theory of modular forms. On the other hand, there are several variants of the Poisson
summation formula. The following version is useful.

Theorem 3.15 (Poisson summation formula) Let f be a continuous function on R


such that for some δ > 0, and some fixed constant c,
c
| f (x)|  .
(1 + |x|)1+δ
188 3 Fourier Transforms

Suppose further that


| fˆ(m)| < ∞.
m∈Z

Then
fˆ(m)e2πimx = f (n + x).
m∈Z n∈Z

We leave the proof as an exercise to the reader and turn our attention to various
applications.
Recall that if F(x) = e−c|x| with Re (c) > 0, then

2c
F̂(u) =
c2 + 4π 2 u 2

which follows easily from our earlier calculation of the Fourier transform of the
continuous function e−|x| . The conditions of the theorem are satisfied, and we obtain
∞ ∞
2c
e−c|n| = 2 + 4π 2 n 2
.
n=−∞ n=−∞
c

The left hand side is a geometric progression and equal to



2e−c 1 + e−c ec + 1
1+2 e−cn = 1 + −c
= −c
= c .
n=1
1−e 1−e e −1

Therefore,

ec + 1 2c
= . (∗)
ec − 1 n=−∞ c2 + 4π 2 n 2

Putting c = 2π gives
  ∞ ∞
e2π + 1 1 1
π 2π = =1+2 .
e −1 n=−∞
n 2+1
n=1
n 2+1

Thus

1 πe2π + π − (e2π − 1) πe2π − e2π + π + 1
2 = = .
n=1
n2 +1 e −1
2π e2π − 1

A famous theorem of Gelfond and Schneider states that eπ is transcendental. In


the 1970s Chudnovski proved that π and eπ are algebraically independent. Thus, the
sum is a transcendental number.
3.14 The Poisson Summation Formula 189

The celebrated Basel problem asked for the evaluation of



1
n=1
n2

which was finally solved by Euler who showed that it equals π 2 /6. This can be
deduced from our result.
Indeed, from (∗), we have
  ∞ ∞
1 2 1 1 1
1+ c = = + 2 .
2c e −1 n=−∞
c 2 + 4π 2 n 2 c 2
n=1
c 2 + 4π 2 n 2

We would like to set c = 0 in the last sum. Therefore


∞    
1 1 1 2 1
= lim 1 + − .
2π 2 n=1
n2 c→0 2c ec − 1 c2

Now
c2 c3
ec − 1 = c + + + ···
2! 3!
so that
⎛ ⎞
1 ⎝ 2 1
1+  ⎠ − 2
2c c2
c 1 + 2 + 6 + ···
c c
   2 
1 2 c c2 c c2 1
= 1+ 1− − + ··· + + + ··· + ··· − 2
2c c 2 6 2 6 c

which simplifies to
  2 
1 1 c c2 1 c 1
+ 1− − − · · · + c2 + + ··· ··· −
2c c2 2 6 2 6 c2
1 1 1
=− + = .
6 4 12
Therefore,

1 π2
2
= .
n=1
n 6

The method can be adapted to deduce that


190 3 Fourier Transforms


1
∈ π 2k Q
n=1
n 2k

which is a celebrated theorem of Euler.


Another application of the Poisson summation formula is due to Riemann. Apply-
ing it to the function F(t) = e−πt , we easily see that if
2


f (t) = e−π(a+t/ x)2

then √ √
fˆ(t) = · e−πt x .
2
xe2πiat x

Therefore,
∞ ∞

−π(a+n/ x)2
√ √
e−πn
2
e = x x+2πian x
. (3.6)
n=−∞ n=−∞

Riemann used this identity to derive the analytic continuation and functional
equation for his famous zeta function:

1
ζ(s) = .
n=1
ns

Riemann’s insight was to study ζ(s) as a function of a complex variable s and


relate this study to the distribution of prime numbers. We will discuss this in more
detail in the next chapter.
In this connection, it is useful to mention the heat equation: let x ∈ R/Z and let
u(x, t) denote the heat at x at time t. It is known that u(x, t) satisfies

∂u ∂2u
=κ 2 ()
∂t ∂x
where κ is a constant (called the conductivity of the ring’s material).
Fourier solved this equation by the method of separation of variables, by assuming
that
u(x, t) = A(x)B(t).

Then () reduces to


A(x)B (t) = κA (x)B(t)

so that
A (x) B (t)
κ = = λ = constant
A(x) B(t)

since the left hand side is a function of x and the right hand side is a function of t.
3.14 The Poisson Summation Formula 191

We therefore get two ordinary differential equations, and they can be solved by
classical methods. The general solution is of the form for A(x) is

A(x) = e2πinx

since A has period 1. For B(t), we therefore get

B(t) = e−4π
2 2
n t
.

The general solution is therefore a linear combination of such functions:



an e−4π
2 2
u(x, t) = n t 2πinx
e .
n=−∞

In this way, we see that the general theta function appears as the solution to the
heat equation.
If in addition, we have an initial condition:

u(x, 0) = f (x)

for some continuous function f (x), then



u(x, 0) = an e2πinx
n=−∞

must be the Fourier series of f so that


 1
an = f (x)e−2πinx d x = fˆ(x).
0

This suggests that if we put

e−4π
2 2
θt (x) = n t 2πinx
e
n∈Z

then we see that e−4π n t is the n-th Fourier coefficient of θt (x) and an e−4π n t is the
2 2 2 2

n-th Fourier coefficient of u(x, t). As an = fˆ(n), and e−4π n t = θ̂t (n), we see that
2 2

u(x, t) = ( f ∗ θt )(x).

In other words, the θ-function enters in a fundamental way to resolve the general
heat equation.
192 3 Fourier Transforms

Exercises

1. Let 
1 − |x| if |x| ≤ 1
g(x) =
0 otherwise.

Show that  2
sin πx

g (x) = .
πx

2. Apply Poisson’s summation formula for the function g in the previous exercise
to deduce that

1 π2
= ,
n=−∞
(n + x) 2 (sin πx)2

whenever x is real and unequal to an integer.


3. Show that

1 π
=
n=−∞
n+x tan πx

when x is real and unequal to an integer.

3.15 A Fourier Analytic Proof of the Central Limit


Theorem

The theory of characteristic functions in probability theory is really the theory of


Fourier transforms reformulated in the vernacular of that theory. The central limit
theorem plays a pivotal role in probability and statistics. With a long and sinuous
journey spanning two centuries, the theorem reached its culmination in the middle
of the twentieth century. The purpose of this section is to highlight that indeed, the
central limit theorem is a theorem about Fourier transforms. We assume the reader
has a minimal background in probability theory so as to hasten our discussion. But
the prerequisites are not formidable and we collect the basic ideas in the exercises at
the end of this section. The reader may consult these in order to follow the narrative.
However, for the sake of clarity, we summarize the dictionary of terms in language
of measure theory that is perhaps familiar to the reader of this text.
In measure theory, we have a measure space Ω and measurable sets A. In probabil-
ity theory, we also have a measure space Ω, often called the sample space with total
measure 1. Measurable sets A of Ω are called events. The measure of A, denoted
P(A), is called the probability of the event A. A measurable function of Ω is called
a random variable. Given a random variable X , the probability distribution is
given by
3.15 A Fourier Analytic Proof of the Central Limit Theorem 193

FX (x) := P(X ≤ x).

If we can write this as  x


FX (x) = f X (t)dt,
−∞

we call f X the density function of X . Clearly,


 ∞
f X (x)d x = 1.
−∞

We say X and Y are independent if

P(X ≤ a; Y ≤ b) = P(X ≤ a)P(Y ≤ b).

If X and Y are independent and have density functions f X and f Y , then X + Y has
density function f X ∗ f Y (see exercises).
We say X and Y are identically distributed if f X = f Y . Given a random variable
X , its expectation denoted E(X ) is defined as

E(X ) := x f X (x)d x.
X

Sometimes, one denotes E(X ) as μ and calls it the mean of X . The variance of X ,
denoted var(X ), is E((X − μ)2 ).
With this terminology and understanding, we are now ready to state and prove
the central limit theorem.

Theorem 3.16 (The central limit theorem) Let X 1 X 2 , ... be independent and iden-
tically distributed random variables with E(X i ) = 0 and var(X i ) = 1 Let Sn =
X 1 + X 2 + · · · + X n . Then,
 β
√ 1 √
e−t /2
2
lim P(α n ≤ Sn ≤ β n) = √ dt.
n→∞ 2π α


Proof We want to determine the density function of Sn / n. Recall that if X has
density f X and Y has density f Y , then X + Y has density f X ∗ f Y , provided X and Y
are independent. Since all the X i ’s are identically distributed with density function
f (say) and the X i ’s are independent, Sn has density function given by

f ∗n := f ∗ · · · ∗ f.

−1

Now if X has density f (t), λX has √ λ f (t/λ) (Exercise). Therefore Sn / n
√ density
∗n
has density g (t) where g(t) = n f ( nt). We want to show
194 3 Fourier Transforms

1
lim g ∗n (t) → √ e−t /2 .
2

n→∞ 2π

Taking Fourier transforms of both sides, this is equivalent to

g (t)n → e−t /2
2
 .

g (t) = 
But  f (t/ n). Taking the Taylor expansion, we have
√ √
f (t/ n) = 
f (0) + 
f (0)t/ n + 
f (0)t 2 /2n + O(1/n 3/2 ).

Now  ∞

f (0) = f (t)dt = 1,
−∞

since f is a probability density function. Also,


 ∞  ∞

f (0) = −i t f (t)e−i xt dt = t f (t)dt = E(X ) = 0
−∞ x=0 −∞

with X = X i by our hypothesis. Moreover, for X = X i ,


 ∞  ∞

f (0) = − t f (t)e
2 −t 2 /2
dt =− t 2 f (t)dt = −var(X ) = −1.
−∞ x=0 −∞

Thus, our Taylor expansion simplifies to

1 − t 2 /2n + O(1/n 3/2 ).

By basic calculus, we immediately see that


  n
t2 1
= e−t /2 .
2
lim 1− +O
n→∞ 2n n 3/2

This completes the proof. 

Exercises

1. If X and Y are independent random variables with density functions f X and f Y ,


respectively, show that X + Y has density function f X ∗ f Y .
2. Show that if X has density f (t), λX has density λ−1 f (t/λ).
3. Prove that
3.15 A Fourier Analytic Proof of the Central Limit Theorem 195
   a
n k 1
e−t /2 dt.
2
lim p (1 − p)n−k = √
n→∞ √ k 2π −a
|k−np|≤a np(1− p)

[Hint: choose the random variables to be Bernoulli trials with Ω = {H, T } and
independent random variables X i such that P(X i = H ) = p and apply the central
limit theorem.]
4. Prove that
n
−n nk 1
lim e = .
n→∞
k=0
k! 2

[Hint: consider X 1 , ..., X n independent random variables with the Poisson distri-
bution and parameter 1.]

References

1. M. Ram Murty, A simple proof that ζ(2) = π 2 /6. Math. Student 88, 113–115 (2019)
2. W. Rudin, Real and Complex Analysis, 3rd edn. (McGraw-Hill, New York, 1987)
Chapter 4
Complex Analysis

4.1 Basic Definitions

Let Ω be an open set of the complex plane C. Suppose that f : Ω → C is differen-


tiable. That is, the limit
f (z) − f (z 0 )
lim
z→z 0 z − z0

exists for every z 0 ∈ Ω. Writing z = x + i y, we can decompose f (z) as

u(x, y) + iv(x, y)

where u(x, y) and v(x, y) are real-valued functions of x and y.


Restricting the limiting process to the two orthogonal directions, namely the x
and y directions, we get

f (x + i y0 ) − f (x0 + i y0 ) f (x0 + i y) − f (x0 + i y0 )


lim = lim .
x→x0 x − x0 y→y0 i(y − y0 )

The first limit is


 
u(x, y0 ) + iv(x, y0 ) − u(x0 , y0 ) − iv(x0 , y0 ) ∂u ∂v
lim = +i .
x→x0 x − x0 ∂x ∂x x=x0 ,y=y0

The second limit is

 
u(x0 , y) + iv(x0 , y) − u(x0 , y0 ) − iv(x0 , y0 ) ∂u ∂v
−i lim = −i +i .
y→y0 y − y0 ∂y ∂ y x=x0 ,y=y0

As these two limits are equal, we have

© Hindustan Book Agency 2022 197


M. R. Murty, A Second Course in Analysis, IMSc Lecture Notes in Mathematics,
https://doi.org/10.1007/978-981-16-7246-0_4
198 4 Complex Analysis

∂v ∂u
=− ,
∂x ∂y

∂u ∂v
= .
∂x ∂y

These are called the Cauchy–Riemann equations. We have shown that if f


is differentiable in an open set Ω ⊆ C, then f must satisfy the Cauchy–Riemann
equations.
∂u ∂v ∂u ∂v
The converse is also essentially true. If , , , exist, are continuous on
∂x ∂x ∂ y ∂ y
Ω and satisfy the Cauchy–Riemann equations, then f is differentiable on Ω. This
is easily seen and left as an exercise.
The reader should understand that although the definition of differentiability of
f is similar to the usual derivative of a function of a real variable, the complex
variable case is much richer. This is underscored by the Cauchy–Riemann equations.
For instance, we see that for an analytic function f (z) = u(x, y) + iv(x, y), the
functions u(x, y) and v(x, y) of real variables satisfy Laplace’s equation:

∂2u ∂2u ∂2v ∂2v


+ 2 =0 and + 2 = 0.
∂x 2 ∂y ∂x 2 ∂y

Functions of a real variable that satisfy Laplace’s equation are called harmonic.
Thus, the real and imaginary parts of a differentiable function are harmonic via the
Cauchy–Riemann equations. Later, we will see that if f is analytic, u and v are, in
addition, infinitely differentiable.

There are many synonyms for the property of differentiability of a function f


defined on an open set in the complex plane. One says f is holomorphic, or analytic,
or regular, and these words are often used interchangeably.
Given an open set Ω of C, we let H (Ω) be the set of all holomorphic functions
on Ω. The set H (Ω) forms a ring under pointwise addition and multiplication.
Moreover, if f ∈ H (Ω) and Ω1 is an open set containing f (Ω), then letting g ∈
H (Ω1 ), we may consider
h = g ◦ f.

We can easily see that h ∈ H (Ω) and that the chain rule holds:

h  = (g  ◦ f ) f  .

We begin our study of holomorphic functions by first considering power series.


A power series is a series of the form


cn (z − z 0 )n ,
n=0
4.1 Basic Definitions 199

where the cn ’s are complex numbers.


For such a series, there is a unique number R ∈ [0, ∞] such that the series con-
verges absolutely and uniformly in the closed disk B(z 0 , R) where

B(z 0 , r ) = {z : |z − z 0 | < r },

/ B(z 0 , R). R is called the radius of convergence. By the classical


and diverges if z ∈
theory of power series, we have various formulas for R:
 
1  cn+1 

= lim sup  
R n→∞ cn 
or
1
= lim sup |cn |1/n ,
R n→∞

the former using the ratio test and the latter from the root test for convergence of
series.
A function f : Ω → C is representable by a power series in Ω if for every open
ball B(z 0 , r ) ⊆ Ω,


f (z) = cn (z − z 0 )n
n=0

for all z ∈ B(z 0 , r ).


Theorem 4.1 If f is representable as a power series in Ω, then f ∈ H (Ω) and
f  is also representable as a power series in Ω and hence is holomorphic. More
precisely, if


f (z) = cn (z − z 0 )n
n=0

for z ∈ B(z 0 , r ), then




f  (z) = ncn (z − z 0 )n−1
n=0

for all z ∈ B(z 0 , r ).


Proof Without any loss of generality, we may suppose z 0 = 0.
Fix w ∈ B(0, r ) ⊆ Ω. Let p be such that

|w| < ρ < r.

Let


g(z) = ncn z n−1 .
n=1
200 4 Complex Analysis

By the root test for convergence, we see that g also converges for all z ∈ Ω.
Now consider

 ∞  
f (z) − f (w) z n − wn
− g(w) = cn − nw n−1
z−w n=1
z−w

for z = w. Note that

z n − wn 
n−1
− nw n−1 = (z − w) kz k−1 w n−k−1
z−w k=1

for all n  2. To see this, we observe that


 n
z − wn
lim − nw n−1 = 0
z→w z−w

so that z − w is a factor of the left-hand side. Thus, we may write

z n − wn
− nw n−1 = (z − w)G(z, w),
z−w

say. Changing z to λz, w to λw shows that

G(λz, λw) = λn−2 G(z, w).

In other words, G(z, w) is a homogeneous polynomial of degree n − 2, so we can


write
 
n−2
G(z, w) = ci j z i wi = αk z k w n−2−k .
i+ j=n−2 k=0

But
z n − wn  n−1
− nw n−1 = z k w n−1−k − nw n−1 .
z−w k=0

We now only need to compare coefficients. That is,


n−2 
n−2
(z − w)G(z, w) = αk z k+1
w n−2−k
− αk z k w n−1−k .
k=0 k=0

We get α0 = 1, αi − αi−1 = 1 for 1  i  n − 3 and finally −αn−2 = 1 − n.


In any case, for |z| < ρ,
4.1 Basic Definitions 201
 n−1  n−1
   n(n − 1) n−2
 k−1 n−k−1 
 kz w < kρn−2 = ρ
  2
k=1 k=1

so that   ∞
 f (z) − f (w) 
 − g(w)   |z − w| n 2 |cn |ρn−2 .
 z−w 
n=2

By the ratio test, the series on the right-hand side converges. Therefore, we may
take limits as z → w to get
f  (w) = g(w)

as desired. 

Applying the above theorem to f  , we deduce f  ∈ H (Ω) and so on. Thus, if f


is representable as a power series, then the differentiability of f implies the existence
of all higher derivatives. We exploit this to prove:

Theorem 4.2 Suppose μ is a complex finite measure on a measure space X , φ is a


complex measurable function on X and Ω ⊆ C is an open set not intersecting φ(X ).
Then,
dμ(ζ)
f (z) :=
X φ(ζ) −z

is representable as a power series in Ω.

Proof Suppose B(a, r ) ⊆ Ω. Since Ω ∩ φ(X ) = 0, we have |φ(ζ) − a|  r for any


ζ ∈ X . Thus, for z ∈ B(a, r ), we have

|z − a| < r  |φ(ζ) − a|

so that  
 z−a 
 
 φ(ζ) − a  < 1.

Therefore, the geometric series


∞ 
 n
1 z−a
·
φ(ζ) − a n=0 φ(ζ) − a

converges to
1 1 1
· z−a = .
φ(ζ) − a 1 − φ(ζ)−a φ(ζ) − z

So by Fubini’s theorem,
202 4 Complex Analysis

∞ 
 n ∞
1 z−a
f (z) := · dμ(ζ) = cn (z − a)n
X φ(ζ) − a n=0 φ(ζ) − a n=0

where
dμ(ζ)
cn = .
X (φ(ζ) − a)n+1

Note that as |z − a| < r  |φ(ζ) − a|, we have

|μ(X )|
|cn | 
r n+1
so that our series converges absolutely. 

Exercises
∂u ∂v ∂u ∂v
1. Let f (z) = u(x, y) + iv(x, y) with z = x + i y. If , , , exist, are
∂x ∂x ∂ y ∂ y
continuous on Ω and satisfy the Cauchy–Riemann equations, then show that f
is differentiable on Ω.
2. Show that the function f (z) = z is not analytic.
3. Verify the Cauchy–Riemann equations for f (z) = z 3 .
4. For z = x + i y, define the differential operator:
 
∂ 1 ∂ ∂
:= +i .
∂z 2 ∂x ∂y

Show that f (z) is analytic if and only if

∂f
= 0.
∂z

5. Show that f (z) = e z is analytic in C.


6. Let w ∈ C be a fixed complex number with |w| < 1. Let

z−w
f (z) = .
1 − wz

Show that f is regular in |z| ≤ 1 and calculate f (w) and f  (w).


7. Find the radius of convergence of

 zn
n! .
n=1
nn
4.1 Basic Definitions 203

8. Prove the following discrete version of integration by parts (often referred to as


partial summation): for any two sequences of numbers an , bn ,


N N −1

S N := an bn = a N B N − Bn (an+1 − an ),
n=1 n=1

where

n
Bn = bj.
j=1

Using this result, show that the power series

∞
zn
n=1
n

converges for every complex number z with |z| = 1 and z = 1.


9. Show that the series
∞
zn
n=1
n(n + 1)

converges for all |z| ≤ 1. What is its radius of convergence? Does the series


zn
n=1

converge for any z with |z| = 1?


10. Let n be a non-negative integer. Define for α ∈ C, the binomial coefficient

α α(α − 1) · · · (α − n + 1)
= .
n n!
Show that the series
∞
α n
z
n=0
n

converges for |z| < 1.


11. Let A be an open set in C, and suppose that f : A → C is analytic. If z 0 ∈ A such
that f  (z 0 ) = 0, show that there are open sets U of z 0 and V of f (z 0 ) such that
f : U → V is a bijection. Moreover, f −1 (z) is analytic with

d −1 1
f (w) =  , where w = f (z).
dw f (z)
204 4 Complex Analysis

(This is the complex analytic version of the inverse function theorem of real
variables.) [Hint: Apply the inverse function theorem from real variables along
with the Cauchy–Riemann equations.]

4.2 Integration over Paths

We have shown that any function representable by a power series in some open subset
of C is holomorphic there. Our next goal is to show that any holomorphic function
has a power series representation.
Definition 4.1 A curve in a topological space X is a continuous map γ : [α, β] →
X . We let γ ∗ be the image of γ. If X = C and the curve γ is piecewise C 1 , then we
call γ a path. A closed path is a path γ : [α, β] → X such that γ(α) = γ(β).
Definition 4.2 If γ : [α, β] → C is a path and f : C → C is continuous on γ ∗ , we
define the path integral of f as
β
f = f (z)dz = f (γ(t))γ  (t) dt.
γ γ α

Note that    
  β
 
f   max∗ | f (z)| |γ  (t)| dt,
 z∈γ
γ α

using the formula for the length of the path.


For any a, b ∈ C, we denote by [a, b] the curve parametrized by γ : [0, 1] → C
given by γ(t) = a + (b − a)t. Thus for a triangle with vertices (a, b, c) in C3 and
orientation of the boundary ∂ given by the order of the vertices, we have

f = f + f + f.
∂ [a,b] [b,c] [c,a]

Note that this integral is invariant under cyclic permutations of the vertices
(a, b, c).
Throughout, Ω is an open set in C.
Definition 4.3 We define the index of z with respect to a closed path γ by

1 dw
Indγ (z) :=
2πi γ w−z

for z ∈ C \ γ ∗ = Ω.
We will show that this is always an integer. Intuitively, the index Indγ (z) mea-
sures how many times γ “winds around z.” Thus, sometimes, it is also called the
4.2 Integration over Paths 205

winding number. Note that γ ∗ is compact and hence lies in a bounded disk D whose
complement D c is connected and hence lies in precisely one unbounded connected
component.
Theorem 4.3 Let γ be a closed path, and let Ω be the complement of γ ∗ . Then,
Indγ (z) ∈ Z ∀z ∈ Ω and is constant on any connected component of Ω. Moreover,
the index is zero on the unbounded component of Ω.
Proof By our previous theorem, Indγ (z) is holomorphic (as it is representable as a
power series) and hence continuous. Thus, constancy on any connected component
will follow once we show it is integer-valued.
Let γ : [α, β] → γ ∗ and fix z ∈ Ω. Then,
β
1 γ  (t) dt
Indγ (z) = .
2πi α γ(t) − z

Observe that Indγ (z) → 0 as |z| → ∞ so that once we show that Indγ (z) is
integer-valued, it will follow that it is zero on the unbounded component. To show
that Indγ (z) is integer-valued, recall that

ew = 1 ⇔ w = 2πin for some integer n.

To see this, write w = u + iv, u, v ∈ R. Then as

ew = eu eiv = eu (cos v + i sin v),

we see that ew = 1 implies u = 0 and v ∈ 2πZ.


Let  s  
γ (t) dt
φ(s) = exp
α γ(t) − z

for α  s  β. Differentiating, we get


 
 γ  (s) s
γ  (t) dt
φ (s) = exp .
γ(s) − z α γ(t) − z

Thus,
φ (s) γ  (s)
=
φ(s) γ(s) − z

which is valid except possibly on a finite set S where γ is not differentiable. Every-
where else, φ (s) is continuous. Thus,

φ (s) φ(s)
= (∗)
γ  (s) γ(s) − z

is continuous on [α, β] \ S.
206 4 Complex Analysis

Now,  
d φ(s) (γ(s) − z)φ (s) − φ(s)γ  (s)
= =0
ds γ(s) − z (γ(s) − z)2

by virtue of (∗).
φ(s)
Thus, is constant on [α, β] \ S.
γ(s) − z
/ γ∗.
But by continuity, this function is constant on all of [α, β] because z ∈
Thus,
φ(s) φ(α) 1
= =
γ(s) − z γ(α) − z γ(α) − z

as φ(α) = 1. Therefore,
γ(s) − z
φ(s) = .
γ(α) − z

But γ is closed, so γ(α) = γ(β). Thus, φ(α) = φ(β) = 1. Therefore, Indγ (z) is
integer-valued. 

Corollary 4.2.1 If γ is a positively oriented circle with center a and radius r , then

1, if |z − a| < r ;
Indγ (z) =
0, if |z − a| > r.

Proof We need to only show Indγ (z) = 1 for |z − a| < r . Parametrize γ : [0, 2π] →
C by γ(θ) = a + r eiθ . Then as Indγ (z) is constant for |z − a| < r ,

2π 2π
1 ir eiθ dθ 1
Indγ (z) = Indγ (a) = = i dθ = 1.
2πi 0 a + r eiθ − a 2πi 0

The corollary makes complete sense from our intuitive understanding of the index
as a winding number.

Exercises

1. Let γ be a path. Show that the function of α defined by

dz
α →
γ z−α

for α not on the path is a continuous function of α.


/ γ ∗ , show that
2. If γ is a closed path and z ∈
4.2 Integration over Paths 207

1 w dw
= zIndγ (z).
2πi γ w−z

3. Consider the curve γ given by γ(t) = (cos t, 3 sin t), 0 ≤ t ≤ π. Show that
Iγ (0) = 2.
4. If γ is the unit circle oriented counterclockwise, show that

cos z sin z
dz = dz = 2πi.
γ z γ z2

5. Prove that for a closed curve γ, we have Iγ (z) = −I−γ (z).


6. If γ1 , γ2 are closed curves, show that Iγ1 +γ2 (z) = Iγ1 (z) + Iγ2 (z).
7. Prove that π
ecos θ cos(sin θ) dθ = π,
0

by considering
ez
dz
γ z

and γ is the unit circle oriented counterclockwise.


8. Prove that if the image of a closed curve γ lies in a simply connected region A
and if z ∈
/ A, then Iγ (z) = 0.

4.3 The Local Cauchy Theorem

By a region, we mean a non-empty connected open subset of the complex plane.


The next theorem is a simple extension of the fundamental theorem of calculus to
holomorphic functions.

Theorem 4.4 If F ∈ H (Ω) and F  is continuous in Ω, then

F  (z) dz = F(γ(β)) − F(γ(α))


γ

for every path γ : [α, β] → Ω.

Proof By the fundamental theorem of calculus,

β
F  (z) dz = F  (γ(t))γ  (t) dt = F(γ(β)) − F(γ(α)).
γ α


208 4 Complex Analysis

Corollary 4.3.1 If F is as in the theorem,

F  (z) dz = 0
γ

for every closed path γ in Ω.

Corollary 4.3.2 For any closed path γ,

z n dz = 0
γ

/ γ∗.
for all n  0; this also holds for n  −2 if 0 ∈

We are now ready to prove what is sometimes called the Cauchy–Goursat theorem:

Theorem 4.5 (Cauchy’s theorem for a triangle). Let be a triangle in the open set
Ω ⊆ C, and let p ∈ Ω. Suppose f is continuous on Ω and f ∈ H (Ω \ { p}). Then,

f (z) dz = 0.

Proof First assume p ∈ / . Let (a, b, c) be the oriented vertices of (see Fig. 4.1),
and let a  , b , c be the corresponding opposite midpoints (see Fig. 4.2).
Now consider the four oriented triangles 1 , 2 , 3 and 4 formed by joining
these midpoints (a, c , b ), (b, a  , c ), (c, b , a  ) and (a  , b , c ), respectively. Then,


4
J := f (z) dz = f (z) dz.
∂ j=1 ∂ j

The orientation is important since the “internal” line integrals are canceled.
For one of the triangles, call it 1 , we have
 
  |J |
 f (z) dz   .
 4
∂ 1

Fig. 4.1 Oriented triangle c


abc

a b
4.3 The Local Cauchy Theorem 209

Fig. 4.2 Oriented triangles c


abc and a  b c

b a

a b
c

Repeating this process inductively, we get a sequence n of nested triangles such


that the length of ∂ n is L/2n , where L is the length of ∂ . Moreover,
 
 
|J |  4  n
f (z) dz  .
∂ n

Let z 0 ∈ ∞n=1
n
. We see that z 0 ∈ , z 0 = p. Thus, for any ε > 0, there is an
r > 0 such that

| f (z) − f (z 0 ) − f  (z 0 )(z − z 0 )|  ε|z − z 0 |

for |z − z 0 | < r , because f is holomorphic in Ω \ { p}.


But for n large, |z − z 0 | < r for z ∈ n . Moreover, for such n, we have |z − z 0 | <
L/2n .
By an earlier corollary,

 
f (z) dz = f (z) − f (z 0 ) − f  (z 0 )(z − z 0 ) dz
∂ n ∂ n

so that  
  Lε L
 f (z) dz   ε |z − z 0 | dz  ·
 2n 2n
∂ n ∂ n

for z ∈ n
. Thus,
L 2ε
|J |  4n · = εL 2 .
4n
Since ε was arbitrary, J = 0.
In the case that p is a vertex of , choose two points of ∂ very close to p and
join with each other and with p, to form a triangle (see Fig. 4.3).
210 4 Complex Analysis

Fig. 4.3 Oriented triangle p


abp

a b

Fig. 4.4 Oriented triangle p


abp

a b

Fig. 4.5 Case when p lies c


inside the triangle

a b

This splits into a very small triangle containing the vertex p and a larger non-
triangular region. This larger non-triangular region can be cut into two triangular
regions, neither of which contain p (see Fig. 4.4).
As before, the integral over is the oriented sum of these integrals over these three
triangles. By our initial case, the integral over the other two integrals is zero. So, the
integral over is simply the integral over the remaining small triangle that contains
the vertex p. Since the triangle can be made arbitrarily small (as f is continuous on
Ω and hence bounded), this integral is zero.
Finally, in the general case that p lies in the interior of the triangle, join lines from
the vertices of to p, so as to split into three triangles, each having p as a vertex
(see Fig. 4.5).
By the previous calculation, the integral is again zero. 

Theorem 4.6 (Cauchy’s theorem for a convex set). Let Ω be a convex open set. Let
p ∈ Ω and suppose f is continuous on Ω and f ∈ H (Ω \ { p}). Then, f = F  for
some F ∈ H (Ω). Therefore,
f (z) dz = 0
γ
4.3 The Local Cauchy Theorem 211

for any closed path γ in Ω.

Proof Let [a, z] denote the oriented line segment joining a to z. Fixing a ∈ Ω, the
convexity of Ω allows us to define

F(z) = f (ξ) dξ
[a,z]

for all z ∈ Ω. Now for z, z 0 ∈ Ω, let be the triangle in Ω with vertices (a, z 0 , z).
Then,

0= f (ξ) dξ = f (ξ) dξ + f (ξ) dξ + f (ξ) dξ.


∂ [a,z 0 ] [z 0 ,z] [z,a]

The convexity of Ω implies that the triangle lies in Ω because a, z 0 , z ∈ Ω. By


our definition of F(z), we see this is

0 = F(z 0 ) − F(z) + f (w) dw


[z 0 ,z]

so that for z = z 0

F(z) − F(z 0 ) 1
− f (z 0 ) = [ f (w) − f (z 0 )] dw.
z − z0 z − z0 [z 0 ,z]

Now f is continuous in Ω so that given ε > 0, ∃δ > 0 such that

| f (w) − f (z 0 )| < ε when |w − z 0 | < δ.

Hence,  
 F(z) − F(z 0 ) 
 − f (z 0 )  ε for |z − z 0 | < δ.
 z − z0

This proves that f = F  . That is, F ∈ H (Ω). By Theorem 4.4, the result follows.


Theorem 4.7 (Cauchy’s formula for a convex set) Suppose γ is a closed path in a
/ γ ∗ , then
convex open set Ω and f ∈ H (Ω). If z ∈ Ω and z ∈

1 f (w)dw
f (z)Indγ (z) = .
2πi γ w−z

Proof Defining
212 4 Complex Analysis
⎧ f (w) − f (z)

⎨ , for w ∈ Ω, w = z
g(w) = w−z


f  (z), if w = z,

we see that g ∈ H (Ω \ {z}) because f is holomorphic. Also, g is continuous in Ω


including z. We can therefore apply the previous theorem to deduce

g(w) dw = 0, for any close path γ.


γ

Rearranging what this means gives us the result. 

Theorem 4.8 If Ω ⊆ C is open, then any f ∈ H (Ω) is representable as a power


series in Ω.

Proof Let γ be a positively oriented circle in an open ball B ⊆ Ω, centered at a and


radius r < R = radius of B. The convexity of the ball allows us to apply the previous
theorem. As the Indγ (z) = 1, we see

1 f (w)dw
f (z) =
2πi γ w−z

for z ∈ B(a, r ). By our earlier theorem, this can be written as a power series.

Corollary 4.3.3 If f ∈ H (Ω), then f  ∈ H (Ω).

The Cauchy theorem has a useful converse.

Theorem 4.9 (Morera’s theorem). If f : Ω → C is continuous and

f (z) dz = 0

for every closed triangle ⊆ Ω, then f ∈ H (Ω).

Proof Let V be a convex open set in Ω. As in the proof of Cauchy’s theorem for a
convex set, we can construct

F(z) = f (w) dw
[z 0 ,z]

and again by continuity of f , we see F(z) is holomorphic with F  (z) = f (z). By


Corollary 4.3.3, f ∈ H (Ω). 
4.3 The Local Cauchy Theorem 213

Exercises

1. Evaluate
z2
dz
γ z−1

where γ is a circle of radius 2 centered at 0.


2. Evaluate
ez
2
dz
γ z

where γ is the unit circle.


3. Evaluate
z2 − 1
dz
γ z2 + 1

where γ is a circle of radius 2 centered at 0.

4.4 Zeros and Singularities

In earlier sections, we studied how a holomorphic function f can be expanded as a


power series. In this section, we want to study the nature of the set of zeros of f . To
what extent can the study of power series representations be extended to functions
that are not holomorphic?
Theorem 4.10 Suppose Ω is a region, f ∈ H (Ω) and Z ( f ) = {a ∈ Ω : f (a) =
0}. Then, either Z ( f ) = Ω or Z ( f ) has no limit point in Ω, in which case for each a ∈
Z ( f ), there is a unique positive integer m = m(a) such that f (z) = (z − a)m g(z)
for z ∈ Ω, and g ∈ H (Ω) with g(a) = 0. Moreover, in this case, Z ( f ) is at most
countable.
Remark 4.1 The number m = m(a) is called the order of f at a.
Proof Let A be the set of limit points of Z ( f ) in Ω. Then, by the continuity of f ,
we have f | A = 0. Thus, A ⊆ Z ( f ). Fix a ∈ Z ( f ) and take r > 0 with B(a, r ) ⊆ Ω
for which we can write


f (z) = cn (z − a)n ∀z ∈ B(a, r ).
n=0

Suppose that not all cn are zero. Then, there is a smallest m such that cm = 0. As
f (a) = 0, we have m > 0. Define

(z − a)−m f (z), ∀z ∈ Ω \ {a}
g(z) =
cm , if z = a
214 4 Complex Analysis

so that f (z) = (z − a)m g(z). Also, g ∈ H (Ω \ {a}). But the power series represen-
tation of f gives
∞
g(z) = (z − a)k cm+k
k=0

for z ∈ B(a, r ). Hence, g ∈ H (B(a, r )). Therefore, g ∈ H (Ω).


Finally, g(a) = 0. Therefore, by continuity, a is an isolated point in the case that
not all the cn ’s are zero. What happens if all the cn ’s are zero? In this case, f ≡ 0 in the
neighborhood of a. Thus, A is open. But A is the set of limit points and hence closed
so Ω\A is open. Thus, Ω is the disjoint union of open sets. But Ω is a connected set,
so A = Ω or A = ∅. If A = Ω, then Z ( f ) = Ω and f = 0. If A = ∅, then Z ( f )
has at most finitely many points in any compact subset. Since Ω can be covered by
a countable union of compact sets, Z ( f ) is at most countable. 

Corollary 4.4.1 Suppose Ω is a region and f, g ∈ H (Ω). If f (z) = g(z) in some


set with a limit point in Ω, then f (z) = g(z) ∀z ∈ Ω.

Proof We apply the theorem to f − g. 

An important consequence of this theorem is that if f ∈ H (Ω), a ∈ Ω and B(a, r )


is the largest open disk centered at a and contained in Ω, then the power series
representation of f converges in B(a, r ).

Definition 4.4 If a ∈ Ω and f ∈ H (Ω \ {a}), then f is said to have an isolated


singularity at a. If f has an isolated singularity at a and there is a function g ∈ H (Ω)
such that f (z) = g(z) ∀z ∈ Ω \ {a}, then f is said to have a removable singularity.

Whenever we encounter a function with a removable singularity, we often replace


it with its analytic extension.
Let us define B o (a, r ) := B(a, r ) \ {a} which is the punctured disk of radius r
centered at a. The following theorem is often called Riemann’s removable singularity
theorem.

Theorem 4.11 Suppose f ∈ H (Ω \ {a}) and f is bounded in B o (a, r ) for some


r > 0. Then, f has a removable singularity.

Proof Define ⎧
⎨ (z − a)2 f (z), ∀z ∈ Ω \ {a}
h(z) =

0, if z = a.

It is clear that h ∈ H (Ω \ {a}). To show h ∈ H (Ω), we compute

h(z) − h(a) (z − a)2 f (z)


lim = lim = lim (z − a) f (z) = 0
z→a z−a z→a z−a z→a

since f is bounded in B o (a, r ). Thus, h ∈ H (Ω). So, we can write


4.4 Zeros and Singularities 215



h(z) = cn (z − a)n
n=2

because h(a) = h  (a) = 0. This suggests we define f (a) = c2 to get




f (z) = cn+2 (z − a)n .
n=0

This gives the desired holomorphic extension of f to Ω. 


The following theorem classifies the possible types of isolated singularities.
Theorem 4.12 If a ∈ Ω, f ∈ H (Ω \ {a}), then one of the following occurs:
(1) f has a removable singularity.
(2) There are complex numbers c1 , . . . , cm with m > 0 and cm = 0 such that


m
ck
f (z) −
k=1
(z − a)k

has a removable singularity at a.


(3) If r > 0 and B(a, r ) ⊆ Ω, then f (B o (a, r )) is dense in the plane.
Remark 4.2 In case of (3), f is said to have an essential singularity at a. If (2)
holds, f is said to have a pole of order m at a. In this case,


m
ck
Q a (z) :=
k=1
(z − a)k

is called the principal part of f at a.


Since the function
g(z) = f (z) − Q a (z)

is holomorphic for z ∈ Ω \ {a}, it can be expanded as a power series




ck (z − a)k
k=0

about a, and we get the Laurent series expansion




f (z) = ck (z − a)k
k=−m

for f about a.
216 4 Complex Analysis

Proof Suppose (3) does not hold. Then, there is an r > 0, δ > 0 and w ∈ C such
that | f (z) − w| > δ for all z ∈ B o (a, r ).
Then defining
1
g(z) = for z ∈ B o (a, r ),
f (z) − w

we see g ∈ H (B o (a, r )). Moreover, g is bounded in this punctured disk as

1 1
|g(z)| = < .
| f (z) − w| δ

Hence, by our previous theorem g extends to a holomorphic function in B(a, r ).


There are now two cases to consider.
Case 1. g(a) = 0. Then, there exists c > 0 such that |g(z)|  c for all z ∈ B(a, r1 )
with 0 < r1 < r . Thus,

1
| f (z) − w|  ∀z ∈ B(a, r1 ).
c
Hence, f is bounded, and again by our previous theorem, f has a removable
singularity at a. So (1) holds.
Case 2. g(a) = 0. Then, we can write

g(z) = (z − a)m g1 (z)

for some g1 ∈ H (B(a, r )) with g1 (a) = 0. Thus, h(z) = 1/g1 (z) is holomorphic in
B(a, r2 ) for some 0 < r2 < r and h has no zeros in B(a, r2 ). Arguing as before, we
get that g1 ∈ H (B(a, r2 )). Therefore,

1 1
f (z) − w = = (z − a)−m = (z − a)−m h(z)
g(z) g1 (z)

for all z ∈ B o (a, r2 ). As h is holomorphic at a, we can write




h(z) = bn (z − a)n
n=0

where b0 = 0, because h(a) = 0. Thus,




f (z) − w = (z − a)−m bn (z − a)n
n=0

and the case (2) holds.


4.4 Zeros and Singularities 217

Example 4.1 The function f (z) = e1/z is holomorphic in C \ {0} and has an essen-
tial singularity at z = 0.

We saw above how two holomorphic functions that agree on a set with limit points
must be equal. A similar statement can be made about functions with poles. Suppose
f and g are holomorphic in Ω except for a pole at a of order n. Then, (z − a)n f (z) and
(z − a)n g(z) have removable singularities at a so there are functions f 1 , g1 ∈ H (Ω)
such that (z − a)n f (z) = f 1 (z) and (z − a)n g(z) = g1 (z) ∀z ∈ Ω \ {a}. Both of
these functions agree for all z in Ω \ {a} and so f (z) = g(z) for all z ∈ Ω \ {a}.

Exercises

1. Find the Laurent series expansion of

z+1
z

about z = 0 and z = 1.
2. Find the Laurent series expansion of
z
z2 +1

about z = i.
3. Suppose that a = 0. Write down the Laurent expansion of

1
z−a

about z = 0. If k is a natural number, what about the expansion of 1/(z − a)k at


z=0?
4. Show that the function (sin z)/z has a removable singularity at z = 0.
5. Show that the function e z /z does not have a removable singularity at z = 0.
6. Show that z/(e z − 1) has a removable singularity at z = 0.
7. Find the Laurent expansion of

1
f (z) =
z(z 2 + 1)

that is valid for


(a) 0 < |z| < 1.
(b) |z| > 1.
218 4 Complex Analysis

8. Suppose that f (z) has an essential singularity at z 0 . Let w ∈ C. Show that there
exists a sequence
z1, z2 , . . .

such that z n → z 0 and f (z n ) → w. (This is called the Casorati–Weierstrass


theorem. A more general theorem due to E. Picard was proved in 1879: If f has
an essential singularity at z 0 and U is an arbitrarily small punctured ball around
z 0 , then given w ∈ C, the equation f (z) = w has infinitely many solutions in U ,
except perhaps for at most one value of w. This is called Picard’s theorem. The
reader can find a proof in the author’s problem book on modular forms [1] which
uses the theory of the j-function.)

4.5 The Maximum Modulus Principle

Sometimes called the maximum principle, the maximum modulus principle is one of
the most important theorems in the theory of complex functions. We derive it below.
The following theorem representing a nice fusion of Fourier series and complex
analysis will serve as a catalyst for a collection of results leading to the maximum
principle.
Theorem 4.13 If


f (z) = cn (z − a)n
n=0

for all z ∈ B(a, R) and 0 < r < R, then



 2π
1
|cn |2 r 2n = | f (a + r eiθ )|2 dθ.
n=0
2π 0

Proof We have


f (a + r eiθ ) = cn r n einθ
n=0

, and the series is an absolutely convergent Fourier series. Thus, by Parseval’s formula

 2π
1
|cn |2 r 2n = | f (a + r eiθ )|2 dθ.
n=0
2π 0


The following theorem of Liouville is one of the central theorems of complex
analysis.
4.5 The Maximum Modulus Principle 219

Corollary 4.5.1 (Liouville’s theorem). A bounded entire function is constant.

Proof Suppose f is entire and | f (z)| < M for all z ∈ C. Writing




f (z) = cn z n
n=0

for the power series expansion about z = 0, we see that the series converges for all
z ∈ C since f is entire. By the theorem,

 2π
1
|cn |2 r 2n = | f (a + r eiθ )|2 dθ < M 2 .
n=0
2π 0

If we let r → ∞, we get a contradiction unless c1 = c2 = · · · = 0. Thus, f is


constant. 

Theorem 4.14 (The maximum modulus principle). If Ω is a region and f ∈ H (Ω),


then
max | f (z)| = max | f (z)|.
z∈Ω z∈∂Ω

Moreover, if the maximum is attained at an interior point, then f is constant.

Proof Suppose that the maximum is attained at an interior point a ∈ Ω. Then, there
is an r > 0 such that B(a, r ) ⊆ Ω and

f (a) = max | f (z)|.


z∈Ω

Then, | f (a + r eiθ )|  | f (a)| for all θ ∈ [0, 2π]. Thus, writing




f (z) = cn (z − a)n ∀z ∈ B(a, r ),
n=0

we see that

 2π
1
|cn |2 r 2n = | f (a + r eiθ )|2 dθ  | f (a)|2 = |c0 |2 ,
n=0
2π 0

which is a contradiction unless c1 = c2 = · · · = 0. 

The minimum modulus theorem given below is an immediate consequence of the


maximum principle.

Theorem 4.15 (The minimum modulus theorem). If Ω is a region, f ∈ H (Ω) and


f (z) = 0 for all z ∈ Ω, then
220 4 Complex Analysis

min | f (z)| = min | f (z)|.


z∈Ω z∈∂Ω

Moreover, if f is non-constant, then its minimum modulus cannot be attained at


an interior point.

Proof Since f (z) = 0 for all z ∈ Ω, we have 1/ f ∈ H (Ω). Applying the maximum
modulus principle to 1/ f yields the result. 

An important consequence of the maximum principle is the following proof of


what has been called the fundamental theorem of algebra. Some mathematicians
have expressed (fundamentalist) annoyance that all known proofs use some form of
analysis!

Theorem 4.16 (The fundamental theorem of algebra). If f (z) ∈ C[z] has degree
n  1, then there exist unique α1 , . . . , αn ∈ C such that

f (z) = A(z − α1 ) · · · (z − αn ).

Proof Assume without loss of generality that f is monic:

f (z) = z n + an−1 z n−1 + · · · + a1 z + a0 .

Let M = max(|a0 |, . . . , |an−1 |). Then, for |z|  max(1, 2n M), we have

1 n
| f (z) − z n |  |an−1 ||z|n−1 + · · · + |a1 ||z| + |a0 |  n M|z|n−1  |z|
2
since |z|  1 and |z|  2n M. Thus, for such z,

1 n
|z|  |z|n − | f (z)|
2
1 1
so that | f (z)|  |z|n  , because |z|  1. Now, if f has no zeros, then 1/ f is
 2 2
 1 
entire and    2 for |z| sufficiently large, say |z|  R. Thus, 1 is bounded
f (z)  f (z)
and entire. By Liouville’s theorem, f is constant. Thus, f has a zero. Now by the
division algorithm,
f (z) = (z − α1 ) f 1 (z)

where f 1 (z) has degree n − 1. Applying induction completes the proof. 

The following theorem gives an estimate for coefficients of the power series
representation of an analytic function.

Theorem 4.17 (Cauchy’s estimates). If f ∈ H (B(a, R)) and | f (z)|  M for z ∈


B(a, R), then
4.5 The Maximum Modulus Principle 221

| f (n) (a)| M
 n.
n! R

Proof We have


f (z) = cn (z − a)n
n=0

f (n) (a)
for z ∈ B(a, R) where cn = . But
n!

 2π
1
|cn |2 R 2n = | f (a + r eiθ )|2 dθ  M 2
n=0
2π 0

so that
M
|cn |  .
Rn


The Schwarz lemma is an important tool for many results in geometric complex
analysis.

Theorem 4.18 (Schwarz’s lemma). Let f (z) be analytic in |z|  1. Suppose that
f (0) = 0, and | f (z)|  M for all |z|  1. Then,

| f (z)|  M|z|

for all |z|  1.

Proof Let ⎧
⎨ f (z)/z, for z = 0;
g(z) =

f  (0), for z = 0.

Then, g(z) is analytic for |z|  1.


By the maximum modulus theorem,

| f (z)|
max |g(z)| = max |g(z)| = max = max | f (z)|  M.
|z|1 |z|=1 |z|=1 |z| |z|=1

Thus, | f (z)|  M|z|. 

The following theorem illustrates how one can use complex analysis to study
functions which are not analytic.

Theorem 4.19 Let f 1 (z), . . . , f n (z) be analytic in a domain D. Then,


222 4 Complex Analysis

φ(z) = | f 1 (z)| + · · · + | f n (z)|

attains its maximum on ∂D.

Proof We cannot apply the maximum modulus theorem directly to φ(z) because
it is not an analytic function. So, we proceed as follows. Suppose the maximum is
attained at an interior point z 0 of D. Write

| f j (z 0 )| = c j f j (z 0 ) with |c j | = 1

for 1  j  n. Consider the function

F(z) = c1 f 1 (z) + · · · + cn f n (z)

which is analytic in D.
By the maximum modulus principle, there exists z 1 ∈ ∂D such that

|F(z 1 )|  |F(z 0 )| = φ(z 0 ).

But as each c j has absolute value 1, we see that

|F(z 1 )|  | f 1 (z 1 )| + · · · + | f n (z 1 )| = φ(z 1 )

so that φ(z 0 )  φ(z 1 ). 

We can sharpen the result of the previous theorem as follows. The maximum is
attained only on the boundary unless all the f j ’s are constant. This is easily seen
from the formula:

1
c j f j (z 0 ) = | f j (z 0 )|  | f (z 0 + r eiθ )|2 dθ  max | f j (z)|,
2π 0 z∈∂D

and the inequality is strict unless f j is constant.


Next, we study the growth rates of the maximum modulus of analytic functions.
If f is analytic in some domain containing B(0, r ), we define

M f (r ) = max | f (z)|.
|z|=r

Theorem 4.20 (Hadamard’s three circle theorem). Let f be analytic in the annulus
0 < r1  |z|  r3 . Then,
log M f (r )

is a convex function in log r . That is, if

r1  r2  r3 ,
4.5 The Maximum Modulus Principle 223

β
then writing r2 = r1α r3 with α + β = 1, we have

M f (r2 )  M f (r1 )α M f (r3 )β .

Proof The function g(z) = z m f (z)n is analytic in the annulus for any m ∈ Z and
n ∈ N. By the maximum modulus principle,

r2m M f (r2 )n  max(r1m M f (r1 )n , r3m M f (r3 )n )

for such m, n. By homogeneity, both sides of the expansion can be raised to the power
λ with λ > 0. Thus, the inequality holds for any m ∈ Q and n ∈ Q+ . By continuity,
this extends to m ∈ R and n ∈ R+ . For

log M f (r1 )/M f (r3 )


λ= ,
log r3 /r1

we see that  λ
r3 M f (r1 )
=
r1 M f (r3 )

so that
r1λ M f (r1 ) = r3λ M f (r3 ).

Let us raise both sides to the power n, with n ∈ R+ . Then setting m = λn, we
have
r1m M f (r1 )n = r3m M f (r3 )n .

Now for α + β = 1,

(r2m M f (r2 )n )α  (r1m M f (r1 )n )α

(r2m M f (r2 )n )β  (r3m M f (r3 )n )β

Multiplying these two inequalities gives

r2m M f (r2 )n  r1mα r3 [M f (r1 )α M f (r3 )β ]n .


β
Since r2 = r1α r3 , we get

r2m M f (r2 )n  r2m [M f (r1 )α M f (r3 )β ]n

from which we deduce the desired result by taking nth roots. 

Hadamard noticed that the growth rate of the function yields new information
about the function itself. Here is an example.
224 4 Complex Analysis

Theorem 4.21 If f is entire and there is an n ∈ N such that

M f (r )  cr n

for some constant c > 0, then f (z) is a polynomial of degree at most n.


Proof By the local Cauchy theorem,

f (m) (0) 1 f (z)


= dz
m! 2πi Cr z m+1

where Cr is the circle of radius r , centered at zero. By Cauchy’s estimate,

| f (m) (0)| M f (r )
 m+1  cr n−m−1
m! r

which goes to zero if m > n. Thus, f (m) (0) = 0 for all m > n so f is a polynomial
of degree  n. 

Exercises

1. Let f be analytic and nonzero in a region A. Show that | f | has no strict local
minima in A.
2. Show that the maximum of | sin(x + i y)| for 0 ≤ x ≤ 2π and 0 ≤ y ≤ 2π is
cosh 2π.
3. The functions u(x, y) and v(x, y) defined on R × R are said to be harmonic
conjugates if f (z) = u(x, y) + iv(x, y) with z = x + i y is analytic. Find the
harmonic conjugate of u(x, y) = x 2 − y 2 .
4. Let g be analytic in the region |z| < 1 and assume that |g(z)| = |z| for all |z| < 1.
Show that g(z) = eiθ z for some θ ∈ [0, 2π]. [Hint: Use Schwarz’s lemma.]
z
5. Consider the function f (z) = ee in the region −π/2 ≤ y ≤ π/2 where y =
Im(z). Show that the function is bounded on the edges of this region, but is
unbounded in the interior. (This exercise shows that the maximum modulus prin-
ciple does not necessarily hold for unbounded regions. However, with suitable
growth conditions, the Phragmén–Lindelöf theorem extends the principle to cer-
tain unbounded regions. See Sect. 4.12.)

4.6 The Global Cauchy Theorem

Suppose γ1 , . . . , γn are paths in C. Each γi induces a linear function

γ̃i ( f ) = f (z) dz.


γi
4.6 The Global Cauchy Theorem 225

Define Γ˜ = γ̃1 + · · · + γ̃n . We have the formal sum Γ = γ1 + · · · + γn called a


chain. If each γi is a closed path, we call Γ a cycle. If each such γi is a path in some
open set in Ω, then we say that Γ is a chain in Ω. We also define

Γ ∗ = γ1∗ ∪ · · · ∪ γn∗ .

/ Γ ∗ , we define the index of α with respect to Γ by


If Γ is a cycle and α ∈

1 dz
IndΓ (α) :=
2πi Γ z−α

and
Γ˜ ( f ) = f (z) dz.
Γ

Theorem 4.22 (Cauchy’s theorem). Suppose f ∈ H (Ω). If Γ is a cycle such that


IndΓ (α) = 0 for all α ∈
/ Ω, then

1 f (w) dw
f (z)IndΓ (z) =
2πi Γ w−z

for all z ∈ Ω \ Γ ∗ and


f (z) dz = 0.
Γ

Thus, if Γ1 and Γ2 are cycles in Ω such that

IndΓ1 (α) = IndΓ2 (α) ∀α ∈


/ Ω,

then
f (z) dz = f (z) dz.
Γ1 Γ2

Remark 4.3 We had proved this theorem earlier for a convex set Ω. This is now
more general.

Proof Define g : Ω × Ω → C by
⎧ f (w) − f (z)

⎨ , w = z
g(z, w) = w−z


f  (z), w = z.

Then, g is a continuous function and we can define

1
h(z) = g(z, w) dw.
2πi Γ
226 4 Complex Analysis

To prove our theorem, we need to show that h(z) = 0 for all z ∈ Ω \ Γ ∗ . To this
end, we first show h ∈ H (Ω). Since g is uniformly continuous over any compact
subset of Ω × Ω, we see that if z n → z ∈ Ω then g(z n , w) → g(z, w) uniformly in
w∈ / Γ ∗ . Thus, h(z n ) → h(z) so that h is continuous in Ω.
Now let Δ be a closed triangle in Ω. Then,

1
h(z) dz = g(z, w) dz dw
∂ 2πi Γ ∂

by Fubini’s theorem.
But for each w ∈ Ω, the function

z → g(z, w)

can be thought of as a holomorphic function since the singularity z = w is removable.


Thus, by the local Cauchy theorem,

g(z, w) dz = 0

for all w ∈ Γ ∗ . Hence,


h(z) dz = 0

for all triangles ⊆ Ω. By Morera’s theorem, h ∈ H (Ω).


Now let Ω1 = {z ∈ C : IndΓ (z) = 0} and define

1 f (w) dw
h 1 (z) =
2πi Γ w−z

for z ∈ Ω1 .
Clearly, if z ∈ Ω ∩ Ω1 , then h 1 (z) = h(z). Thus, there is a function φ ∈ H (Ω ∪
Ω1 ) whose restriction to Ω is h and to Ω1 is h 1 . By our hypothesis on Γ , IndΓ (α) = 0
/ Ω. That is, IndΓ (α) = 0 for all α ∈ Ω c . Thus, Ω1 ⊇ Ω c . Therefore, φ is
for all α ∈
entire. Moreover, Ω1 contains the unbounded component of (Γ ∗ )c . Hence,

lim φ(z) = lim h 1 (z) = 0.


|z|→∞ |z|→∞

So by Liouville’s theorem, φ ≡ 0. But then h ≡ 0, as required.


Next, pick a ∈ Ω \ Γ ∗ and define F(z) = (z − a) f (z). Then,

1 1 F(z) dz
f (z) dz = = F(a)IndΓ (a) = 0.
2πi Γ 2πi Γ z−a

Applying this to the cycle Γ = Γ1 − Γ2 gives the last result. 


4.6 The Global Cauchy Theorem 227

Exercises

1. Evaluate
2z 2 − 15z + 30
dz,
C z3 − 10z 2 + 32z − 32

where C is the circle of radius 3 centered at zero and oriented counterclockwise.


2. Let z 1 , z 2 , . . . , z k be complex numbers for which we have the estimate
 
 k 
 n 
 
j ≤ CB
n
 z
 j=1 

for some absolute constant C and all n sufficiently large. Show that

|z j | ≤ B for all j = 1, 2, . . . , k.

[Hint: Consider ⎛ ⎞

 k 
n
⎝ 1
z nj ⎠ z n = .
n=0 j=1 k=1
1 − zjz

By the given estimate, the left-hand side is a power series that converges for
|z| < 1/B. Hence, the right side must be analytic there.]
3. Let f (z, w) be a continuous function of two complex variables z, w, with z in a
region A and w on a curve γ. For each w on γ, assume that f is analytic in z. Let

F(z) = f (z, w)dw.


γ

Then, show that F is analytic and

∂f
F  (z) = (z, w)dw.
γ ∂z

[Hint: Let z 0 ∈ A and γ0 a circle in A around z 0 whose interior also lies in A.


Then,  
1 f (s, w)
F(z) = ds dw.]
2πi γ γ0 s − z
228 4 Complex Analysis

4.7 The Calculus of Residues

Definition 4.5 A function f is said to be meromorphic in an open set Ω ⊆ C if


there is a set A ⊆ Ω such that
(a) A has no limit point in Ω.
(b) f ∈ H (Ω \ A).
(c) f has a pole at each point of A.

Remark 4.4 The condition that A has no limit points implies that there is no compact
subset of Ω containing infinitely many points of A. Thus, A is at most countable.

Let f be meromorphic with the set of poles A. For each a ∈ A, let


m
Q a (z) = ck (z − a)−k
k=1

be the principal part of f . We define c1 as the residue of f at a and denote it by

Resz=a f (z) or Res( f ; a).

/ Γ ∗ , then
If Γ is a cycle and a ∈

1  
Q a (z) dz = c1 IndΓ (a) = Resz=a f (z) IndΓ (a),
2πi Γ

since ⎧
1 dz ⎨ 0, n = 1
=
2πi C zn ⎩
1, n = 1.

We generalize this as follows.

Theorem 4.23 (Residue theorem). Let f be meromorphic on Ω, and let A be its


set of poles. If Γ is a cycle in Ω \ A such that IndΓ (α) = 0 for all α ∈
/ Ω, then

1 
f (z) dz = Res( f ; a)IndΓ (a).
2πi Γ a∈A

Proof Let
B = {a ∈ A : IndΓ (a) = 0}

, and let W = (Γ ∗ )c so that IndΓ is constant on each connected component V of W .


Also, if V is unbounded or intersects Ω c , then IndΓ (z) = 0 for all z ∈ V .
Since A has no limit point and B ⊆ A, B has no limit point and as B is not in the
unbounded component of W , B is actually a finite set. Thus, the sum in the statement
4.7 The Calculus of Residues 229

of the theorem is finite. Writing B = {a1 , . . . , an } with corresponding principal parts


Q 1 , . . . , Q n , put
g = f − (Q 1 + · · · + Q n )

(or g = f if B = ∅). As g is analytic (i.e., has removable singularities at a1 , . . . , an ),


we see
1
g(z) dz = 0
2πi Γ

from which the theorem follows immediately. 

In order to use the residue theorem, we need to know how to compute the residues
of the functions at their poles.
Let f be meromorphic in Ω with a pole of order n at a ∈ Ω, and let


n
Q a (z) = c−k (z − a)−k
k=−1

be its principal part. Then, g = f − Q a is holomorphic at a and has a power series


expansion:


g(z) = ck (z − a)k .
k=0

Thus,


f (z) = g(z) + Q a (z) = ck (z − a)k
k=−n

so that the function




h(z) := (z − a)n f (z) = ck−n (z − a)k .
k=0

In particular, Res( f ; a) = c−1 = c(n−1)−n so it is the coefficient of

(z − a)n−1

in the above expression. Hence, it is given by

h (n−1) (a)
.
(n − 1)!

Thus,
h (n−1) (a)
Resz=a f (z) = .
(n − 1)!
230 4 Complex Analysis

CR

ρ3 ρ

−R R

Fig. 4.6 Contour C R

In the case that the pole is simple (i.e., n = 1), we have

Res( f ; a) = lim (z − a) f (z).


z→a

Let us consider an example to see how this works.

Example 4.2 Let us compute



dx
−∞ x4 +1

using residue calculus.


1
The integrand f (z) = has poles at eπik/4 for k = 1, 3, 5, 7. The poles
z4 + 1
ρ = eπi/4 , ρ3 = e3πi/4 lie in the upper half plane. For R large, they are contained
inside the contour C R depicted in Fig. 4.6. By our residue theorem,

1 dz
= Res1 + Res2
2πi CR z4 +1

where Resk = Res( f ; eπik/4 ). Let z k = eπik/4 . Then, as these are all simple poles,

(z − z k ) z − zk
Resk = lim = lim
z→z k z +1
4 z→z k g(z)

1
where g(z) = z 4 + 1. Hence, Resk = .
4z k3
Therefore,

1 dz 1 1 i
= (e−3πi/4 + e−9πi/4 ) = (e−3πi/4 + e−πi/4 ) = − √ .
2πi CR z4+1 4 4 2 2

Letting C R∗ be the semicircular arc from R to −R, we have


4.7 The Calculus of Residues 231

R
1 1 1
f (z) dz = f (z) dz + f (z) dz.
2πi CR 2πi −R 2πi C ∗R

We can estimate the last integral:


 
 1 
  K
 f (z) dz   3
 2πi C ∗R  R

for some constant K . As R → ∞, this goes to zero. Thus,



1 i
f (z) dz = − √ .
2πi −∞ 2 2

Hence,

dx π
=√ .
−∞ x4 +1 2

This example also illustrates how one can use complex analysis to calculate real
definite integrals.

Exercises

1. Compute the residues of the following functions at their singularities:


1
(a) .
(1 − z)3
z
e
(b) .
(1 − z)3
1
(c) .
z(1 − z)3
z
e
(d) .
z(1 − z)3
2. Evaluate the integral
dz
C z2 + z + 1

where C is a circle oriented counterclockwise centered at zero of radius 1/2.


3. Let K be a natural number. A collection of pairs of integers (a j , n j ) with 0 ≤
a j < n j , 1 ≤ j ≤ K is called a covering system if every integer y satisfies the
congruence y ≡ a j (mod n j ) for at least one value of j. It is called an exact
system if for every integer y there is exactly one j with 1 ≤ j ≤ K such that
y ≡ a j (mod n j ).
232 4 Complex Analysis

(a) Show that if the system is exact, then


K
za j 1
= (4.1)
j=1
1 − zn j 1−z

for |z| < 1.


(b) By computing the residues of both sides of (4.1) at z = 1, show that if the
system is exact, then
K
1
= 1.
n
j=1 j

(c) Show that there is no exact covering system satisfying n 1 ≤ n 2 ≤ · · · ≤


n K −1 < n K where the last inequality is strict. [Hint: Observe that the right-
hand side of (4.1) has a simple pole at z = 1 only, whereas the left-hand side
has simple poles at primitive n K th roots of unity. This result was conjectured
by Erdös and was proved by L. Mirsky and D.J. Newman using this simple
argument.]

4.8 Further Examples

We want to use residue calculus to evaluate


A
sin x it x
lim e d x,
A→∞ −A x

an integral we met in our study of the Fourier transform. To do this, consider

sin z it z
f (z) = e
z

which has a removable singularity at z = 0, since

sin z
lim = 1.
z→0 z

Thus, f (z) is entire.


Consider the path Γ A from −A to A indicated in Fig. 4.7.
By Cauchy’s theorem,
−A
1 1
f (z) dz + f (z) dz = 0.
2πi ΓA 2πi A
4.8 Further Examples 233

−1 1
−A A

Fig. 4.7 Contour Γ A

Thus,
A
1 sin x it x 1
e dx = f (z) dz. (4.2)
2πi −A x 2πi ΓA

ei z − e−i z
Since sin z = , the integral on the right-hand side can be rewritten as
2i
 iz  
1 e − e−i z 1 1 ei(t+1)z 1 ei(t−1)z
eit z dz = dz − dz .
2πi Γ A 2i z 2i 2πi Γ A z 2πi Γ A z

Defining
1 eisz
φ A (s) = − dz (s ∈ R),
4π ΓA z

we see that our goal is to evaluate

φ A (t + 1) − φ A (t − 1).

Now, eisz /z has a singularity at z = 0. In fact, it is a simple pole. We complete


Γ A to a closed path in two ways and then apply Cauchy’s theorem.
The closed path Γ A + I (indicated in Fig. 4.8) avoids the singularity so that

1 eisz
dz = 0.
2πi Γ A +I z

The closed path Γ A + I I (indicated in Fig. 4.9) encloses the singularity at z = 0.


As the residue is 1, we see by Cauchy’s theorem

1 eisz
dz = 1.
2πi Γ A +I I z
234 4 Complex Analysis

−1 1
−A A

ΓA

Fig. 4.8 Contour Γ A + I

II

−1 1
−A A

ΓA

Fig. 4.9 Contour Γ A + I I

We parametrize the circular arc in each of these integrals. In our first integral,

−π iθ
1 eisz 1 eis Ae
0= dz + (i Aeiθ ) dθ
2πi ΓA z 2πi 0 Aeiθ

by putting z = Aeiθ so that dz = i Aeiθ dθ.


Now,
|eis Ae | = |eis A(cos θ+i sin θ) | = e−s A sin θ

4.8 Further Examples 235

which is  1 as s  0 and sin θ  0 for θ ∈ [−π, 0].


Letting 0 < ε < π, we have | sin θ|  | sin ε| so that
−π+ε
|eis Ae |dθ  πe−|s|A| sin ε| → 0

−ε

as A → ∞. Thus,
 −π 
  −ε −π+ε −π
 e is Aeiθ
dθ  iθ
|eis Ae |dθ +

|eis Ae |dθ +

|eis Ae |dθ.
 
0 0 −ε −π+ε

Since the integrand is bounded by 1, the first and last integrals are bounded by 2ε.
Letting ε → 0, we see that

1 eisz
lim dz = 0 (4.3)
A→∞ 2πi ΓA z

if s < 0. Thus, lim A→∞ φ A (s) = 0 if s < 0. This fact could have been deduced alter-
nately by a simple application of the dominated convergence theorem (see Exercise
1 below).
We now consider the case if s > 0. Proceeding as before and parametrizing the
arc, we have
π
1 eisz 1 iθ
1= dz + ieis Ae dθ
2πi Γ A z 2πi 0

and for s > 0,


|eis Ae | = e−s A sin θ .

As θ ∈ [0, π], sin θ > 0, so that as before


π
1 iθ
lim ieis Ae dθ = 0.
A→∞ 2πi 0

Therefore,
1 eisz
lim dz = 1
A→∞ 2πi ΓA z

if s > 0.
Therefore,
236 4 Complex Analysis

1 eisz
lim φ A (s) = lim − dz
A→∞ A→∞ 4π ΓA z
1 1 eisz
= lim dz
2i A→∞ 2πi ΓA z
⎧ 1
⎨ 2i , s > 0
=

0, s < 0.

That is,
A
1 sin x it x
lim e d x = φ A (t + 1) − φ A (t − 1).
A→∞ 2πi −A x

In other words,
A
sin x it x
lim e d x = 2πi{φ A (t + 1) − φ A (t − 1)}
A→∞ −A x

⎨ π, t + 1 > 0 and t − 1 < 0;
= −π, t + 1 < 0 and t − 1 > 0;

0, t + 1 < 0 and t − 1 < 0.

The first case means |t| < 1. The second case is impossible. The third case means
|t| > 1. Therefore,
A 
sin x it x π, if|t| < 1;
lim e dx =
A→∞ −A x 0, if |t| > 1.

What happens at t = ±1? We need to study φ A (0), which is

1 dz
− .
4π ΓA z

But using Γ A + I I (Fig. 4.9), we see

1 dz
=1
2πi Γ A +I I z

and
π
1 dz 1 i Aeiθ 1
= iθ
dθ = .
2πi II z 2πi 0 Ae 2

Thus,
dz
= πi
ΓA z
4.8 Further Examples 237

so that φ A (0) = −i/4. Now by (4.2),


A
1 sin x i x 1 i i
lim e d x = lim {φ A (2) − φ A (0)} = + =−
2πi A→∞ −A x A→∞ 2i 4 4

if t = 1. We can compute for the case t = −1 in a similar way. Hence,


A
sin x ±i x i π
lim e d x = − × 2πi = .
A→∞ −A x 4 2

Let us consider a related example. This was considered in the previous chapter
and solved using the theorem related to differentiation under the integral sign. It is
instructive to consider it now using complex analysis.

sin x
Let us compute d x.
0 x

∞ sin x 1 ∞ sin x
dx = dx
0 x 2 −∞ x
1 ∞ ei x − e−i x d x
=
2 −∞ 2i x
 
1 −ε e i x R ei x −ε e−i x R e−i x
= lim dx + dx − dx − dx .
2 R→∞ −R 2i x ε 2i x −R 2i x ε 2i x
ε→0

Changing variables x → −x in the last two integrals, we see that this is


 −ε R
ei x ei x
lim dx + dx .
R→∞
ε→0 −R 2i x ε 2i x

This suggests we consider the following contour:


By Cauchy’s theorem,
1 ei z
dz = 0.
2πi C↑,R z

The integral along the circular arc of radius R is

π iθ π
1 ei Re 1

Rieiθ dθ = ei R(cos θ+i sin θ) dθ → 0 as R → 0
2πi 0 Re 2π 0

by the dominated convergence theorem via an argument similar to the one we used
in our previous example.
It remains to consider the integral along the smaller circle C(ε) (Fig. 4.10):
238 4 Complex Analysis

C(R)

C()

−R −  R

Fig. 4.10 Contour C,R

1 ei z
dz.
2πi C(↑) z

To evaluate this integral, we prove the following lemma.

Lemma 4.1 Suppose g(z) is holomorphic. Then,

g(z)
lim dz = πig(0).
ε→0 C(↑) z

Proof We write g(z) = g(0) + h(z)z where h(z) is holomorphic. Then,



g(0) g(0)
lim + h(z) dz = lim dz
ε→0 C(↑) z ε→0 C(↑) z

because h being holomorphic, it is bounded on C(↑). Now the integral on C(ε) is


(taking into account the orientation)
π π
d(eiθ ) ieiθ dθ
−g(0) = −g(0) = −πig(0).
0 eiθ 0i eiθ

Putting everything together, we get


 − R 
1 ei z ei z 1
lim dz + dz − = 0,
ε→0,R→∞ 2πi −R z ε z 2

from which we deduce the formula:


4.8 Further Examples 239


sin x π
dx = .
0 x 2

Exercises

1. Use the dominated convergence theorem to show that (4.3) holds.


2. Show that ∞
dx 2π
= .
−∞ x + 1
6 3

3. For each natural number n ≥ 2, show that



dx π/n
= .
0 xn + 1 sin π/n

[Hint: Consider the path from 0 to R, and then from R to Re2πi/n and then back
to 0.]

4.9 Rouché’s Theorem

The logarithmic derivative of a function f is defined as f  / f . If f is holomorphic,


f  / f is meromorphic.
Theorem 4.24 (The argument principle). Let γ be a closed curve in Ω such that
Indγ (α) = 0 for all α ∈
/ Ω and Indγ (α) = 0 or 1 for α ∈ Ω. Let

Ω1 = {α ∈ Ω : Indγ (α) = 1}.

If f is meromorphic in Ω, let N be the number of zeros and P the number of


poles of f in Ω1 , counted with multiplicity. If none of these are in γ ∗ , then

1 f  (z)
dz = N − P.
2πi γ f (z)

In other words, the difference between the number of zeros and poles of f can be
counted by integrating the logarithmic derivative of f .
Proof If α ∈ Ω1 is a zero of f of multiplicity m, then we can write

f (z) = (z − a)m g(z)

with g(a) = 0. Also, g and 1/g are holomorphic in a neighborhood of a. In this


neighborhood,
240 4 Complex Analysis

f  (z) m(z − a)m−1 g(z) + (z − a)m g  (z) m g  (z)


= = + .
f (z) (z − a)m g(z) z−a g(z)

Thus, Resz=a ( f  / f )(z) = m.


Similarly, if b is a pole in Ω1 of order r , we can write

f (z) = (z − b)−r h(z)

where h(b) = 0, and h and 1/ h are holomorphic in a neighborhood of b. Thus,

f  (z) −r (z − a)−r −1 h(z) + (z − a)−r h  (z) −r h  (z)


= −r
= + ,
f (z) (z − a) h(z) z−a h(z)

so Resz=b ( f  / f )(z) = −r .
The result now follows from the residue theorem. 

Theorem 4.25 (Rouché’s theorem). Suppose f, g ∈ H (Ω), and let γ be a closed


path in Ω such that Indγ (α) = 0 for all α ∈/ Ω. If | f (z) − g(z)| < | f (z)| for all
z ∈ γ ∗ , then f and g have the same number of zeros inside γ.

Proof By the inequality in the hypothesis, neither f nor g vanishes on γ ∗ . Thus, the
inequality can be rewritten as
 
 
1 − g(z)  < 1 for all z ∈ γ ∗ .
 f (z) 

That is, g(z)/ f (z) lies in the open unit disk centered at 1 for z ∈ γ ∗ .
So if γ is defined on [a, b] and F(z) = g(z)/ f (z), then

F ◦ γ : [a, b] → B o (1, 1).

Therefore,

0 = Ind F◦γ (0)


1 dz
= Put z = F ◦ γ(t).
2πi F◦γ z
1 b
F  (γ(t))γ  (t) dt
=
2πi a F(γ(t))
b 
1 f (γ(t)) g  (γ(t)) 
= − γ (t)dt.
2πi a f (γ(t)) g(γ(t))

Since
F  (z) g  (z) f (z) − f  (z)g(z) f (z)
= · .
F(z) f (z)2 g(z)
4.9 Rouché’s Theorem 241

Thus,  
1 f  (z) g  (z)
0= − dz
2πi γ f (z) g(z)

which means f and g have the same number of zeros. 


Example 4.3 Let p(z) = z 8 − 5z 3 + z − 2, and let γ ∗ be the unit circle. If f (z) =
−5z 3 , then
| f (z) − p(z)| = | − z 8 − z + 2|  4 < | f (z)|

for z ∈ γ ∗ . By Rouché’s theorem, p(z) has three zeros in the unit disk. 
We can also give an alternative proof of the fundamental theorem of algebra, using
Rouché’s theorem. Let

g(z) = z n + an−1 z n−1 + · · · + a1 z + a0 .

Then if f (z) = z n , we see

| f (z) − g(z)|  (|an−1 | + · · · + |a1 | + |a0 |)R n−1

for |z|  R. If we choose R > |an−1 | + · · · + |a1 | + |a0 |, we get

| f (z) − g(z)| < R n = | f (z)|

for |z| = R. Thus, f and g have the same number of zeros in |z|  R. Thus, g has
n zeros.
The proof also shows that all zeros lie in the disk

|z| < |an−1 | + · · · + |a1 | + |a0 |.

Another application of Rouché’s theorem is Hurwitz’s theorem.


Theorem 4.26 (Hurwitz). Let f n be a sequence of analytic functions on a region Ω
converging uniformly on every closed disk in Ω to f . (By a theorem of Weierstrass,
f is analytic.) Assume f is not identically zero. Let z 0 ∈ Ω. Then, f (z 0 ) = 0 if and
only if there is a sequence z n → z 0 such that f n (z n ) = 0. That is, a zero of f is a
limit of zeros of the function f n .
Proof Choose γ to be a small circle centered at z 0 and lying in Ω. By assumption,
f n → f uniformly on and inside γ. By choosing γ sufficiently small, we may suppose
that z 0 is the only possible zero of f as the zeros of f are isolated. Let m be the
minimum of | f (z)| on γ. Then, | f (z)|  m > 0 on γ. Choose N large so that

| f n (z) − f (z)| < m on γ for n  N .

This we can do by uniform convergence. But now, on γ ∗ ,


242 4 Complex Analysis

| f n (z) − f (z)| < | f (z)|

so that f n (z) and f (z) have the same number of zeros in γ. The result now follows
from this conclusion. 
Rouché’s theorem allows us study one-to-one analytic functions and deduce the
open mapping theorem which states that any non-constant analytic function f is an
open map. That is, f (U ) is open for every open U . We relegate this to the exercises.
A key tool needed is the following theorem which is of independent interest.
Theorem 4.27 Let A be an open subset of C, and suppose that f : A → C is analytic
and one to one. Then, f  (z 0 ) = 0 for all z 0 ∈ A.
Proof Suppose that for some z 0 ∈ A, we have f  (z 0 ) = 0. Then, the function f (z) −
f (z 0 ) has a zero of order k ≥ 2 at z 0 . As f is one to one, and A is open, it is
not constant, so the zeros of f (z) − f (z 0 ) are isolated. So, there is a δ > 0 such
that f (z) − f (z 0 ) has no zero and f  (z) = 0 for 0 < |z − z 0 | ≤ δ . We can also
choose δ < m. In particular, there is an m > 0 such that | f (z) − f (z 0 )| ≥ m > 0 and
f  (z) = 0 for 0 < |z − z 0 | ≤ δ. For 0 < η < δ, we apply Rouché’s theorem to the
functions, f (z) − f (z 0 ) and f (z) − f (z 0 ) − η to conclude that f (z) − f (z 0 ) − η
has k zeros inside the disk |z − z 0 | = δ. Since f  (z) = 0 in the region 0 < |z − z 0 | ≤
δ, this cannot be a double zero. Therefore, f (z) is not one to one, a contradiction. 

Exercises

1. Determine the number of zeros of the polynomial

z 87 + 36z 57 + 71z 4 + z 3 − z + 1

in the unit circle.


2. Determine the number of zeros of the polynomial

2z 5 − 6z 2 + z + 1

in the annulus 1 ≤ |z| ≤ 2.


3. If a > e, show that the equation e z = az n has n solutions inside the unit circle.
4. Let
∞
f (z) = an z n
n=0

with a0 = 1, a1 = 0 and f not constant. Prove that f is one to one in the unit disk
|z| < 1 if
∞
n|an | ≤ 1.
n=2
4.9 Rouché’s Theorem 243

[Hint: Fix z 0 in the unit disk. Let g(z) = z − z 0 and h(z) = f (z) − f (z 0 ) and
apply Rouché’s theorem.]
5. By considering the example f n (z) = e z/n , show that the assumption that f be not
identically zero is essential in Hurwitz’s Theorem 4.26.
6. Suppose f is analytic and not constant on a region A. Show that f is an open
map.

4.10 Infinite Products and Weierstrass Factorization

To what extent does the knowledge of the zeros and poles of a meromorphic function
determine that function? Any polynomial is completely determined by the knowledge
of its zeros and multiplicities. Any polynomial can be written as

n  
z
Az r 1−
i=1
ai

where ai are the nonzero roots of f . Does such a factorization exist for other func-
tions?
To this end, we begin by studying convergence of infinite products.
Let {u n }∞
n=1 be a sequence of complex numbers. Let


N
PN = (1 + u n ).
n=1

If lim N →∞ PN exists and equals α, we say that the infinite product




(1 + u n ) (4.4)
n=1

exists and write




α= (1 + u n ).
n=1

Lemma 4.2 Suppose u 1 , . . . , u N ∈ C. Let


N
PN = (1 + u n )
n=1


N
PN∗ = (1 + |u n |).
n=1
244 4 Complex Analysis

Then, PN∗  exp(|u 1 | + · · · + |u N |) and |PN − 1|  PN∗ − 1.

Proof Recall that 1 + x  e x for x  0. This immediately implies the bound for
PN∗ . To derive the bound for PN , we proceed by induction.
For N = 1, this is trivially true. Assume the inequality holds for all subscripts
 N . Then,

PN +1 − 1 = PN (1 + u N +1 ) − 1 = (PN − 1)(1 + u N +1 ) + u N +1

so that

|PN +1 − 1|  |PN − 1|(1 + |u N +1 |) + |u N +1 |


 |PN∗ − 1|(1 + |u N +1 |) + |u N +1 |

by the induction hypothesis. Hence,

|PN +1 − 1|  PN∗ (1 + |u N +1 |) − (1 + |u N +1 |) + |u N +1 | = PN∗ +1 − 1,

as claimed. 

We will say the product (4.4) converges absolutely if the product with u n replaced
by |u n | converges. It is tempting to reduce the study of infinite products to the study of
infinite sums “by taking logarithms.” This can, in fact, be done but one must proceed
with some care to ensure the logarithm is well defined (see exercises below).

Theorem 4.28 Suppose {u n (z)}∞


n=1 is a sequence of bounded complex functions in
Ω such that
∞
|u n (z)|
n=1

converges uniformly in Ω. Then,




(1 + u n (z))
n=1

converges uniformly to a function f (z) in Ω. Moreover, if f (z 0 ) = 0 for some z 0 ∈ Ω,


then u n (z 0 ) = −1 for some n.

Proof The hypothesis implies that ∞ n=1 |u n (z)| is bounded in Ω. Indeed, let


N
f N (z) = |u n (z)|,
n=1

the uniform convergence implies that for any ε > 0, ∃N0 such that
4.10 Infinite Products and Weierstrass Factorization 245

| f N (z) − f M (z)| < ε

for M, N  N0 . In particular,

| f N (z)| = | f N0 (z) + f N (z) − f N0 (z)|


 | f N0 (z)| + | f N (z) − f N0 (z)|
 K N0 + ε

where |u n |  K for n  N0 .
By the previous lemma, the functions


N
PN (z) = (1 + u n (z))
n=1

 bounded in Ω, say by K .
are uniformly
Since ∞ n=1 |u n (z)| converges uniformly in Ω, given ε > 0, ∃N0 such that



|u n (z)| < ε.
n=N0

Thus, for M  N  N0 ,
 M 
  
 
|PM (z) − PN (z)| = |PN (z)|  (1 + u n (z)) − 1  |PN (z)| · (e − 1)
 
m=N +1

by the previous lemma. As the PN (z) are uniformly bounded by K , we see

|PM (z) − PN (z)|  O(ε).

Now if z = z 0 is a zero of f , then letting M → ∞ above, we see from | f (z) −


PN (z)| = O(ε) that | f (z 0 ) − PN (z 0 )| = O(ε), which implies that |PN (z 0 )| = O(ε)
for any ε > 0. Thus, PN (z 0 ) = 0. But PN (z) is a finite product, so one of the terms
must be zero. This completes the proof. 

Example 4.4 The Riemann zeta function

∞
1  
1 −1
ζ(s) = = 1 −
n=1
ns p
ps

converges absolutely for Re (s) > 1, and the infinite product is by virtue of unique
factorization. We can apply our theorem to show ζ(s) = 0 for Re (s) > 1 since
246 4 Complex Analysis
 
1 −1 ps
1− s =0⇔ s = 0 ⇔ ps = 0
p p −1

which cannot happen in the region Re (s) > 1. 


Given a sequence of complex numbers an , a series of the form

 an
n=1
ns

is called a Dirichlet series. Such series play an important role in number theory. In
particular, the study of the distribution of prime numbers hinges on a study of such
series (see Sects. 4.19 and 4.20).
Let us now set
 n
zk
Pn (z) = .
k=1
k

Define E 0 (z) = 1 − z and

E n (z) = (1 − z) exp(Pn (z)).

The only zero of E n (z) is at z = 1. Note that


n
zn − 1
Pn (z) = z k−1 = .
k=1
z−1

Lemma 4.3 For |z|  1,


|1 − E n (z)|  |z|n+1 .

Proof This is trivially true for n = 0. For n  0,


 
zn − 1
E n (z) = − exp(Pn (z)) + (1 − z) exp(Pn (z)) = −z n exp(Pn (z)).
z−1

Therefore, −E n (z) has a zero of order n at 0. Moreover, it has a power series


expansion in z with non-negative coefficients because Pn (z) has and exp(Pn (z)) also
does.
Now,
1 − E n (z) = − E n (w) dw.
[0,z]

Thus, 1 − E n (z) has a zero of order n + 1 at z = 0. Thus,

1 − E n (z)
φ(z) =
z n+1
4.10 Infinite Products and Weierstrass Factorization 247

can be expressed as a power series




am z m
m=0

with am  0. By the maximum modulus principle




max |φ(z)| = max |φ(z)|  |am | = φ(1) = 1
|z|1 |z|=1
m=0

and the result follows from this. 


Theorem 4.29 Let z n be a sequence of nonzero complex numbers such that |z n | →
∞ as n → ∞. If pn is a sequence of non-negative integers such that

∞  
r 1+ pn
<∞ (4.5)
n=1
|z n |

for all r > 0, then




P(z) = E pn (z/z n )
n=1

is an entire function with a zero at each z n and no other zeros in C. Moreover, if w0


occurs in the sequence z n exactly m times, then P(z) has a zero at z = w0 of order
exactly m.
Proof First, note that for any r > 0, |z n | > 2r for all n sufficiently large. So, the
summation condition holds for pn = n − 1 and is not necessary. Now for |z| < r ,
we have by the previous lemma,
 1+ pn  
z r 1+ pn
|1 − E pn (z/z n )|     .
zn |z n |

The convergence of the series (4.5) ensures that the product




E pn (z/z n )
n=1

converges absolutely to P(z). Moreover, P(z) = 0 if and only if E pn (z/z n ) = 0 for


some n, which occurs if and only if z = z n for some n. The last assertion is easily
derived by noting that |z n | → ∞ so that any zero is of finite order and each zero
contributed by E pn (z/z n ) is a simple zero. 
We can use this to derive product expansion for entire functions.
248 4 Complex Analysis

Theorem 4.30 (Weierstrass factorization theorem). Let f be entire and suppose


f (0) = 0. Let z 1 , z 2 , . . . be zeros of f , listed with multiplicity. Then, there is an
entire function g and a sequence p1 , p2 , . . . such that


f (z) = e g(z) E pn (z/z n ).
n=1

Remark 4.5 If f has a zero of order k at z = 0, we can apply the above theorem to
f (z)/z k . The above product expansion is not necessarily unique.
Proof Let


P(z) = E pn (z/z n ).
n=1

Note that necessarily |z n | → ∞. By the previous theorem, P and f have pre-


cisely the same zeros. Since f /P has no zeros, the next proposition tells us that
f (z)/P(z) = e g(z) . 
Proposition 4.10.1 If f is entire and f (0) = 0 for all z ∈ C, then there is an entire
function g such that f (z) = Ce g(z) , with C a constant.
Proof Since f has no zeros, f  / f is entire. Then, there is an entire function h such
that h  = f  / f .
Now let
F(z) = f (z)e−h(z) .

Then,

F  (z) = f  (z)e−h(z) − f (z)h  (z)e−h(z)


= e−h(z) { f  (z) − f (z)h  (z)} = 0.

Therefore, F(z) = C for some constant C. 


Example 4.5 Let us find a product expansion for the entire function sin z. This has
a zero of order 1 at nπ, n ∈ Z. By the theorem, the product
 z
f (z) = 1− e z/nπ
n=0

is entire and has precisely the same zeros as (sin z)/z.


Therefore,
sin z  z
= Ae g(z) · 1− e z/nπ
z n=0

for some constant A and entire function g(z).


4.10 Infinite Products and Weierstrass Factorization 249

Note that
z z z2
1− 1+ = 1− 2 2.
nπ nπ n π
Thus,
∞  
z2
sin z = ze g(z)
1− 2 2 .
n=1
n π

By logarithmic differentiation,

∞
cos z 1 2z
= + g  (z) + .
sin z z n=1
z − n2 π2
2

In particular,

∞
cos(ic/2) 2 4ic
= + g  (ic/2) −
sin(ic/2) ic n=1
c + 4n 2 π 2
2

2i  2ic
= − + g  (ic/2) −
c n=0
c + 4n 2 π 2
2



 2ic
= g (ic/2) − 2 + 4n 2 π 2
.
n=−∞
c

Recall that we proved using the Poisson summation formula,




ec + 1 2c
=
e − 1 n=−∞ c + 4π 2 n 2
c 2

so that
cos(ic/2) ec + 1
= g  (ic/2) − i · c .
sin(ic/2) e −1

On the other hand,

cos(ic/2) e−c/2 + ec/2 ec + 1


= i · −c/2 = −i · c
sin(ic/2) e −e c/2 e −1

so we deduce g  (ic/2) = 0 for all c = 2πin. Thus, g is constant in an interval and


hence constant everywhere by the analyticity. So, we can write

∞  
z2
sin z = Az 1− 2 2 .
n=1
n π
250 4 Complex Analysis

Now,
sin z
lim =1
z→0 z

so that A = 1. This proves:

Theorem 4.31
∞  
z2
sin πz = πz 1− 2 .
n=1
n

An interesting corollary is that

∞
1 π2
= . (4.6)
n=1
n2 6

We leave the deduction of this from the theorem as an exercise to the reader.

Exercises

1. In some definitions of convergence of infinite products, we do not allow zero as


a limit value for various reasons. (Sometimes, the authors write that the product
diverges to zero to emphasize this.) Suppose that the product


(1 + u n )
n=1

converges to a nonzero limit. Show that




lim (1 + u n ) = 1.
N →∞
n=N

 −π < θ ≤ π. Define Log z as log |z| + iθ.


2. For z ∈ C, z = 0, write z = |z|eiθ with
0 for all n ≥ 1, show that ∞
If bn = n=1 bn converges to a nonzero limit if and
only if ∞ n=1 Log bn converges. ∞
3. Suppose that cn  = −1 for all n ≥ 1. Show that n=1 (1 + cn ) converges absolutely
if and only if ∞ n=1 cn converges absolutely.
4. Suppose an is a sequence of complex numbers with absolute value at most 1.
Suppose further that an is multiplicative, that is, amn = am an . Show that the
Dirichlet series
∞
an
n=1
ns
4.10 Infinite Products and Weierstrass Factorization 251

is analytic in the region Re(s) > 1 and is nonvanishing there.


5. Prove (4.6) using Theorem 4.31.
6. Using Theorem 4.31, deduce that ζ(2k) is a nonzero rational multiple of π 2k .

4.11 The Logarithm

Let X be a topological space. A curve in X is a continuous map

γ : [0, 1] → X.

A curve is said to be closed if γ(0) = γ(1).


Two closed curves γ0 , γ1 in X are said to be homotopic in X if there is a continuous
map
H : [0, 1] × [0, 1] → X

such that H (s, 0) = γ0 (s) and H (s, 1) = γ1 (s), for all s ∈ [0, 1].
In this case, H is said to be a homotopy from γ0 to γ1 .
One should think of this as giving mathematical expression to the idea that we
can deform γ0 to γ1 continuously.
Recall that a space X is called path-connected if for any a, b ∈ X , there is a
continuous map γ : [0, 1] → X such that γ(0) = a and γ(1) = b.
Now suppose X is path-connected. We say X is simply connected if every closed
path in X is homotopic to a point. That is, it is homotopic to the trivial closed curve
γ(s) = x0 for all s ∈ [0, 1] and some x0 ∈ X .
For example, the annulus is not simply connected.
One can show that if γ0 and γ1 are homotopic closed paths in a domain Ω ∈ C,
then Indγ0 (α) = Indγ1 (α) for all α ∈/ Ω.
The idea is that a homotopy H from γ0 to γ1 induces a family of paths γt = H (·, t)
that depend continuously on t. Thus, the index Indγt (α) depends continuously on t.
But as the index is integer-valued, it is independent of t. In particular, if Ω is simply
connected, then Indγ (α) = 0 for any closed path γ in Ω and any α ∈ / Ω.
Suppose f ∈ H (Ω) where Ω is a simply connected region. Fix z 0 ∈ Ω and define
z
g(z) = f (w) dw
z0

z
for z ∈ Ω, where z0 denotes the contour integral over a path in Ω from z 0 to z.
By our earlier comments on simply connected regions, Cauchy’s theorem (Theorem
4.5) tells us that the integral is path-independent. Thus, g is well defined and is
holomorphic. Also, g  = f .
As a special case to this, let Ω be simply connected with 0 ∈
/ Ω. Let f (z) = 1/z.
Then, f ∈ H (Ω). Fixing z 0 ∈ Ω, take w0 such that ew0 = z 0 . This we can do in
252 4 Complex Analysis

several ways. For example, writing

z 0 = r eiθ , r > 0

we can take w0 = log r + iθ.


Define z
dw
log z = w0 + ,
z0 w

for z ∈ Ω. Since there is freedom in choosing w0 , this definition is dependent on the


choice of w0 . Each choice of w0 gives a different branch of the logarithm.
In each case, log z is a holomorphic function for 0 ∈/ Ω.

Example 4.6 Define


z
dw
log z = ,
1 w

for z ∈ C \ {x ∈ R : x ≤ 0}. This function is called the principal branch of the log-
arithm. Writing z ∈ Ω, z = r eiθ , −π < θ < π, we see log z = log r + iθ. In partic-
ular,
iπ iπ
log i = and log(−i) = − .
2 2
Example 4.7 If instead we take

Ω = C \ {x ∈ R : x  0}

, then the corresponding values of log i = iπ


2
and log(−i) = 3iπ
2
. 

We can use the logarithm to deal with more complicated integrals such as

f (x)x a−1 d x,
0

because we can define z a as ea log z provided we choose our domain properly.

Theorem 4.32 Let f (z) be analytic in C except for a finite number of poles αi none
of which lie on the positive real axis. Let a ∈
/ Z and suppose
K1
(1) ∃ b > a such that | f (z)|  for some constant K 1 as |z| → ∞.
|z|b
K2
(2) ∃b with 0  b < a such that | f (z)|  b as |z| → 0.
|z|
Then,

πe−πia 
f (x)x a−1 d x = − Resz=αi ( f (z)z a−1 ),
0 sin πa i
4.11 The Logarithm 253

CR

C
L+

ϕ
−R − ϕ  R

L−

Fig. 4.11 Contour C

where the sum is over the poles of f . (Since a ∈


/ Z, sin πa = 0.)

Proof Write z a−1 = e(a−1) log z where log z is the principal branch of the logarithm
defined on the simply connected region

C \ {x ∈ R : x  0}.

Let C be the closed contour indicated in Fig. 4.11.


By Cauchy,

1 
f (z)z a−1 dz = Resz=αi ( f (z)z a−1 ).
2πi C αi in C

The contour has four


 segments : C R , L − , Cε and L + .
By hypothesis (1), CR f (z)z a−1
dz → 0 as R → ∞. Also, hypothesis (2) implies

that C↑ f (z)z a−1 dz → 0 as ε → 0 for similar reasons.
 
Now, we are left with L + and L − . We parametrize L + :

z = r eiϕ ,   r  R
254 4 Complex Analysis

so that
log z = log r + iϕ.

Then,
R
f (z)z a−1 dz = f (r eiϕ )e(a−1)(log r +iϕ) · eiϕ dr
L+ 
R
= eiaϕ f (r eiϕ )r a−1 dr.


The function f is suitably bounded and by the dominated convergence theorem


R R
lim lim f (r eiϕ )r a−1 dr = f (r )r a−1 dr.
→0 ϕ→0  0

On L − , z = r ei(2π−ϕ) so that

log z = log r + i(2π − ϕ)

and 
f (z)z a−1 dz = f (r ei(2π−ϕ) )e(a−1)(log r +i(2π−ϕ)) · ei(2π−ϕ) dr.
L− R

Now,

e(a−1)(log r +i(2π−ϕ)) · ei(2π−ϕ) = r a−1 · e2πi(a−1)−ϕi(a−1) · e−ϕi


= r a−1 e2πia · e−ϕia

so our integral is
R
f (z)z a−1 dz = −e2πia e−ϕia f (r e−iϕ )r a−1 dr
L− 

so that letting ϕ → 0 gives by the dominated convergence theorem that


R
f (z)z a−1 dz = −e2πia f (r )r a−1 dr.
L− 

Therefore,
R
f (z)z a−1 dz + f (z)z a−1 dz = (1 − e2πia ) f (r )r a−1 dr.
L+ L− 

As ε → 0, this tends to
4.11 The Logarithm 255


(1 − e2πia ) f (r )r a−1 dr.
0

Putting everything together gives the result. 

Exercises

1. Let f be a meromorphic function and A its set of poles. Suppose it has a finite
number of poles all of which lie in the upper half plane (i.e., have positive imagi-
nary part). Suppose there is a constant K > 0 such that for some fixed δ > 1, we
have
| f (z)| ≤ K /|z|δ ,

for |z| sufficiently large. Show that


∞ 
f (x)d x = 2πi Res( f ; a).
−∞ a∈A

2. Show that ∞
cos x π
dx = .
−∞ x2 + 1 e

3. Let f (z) have a simple pole at z = 0. Let C() be the semicircular arc from −
to  of radius  > 0. Show that

lim f (z)dz = −πiRes( f ; 0).


→0 C()

[Note: C() is not a closed path.]


4. Suppose α is a complex number, |α| = 1. Compute


.
0 1 − 2α cos θ + α2

5. Let C be the circle of radius 2 centered at the origin and oriented counterclockwise.
Evaluate
1 dz
.
2πi C z 2 (z − 1)3
256 4 Complex Analysis

4.12 The Phragmén–Lindelöf Theorem and Jensen’s


Theorem

An entire function f (z) is said to be of finite order if for some α ≥ 0, we have


 α
f (z) = O e|z| (4.7)

as |z| → ∞. If α = 0, then f (z) is constant by Liouville’s theorem. The infimum


of the numbers α such that the above estimate holds is called the order of f (z). It is
convenient to extend this notion as follows. Given a region A and f analytic on A,
we say f has finite order on A if (4.7) is satisfied for some α and for all z in A.
In 1908, E. Phragmén (1863–1937) and E. Lindelöf (1870–1946) discovered an
extension of the maximum modulus principle to vertical strips for functions of finite
order.

Theorem 4.33 (Phragmén–Lindelöf theorem). Let f be holomorphic in the vertical


strip a ≤ Re(z) ≤ b and of finite order there. If f is bounded for z with Re(z) = a
and Re(z) = b, then it is bounded in the whole vertical strip.

Proof Let B be a bound for f on the two vertical strips. Writing z = x + i y with
x, y real, we see that as f has finite order in the region, there are constants K and c
such that for all y, we have

| f (x + i y)| ≤ K e|y| ,
c
a ≤ x ≤ b.

Let m be an integer ≡ 2 (mod 4) with m > c. For z in the vertical strip, arg z → π/2
as |y| → ∞. Thus, there is a T1 such that for |y| > T1 ,
 π  π

arg z −  < .
2 4m
Let  > 0 and consider the function

g (z) = ez f (z).


m

With our choice of m, it is now easily checked that (see Exercise 1) for |y| ≥ T1 ,

|g (z)| ≤ K e|y| e−|z| / 2
c m
(4.8)

so that g (z) → 0 as |z| → ∞ since m > c. Let T2 be chosen so that |g (z)| ≤ B for
| Im(z)| ≥ T2 . Thus,

| f (z)| ≤ Be−|z| ≤ Be|z| ,


m m
cos(m arg z)

for |z| ≥ T2 . Applying the maximum modulus principle to the bounded region
4.12 The Phragmén–Lindelöf Theorem and Jensen’s Theorem 257

a ≤ Re(z) ≤ b, 0 ≤ | Im(z)| ≤ T2 ,

we find | f (z)| ≤ Be|z| . Since this is valid for all  > 0, we can take  → 0 to deduce
m

the desired result. 


In the 1890s, Hadamard developed the theory of entire functions of finite order.
He showed that, very much like polynomials, they can be factored into an infinite
product over the zeros of f (z).
In the next section, we will derive this factorization theorem of Hadamard for
entire functions of order 1. In this section, we will derive Jensen’s theorem which
will be used in proving Hadamard’s factorization theorem.
Let f (z) be an entire function of finite order β. Jensen’s theorem relates β to the
distribution of the zeros of f (z).
Theorem 4.34 An entire function f (z) of finite order β without any zeros must be
of the form f (z) = e g(z) , where g(z) is a polynomial and β = deg g.
Proof Let h(z) = log f (z) − log f (0). Then, h(z) is entire, since f (z) has no zeros.
Also, for any  > 0,
Re h(z) = log | f (z)|  R β+ .

Writing


h(z) = (an + ibn )z n
n=0

with an , bn ∈ R, we see that for z = Reiθ ,



 ∞

Re(h(z)) = an R cos nθ −
n
bn R n sin nθ.
n=0 n=0

By Fourier analysis, we get


2π    iθ 
|an |R n  Re h Re  dθ.
0

Since h(0) = 0, we have a0 = 0, and therefore,


2π   
Re h Reiθ dθ = 0.
0

Observe that for x ∈ R, we have



2x if x ≥ 0,
|x| + x =
0 if x < 0.

Hence,
258 4 Complex Analysis

2π !
|an |R n  | Re(h(Reiθ ))| + Re h(Reiθ ) dθ
0
 R β+ .

Letting R → ∞ yields an = 0 if n > β. 

Notice that in this theorem the same result holds if the estimate
β+
| f (z)|  e Ri

holds for |z| = Ri and Ri is a sequence tending to infinity. This observation will be
used later in our proof of the Hadamard factorization theorem.

Theorem 4.35 (Jensen’s theorem). Let f (z) be an entire function of order β such
that f (0) = 0. If z 1 , z 2 , . . . , z n are the zeros of f (z) in |z| < R, counted with mul-
tiplicity, then

1 Rn
log | f (Reiθ )|dθ = log | f (0)| + log .
2π 0 |z 1 | · · · |z n |

Proof We may assume, without loss of generality, that f (0) = 1. Also, it is clear
that if the theorem is true for functions g and h, then it is also true for the product gh.
Thus, it suffices to prove it for functions with either no zero or one zero in |z| < R.
Indeed, if f has no zeros in |z| < R, the right-hand side is zero. The left-hand side
is
1 dz
(log f (z)) ,
2πi |z|=R z

which by Cauchy’s theorem is zero. Taking real parts gives the desired result.
If f has one zero z = z 1 in |z| < R, we consider the contour |z| = R taken in
the counterclockwise direction and cut it from z 1 to the boundary. We deform the
contour so that we go around z 1 in a clockwise direction along a circle of radius 
(say). Then, by Cauchy’s theorem with g(z) = log f (z),

1 dz
0= g(z)
2πi C z

where C is the contour described above.


Since the argument changes by −2πi when g(z) goes around the zero z = z 1 , we
see that as  → 0, we deduce

1 R
log | f (Reiθ )|dθ = log ,
2π 0 |z 1 |

as desired. This completes the proof. 


4.12 The Phragmén–Lindelöf Theorem and Jensen’s Theorem 259

An alternative proof of Jensen’s theorem can be given that avoids the use of cutting
the plane. One considers
R(z − z 1 )
f (z) = 2 .
R − z1 z

Then, f (z) is regular for |z| ≤ R. Moreover, | f (z)| = 1 on |z| = R, and | f (0)| =
|z 1 |/R, as a simple calculation shows. Jensen’s theorem is easily verified for this
choice of f . But any holomorphic function on |z| ≤ R can be written as a function
with no zeros in |z| ≤ R and a product of functions of the form

R(z − z i )
.
R2 − zi z

Now Jensen’s theorem easily follows.

Theorem 4.36 Let f be as in Theorem 4.35. Then,


 
Rn
log ≤ max log | f (z)| − log | f (0)|.
|z 1 | · · · |z n | |z|=R

Proof This is clear from Jensen’s theorem. 

Exercises

1. Prove inequality (4.8).


2. Suppose f (z) is analytic in a ≤ Re(z) ≤ b and of finite order there. Suppose that
there is a constant A ≥ 0 such that | f (z)| = O(|y| A ) on the two vertical strips.
Show that the same estimate holds for points in the interior. (This generalizes the
Phragmén–Lindelöf theorem.)
3. Let f be a function of finite order and define n f (r ) := n(r ) to be the number of
zeros of f in |z| ≤ r. Show that
R
n(r )dr
≤ max log | f (z)| − log | f (0)|,
0 r |z|=R

with f as in Jensen’s theorem.


4. If f (z) is of order β, show that n f (r ) = O(r β+ ), for any  > 0.
5. Let f (z) be an entire function of order β. Show that


|z n |−β−
n=1

converges for any  > 0 (here, we have indexed the zeros z i so that |z 1 | ≤ |z 2 | ≤
· · · ).
260 4 Complex Analysis

4.13 Entire Functions of Order 1

We will now derive a factorization theorem for entire functions of order 1. A similar
result holds for entire functions of higher order.

Theorem 4.37 Let f (z) be an entire function of order 1 with zeros z 1 , z 2 , . . .


arranged so that |z 1 | ≤ |z 2 | ≤ · · · and repeated with appropriate multiplicity. Then,
f can be written as
∞
z z/zn
f (z) = e A+Bz 1− e ,
n=1
z n

where A and B are constants.

Proof The product



 z z/zn
P(z) = 1− e
n=1
zn

converges absolutely for all z, since

(1 − z)e z = 1 − z 2 + · · ·

and by Exercise 5 in the previous section. Thus, P(z) represents an entire function.
If we write
f (z) = P(z)F(z),

then F(z) is an entire function without zeros. If F were of finite order, we could
conclude by Theorem 4.34 that F(z) = e g(z) , where g(z) is a polynomial.
By the remark after Theorem 4.34, it suffices to show that
1+
|F(z)|  e Ri

to deduce that F(z) = e g(z) where g(z) is of the form A + Bz for certain constants
A and B.
To this end, we will choose Ri satisfying
 
 
 Ri − |z n | > |z n |−2

for all n. This can be done, since the total measure of the intervals (|z n | − |z n |−2 ,
|z n | + |z n |−2 ) is bounded by


2 |z n |−2 < ∞,
n=1

since f (z) has order 1.


4.13 Entire Functions of Order 1 261

We write
P(z) = P1 (z)P2 (z)P3 (z),

where in P1 , |z n | < 21 Ri , in P2 , 21 Ri ≤ |z n | ≤ 2Ri and in P3 , |z n | > 2Ri . For the


factors of P1 , we have for |z| = Ri ,
    
 
 1 − z e z/zn  ≥  z  − 1 e−|z|/|zn | > e−Ri /|zn | .
 zn  zn

Since
 ∞

1 
|z n |−1 ≤ R |z n |−1− ,
2
|z n |< 21 R n=1

we get
|P1 (z)| > exp(−Ri1+ ).

For P2 (z),   
 
 1 − z e z/zn  ≥ e−2 |z − z n |/2Ri  R −3
 zn  i

by the way we have chosen Ri .


Since n(Ri ) = O(Ri1+ ), we get

|P2 (z)|  (Ri−3 ) Ri


1+
≥ exp(−c1 Ri1+2 ).

Finally, for P3 (z), we have |z/z n | < 1/2 so that


  
 
 1 − z e z/zn  ≥ e−c2 Ri2 /|zn |2
 zn 

and
 ∞

|z n |−2 < (2R)−1+ |z n |−1− .
|z n |>2R n=1

Thus, on |z| = Ri we have

|P(z)| > exp(−R 1+3 ),

so that
|F(z)| < exp(R 1+4 ).

Hence, F(z) = e g(z) , where g(z) is a polynomial of degree at


most 1. This completes the proof. 
262 4 Complex Analysis

There is an additive analogue of Hadamard’s factorization theorem called the


Mittag-Leffler theorem. A special case is given in the exercises below (see Exercise
4). The general version answers the question of whether a meromorphic function
exists having a given principal part at a sequence of points an .

Theorem 4.38 (Mittag-Leffler theorem) Let an be a sequence of distinct complex


numbers such that |an | → ∞. Let Pn be a sequence of polynomials without a constant
term. Then, there exists a meromorphic function f whose only poles are at an with
principal part Pn (1/(z − an )). The most general function of this kind can be written
in the form
∞ "   #
1
f (z) = Pn − Q n (z) + φ(z),
n=1
z − an

where Q n is a suitable polynomial and φ is entire. The series converges absolutely


and uniformly on any compact set not containing the poles.

Proof Without loss of any generality, we may suppose that none of the an are equal to
zero, since we can always add the principal part at zero to our function, if necessary.
As the function  
1
Pn
z − an

is analytic at z = 0, we can write down its power series expansion. Let Q n (z) be the
sum of the terms of degree less than dn (with dn being a natural number to be chosen)
so that      dn
 1  z 
 Pn − Q (z)  ≤ C   ,
 z − an
n  n
an 

valid for |z| ≤ |an |/2 (say) and Cn is a suitable positive constant greater than 1. We
choose dn inductively so that d1 < d2 < · · · and

dn > max([log Cn ] + 1, d1 , d2 , . . . , dn−1 ).

Thus,
|Cn |1/dn
→ 0,
|an |

because |an | → ∞. We claim the series


∞ "
   #
1
Pn − Q n (z)
n=1
z − an

converges absolutely and uniformly for z in any compact set not containing the an ’s.
Indeed, for any given R, let N be such that |a N | ≥ R and split the series as
4.13 Entire Functions of Order 1 263

N "
   # ∞ "
   #
1 1
Pn − Q n (z) + Pn − Q n (z) .
n=1
z − an n=N +1
z − an

The first part is a finite sum. If |z| ≤ R/2, the second part is bounded by


(Cn /|an |dn )z dn ,
n=N +1

which has a radius of convergence equal to infinity, by the ratio test. The finite sum
has the desired polar part for |z| ≤ R. As this is true for every R, this completes the
proof of the theorem. 

Exercises

1. Show that for |z| < 1,



 n 1
(1 + z 2 ) = .
n=0
1−z

2. Prove that
∞ 
 
4z 2
cos πz = 1− .
n=1
(2n − 1)2

3. Find the infinite product factorization of sinh z and cosh z.


4. Let f (z) be meromorphic with simple poles at a1 , a2 , . . . , where 0 < |a1 | <
|a2 | < · · · . Let bn be the residue of f (z) at z = an . Suppose that f is uniformly
bounded on a sequence of circles whose radii approach infinity. Prove that

∞  
bn bn
f (z) = f (0) + + .
n=1
z − an an

5. Using the previous exercise, show that


 
1  1 1
cot z = + + ,
z n=0 z − nπ nπ

where the sum is over all integers n = 0.


264 4 Complex Analysis

4.14 The Gamma Function

The Γ function is defined as follows.



Γ (z) = e−t t z−1 dt, Re(z) > 0.
0

It is easily verified that the integral represents an analytic function in the half plane
Re(z) > 0. Integration by parts shows

Γ (z + 1) = zΓ (z). (4.9)

For natural numbers n, we see that Γ (n + 1) = n! and so Γ (z) is a complex analytic


interpolation of the factorials. This functional equation allows us to derive an analytic
continuation of Γ (z) as a meromorphic function to the entire complex plane. Indeed,
the above functional Eq. (4.9) can be rewritten as

Γ (z + 1)
Γ (z) = ,
z

and the right-hand side is analytic if Re(z) > −1. Thus, by this inductive procedure,
it is easy to see that Γ (z) extends to a meromorphic function in the entire complex
plane with simple poles at z = 0, −1, −2, . . . .
We will prove that 1/Γ (z) is an entire function of order 1 and derive its Hadamard
factorization. To this end, we first derive a second functional equation satisfied by
the Γ function. The following result has the appearance of being a consequence of
Theorem 4.32 but cannot be deduced from that theorem.
Theorem 4.39 We have

v x−1 dv π
=
0 1+v sin πx

for 0 < x < 1.


Proof Consider the integral
z x−1 dz
,
C 1+z

where C is the contour taken along the real axis from  to R, then in the positive
direction along the circle c1 of radius R centered at the origin and then back along
the real axis to z =  and finally around the circle c2 of radius  centered at the origin,
taken in the negative direction.
The function
z x−1
1+z

is regular except at z = −1, where it has a simple pole with residue


4.14 The Gamma Function 265

eπi(x−1) .

We will take  < 1 < R so that integrating the function along the contour indicated
above shows by Cauchy’s theorem

R
u x−1 du z x−1 dz (ue2πi )x−1 du z x−1 dz
+ + +
 1+u c1 1+z R 1+u c2 1+z

= (2πi)eπi(x−1) .

The two integrals along the real axis together give


R R
u x−1 du u x−1 du
(1 − e2πi(x−1) ) = −2ieπi x (sin πx) .
 1+u  1+u

The other two integrals tend to 0 as R → ∞ because on c1 ,


 x−1 
z  R x−1
 
1 + z  ≤ R − 1,

so that  
 z x−1 dz  R x−1 2π R x
 ≤ 2π R = ,
 1+z  R−1 R−1
c1

which tends to 0, since x < 1.


Similarly,  
 z x−1 dz  2πx
 ≤ ,
 1+z  1−
c2

which tends to 0, as  → 0 since x > 0. Therefore,



u x−1 du
−2ieπi x (sin πx) = −2πieπi x ,
0 1+u

which gives

u x−1 π
= .
0 1+u sin πx

Theorem 4.40 We have


π/2
Γ (x)Γ (y) = 2Γ (x + y) (cos θ)2x−1 (sin θ)2y−1 dθ
0
266 4 Complex Analysis

for x, y > 0.
Proof For x, y > 0, we have
 ∞  ∞ 
−t x−1 −u y−1
Γ (x)Γ (y) = e t dt e u du .
0 0

Putting u = tv and inverting the order of integration, we obtain


∞ ∞
dv
Γ (x)Γ (y) = e−t t x−1 dt t y v y e−tv
0 0 v
∞ ∞
= v y−1
dv e−t (1+v) t x+y−1 dt
0 0

v y−1 dv
= Γ (x + y) .
0 (1 + v)x+y

The interchanging of integrals is easily justified by Fubini’s theorem. This last integral
is
π/2
2 (cos θ)2x−1 (sin θ)2y−1 dθ,
0

where we have put v = tan2 θ. 


Making the substitution of λ = cos θ transforms the integral of Theorem 4.40 to
2

1
λx−1 (1 − λ) y−1 dλ,
0

which is called the beta function B(x, y).


In particular, for 0 < x < 1, we obtain
1
Γ (x)Γ (1 − x) = λx−1 (1 − λ)−x dλ.
0

Putting
λ
v=
1−λ

in the integral gives



v x−1 dv
Γ (x)Γ (1 − x) = ,
0 1+v

which by Theorem 4.39 is


π
.
sin πx
These observations prove the following theorem.
4.14 The Gamma Function 267

Theorem 4.41 We have


1
Γ (x)Γ (y) = Γ (x + y) λx−1 (1 − λ) y−1 dλ.
0

π
Γ (x)Γ (1 − x) =
sin πx
for 0 < x < 1.

By the principle of analytic continuation, we deduce that for all complex z, we


have π
Γ (z)Γ (1 − z) = .
sin πz

The theorem also provides an analytic continuation of the beta function.

Theorem 4.42 1/Γ (z) is an entire function with simple zeros at z = 0, −1, −2, . . . .

Proof From the functional equation


π
Γ (z)Γ (1 − z) = ,
sin πz

we see that Γ (z)Γ (1 − z) is regular except when z is an integer, in which case it has
a simple pole.
We also see from this functional equation that since Γ (z) is regular in Re(z) > 0,
Γ (1 − z) has simple poles at z = 1, 2, 3, . . . . Therefore,

1/Γ (z) = Γ (1 − z)(sin πz)/π

is regular in Re(1 − z) ≥ 0. If Re(z) ≤ 0, then Re(1 − z) ≥ 1 and the right-hand


side of the above equation is regular. This completes the proof. 

Exercises

1. Prove that  
1 √
Γ = π.
2

2. (Legendre’s duplication formula) Show that

1 1
Γ (2x)Γ = 22x−1 Γ (x)Γ x +
2 2
for x > 0.
268 4 Complex Analysis

3. Let c be a positive constant. Show that as x → ∞,

Γ (x + c) ∼ x c Γ (x).

4.15 Stirling’s Formula

The student may be familiar with the classical Stirling’s formula that

n n √
n! ∼ 2πn,
e
as n tends to infinity. As the Γ function interpolates the factorials, it is prudent to
see if the asymptotic extends to the complex domain. This is the goal of this section.
The classical Stirling’s formula will be a corollary of our discussion.
Theorem 4.43 (Stirling’s formula) We have

Γ (x) ∼ e−x x x−1/2 2π

as x → ∞.
Proof By partial summation, we know that for a natural number n,

1
log Γ (n) = log(n − 1)! = n − log n − n + c1 + o(1)
2
as n → ∞ (and with c1 an absolute constant). If x is not an integer, let us write
x = n + c for some 0 < c < 1. By Exercise 3 of the previous section, we have

Γ (n + c) ∼ n c Γ (n),

so that

log Γ (x) = log Γ (n) + c log n + o(1)


1
= x− log n − n + c1 + o(1).
2
Also,
n+c c c 1
log = log 1 + = +O 2 ,
n n n n
so that
c 1
log x = log n + +O 2 .
n x
Inserting this observation above gives
4.15 Stirling’s Formula 269

1
log Γ (x) = x − log x − x + c1 + o(1).
2
We can use the duplication formula to evaluate c1 . Indeed, on the one hand we have
from above
1
log Γ (2x) = 2x − log 2x − 2x + c1 + o(1).
2
On the other hand, by the duplication formula (Exercise 2 of the last section) we
have
1 1
log Γ (2x) = (2x − 1) log 2 + log Γ (x) + log Γ x + − log π,
2 2
which is equal to

1 1 1
2x − log 2x − 2x − log 2 + 2c1 − log π + o(1),
2 2 2
so that
1 log 2
c1 = 2c1 − log π − .
2 2
Thus, as required √
c1 = log 2π.

Theorem 4.44 For some constant K ,

Γ  (z) 1 ! dt
= 1 − (1 − t)z−1 − K.
Γ (z) 0 t

Proof By Theorem 4.41, we have

Γ (z − h)Γ (h) 1
= t h−1 (1 − t)z−h−1 dt
Γ (z) 0
1 1 !
= + (1 − t)z−h−1 − 1 t h−1 dt.
h 0

The Taylor expansion of the left-hand side with respect to h is

1 ! 1 !
Γ (z) − Γ  (z)h + · · · + K + ···
Γ (z) h

1 Γ  (z)
= − + K + O(h).
h Γ (z)
270 4 Complex Analysis

The Taylor expansion of the right-hand side is

1 1 ! dt
= + (1 − t)z−1 − 1 + O(h),
h 0 t

so that by equating the constant terms we get the desired result.

Theorem 4.45 For z not a negative integer,



Γ  (z)  1 1
= − −K
Γ (z) n=0
n+1 n+z

for some constant K .

Proof First, for z > 1, we use Theorem 4.44 and expand

 ∞
dt
= (1 − t)n dt
t n=0

in the integrand and integrate term by term to obtain the result. The step is valid
for z > 1 and by analytic continuation for all z unequal to a negative integer. This
completes the proof. 

We are now ready to derive the Hadamard factorization of 1/Γ (z).

Theorem 4.46 The Hadamard factorization of 1/Γ (z) is given by:

∞
1 z −z/n
= eγz z 1+ e ,
Γ (z) n=1
n

where γ denotes Euler’s constant.

Proof We integrate the formula



Γ  (z)  1 1
= − −K
Γ (z) n=0
n + 1 n + z

from z = 1 to z = w and take exponentials, to obtain

∞
1 z −z/n
= e Bz z 1+ e
Γ (z) n=1
n

for some constant B. Putting z = 1 and taking logarithms gives


4.15 Stirling’s Formula 271

1!

 1
0= B+ log 1 + −
n=1
n n
 1!
N
1
= B + lim log 1 + −
N →∞ n n
n=1
= B − γ.

Exercises

1. Show that
∞ [u] − u + 21
1 1
log Γ (z) = z − log z − z + log 2π + du.
2 2 0 u+z

2. For any δ > 0, show that

1 1 1
log Γ (z) = z − log z − z + log 2π + O
2 2 |z|

uniformly for −π + δ ≤ arg z ≤ π − δ.


3. If σ is fixed, and |t| → ∞, show that

|Γ (σ + it)| ∼ e− 2 π|t| |t|σ− 2
1 1
2π.

4. Show that 1/Γ (z) is of order 1.

5. Show that
Γ  (z) 1
= log z + O
Γ (z) |z|

for |z| → ∞ in the sector −π + δ < arg z < π − δ for any fixed δ > 0.
6. Prove that Γ (s) has poles only at s = 0, −1, −2, . . . , and that these are simple,
with
Ress=−k Γ (s) = (−1)k /k!.

7. Show that
1
e−1/x = x s Γ (s)ds,
2πi (σ)

for any σ > 1 and x ≥ 1.


272 4 Complex Analysis


8. Let f (s) = ∞n=1 an /n be an absolutely convergent Dirichlet series in the half
s

plane Re(s) > 1. Show that



 1
an e−n/x = f (s)x s Γ (s)ds
n=1
2πi (σ)

for any σ > 1.


9. With ζ(s) denoting the Riemann zeta function

∞
1
ζ(s) = s
, Re(s) > 1,
n=1
n

prove that for Re(s) > 1,


$∞ %
s ∞  dx
π −s/2 Γ e−n πx
2
ζ(s) = x s/2 .
2 0 n=1
x

10. Use formula 3.6 (with a = 0) to show that



s 1 dx
Ξ (s) := π −s/2 Γ
s 1−s
ζ(s) = + w(x) x 2 + x 2 ,
2 s(s − 1) 1 x

where


e−n πx
2
w(x) = .
n=1

Show that the integral converges absolutely for all s ∈ C. This gives the analytic
continuation and functional equation

Ξ (s) = Ξ (1 − s),

of the Riemann zeta function. The proof is due to Riemann.

4.16 The Wiener–Ikehara Tauberian Theorem

In this section, we present an important application of complex analysis to the the-


ory of Dirichlet series. This application, known as the Wiener–Ikehara Tauberian
theorem, is motivated by the prime number theorem, which is the statement that the
number of primes p ≤ x, denoted π(x), satisfies the asymptotic formula
x
π(x) ∼
log x
4.16 The Wiener–Ikehara Tauberian Theorem 273

as x tends to infinity. This theorem was proved independently in 1896 by J. Hadamard


and C. de la Vallée Poussin. A critical ingredient in both proofs is the nonvanishing
of the Riemann zeta function on the line Re(s) = 1. Later through Tauberian theory,
Wiener showed that in fact, the prime number theorem is equivalent to the assertion
ζ(1 + it) = 0 for all t ∈ R. Thus, the general feeling in the 1940s was that the prime
number theorem is essentially a theorem in complex analysis so that it was a surprise
when an “elementary” proof of the prime number theorem, which avoided complex
analysis, was discovered by A. Selberg (1917–2007) and P. Erdös (1913–1996) in
1948.
We motivate the notion of Tauberian theorems with a brief discussion of Abel’s
theorem which the reader may have already seen in an undergraduate course in
analysis.

Let ∞ n=0 an x , x ∈ R be a power series centered at 0 having radius of convergence
n

1. At the boundary of the region of convergence, i.e., at |x| = 1, the series may
converge or diverge. In Chap. 1, we have already seen examples of this phenomenon.
Abel’s theorem states that if the series converges at a boundary point, then it is
reasonably well behaved in the sense that it is continuous at that point. More precisely,
if
∞
an = A, (4.10)
n=0

then


lim− an x n = A. (4.11)
x→1
n=0

Broadly speaking, Tauberian theorems are conditional converses of Abel’s the-


orem. More precisely, any theorem in which conditions are imposed on the an ’s to
ensure the converse of Abel’s theorem holds is called a Tauberian theorem. They
derive their name from a theorem of A. Tauber (1866–1942) [2] published in 1897,
which states that if (4.11) is satisfied and we have the growth condition an = o(1/n)
on the coefficients of the power series, then (4.10) holds. These growth conditions
were subsequently relaxed, most notably by Hardy and Littlewood and by K. Ananda
Rau who showed in 1928 that if an = O(1/n), then (4.10) holds.
Some of the most interesting applications of Tauberian theorems pertain to analytic
number theory. In this context, Tauberian results can be thought of as estimates for
the partial sums of coefficients of certain Dirichlet series. An important result of this
type is the Wiener–Ikehara theorem. Introduced by Ikehara [3] in 1931, it generalizes
a theorem of E. Landau (1877–1938) [4], by applying a Tauberian result obtained
by Wiener. Proofs of this and other Tauberian theorems in the literature are usually
found to be quite involved.
A well-known application of the Wiener–Ikehara theorem is to the derivation of
the prime number theorem. In 1980, Newman [5] gave an ingenious short proof
of the prime number theorem. We modify Newman’s proof to derive the Wiener–
Ikehara Tauberian theorem. That this can be done was also recognized by Korevaar
274 4 Complex Analysis

[6]. However, our presentation is simplified and our theorem is more general. We
derive as a consequence an assortment of prime number theorems following the
arrangement of Serre [7].

Exercises

1. Let Λ(n) = log p if n = pa for some prime p and zero otherwise. Show that the
prime number theorem is equivalent to

ψ(x) := Λ(n) ∼ x,
n≤x

as x → ∞.
2. Prove that 
log p = ψ(x) + O(x 1/2 log x),
p≤x

where the summation is over primes p less than or equal to x.

4.17 The Analytic Theorem

The following analytic theorem of Newman [5] is the key result that will be used
to prove the Tauberian theorem. The proof is an application of Cauchy’s residue
theorem. Newman’s novel idea was the insertion of a new kernel into the relevant
integral, playing a role similar to that of the Fejér kernel in standard proofs of the
Tauberian theorem.
Theorem 4.47  ∞ For t ≥ 0, let f (t) be a bounded and locally integrable function and
let g(s) := 0 f(t)e−st dt for Re(s) > 0. If g(s) has an analytic continuation to

Re(s) ≥ 0, then 0 f (t)dt exists and equals g(0).
T
Proof For T > 0, let gT (s) = 0 f (t)e−st dt. This integral converges for all values
of s, and it is easy to see that gT (s) is an entire function. We need to show that

lim gT (0) = g(0).


T →∞

We will denote Re(s) by σ. Fix R > 0 and consider the positively oriented contour
C shown in Fig. 4.12. Here, δ > 0 (depending on R) is chosen small enough so that
g(s) is analytic on C . Indeed, as g(s) is analytic on the line σ = 0, one can cover
the vertical strip from (0, R) to (0, −R) with open balls, on each of which g(s) is
analytic. Compactness of this strip allows one to obtain a finite subcover, which then
gives the desired δ.
4.17 The Analytic Theorem 275

Fig. 4.12 Contour C

C− C+

(−R, 0) (−δ, 0) (0, 0) (R, 0)

We use the following notation :

C+ = C ∩ {s : σ > 0}, C− = C ∩ {s : σ < 0}.

We also denote the semicircle of radius R to the left of the line σ = 0 by C− . We


will use the big O notation, treating everything other than the variables T, R and σ
as constants.
Cauchy’s theorem gives us
 
1 s 2 ds
IC := (g(s) − gT (s))e sT
1+ 2 = g(0) − gT (0), (4.12)
2πi C R s

as the integrand is analytic inside C except for a simple pole at s = 0. We denote


the corresponding integrals over C+ and C− as IC + and IC − , respectively. Let M =
supt≥0 | f (t)|. On C+ , as σ > 0, we have
 
 ∞  ∞
e−σT
|g(s) − gT (s)| =  f (t)e−st dt  ≤ M e−σt dt  .
T T σ

Using s = Reiθ and R cos θ = σ on C+ , we obtain the following estimate for the
kernel
      
 sT 1 s 2  
σT  1 eiθ   
σT  2 cos θ  σT |σ|
e
 s 1 + R 2  = e  Reiθ + R  = e  R   e R 2 . (4.13)

Thus, the contribution to (4.12) from the path C+ of length π R is


 
1   1
|IC + |  2  ds   .
R C+ R

On C− , we examine gT (s) and g(s) separately. Consider first the integral


276 4 Complex Analysis
 
1 s 2 ds
I1 := gT (s)e sT
1+ 2
2πi C− R s

As gT (s) is entire and the rest of the integrand is analytic to the left of σ = 0, we
have by Cauchy’s theorem
 
1 s 2 ds
I1 = gT (s)e sT
1+ 2 .
2πi C− R s

That is, we can integrate over the semicircle C− instead of C− . Then, noting that
σ < 0 in this case, we have
 
 T  T
e−σT
|gT (s)| =  f (t)e −st
dt  ≤ M e−σt dt  ,
0 0 |σ|

and the estimate (4.13) holds on C− exactly as it did on C− . We obtain |I1 |  1/R
in the same way as done for |IC + | above. This leaves us with the integral
 
1 s 2 ds
I2 := g(s)e sT
1+ 2 .
2πi C− R s

As C− is contained in a compact set on which g(s) is analytic, |g(s)| can be bounded


in terms of R on C− . As the estimate (4.13) holds on the arcs of C− , the integrand in
this region is of the order of
|σ|eσT as T → ∞,

with the implicit constant depending on R. Recalling that σ < 0 in this region, the
above quantity can be compared to the real-valued function xe−x , which attains a
global maximum of e−1 (as can be checked by standard derivative tests). Thus,

|σ|eσT ≤ e−1 /T,

giving a bound of O R (1/T ) for the integrand over the arcs of C− . As the length of
the arcs is again a function of R which gets absorbed into the implied constant, we
see that the contribution to |I2 | from the arcs of C− is O R (1/T ) as T → ∞. On the
vertical strip of C− , as σ = −δ, we have

|esT | = e−δT .

The rest of the integrand of I2 is analytic in this region and hence absolutely bounded
in terms of R. The contribution to |I2 | from this strip is thus O R (e−δT ). Putting
everything together, we have obtained, as T → ∞,
4.17 The Analytic Theorem 277

|g(0) − gT (0)| = |IC | ≤ |IC + | + |I1 | + |I2 |


   
1 1
O + OR + O R (e−δT ).
R T

As R is arbitrary, the right-hand side can be made as small as needed. This completes
the proof. 

4.18 The Proof of the Tauberian Theorem

We establish the following version of the Tauberian theorem, applicable in many


settings.

Theorem 4.48 Let




G(s) = bn /n s
n=1

be a Dirichlet series with non-negative coefficients, satisfying


(a) G(s) is absolutely convergent for Re(s) > 1.
(b) The function G(s) extends meromorphically to the region Re(s) ≥ 1, having no
 possibly a simple pole at s = 1 with residue R.
poles except
(c) B(x) := n≤x bn = O(x).
Then, as x → ∞,
B(x) = Rx + o(x).

We begin by making some elementary observations. For any  > 0,

 ∞
 x 1+
bn ≤ bn .
n≤x n=1
n

The right-hand side is x 1+ G(1 + ) which is of the order of x 1+ / since G(1 + ) 
1/. Choosing  = (log x)−1 gives B(x)  x log x. Note that this estimate does not
use any information about the behavior of G(s) on Re(s) = 1, except at s = 1.
Normally, (c) is not needed in the general Wiener–Ikehara Tauberian theorem. One
can deduce it from the other assumptions, as indicated in the concluding remarks.
However, in practically all applications, this condition is found to be readily available
and we retain it for the sake of a shorter proof.
A natural starting point for this and indeed most proofs of the Tauberian theorem
is what is known as Abel’s trick: For Re(s) > 1, we have

B(x)
G(s) = s d x. (4.14)
1 x s+1
278 4 Complex Analysis

This can be derived using partial summation, as is done in Exercise 2.1.5 of [8]. We
proceed to prove the above Theorem 4.48.

Proof Without loss of generality, we may suppose R > 0. Indeed, if R ≤ 0, it is


enough to prove the result for G(s) + mζ(s), where ζ is the Riemann zeta function
and m is an integer greater than |R|. For R > 0, replacing bn by bn /R if needed, we
may assume R = 1. From our discussion above, we have for Re(s) > 1,

G(s) 1 B(x) − x
− = dx (4.15)
s s−1 1 x s+1

After the change of variable x to eu and then s to s + 1, we have for Re(s) > 0,

G(s + 1) 1 B(eu ) − eu −su
− = e du,
s+1 s 0 eu

which is suitable for application of Theorem 4.47 because the function

f (u) := (B(eu ) − eu )/eu

is bounded on account of (c) and the left-hand side has an analytic continuation to
Re(s) ≥ 0 by (b). Hence, by Theorem 4.47, the integral
∞ ∞
B(eu ) − eu B(t) − t
du = dt (4.16)
0 eu 1 t2

converges. We will show that B(x) ∼ x as x → ∞. Suppose not. Then, lim x→∞
B(x)/x either does not exist or does not equal 1 if it exists. In either case, we see that
lim supx→∞ B(x)/x > 1 or lim inf x→∞ B(x)/x < 1. Suppose the former inequality
holds (the latter case can be treated similarly). Then, there exists some λ > 1 such that
B(x) ≥ λx for infinitely many x. As there exists x arbitrarily large with B(x) ≥ λx
and B(x) is an increasing function, we have

λx λx
B(t) − t λx − t
dt ≥ dt
x t2 x t2
λ λ
λx − vx λ−v
= xdv = dv,
1 (vx)2 1 v2

which is a positive quantity c(λ) (say) depending only on λ. This gives


 
 ∞
B(t) − t ∞
B(t) − t 
 dt −
 t2 t2
dt  = c(λ)
x λx
4.18 The Proof of the Tauberian Theorem 279

For fixed λ, as x → ∞, the above integrals are tails of the convergent integral (4.16)
and can be made arbitrarily small, thereby giving a contradiction. This completes the
proof. 

The result can be extended to Dirichlet series with complex coefficients as follows.

Corollary 4.1 Let




F(s) = an /n s
n=1

be a Dirichlet series with complex coefficients. Let A(x) denote the partial sum of
the coefficients: 
A(x) = an .
n≤x

∞
Suppose there exists a Dirichlet series G(s) = n=1 bn /n s with non-negative coef-
ficients, such that
(a) |an | ≤ bn for all n.
(b) G(s) is absolutely convergent for Re(s) > 1.
(c) The function G(s) (resp. F(s)) extends meromorphically to the region Re(s) ≥ 1,
having nopoles except for a simple pole at s = 1 with residue R (resp. r ).
(d) B(x) := n≤x bn = O(x).
Then, as x → ∞,
A(x) = r x + o(x).

Proof If an ’s are real, we consider the series G(s) − F(s), which has non-negative
coefficients and satisfies the conditions of Theorem 4.48, giving

(bn − an ) = (R − r )x + o(x),
n≤x

as x → ∞. As B(x) = Rx + o(x), this proves the result in the case of real coeffi-
cients. If the coefficients an are not real, we define


F ∗ (s) = ān /n s
n=1

so that  
F + F∗ F − F∗
F= +i .
2 2i

and apply the result for real coefficients separately to the real and imaginary part
above after checking that the necessary conditions are satisfied. 
280 4 Complex Analysis

As remarked earlier, the added condition (c) in Theorem 4.48 is not restrictive for
most practical purposes. However, it is possible to eliminate this condition altogether.
We give a brief sketch of the argument. The key idea is to notice that the known bound
B(x)  x log x implies that for any  > 0, the function

f (t) B(et ) 1
f  (t) := = − t
et et (1+) e
is bounded and satisfies the conditions of Theorem 4.47. Applying this theorem to
f  (t) and following an elementary argument that exploits the increasing behavior of
the function B(et )/et (1+) , one obtains a uniform bound on supt≥0 | f  (t)|. Letting
 → 0, we see that f (t) must be bounded. A more detailed proof can be found in
[6].

4.19 The Prime Number Theorem

The Riemann zeta function admits an analytic continuation to the entire complex
plane, apart from a simple pole at s = 1. It satisfies a remarkable functional equation
which we derived in an earlier chapter using the Poisson summation formula, essen-
tially following the method of Riemann. However, for the purpose of deriving the
prime number theorem, it is not necessary to have analytic continuation to the entire
complex plane. It suffices to have continuation to Re(s) ≥ 1. We elaborate below.
Firstly, the series
∞
1
n=1
ns

is analytic in the half plane Re(s) > 1. Also, by an observation due to Euler, the
series can be written as an infinite product over prime numbers:

∞
1  
1 −1
= 1 − ,
n=1
ns p
ps

again valid in the region Re(s) > 1. In Example 4.4, we saw that this Euler product
shows that ζ(s) = 0 for Re(s) > 1. This Euler product provides the fundamental link
between ζ(s) and prime numbers. Indeed, if we define the von Mangoldt function
by 
log p if n = pr , p prime
Λ(n) :=
0 otherwise,

then it is easy to see that



ζ  (s)  Λ(n)
− = .
ζ(s) n=1
ns
4.19 The Prime Number Theorem 281

Since ζ(s) admits an analytic continuation to the entire complex plane, apart from
a simple pole at s = 1, we see that the left-hand side is a meromorphic function. If
ζ(s) = 0, for Re(s) = 1, then the left-hand side provides an analytic continuation
of the series on the right-hand side for Re(s) ≥ 1, except for a simple pole at s = 1
with residue 1. The Wiener–Ikehara Tauberian theorem then implies that

Λ(n) ∼ x,
n≤x

as x tends to infinity. It is a simple exercise (see Exercise 1 of Sect. 4.16) to show


that this is equivalent to the prime number theorem.
To show that ζ(s) = 0 for Re(s) = 1, we begin by assuming to the contrary
that ζ(1 + it) = 0 for some t ∈ R, t = 0. We apply the elementary inequality (see
Exercise 3)
3 + 4 cos θ + cos 2θ ≥ 0, (4.17)

and with σ > 1, we have from the Euler product (see Exercise 4)

⎛ ⎞
 1
|ζ(σ)3 ζ(σ + it)4 ζ(σ + 2it)| = exp ⎝ (3 + 4 cos(tr log p) + cos(2tr log p)⎠
r pr σ
p,r ≥1
(4.18)
By (4.17), the term on the right-hand side of (4.18) is ≥ 1. This being valid for every
σ > 1, we may take the limit of the left-hand side as σ → 1+ . If ζ(1 + it) = 0, the
term on the left vanishes because ζ(σ)3 has a pole of order 3 and ζ(σ + it)4 has a
zero of order 4 at σ = 1. This proves:

Theorem 4.49
ζ(1 + it) = 0, ∀t ∈ R, t = 0.

The Tauberian theorem thus provides the simplest derivation of the prime number
theorem once we understand the minimal requirements needed from the zeta function.
That the zeta function admits an analytic continuation to Re(s) > 0 is easily seen by
noting that the alternating series

 (−1)n−1
= (1 − 21−s )ζ(s), (4.19)
n=1
ns

provides immediately an analytic continuation of ζ(s) to Re(s) > 0 (Exercise 1).


Indeed, the left-hand side converges in that region and thus defines an analytic func-
tion.
282 4 Complex Analysis

Exercises

1. Suppose that an is a sequence of complex numbers such that



A(x) := an = O(1).
n≤x

Show (by partial summation) that



 ∞
an A(t)
=s dt.
n=1
ns 1 t s+1

Deduce that

 (−1)n−1
n=1
ns

defines an analytic function in Re(s) > 0.


2. Show that for Re(s) > 1,

 (−1)n−1
(1 − 2 1−s
)ζ(s) = .
n=1
ns

3. Prove that for θ ∈ R,


3 + 4 cos θ + cos 2θ ≥ 0.

4. Show that for t ∈ R,

|ζ(σ)3 ζ(σ + it)4 ζ(σ + 2it)| ≥ 1.

5. The Möbius function is defined as follows: μ(1) = 1, and μ(n) = 0 if n is not a


product of distinct primes; otherwise, μ(n) = (−1)k if n is a product of k distinct
primes. Show that
∞
1 μ(n)
= .
ζ(s) n=1
ns

6. Prove that 
μ(n) = o(x),
n≤x

as n tends to infinity.
4.20 Further Applications 283

4.20 Further Applications

In this section, we demonstrate some applications of the Tauberian theorem, follow-


ing the treatment of Serre [7] who gives a general setup for the same in the context
of equidistribution.
We make this more precise in an abstract setting as follows. Let G be a compact
group and X be the space of conjugacy classes of G. Let xv be a family of elements
of X , indexed by a countably infinite set P. Let N : P → Z be a function taking
integer values ≥ 2, ρ an irreducible complex representation of G with character χ.
We define
 1
−1   
ρ(xv ) −1
ζP (s) = 1− , L(s, ρ) = det 1 − .
v∈P
(N v)s v∈P
(N v)s

Thus, for the trivial representation ρ = 1, L(s, 1) = ζP (s).

Theorem 4.50 Suppose L(s, ρ) is absolutely convergent for Re(s) > 1 and extends
to a meromorphic function on Re(s) ≥ 1 with no zeros or poles except for a pole of
order cχ at s = 1. Then,
 n
χ(xv ) = (1 + o(1))cχ .
N v≤n
log n

The proof of the above theorem follows by applying the Tauberian theorem to
L  /L. We refer the reader to the appendix of Chap. 1 of [7] for the same. If Theorem
4.50 holds for all irreducible representations ρ = 1 with cχ = 0, then the Peter–
Weyl theorem allows us to deduce that the xv ’s are equidistributed with respect to
the normalized Haar measure of G. Special cases of this theorem lead to important
results, among them being the prime number theorem for arithmetic progressions
(see exercises below), Chebotarev density theorem and the Sato–Tate theorem. An
excellent reference for the interested reader wishing to delve deeper into these topics
is [9].

Exercises

1. The Hurwitz zeta function is defined for Re(s) > 1 by



 1
ζ(s, x) = ,
n=0
(n + x)s

for 0 < x ≤ 1. Prove that


284 4 Complex Analysis

∞  
1 −s
− s + ζ(s, x) = ζ(s) = ζ(s + r )x r .
x r =1
r

[Hint: Use the binomial theorem to write

∞  
1 1 −s x r
− s = ,
(n + x)s n r =1
r n s+r

and then sum over n.]


2. Using the previous exercise, show that
∞ 
 
−s ζ(s + r )
(2s − 2)ζ(s) = 2s + .
r =1
r 2r

Deduce that (2s − 2)ζ(s) extends analytically to Re(s) > 0. [Hint: Put x = 1/2
in the identity of the previous exercise.]
3. For any natural number q, show that

q−1  
 a
(q − q)ζ(s) =
s
ζ(s, ) − ζ(s) .
a=1
q

4. Deduce that (3s − 3)ζ(s) extends analytically to Re(s) > 0. Combining this with
Exercise 2, conclude that ζ(s) extends to Re(s) > 0 apart from a possible pole at
s = 1.
5. Show that lims→1+ (2s − 2)ζ(s) = 2 log 2. [Hint: Use question 2.] Deduce from
the previous exercise that ζ(s) extends to an analytic function for Re(s) > 0
except for a simple pole at s = 1 with residue 1.
6. For each character χ : (Z/qZ)∗ → C, show that L(s, χ) can be written as


q
L(s, χ) = q −s χ(a)ζ(s, a/q).
a=1

Deduce that L(s, χ) extends analytically to Re(s) > 0 for every non-trivial char-
acter χ.
7. With notation as in the previous exercise, show that if L(s, χ) does not vanish on
Re(s) = 1, then for a and q coprime,
 x
Λ(n) ∼ ,
n≤x,n≡a(mod q)
φ(q)
4.20 Further Applications 285

where φ(q) is Euler’s function, equal to the order of the group (Z/qZ)∗ . Deduce
that there are infinitely many primes in arithmetic progressions. (Dirichlet’s the-
orem)

4.21 The Paley–Wiener Theorems

The theme of fusing Fourier transforms and analytic functions is the underlying goal
of the Paley–Wiener theorems. It is sometimes the case that a Fourier transform of
a function f on R can be extended to an analytic function in some region of the
complex plane. For instance, we have already seen that if f (t) = e−|t| , then (up to a
constant factor) the Fourier transform &
f (x) is

1
,
1 + x2

which is a rational function and thus defines a meromorphic function in the complex
plane. Thus, it would be interesting to investigate under what conditions on f is it
the case that &
f is analytic in certain regions of the complex plane. This is the aim of
the Paley–Wiener theorems.
For instance, let F ∈ L 2 (0, ∞) and define

f (z) = F(t)eit z dt, z ∈ H, (4.20)
0

where H denotes the upper half plane (i.e., z ∈ C with Im(z) > 0). If Im(z) > δ > 0,
and z n is a sequence of complex numbers with Im(z n ) > δ tending to z , then a simple
application of the dominated convergence theorem shows

lim |eit zn − eit z |2 dt = 0,
n→∞ 0

because the integrand is bounded by the L 1 -function 4e−2δt and tends to zero for
every t > 0. The Cauchy–Schwarz inequality implies then that f is continuous in H.
Furthermore, a direct application of Fubini’s theorem and Cauchy’s theorem shows
that
f (z)dz = 0,
γ

for every closed path γ in H. By Morera’s theorem, we can conclude that f is a


holomorphic function on H. That is, f ∈ H (H). Thus, in this example, the Fourier
transform extends to an analytic function on the upper half plane.
We can rewrite (4.20) with x, y ∈ R and y fixed as:
286 4 Complex Analysis


f (x + i y) = F(t)e−t y eit x dt.
0

Applying Plancherel’s theorem gives


∞ ∞ ∞
1
| f (x + i y)|2 d x = |F(t)|2 e−2t y dt ≤ |F(t)|2 dt
2π −∞ 0 0

for√every y > 0 (keeping in mind that our Lebesgue measure was normalized by
1/ 2π in the earlier chapter). This proves:

Theorem 4.51 If f is of the form (4.20), then f is holomorphic in H and its restric-
tions to horizontal lines in H form a bounded set in L 2 (R).

Consider now a second class of functions f of the form


A
f (z) = F(t)eit z dt (4.21)
−A

where 0 < A < ∞ and F ∈ L 2 (X ) where X = (−A, A). By the method of proof
above, f is entire and it satisfies the growth condition
A A
| f (z)| ≤ |F(t)|e−t y dt ≤ e A|y| |F(t)|dt.
−A −A

If we denote by C this last integral, then C < ∞ and we have

| f (z)| ≤ Ce A|z| . (4.22)

Entire functions that satisfy (4.22) are said to be of exponential type. Our discussion
shows that

Theorem 4.52 Every f of the form (4.21) is an entire function that satisfies (4.22)
and whose restriction to the real axis lies in L 2 (by the Plancherel theorem).

It is remarkable that the converses of the two theorems above are true. This is the
content of the Paley–Wiener theorems.

Theorem 4.53 Suppose f ∈ H (H) and



1
sup | f (x + i y)|2 d x = C < ∞. (4.23)
0<y<∞ 2π −∞

Then, there exists an f ∈ L 2 (0, ∞), such that



f (z) = F(t)eit z dt, z ∈ H,
0
4.21 The Paley–Wiener Theorems 287

and ∞
|F(t)|2 dt = C.
0

Proof To gain some intuition on what F should look like, we can apply the inversion
theorem (without worrying whether the conditions are met or not). Thus, our desired
F should be of the form

1
F(t) = et y f (x + i y)e−it x d x = f (z)e−it z dz,
−∞ 2π Im(z)=y

and if this argument is correct, the last integral should not depend on which y is
chosen suggesting that perhaps Cauchy’s theorem should be applied. Motivated by
this idea, let y be fixed with 0 < y < ∞. For each a > 0, let γa be the rectangular
path with vertices ±a + i and ±a + i y. Since f is holomorphic in the upper half
plane, we have by Cauchy’s theorem

f (z)e−it z dz = 0.
γa

We consider only real values of t. Let φ(b) be the integral of f (z)e−it z over the
straight-line interval from b + i to b + i y (b ∈ R). Set I = [y, 1] if y < 1 and I =
[1, y] if y > 1. Then, by the Cauchy–Schwarz inequality
 2   
 
|φ(b)|2 =  f (b + iu)e−it (b+iu) du  ≤ | f (b + iu)|2 du e2tu du .
I I I
(4.24)
Put
L(b) = | f (b + iu)|2 du,
I

so that by (4.23) and an application of Fubini’s theorem it gives



1
L(b) db ≤ Cμ(I ).
2π −∞

Hence, there is a sequence of a j with a j → ∞ such that

L(a j ) + L(−a j ) → 0, j → ∞.

By (4.24), this implies that

φ(a j ) → 0, φ(−a j ) → 0, as j → ∞.

This holds for every t ∈ R, and the sequence a j does not depend on t.
288 4 Complex Analysis

Let us now define


aj
1
g j (y, t) = f (x + i y)e−it x d x.
2π −a j

Then, we have from the above that

lim [et y g j (y, t) − et g j (1, t)] = 0, −∞ < t < ∞. (4.25)


j→∞

Writing f y (x) for f (x + i y), we see that f y ∈ L 2 (−∞, ∞), by hypothesis. The
Plancherel theorem shows that

lim |&
f y (t) − g j (y, t)|2 dt = 0, (4.26)
j→∞ −∞

where & f y is the Fourier transform of f y . A subsequence of {g j (y, t)} converges


therefore pointwise to & f y (t) for almost all t. Thus, if we define

F(t) = et &
f 1 (t), (4.27)

then it follows from (4.25) that

F(t) = et y &
f y (t). (4.28)

Notice that (4.27) does not involve y and (4.28) holds for every y > 0. Plancherel’s
theorem can be applied to (4.28):
∞ ∞ ∞
1
e−2t y |F(t)|2 dt = |&
f y (t)|2 dt = | f y (x)|2 d x ≤ C.
−∞ −∞ 2π −∞

By taking y arbitrarily large in the above inequality, we deduce that F(t) = 0 almost
all t < 0. On the other hand, letting y → 0 in the penultimate inequality shows that

|F(t)|2 dt ≤ C.
0

From (4.28) and the Cauchy–Schwarz inequality, we deduce that


∞  ∞  ∞ 
| f y (t)| dt ≤ e −2t y
dt e 2t y
|&
f y (t)|2 dt <∞
−∞ −∞ −∞

for y > 0. Thus, &


f y ∈ L 1 (R). The Fourier inversion theorem gives that

f y (x) = &
f y (t)eit x dt.
−∞
4.21 The Paley–Wiener Theorems 289

In other words,
∞ ∞
f (z) = F(t)e−yt eit x dt = F(t)eit z dt (z ∈ H).
0 0

Finally,

|F(t)|2 dt = C
0

follows from (4.23) and an application of Plancherel’s theorem. This completes the
proof. 
Theorem 4.54 Suppose A and C are positive constants and f is an entire function
such that
| f (z)| ≤ Ce A|z| ∀ z ∈ C,

and ∞
| f (x)|2 d x < ∞.
−∞

Then, there exists an F ∈ L 2 (X ) with X = (−A, A) such that


A
f (z) = F(t)eit z dt,
−A

for all z ∈ C.
Proof Put f  (x) = f (x)e−|x| for  > 0 and x ∈ R. We will show that

lim f  (x)e−it x d x = 0, t ∈ R, |t| > A. (4.29)
→0 −∞

Since
|| f  − f ||2 → 0, as  → 0,

The Plancherel theorem would then imply that the Fourier transforms of f  converge
in L 2 to the Fourier transform F of f . Then, (4.29) will imply that F vanishes outside
of [−A, A] so that the Fourier inversion theorem then implies the result. To prove
(4.29), let for each real a, γa be the path defined by

γa (s) = seia , 0 ≤ s < ∞.

In other words, γa is the ray emanating from the origin to infinity with radial angle
a. Put
Pa = {w : Re(weia ) > A},

and if w ∈ Pa , define
290 4 Complex Analysis


Φa (w) = f (z)e−wz dz = eia f (seia ) exp(−wseia )ds.
γa 0

By our hypothesis on f , the absolute value of the integrand above is at most

C exp(−[Re(weia ) − A]s)

and it follows from our preliminary discussion that Φa is holomorphic in Pa . Now


for a = 0 and a = π, we have

Φ0 (w) = f (x)e−wx d x, Re(w) > 0,
0

0
Φπ (w) = − f (x)e−wx d x, Re(w) < 0.
−∞

Since f ∈ L 2 (R), we see that both Φ0 and Φπ are holomorphic in the indicated half
planes. Now it is easily checked that for t ∈ R,
∞ ∞
f  (x)e−it x d x = f (x)e−|x| e−it x d x = Φ0 ( + it) − Φπ (− + it),
−∞ −∞

so to prove our assertion, we need to show that the right-hand side of the above
equation tends to zero as  → 0 for |t| > A. We will do this by showing that any two
of our functions Φa agree in the intersections of their domains of definition. That is,
they are analytic continuations of each other. Once we have this, then we can replace
Φ0 and Φπ by Φπ/2 when T < −A and by Φ−π/2 for t > A and it is then obvious
that the difference tends to zero as  → 0. To this end, suppose 0 < b − a < π and
put
a+b b−a
c= , d = cos > 0.
2 2

Putting w = |w|eic , we have

Re(wia ) = d|w| = Re(weib )

so that w ∈ Pa ∩ Pb as soon as |w| > A/d. Now consider the integral

f (z)e−wz dz, (4.30)


γ

where γ is the circular arc of radius r given by γ(t)r eit , for a ≤ t ≤ b. Since

Re(−wz) = −|w|r cos(t − c) ≤ −|w|r d,


4.21 The Paley–Wiener Theorems 291

the absolute value of the integrand in (4.30) is bounded by

C exp ((A − |w|d)r ) .

If |w| > A/d, it follows that the integral in (4.30) tends to zero as r tends to infinity.
We now apply Cauchy’s theorem:

f (z)e−wz d x + f (z)e−wz dz + f (z)e−wz dz = 0.


[0,r eib ] γ [0,r eia ]

Since the middle integral tends to zero as r tends to infinity, we conclude that
Φa (w) = Φb (w) for w = |w|e−ic and |w| > A/d. As these two functions agree
on an uncountable set, we conclude Φa and Φb agree on in the intersection of the
half planes in which they were originally defined. This completes the proof. 

Exercises

1. Suppose f is an entire function of exponential type and



ϕ(y) = | f (x + i y)|2 d x.
−∞

Prove that f = 0 if ϕ is a bounded function.


2. Suppose that f is an entire function of exponential type such that the restriction
of f to two non-parallel lines belongs to L 2 . Prove that f = 0.

References

1. M. Ram Murty, M. Dewar, H. Graves, Problems in the theory of modular forms, IMSC Lecture
Notes, Hindustan Book Agency, Delhi (2015)
2. A. Tauber, Ein Satz aus der Theorie der unendlichen Reihen. Monatshefte f. Math. 8, 273–277
(1897)
3. S. Ikehara, An extension of Landau’s theorem in the analytic theory of numbers. J. Math.
Phys. Mass. Inst. Technol. 10, 1–12 (1931)
4. E. Landau, Über die Betdeutung einiger neuerer Grenzwertsätze der Herren Hardy und Axer.
Prace mat.-Fiz. 21, 97–177 (1910)
5. D.J. Newman, Simple analytic proof of the prime number theorem. Am. Math. Mon. 87,
693–696 (1980)
6. J. Korevaar, The Wiener-Ikehara theorem by complex analysis. Proc. Am. Math. Soc. 134(4),
1107–1116 (2005)
7. J.P. Serre, Abelian l-Adic Representations and Elliptic curves, Lectures at McGill University
(W. A. Benjamin Inc., New York-Amsterdam, 1968)
292 4 Complex Analysis

8. M. Ram Murty, Problems in Analytic Number Theory, 2nd edn., Graduate Texts in Mathe-
matics (Springer, New York, 2008)
9. M. Ram Murty, V. Kumar Murty, Non-Vanishing of L-Functions and Applications, Modern
Birkhäuser Classics (Birkhäuser/Springer Basel AG, Basel, 1997)
Chapter 5
Introduction to Algebraic Topology

5.1 A Very Brief Historical Introduction

The subject of algebraic topology was born in 1895, when Henri Poincaré introduced
the notion of the fundamental group π1 (X ) attached to a topological space X . His idea
was to study the topology of a space using group theory and more generally, algebra.
The modern viewpoint is somewhat analogous (in some perspectives, identical) to
the study of normal extensions of fields and their associated Galois groups.
Since the computations of fundamental groups of topological spaces often proved
difficult, Poincaré also introduced homology groups to measure connectivity proper-
ties of topological spaces. These being abelian groups, one could invoke the machin-
ery of Z-modules (or more generally, module theory) to study the topological spaces.
By this route, algebra entered the study of topology, and we call this branch of math-
ematics algebraic topology.
The subject developed rapidly in the early part of the twentieth century. For
example, in 1910, L.E.J. Brouwer proved his famous fixed point theorem stating that
any continuous map
f : D → D,

where D = {z ∈ C : |z| ≤ 1}, has a fixed point. Later, in 1926, S. Lefschetz extended
this to prove his celebrated fixed point theorem. In 1967, M. Atiyah and R. Bott
gave a far-reaching generalization of Lefschetz’s formulas. Their work led to the
development of the Atiyah–Singer index theorem the following year.
A fundamental concept in algebraic topology is the notion of homotopy. If X
and Y are two topological spaces, maps f : X → Y and g : X → Y are said to be
homotopically equivalent if there is a continuous map

h : X × [0, 1] → Y

such that h(x, 0) = f (x) and h(x, 1) = g(x). Two spaces X and Y are said to be
homotopically equivalent if there are maps F : X → Y and G : Y → X such that

© Hindustan Book Agency 2022 293


M. R. Murty, A Second Course in Analysis, IMSc Lecture Notes in Mathematics,
https://doi.org/10.1007/978-981-16-7246-0_5
294 5 Introduction to Algebraic Topology

F ◦ G and G ◦ F are homotopically equivalent to the identity map. A space X is


said to be a contractible space if it is homotopy equivalent to a point. Contractible
spaces are the simplest to study, and the notion of homotopy creates a hierarchy of
topological spaces that can now be explored systematically.
Further advances were made in the 1950s. All along, several notable conjec-
tures motivated an accelerated development of the theory. The most famous of these
conjectures is the Poincaré conjecture predicting that any simply connected closed
manifold of dimension three is homeomorphic to the 3-sphere S 3 . The generalized
Poincaré conjecture is that any homotopy n-sphere (i.e., a topological space homo-
topically equivalent to the n-sphere) is homeomorphic to the n-sphere S n . In other
words, if π1 (X )  π1 (S n ), then X  S n . For n ≥ 5, this was settled by S. Smale
in 1961. In 1982, M. Freedman settled the case n = 4. But it was the case n = 3
that attained a celebrity status becoming a million dollar prize problem in the list of
unsolved problems highlighted by the Clay Mathematical Institute in 2000. Finally,
in 2003, G. Perelman solved the case n = 3. For this work, he was awarded the Fields
Medal, but he refused to accept it, stating that “I am not interested in money or fame.
I don’t want to be on display like an animal in a zoo.”
In 2010, the Clay Mathematics Institute decided to award its first million dollar
prize to Perelman for his work. But this too was turned down by him saying that
his work built on the work of many others and it was unfair to award it to a single
person. Though he was awarded many accolades, including professorships at top
universities, Perelman declined them all and was happy to work in his native Russia
at the Steklov Institute in Saint Petersburg. Truly, this is a superb example of a man
of renunciation, a mathematical ascetic.
Returning to our brief narrative, parallel developments occurred in the 1950s that
hastened the development of algebraic geometry. Indeed, shortly after he formulated
his famous conjectures regarding solutions of systems of equations over finite fields,
André Weil fantasized that a theory of algebraic topology should exist over finite
fields. If such a theory existed, then he indicated how many of his conjectures would
follow from a (mod p) version of the Lefschetz fixed point theorem. This was realized
in the magnificent edifice constructed by A. Grothendieck in the 1960s and 1970s.
It culminated in the work of Pierre Deligne who proved all of the Weil conjectures,
including the notorious Riemann hypothesis for zeta functions over finite fields.
Given the important role that algebraic topology plays in mathematics, as well as
its central place of interconnectedness (no pun intended), it seemed relevant that any
good course in analysis for graduate students should include this topic. This chapter
is intended as a “crash course” and is meant as an elementary introduction to this
fascinating subject. Hopefully, the student will be inspired to study more and with
this goal, ample references for further reading are provided.
5.2 Homotopic Paths 295

5.2 Homotopic Paths

Recall that a topological space is a pair (X, O) where X is a set and O is a collection
of subsets of X satisfying the following conditions:

(a) the empty set ∅ and X belong to O;


(b) A, B ∈ O implies A ∩ B ∈ O;
(c) Ai ∈ O with i belonging to some index set I implies ∪i∈I Ai ∈ O.

Elements of O are called open sets. The complement of an open set is called a closed
set.
If X and Y are topological spaces and we speak of maps from X to Y , we always
mean continuous maps. Given two topological spaces X and Y , a map f : X → Y
is called a homeomorphism if it is one to one, onto and f (U ) is open if and only
if U is open. If such a map exists, then X and Y are said to be homeomorphic. The
basic problem of topology is to classify topological spaces up to homeomorphism.
The real line R is an example of a topological space with open sets being disjoint
union of open intervals. More generally, a metric space X with metric d is a topologi-
cal space with open sets characterized by those subsets O of X with the property that
for every x ∈ O, there is an  > 0 such that the ball B(x, ) ⊂ O. Thus, Rn , Cn with
the usual metric are examples of topological spaces. More generally, n-manifolds
are topological spaces which are worthy of independent study. (Recall that X is said
to be an n-manifold ifX is a Hausdorff, second countable topological space and
every element x ∈ X belongs to an open set U which is homeomorphic to an open
set of Rn .)
Any subset S of a topological space X inherits a topology from X , called the
relative topology. Thus, the open sets of S are simply of the form S ∩ O for some
open set O of X . For instance, the n-sphere denoted S n and defined by

S n = {(x1 , . . . , xn+1 ) ∈ Rn+1 : x12 + · · · + xn+1


2
= 1}

is an example of a topological space which plays an important role in algebraic


topology. It is an example of an n-manifold. So also is the closed unit ball B n defined
by
B n = {(x1 , . . . , xn ) ∈ Rn : x12 + · · · + xn2 ≤ 1}.

Notice that S n is a subspace of Rn+1 while B n is a subspace of Rn .


To determine if two given topological spaces are homeomorphic is not always
easy. For instance, it is a non-trivial theorem that Rm and Rn are not homeomorphic
whenever m = n. The same is true of the n-spheres. In some sense, algebraic topology
is an approach to this classification problem through algebraic methods. It has been
highly successful in this and thus has played a central role in the development of
modern mathematics, not only in topology, but also in other branches of mathematics
such as number theory, algebraic geometry, and differential geometry.
296 5 Introduction to Algebraic Topology

The first notions of algebraic topology were introduced by Poincaré in a series of


papers from 1895 to 1905 where he attempts to understand the notion of “connec-
tivity” of a space. This is where we will begin.
Recall that a space X is said to be connected if it cannot be expressed non-trivially
as the disjoint union of two open sets. It is not hard to show that the continuous
image of a connected space is connected (Exercise 4). Therefore, connectedness
is preserved under homeomorphism. Thus, the 0-sphere S 0 consisting of {−1, 1}
which is disconnected is not homeomorphic to S n for n ≥ 1, for these latter spaces
are connected.
A stronger property than connectedness is path-connectedness. Given a topolog-
ical space X , with a, b ∈ X , a path from a to b is a continuous map α : I = [0, 1] →
X such that α(0) = a and α(1) = b. A space X is said to be path-connected if any
two points of X are connected by a path. If X is path-connected, it is also connected
(Exercise 5).
The property of path-connectedness lies at the heart of algebraic topology. One
of our goals in this chapter will be to prove:

Theorem 5.1 (Brouwer fixed point theorem, 1910) Any continuous map f : B n →
B n , has a fixed point.

For n = 1, this is straightforward. In this case, B 1 = [−1, 1] and we may suppose


f (−1) = −1, f (1) = 1, for otherwise, we are done. The path-connectedness of
[−1, 1] × [−1, 1] now implies the result since the graph of f must intersect the
diagonal at some point. Indeed, any continuous path joining (−1, f (−1)), which
lies above the diagonal, to (1, f (1)) , which lies below the diagonal, must cross the
diagonal. The intersection of the path with the diagonal is our fixed point.
This simple proof does not seem to extend to higher dimensions. Already for
n = 2, it is non-trivial and for n ≥ 3, it requires more advanced notions.
The fixed point theorem has many applications in mathematics such as to the
existence of solutions of systems of equations. Even in the simple case of n = 1, we
have an interesting application to the problem of “how to slice a pancake evenly,” or
“how to slice two pancakes evenly.”
In a series of remarkable papers, Poincaré introduced the notions of homotopy
and fundamental group and thus initiated the study of modern algebraic topology.
Given a topological space X , two paths α, β from a to b are said to be homotopic if
there is a continuous map F : I × I → X such that

F(s, 0) = α(s), F(s, 1) = β(s),

F(0, t) = a, F(1, t) = b, 0 ≤ s ≤ 1, 0 ≤ t ≤ 1.

The map F is called a homotopy of α into β. Intuitively, α is homotopic to β if we


can continuously deform α into β. It is clear that this is an equivalence relation on
the set of all paths from a to b. Thus we write α ∼ β to indicate that α and β are
homotopic. We already met this concept in our study of complex analysis.
5.2 Homotopic Paths 297

A subset X of Rn is said to be convex if for any x, y ∈ X , the line joining x and


y also lies in X .

Theorem 5.2 Let X be a convex subset of Rn , a, b ∈ X . Then any two paths from a
to b are homotopic.

Proof Let α and β be two paths from a to b. Define F(s, t) = (1 − t)α(s) + tβ(s).
F is easily seen to be a homotopy between α and β. 

Thus, for any convex subset, there is only one equivalence class of homotopic paths
from a to b.

Lemma 5.1 Let α be a path in X from a to b. Let ρ : I → I satisfy ρ(0) = 0, ρ(1) =


1. Then, α ◦ ρ is also a path in X from a to b which is homotopic to α.

Proof Define F(s, t) = α((1 − t)s + tρ(s)). Then, F(s, 0) = α(s) and F(s, 1) =
α ◦ ρ(s). As F is the composition of continuous functions, F is a homotopy between
α and α ◦ ρ. 

If α is a path from a to b and β is a path from b to c, then we can define the “product”
of α and β as a path from a to c by setting

α(2s) 0 ≤ s ≤ 1/2
(αβ)(s) =
β(2s − 1) 1/2 ≤ s ≤ 1.

Lemma 5.2 Let a, b ∈ X , with α0 , α1 paths from a to b, and β0 , β1 paths from b to


c. If α0 ∼ α1 and β0 ∼ β1 , then α0 β0 ∼ α1 β1 .

Proof Clear. 

This lemma allows us to define products of homotopy classes. Let [α] be the
homotopy class of α. If α is a path from a to b, β a path from b to c, we define

[α][β] = [αβ],

which is well-defined by the previous lemma.


Now consider three paths: α, β, γ from a to b, b to c c to d, respectively. Then
(αβ)γ and α(βγ) are both defined. More precisely


⎨α(4s) 0 ≤ s ≤ 1/4
((αβ)γ)(s) = β(4s − 1) 1/4 ≤ s ≤ 1/2


γ(2s − 1) 1/2 ≤ s ≤ 1

and
298 5 Introduction to Algebraic Topology


⎨α(2s) 0 ≤ s ≤ 1/2
(α(βγ))(s) = β(4s − 2) 1/2 ≤ s ≤ 3/4


γ(4s − 3) 3/4 ≤ s ≤ 1

so we see that it is easy to give examples of α, β, γ such that (αβ)γ = α(βγ) (see
Exercise 6). Thus the product of paths (when defined) is not necessarily associative.
However, we can prove that they are homotopically equivalent.
Lemma 5.3
(αβ)γ ∼ α(βγ).

Proof We see from the above formulas

(α(βγ))(s) = ((αβ)γ)(ρ(s))

where ⎧

⎨s/2 0 ≤ s ≤ 1/2
ρ(s) = s − 1/4 1/2 ≤ s ≤ 3/4


2s − 1 3/4 ≤ s ≤ 1.

By Lemma 5.2, we are done. 


The constant path at b will be simply denoted b.
Lemma 5.4 If α is a path from a to b,

aα ∼ α ∼ αb.

That is, [aα] = [α] = [αb] and [a][α] = [α][b].


Proof We have 
a 0 ≤ s ≤ 1/2
(aα)(s) =
α(2s − 1) 1/2 ≤ s ≤ 1.

If we set 
0 0 ≤ s ≤ 1/2
ρ(s) =
2s − 1 1/2 ≤ s ≤ 1

then aα = α ◦ ρ so that by Lemma 5.2, aα ∼ α and similarly α ∼ αb.


If α is a path from a to b, we define the inverse path α−1 from b to a by

α−1 (s) = α(1 − s).

We insert a word of caution to the reader that α−1 is not to be confused with the
inverse function of α, which may not be defined.
5.2 Homotopic Paths 299

Lemma 5.5 If α0 and α1 are paths from a to b with α0 ∼ α1 , then α0−1 ∼ α1−1 .

Proof If F is a homotopy between α0 and α1 , define G(s, t) = F(1 − s, t) which


is a homotopy between α0−1 and α1−1 . 

Lemma 5.5 allows us to define the inverse homotopy class by setting

[α]−1 := [α−1 ].

Lemma 5.6 If α is a path from a to b, then αα−1 is homotopic to the constant path
at a. That is, [αα−1 ] = [a].

Proof Define ⎧

⎨α(2s) 0 ≤ s ≤ t/2
F(s, t) = α(t) t/2 ≤ s ≤ 1 − t/2


α(2 − 2s) 1 − t/2 ≤ s ≤ 1.

Then F(s, 0) = α(0) = a and F(s, 1) = (αα−1 )(s) so that F is the required homo-
topy. 

A path α is called a loop based at a if α(0) = α(1) = a. The above discussion


enables us to conclude that the set of all homotopy classes of loops based at a can
be given a group structure. This is what we shall take up in the next section.

Exercises

1. Recall that a basis B for a topological space X consists of open sets such that
every open set of X is a union of elements of B. Prove that the collection of open
intervals in R is a basis for R with the usual Euclidean topology.
2. Let B be a non-empty collection of subsets of a set X . If B is closed under finite
intersections and every element of X belongs to some element of B, show that B
is a basis for some topology on X .
3. Consider Z with the arithmetic progression topology where a basis of open
sets is given by {an + b : n ∈ Z} where a, b range over all integers satisfying
(a, b) = 1. Prove that for each prime p, the set pZ is closed. Deduce that there
are infinitely many primes. (This topology was introduced by H. Furstenberg.)
[Hint: if there are only finitely many primes p, the union ∪ p ( pZ) is closed. The
complement consists of {−1, 1} and is not open.]
4. Prove that the continuous image of a connected space is connected.
5. If X is path-connected, show that it is connected.
6. Give an example to show that the product of paths, when defined is not necessarily
associative.
7. Show that R and Rn for n ≥ 2 are not homeomorphic.
300 5 Introduction to Algebraic Topology

8. Let X be a topological space and U a connected subset. If U is dense in X , show


that X is connected.
9. Consider the topologist’s sine curve given by the following set of points in R2 :
{(x, sin 1/x) : x > 0} ∪ {(0, y) : −1 ≤ y ≤ 1}. Show that X is connected but
not path-connected.
10. Let f : S 1 → R be a continuous map. Show that there is a point x0 such that
f (x0 ) = f (−x0 ). [Hint: consider h(x) = f (x) − f (−x) and the map H : B 1 →
R given by H (x) = h(eπi x ).]

5.3 The Fundamental Group

Let X be a topological space, b ∈ X a base point. A path α : I → X is called a loop


at b if α(0) = α(1) = b. The set of all loops based at b (modulo homotopy) forms
a group, by the results of the previous section. We call this group the fundamental
group based at b and denote it by the notation π(X, b). By Theorem 5.2, we deduce:
Theorem 5.3 If X is a convex subset of Rn , b ∈ X , then π(X, b) is trivial.
How does the fundamental group depend upon the base point? The next result
answers this question.
Theorem 5.4 Let b, c ∈ X and let α be a path from b to c. Given a homotopy class
γ, based at c, the function
α∗ (γ) := αγα−1

is well-defined and gives an isomorphism from π(X, c) to π(X, b).


Proof Since the initial and terminal points match, α∗ is well-defined. (Recall that
α−1 is a path from c to b.) Suppose β, γ are loops at c. Then

[β][γ] = [β][c][γ] = [β][α]−1 [α][γ].

Consequently,

α∗ ([β][γ]) = [α][β][γ][α]−1 = [α][β][α]−1 [α][γ][α]−1 = α∗ (β)α∗ (γ).

Thus, α∗ is a homomorphism. If α∗ (β) = α∗ (γ), then

β ∼ α−1 αβα−1 α = α−1 αγα−1 α ∼ γ,

so that α∗ is one to one. α∗ is also onto because if λ is a loop based at b, then


β = (α−1 λ)α is a loop based at c that satisfies α∗ (β) = αα−1 λαα−1 = λ. 
Therefore, if X is path-connected, the various groups π(X, b) are all isomorphic.
Thus, we can speak of the fundamental group of X for path-connected spaces, with-
out specifying the base point. For such spaces, we denote by π(X ) this fundamental
5.3 The Fundamental Group 301

group determined up to isomorphism. The map X → π(X ) is an example of a func-


tor from the set of path-connected topological spaces to the category of groups. We
do not go into the detailed definition here but refer the reader to the book by Fulton
given in the section on suggestions for further reading. Suffice it to say now that this
functor moves the topological study of X into the realm of group theoretic study of
π(X ). The role of functors in general is similar.
A path-connected space X is called simply connected if π(X ) is trivial. Thus,
any convex subset of Rn is simply connected.

Exercises

1. Show that the punctured complex plane C \ {0} is path-connected but not simply
connected.

5.4 Examples of Some Fundamental Groups

1. A subset W of Rn is called star-shaped with respect to a point w ∈ W if, whenever


y ∈ W , the straight line segment from w to y is contained in W . Thus, W is simply
connected.
2. By stereographic projection, we can show that S 2 \{0} is homeomorphic to R2 .
Thus, π(S 2 \{0}) is trivial. The same logic can be applied to show that π(S n \{0})
is trivial for n ≥ 2. (See Exercise 2)
3. It is not hard to see that π(X × Y )  π(X ) × π(Y ). Thus, X × Y is simply con-
nected whenever X and Y are. (See Exercise 3)
If f is a map between two pointed topological spaces, then f induces a map
between their fundamental groups. More precisely, let f : (X, b) → (Y, c) be a
map of topological spaces with f (b) = c. We will define a homomorphism between
π(X, b) and π(Y, c).

Lemma 5.7 Let X and Y be topological spaces and let f : X → Y be a continuous


map. Let f 0 and f 1 be paths in X which are homotopic with endpoints fixed. Then
f ◦ f 0 and f ◦ f 1 are paths in Y that are homotopic with endpoints fixed.
Proof If { f t }0≤t≤1 is a homotopy of f 0 and f 1 , then the paths { f ◦ f t }0≤t≤1 form a
homotopy between f ◦ f 0 and f ◦ f 1 . 
Suppose now that b0 , b1 ∈ X and that f is a map from X to Y . If α is a path in X
from b0 to b1 , then f ∗ ([α]) is defined to be the homotopy class of paths in Y from
f (b0 ) to f (b1 ) that includes f ◦ α:

f ∗ ([α]) = [ f ◦ α].
302 5 Introduction to Algebraic Topology

By Lemma 5.7, this definition does not depend on the choice of the representative
in the homotopy class [α]. If the product path αβ is defined, then f ◦ (αβ)=(f ◦ α)
( f ◦ β), so that
f ∗ ([α][β]) = f ∗ ([α]) f ∗ ([β]).

Since the inverse path of f ◦ α is f ◦ α−1 we have

f ∗ ([α]−1 ) = f ∗ ([α])−1 .

Finally, if b is a constant path in X , then

f ∗ ([b]) = [ f (b)].

We can now prove:


Theorem 5.5 Let (X, b), (Y, c) be pointed topological spaces and let f : (X, b) →
(Y, c) be a map. Then, f ∗ is a homomorphism of π(X, b) and π(Y, c). If furthermore
g : (W, d) → (X, b) is a map, then ( f ◦ g)∗ = f ∗ ◦ g∗ . If X = Y , b = c and f is the
identity map, then f ∗ is the identity morphism.
Proof That f ∗ is a homomorphism follows from Lemma 5.7. The others follow
directly from the definitions, and we leave them to the reader as an exercise.
Corollary 5.1 If f : X → Y is a homeomorphism, b ∈ X and c = f (b), then
π(X, b) is isomorphic to π(Y, c).
Proof Since f ◦ f −1 and f −1 ◦ f are the identity maps, the induced maps are iden-
tity isomorphisms. 
There is one important application of the notion of induced homomorphisms
to the computation of the fundamental groups of topological groups. Recall that a
topological group G is a group G endowed with a topology such that the map

m :G×G →G

given by
m(g, h) = gh −1

is continuous. For instance, R with the usual topology, (or more generally, Rn ) is
an example of a topological group. An important example is given by S 1 , where we
identify it with the group of complex numbers of absolute value 1.
As indicated in example 3 above, the fundamental group of the product space
X × Y consists of pairs of loops (α, β) with α ∈ π(X ) and β ∈ π(Y ). In the case
of a topological group, we will consider loops based at the identity. Since m : G ×
G → G, the induced map

m ∗ : π(G) × π(G) → π(G)


5.4 Examples of Some Fundamental Groups 303

is given by
[(α, β)] → [m(α, β)] = [α · β]

where the product on the right is multiplication in the group. That is α · β indicates
the path α(s)β(s) for 0 ≤ s ≤ 1. On the one hand, m ∗ is a homomorphism, so that

[(α, β)] = [(α, 1)(1, β)] → [α][β].

On the other hand,


[(α, β)] = [(1, β)(α, 1)] → [β][α].

We deduce:

Theorem 5.6 If G is a topological group, and π(G) consists of loops based at the
identity, then π(G) is abelian.

The computation of fundamental groups is often not easy. The usual technique is
to look for subspaces about which we know the structure of their fundamental group
and then try “to paste” together this information. Here is the simplest case of this
occurrence.

Theorem 5.7 Suppose X is the union of two open sets U and V which are simply
connected. Suppose further that U ∩ V is non-empty and path-connected. Then X
is simply connected.

The idea of the proof is to write any loop of X as a product of loops contained in U
or V . The reader will be easily convinced by drawing a picture to see what is going
on. The formal proof will involve the following lemma.

Lemma 5.8 (Lebesgue’s lemma) Let X be a compact metric space and F an open
cover. Then, there is a δ > 0 such that any subset of X of diameter < δ is contained
in some member of F.

Proof If the lemma is false, we can find a sequence of subsets A1 , A2 , . . . none of


which is contained in a member of F and whose diameters tend to zero. For each
n, choose xn ∈ An . As X is compact, the sequence of xn ’s must have a limit point p
(say). Let U be an element of F containing p. Choose  > 0 such that B( p, ) ⊆ U .
Choose N so that diam(A N )< /2 and x N ∈ B( p, /2). Then d(x N , p) < /2 and
d(x, x N ) < /2 for all x ∈ A N . Hence, d(x, p) <  for all x ∈ A N . Thus, A N ⊆ U ,
a contradiction. 

Proof of Theorem 5.7 We show that any loop in X is homotopic to a product of


loops each of which is contained in U or V . This is enough to prove the theorem
since U and V are simply connected. Choose p ∈ U ∩ V . Let f : I → X be a loop
based at p. By Lebesgue’s lemma, we can find N such that f ([( j − 1)/N , j/N ]) is
always contained in U or V for 1 ≤ j ≤ N . If we write f j for the path
304 5 Introduction to Algebraic Topology

s → f ((s + j − 1)/N ), 0 ≤ s ≤ 1,

we can join p to f ( j/N ), 1 ≤ j ≤ N − 1 by a path γ j which lies in U if f ( j/N )


is in U , or which lies in V if f ( j/N ) lies in V . If f ( j/N ) lies in U ∩ V , we can
ensure γ j to lie in U ∩ V since U ∩ V is path-connected. The loop f can thus be
factored as
( f 1 γ1−1 )(γ1 f 2 γ2−1 )(γ2 f 3 γ3−1 ) · · · (γ N f N )

each member of which is a loop contained in U or V . This completes the proof. 

We can apply this theorem to compute the fundamental groups of S n for n ≥ 2.


Since S n is the union of the two hemispheres, each of which is simply connected,
and whose intersection is S n−1 which is path-connected, we deduce immediately that
π(S n ) = 1 for n ≥ 2. Notice that this proof does not work for S 1 for in that case,
when we write S 1 as the union of the upper and lower semi-circles, their intersection
is {−1, 1} which is not path-connected. By Theorem 5.6, we know that π(S 1 ) is
abelian. We will see in the next two sections that π(S 1 )  Z.
A subset A of X is called a retract of X if there exists a continuous map r : X → A
(called a retraction) such that r (a) = a for all a ∈ A. For example, S 1 is a retraction
of R2 \{0}. The retraction is simply the point of intersection with S 1 obtained by
joining the origin to x ∈ R2 \{0}. Clearly r is continuous and r (x) = x for all x ∈ S 1 .
If r : X → A is a retraction and i : A → X is the inclusion map, we get the
induced maps
i ∗ : π(A, a) → π(X, a)

r∗ : π(X, a) → π(A, a)

for any a ∈ A. Because r ◦ i = 1, r∗ ◦ i ∗ = 1. We conclude i ∗ is a monomorphism,


r∗ is an epimorphism. One can show that if i ∗ π(A) is a normal subgroup of π(X ),
then π(X ) is isomorphic to the direct product of i ∗ π(A) and the kernel of r∗ .
However, if we impose a further restriction on r , we can show that π(A, a) and
π(X, a) are isomorphic for a ∈ A. A subset A of X is a deformation retract of X
if there is a retraction r : X → A and a homotopy F : X × I → X such that

F(x, 0) = x, ∀x ∈ X
F(x, 1) = r (x)
F(a, t) = a ∀ a ∈ A, t ∈ I.

Thus, r can be deformed to the constant map keeping A fixed throughout.

Theorem 5.8 If A is a deformation retract of X , the inclusion map i : A → X


induces an isomorphism of π(A, a) onto π(X, a) fo all a ∈ A.

Proof r∗ ◦ i ∗ is the identity map of π(A, a). Since i ◦ r is homotopic to the identity
map, i ∗ ◦ r∗ is the identity map. 
5.4 Examples of Some Fundamental Groups 305

We shall use this theorem in two ways. First, we use it to determine when two spaces
have isomorphic fundamental groups. Second, we use it to show that a subspace is
not a deformation retract by proving that certain retracts are not deformation retracts.
A space X is called contractible to a point if there is an x0 ∈ X such that x0 is a
deformation retract of X .
Any convex subset X of Rn is contractible to a point. To see this, let x0 be X and
define f : X × I → X by

f (x, t) = (1 − t)x + t x0 .

Clearly, f is continuous and f (x, 0) = x. As f (x, 1) = x0 , x0 is a deformation


retract of X .

Theorem 5.9 If X is contractible to a point, then X is simply connected.

Example. S n−1 is a deformation retract of B n \{0}. Indeed, consider the map f :


X × I → X where X = B n \{0} given by
x
f (x, t) = (1 − t)x + t .
|x|

Thus, they have the same fundamental groups.

Exercises

1. Show that a star-shaped subset of Rn is contractible to a point.


2. Find an explicit homeomorphism between S 2 \(0, 0, 1) and R2 given by stere-
ographic projection, namely the intersection with the plane of the line joining
x ∈ S 2 with (0, 0, 1).
3. Let (X, x0 ) and (Y, y0 ) be pointed topological spaces. Show that

π(X × Y, (x0 , y0 ))

is isomorphic to the direct product π(X, x0 ) × π(Y, y0 ).


4. Show that the fundamental group of Rn \{0} is trivial for n ≥ 3.
5. A pointed space (X, x0 ) is called an H-space (after H. Hopf) if there exists a
pointed map
m : (X × X, (x0 , x0 )) → (X, x0 )

such that each of the (necessarily pointed) maps m(x0 , ·) and m(·, x0 ) on (X, x0 )
are homotopic to the identity loop based at x0 . Show that π(X, x0 ) is abelian.
6. Let Δn denote the n-simplex defined by

{(x0 , x1 , . . . , xn ) : x0 + x1 + · · · + xn = 1, xi ≥ 0}.
306 5 Introduction to Algebraic Topology

This is a subset of Rn+1 with the induced topology. For instance, a 0-simplex is
a point, a 1-simplex is homeomorphic to the interval [0, 1] and so on. Show that
π(Δn ) is trivial.

5.5 Covering Spaces

Let E and X be topological spaces and p : E → X a map. An open subset U of X


is said to be evenly covered by p if the inverse image p −1 (U ) is a union of disjoint
open sets of E, each of which is homeomorphically mapped onto U by p. We call p
a covering map if p : E → X is continuous, surjective and every point of X has an
open neighborhood which is evenly covered by p. In such a case, we say that E is a
covering space of X .
A prototypical example to keep in mind is the map p : R → S 1 given by the
exponential function: x → e2πi x , where we are viewing S 1 as a subset of C consisting
of numbers of absolute value 1. If e2πi x0 ∈ S 1 and 0 <  < 1/2, let U = {e2πi x :
|x − x0 | < }. Then, p −1 (U ) is the disjoint union of intervals (m + x0 − , m +
x0 + ), m ∈ Z, and each of these intervals is mapped homeomorphically onto U
by p.
If p : E → X is a covering map of topological spaces, we call the set p −1 (x) the
fiber over x. Thus, each x ∈ X has an open neighborhood U such that p −1 (U ) is
homeomorphic to p −1 (x) × U . The subsets of p −1 (U ) mapped homeomorphically
onto U are called sheets of p −1 (U ).
Now let p : E → X be a covering map. Suppose f : Y → X is a map of topolog-
ical spaces. We want to consider the lifting problem: when is there a map g : Y → E
such that f = p ◦ g. In other words, when is there a map g making the following
diagram commute:
E
g
p

Y X
f
Such a map g is called a lift of f . We begin by showing that if Y is a connected
space, such a lift, if it exists, is essentially unique.

Lemma 5.9 Let p : E → X be a covering map, and let Y be a connected Hausdorff


topological space. Let f : Y → X be a map, and let g, h : Y → E be two lifts of f .
If g(y) = h(y) for some y ∈ Y , then g = h.

Proof Let
S = {y ∈ Y : g(y) = h(y)},

T = {y ∈ Y : g(y) = h(y)}.
5.5 Covering Spaces 307

Clearly Y = S ∪ T . If we can show that S is both open and closed, then by the
connectedness of Y , we get S = Y because, by hypothesis, S is non-empty.
Let y ∈ Y and U an open neighborhood of f (y) that is evenly covered by p. Let
V and W be sheets of p −1 (U ) such that g(y) ∈ V , h(y) ∈ W . If g(y) = h(y), then
V = W since p maps homeomorphically onto U . If g(y) = h(y), then V and W are
disjoint for the same reason.
Since g and h are continuous at y, there is an open neighborhood N of y such that
g(N ) ⊂ V , h(N ) ⊂ W . For y ∈ T , V ∩ W = φ, so that g(z) = h(z) for z ∈ N and
hence N ⊂ T . Thus, T is open. On the other hand, if y ∈ S, then V = W . Moreover,
for each z ∈ N , g(z) must be the unique point v ∈ V such that p(v) = f (z) and
h(z) must be the unique point w ∈ W such that p(w) = f (z). Thus, g = h on N and
N ⊂ S. Hence S is open. This completes the proof. 

We now address the problem of lifting of paths. For this purpose, we will use
Lebesgue’s lemma (Lemma 5.8).

Theorem 5.10 (Path lifting theorem) Let p : E → X be a covering map. Let γ :


I → X be a path and let e0 ∈ E satisfy p(e0 ) = γ(0). Then, there exists a unique
path α : I → E such that α(0) = e0 and p ◦ α = γ.

Proof For each x ∈ X , choose an open neighborhood Ux of x that is evenly covered


by p. The sets γ −1 (Ux ), x ∈ X , form an open cover of the compact space [0, 1].
Thus, we can find a finite subcover. By Lebesgue’s lemma, we can find 0 = s0 <
s1 < · · · < sm = 1 so that we have evenly covered open sets U1 , . . . , Um satisfying

[s j−1 , s j ] ⊂ γ −1 (U j ), for 1 ≤ j ≤ m.

Thus, γ([s j−1 , s j ]) ⊂ U j for 1 ≤ j ≤ m. The lifting of α is performed in m steps.


Since p(e0 ) = γ(0) ∈ U1 , there is an open neighborhood V1 of e0 mapped homeo-
morphically onto U1 by p. Define α on [0, s1 ] such that α(s) is the unique point of
V1 covering γ(s). That is,

α = ( p|V1 )−1 ◦ γ on [0, s1 ].

Then, α(0) = e0 and p ◦ α = γ on [0, s1 ]. Now perform the same procedure with
e0 = α(0) replaced by α(s1 ) and U1 replaced by U2 , to extend α to the interval
[s1 , s2 ]. After m steps, we have lifted the entire path α. The uniqueness follows from
Lemma 5.9. 

We can actually lift homotopies. That is, if we have a family of paths depending con-
tinuously on a parameter, the lifted paths also depend continuously on the parameter.

Theorem 5.11 (Homotopy lifting theorem) Let p : E → X be a covering map, and


let F : I × I → X be a map. Let e0 ∈ E satisfy p(e0 ) = F(0, 0). Then, there exists
a unique lift G of F such that G : I × I → E and G(0, 0) = e0 and p ◦ G = F.
308 5 Introduction to Algebraic Topology

Proof The uniqueness follows from Lemma 5.9 since I × I is connected. According
to the path lifting theorem, there is a unique path t → et , 0 ≤ t ≤ 1 in E such that
p(et ) = F(0, t), 0 ≤ t ≤ 1. By the same theorem, there exists for each t, a unique
path s → G(s, t), 0 ≤ s ≤ 1, such that G(0, t) = et and ( p ◦ G)(s, t) = F(s, t),
0 ≤ s ≤ 1. This defines the lift G of F. We must show it is continuous.
Let γ : [0, 1] → X be the path defined by

γ(s) = F(s, 0), 0 ≤ s ≤ 1.

As in the proof of Theorem 5.10, we can find open sets U1 , . . . , Um and 0 = s0 <
s1 < · · · < sm = 1 and  > 0 such that

F([s j−1 , s j ] × [0, ]) ⊂ U j , 1 ≤ j ≤ m.

Since e0 ∈ V1 , we can assume also that the initial points et belong to V1 , for 0 ≤ t ≤ ,
since they depend continuously on t. As before, we can define G in m steps. The
first step is to define

G = ( p|V1 )−1 ◦ F on [0, s1 ] × [0, ].

In particular, G is continuous on [0, s1 ] × [0, ]. Proceeding inductively, we find


G is continuous on [s j−1 , s j ] × [0, ], for 1 ≤ j ≤ m so that G is continuous on
[0, 1] × [0, ]. The same proof shows that for each t0 satisfying 0 < t0 ≤ 1, there
exists an  > 0 such that G is continuous on [0, 1] × [t0 − , t0 + ] (replacing t0 + 
by t0 if t0 = 1). Consequently, G is continuous on [0, 1] × [0, 1]. 

Now let (E, e) and (X, b) be pointed spaces and let p : (E, e) → (X, b) be a covering
map. That is, p : E → X is a covering map with p(e) = b. Let γ : I → X be a loop
based at b. By the path lifting theorem, there is a unique lift α : I → E such that
α(0) = e. The lift need not be a loop. However, the terminal point α(1) satisfies
p(α(1)) = γ(1) = b, so that α(1) lies in the fiber p −1 (b) over b.
Suppose now that γ1 is another loop in X based at b such that γ1 ∼ γ. Let {γt :
0 ≤ t ≤ 1} be the homotopy so that γ0 = γ. By the homotopy lifting theorem applied
to F(s, t) = γt (s), we obtain a mapping G : I × I → E such that G(0, 0) = e and

p(G(s, t)) = γt (s), 0 ≤ s, t ≤ 1.

Then the path αt in E defined by αt (s) = G(s, t), 0 ≤ t ≤ 1, is a lift of γt and α0 = α.


We claim that the αt ’s all start at e. To see this, observe that the map t → G(0, t)
is the unique lift to E starting at e of the constant path at b. Hence the lift coincides
with the constant path at e and e = G(0, t) = αt (0), 0 ≤ t ≤ 1. Similarly, the map
t → G(1, t) is the unique lift to E starting at G(1, 0) = α(1) of the constant path at
b. Hence α(1) = G(1, t) = αt (1) for 0 ≤ t ≤ 1 so that all the paths αt terminate at
α(1). In particular, the lift αt of γ1 of E starts at e and terminates at α(1). Therefore,
the terminal point α(1) is the same for all loops in the same homotopy class of γ.
5.5 Covering Spaces 309

Corollary 5.2 (Monodromy theorem) Let p : (E, e) → (X, b) be a covering map


of pointed spaces. If γ0 and γ1 are two homotopic loops based at b and α0 , α1 are
the liftings of γ0 and γ1 , respectively, with α0 (0) = α1 (0) = e, then α0 (1) = α1 (1).
Moreover, α0 and α1 are homotopic paths in E.
This allows us to define a function

Φ : π(X, b) → p −1 (b)

so that Φ([γ]) is the terminal point of the lift of γ that starts at e.


Theorem 5.12 Let p : (E, e) → (X, b) be a covering map and suppose that E is
simply connected. Then Φ is a one-to-one correspondence of π(X, b) and the fiber
of p −1 (b).
Proof Suppose that y ∈ p −1 (b). Let α be a path in E from e to y and set p ◦ α = γ.
Then, α is the lift of γ to E that starts at e so that Φ([γ]) = α(1) = y. Hence
the function is onto. Suppose that γ0 and γ1 are loops in X based at b such that
Φ([γ0 ]) = Φ([γ1 ]). Let α0 , α1 be lifts of γ0 and γ1 respectively that starts at e. Then
α0 , α1 have the same terminal point, so that α0 α1−1 in E is a loop based at e. Since
E is simply connected, there is a homotopy

F:I×I →E

of α0 α1−1 to the point e. Then, p ◦ F : I × I → X is a homotopy of the loop γ0 γ1−1


to the point b. Hence [γ0 ][γ1 ]−1 = [b] so that [γ0 ] = [γ1 ] and Φ is one to one. 
Let us now consider the covering map p : (R, 0) → (S 1 , 1) given by

p(t) = e2πit , t ∈ R.

Since p −1 (1) coincides with Z, we deduce from Theorem 5.12 that the elements of
π(S 1 , 1) are in one-to-one correspondence with the integers. (In the next section, we
shall see that π(S 1 , 1)  Z.)

Exercises

1. Show that p : C → C∗ given by p(z) = e z , z ∈ C is a covering map.


2. Prove that the elements of π(C∗ ) are in one-to-one correspondence with the inte-
gers.
3. Show that the map p : R → S 1 given by

p(t) = (cos t, sin t)

for t ∈ R is a covering map.


310 5 Introduction to Algebraic Topology

4. Is the map f : (0, 3) → S 1 given by f (x) = e2πi x a covering map?


5. If X̃ is a covering space of X and Ỹ is a covering space of Y , show that X̃ × Ỹ is
a covering space of X × Y .
6. Show that the map from C∗ to S 1 × R given by z → (z/|z|, |z| − 1) if |z| ≥ 1
and (z/|z|, (|z| − 1)/|z|) for 0 < |z| ≤ 1. Deduce that C∗ and S 1 have the same
fundamental group.

5.6 Applications

We will apply the theory of covering spaces to compute π(S 1 , 1). We view S 1 as
the subset of C whose absolute value is 1. Recall that the map p : R → S 1 given
by p(t) = e2πit is a covering map. Given any loop γ of S 1 based at 1, we can lift it
uniquely to a path α : I → R with α(0) = 0, by the path lifting theorem. This allows
us to define the degree of the loop γ based at 1 by setting

deg γ = α(1),

which is necessarily an integer. In particular, each of the loops

γn (x) = e2πinx

has degree n. Moreover, none of these loops are homotopically equivalent, for other-
wise (by the monodromy theorem) their lifts would have the same endpoint, which
is not the case. In addition, if γ is any loop of S 1 , let n = deg γ and consider γγn−1
whose lift is homotopic to the constant loop at zero. Thus, γ ∼ γn . This proves:

Theorem 5.13
π(S 1 , 1)  Z.

This gives us the first example of a topological space with a non-trivial fundamental
group. For any map f : S 1 → S 1 , we have the induced map

f ∗ : π(S 1 , 1) → π(S 1 , f (1)).

If α is a map sending f (1) to 1, we have, by Theorem 2.2, another map


α∗
π(S 1 , f (1)) → π(S 1 , 1).

Thus, γ1 is sent to an integer multiple of γ1 under the composition of these maps.


We designate this multiple by deg f . This observation allows us to give a topological
proof of the fundamental theorem of algebra:

Theorem 5.14 Any polynomial f (z) ∈ C[z] has a root in C.


5.6 Applications 311

Proof Without loss of generality, we can suppose f is monic: f (z) = z n + cn−1


z n−1 + · · · + c1 z + c0 . Assume f (z) = 0 for all z ∈ C. Define a map f˜ : S 1 → S 1
by
f (z)
f˜(z) = ,
| f (z)|

which is well-defined as the denominator does not vanish. We proceed to calculate


deg f˜. Consider the map
F : S1 × I → S1

given by
F(x, t) = f (t x)/| f (t x)|.

Thus, f˜ is homotopic to the constant map. Since degree is preserved under homotopy,
f˜ has degree zero. On the other hand, the homogenized polynomial

G(x, t) = t n f (x/t) = x n + cn−1 t x n−1 + · · · + c1 t n−1 x + c0 t n ,

can be used to give a homotopy between f˜ and z n by

H (x, t) = G(x, t)/|G(x, t)|.

Since z n has degree n we deduce that f˜ has degree n, which is a contradiction. Thus,
for some z ∈ C, f (z) = 0. 

Another application of the determination of π(S 1 ) is to prove the case n = 2 of the


Brouwer fixed point theorem. We need:

Lemma 5.10 There is no continuous map r : B 2 → S 1 such that r ◦ i = 1, where


i is the inclusion map.

Proof If there is such a map, we have the composite homomorphism

i∗ r∗
π(S 1 , 1) → π(B 2 , 1) → π(S 1 , 1)

equal to the identity. But π(B 2 , 1) is trivial, so that r∗ is trivial and the composition
cannot be the identity. 

Theorem 5.15 (Brouwer fixed point theorem for n = 2) Any continuous map f :
B 2 → B 2 has a fixed point.

Proof Suppose that f (x) = x for all x ∈ B 2 . Define r (x) ∈ S 1 to be the intersection
with S 1 of the ray that starts at f (x) and passes through x. Certainly r (x) = x for
x ∈ S 1 . By writing an equation for r in terms of f (see Exercise 1), we see that r is
continuous. But this contradicts Lemma 5.10. 
312 5 Introduction to Algebraic Topology

One can prove the higher dimensional version of Brouwer’s theorem using higher
homotopy groups. Given a pointed topological space (X, b) we may consider the
loop space (Ω X, b̃) of all loops based at b. By abuse of notation, we will denote
this space by (Ω X, b). One can endow this space with the compact-open topology:
namely the sets
(K ; ∅) = { f ∈ Ω X : f (K ) ⊂ O}, K ⊂ I

with K a compact subset of I , and O an open set in X , are used as a subbase for a
topology on Ω X . If X were a metric space with metric d, this topology agrees with
the topology induced by the metric d ∗ on Ω X given by

d ∗ ( f, g) = sup d( f (t), g(t)).


t∈I

We can now define the higher homotopy groups recursively as follows: π1 (X, b) :=
π(X, b), π2 (X, b) := π1 (Ω X, b) and generally

πk (X, b) = πk−1 (Ω X, b).

We say two spaces are homotopically equivalent if there exist maps f : X → Y


and g : Y → X such that f ◦ g is homotopic to the identity on Y , and g ◦ f is
homotopic to the identity on X . We leave as an exercise (see Exercise 3) to show
that if X and Y are homotopically equivalent, then πk (X )  πk (Y ) for every k ≥ 1.
In particular, if X is contractible to a point x0 , then X is homotopically equivalent
to {x0 }. In such a case, we conclude that πk (X ) is trivial for k ≥ 1. Thus, if X is a
convex subset of Rn , we conclude that πk (X ) is trivial. Hence, πk (B n ) is trivial for
every k ≥ 1.
On the other hand, one can show that πk (S n ) is trivial for k < n and πn (S n )  Z.
This fact allows us to give a proof of the Brouwer fixed point theorem. Indeed, if
f : B n → B n did not have a fixed point, we can join x to f (x) by a straight line
which when extended intersects the boundary S n−1 at a point r (x) (say). Note that
r (x) = x for all x ∈ S n−1 . We leave as an exercise (see Exercise 4) to show that r is
a continuous map. This gives us a retraction:

r : B n → S n−1 .

Together with the inclusion map, we obtain

i r
S n−1 → B n → S n−1 .

Applying the πn−1 functor, we get

i∗ r∗
Z → 0 → Z.
5.6 Applications 313

Since r ◦ i = 1, we must have r∗ ◦ i ∗ = 1 which is a contradiction. This completes


our sketch of proof of the Brouwer fixed point theorem in the general case.
Not much is known about πk (S n ) for k > n. Serre showed (in his doctoral thesis)
that πk (S n ) is a finite group unless n is even and k = 2n − 1, in which case we have

π2n−1 (S n )  Z ⊕ (finite group).

We give several applications of the Brouwer fixed point theorem. The first is to
prove:

Theorem 5.16 (Brouwer, 1911)


Rm  Rn

if and only if m = n.

Proof If Rn  Rm , then
Rn \{0}  Rm \{0}.

By stereographic projection, we know that S n−1  Rn \{0} so that we deduce that


S n  S m . By applying the πn functor, we deduce that n = m. 

This theorem (called the theorem of the invariance of domain), though not a direct
application of the fixed point theorem, has the same essential idea in its proof as the
fixed point theorem. Its significance is the corollary that the notion of dimension is
well-defined. That is, an n-manifold cannot be at the same time be an m-manifold
unless m = n.
Our next application is to linear algebra which plays a fundamental role in the
Google PageRank algorithm.

Theorem 5.17 (Perron) If A = (ai j ) is an n × n matrix with ai j > 0, then A has


a positive eigenvalue λ. Moreover, there is a corresponding eigenvector v with all
coordinates positive such that Av = λv.

Proof For any vector x ∈ Rn define σ(x) to be the sum of its coordinates. Let Δn−1
denote the (n − 1)-simplex. Now consider the map f : Δn−1 → Δn−1 given by

Ax
f (x) = .
σ(Ax)

It is not difficult (see Exercise 6) to see that this is indeed a map into Δn−1 . As Δn−1
is homeomorphic to B n , we can apply the fixed point theorem to deduce the result.


The computation of the fundamental group of S 1 allows us to deduce the fundamental


groups of other topological spaces. For example, it is immediate that the fundamental
group of the torus T = S 1 × S 1 is Z × Z.
314 5 Introduction to Algebraic Topology

A theorem of Seifert and van Kampen allows us to write down more fundamental
groups from the fact that π(S 1 )  Z. This theorem is a vast generalization of Theorem
5.7, though essentially based on the same ideas. To state it precisely, we recall the
definition of free products of groups. Given any collection of groups G i for i ∈ I ,
an index set, and a fixed group G with homomorphisms φi : G i → G, we say that a
group G is the free product of the groups G i if and only if the following condition
holds: if H is any other group and we have homomorphisms ψi : G i → H , there
exists a unique homomorphism f : G → H making the following diagram commute.
G
φi
f

Gi H
ψi
One can show that given any collection of groups, their free product exists. Intu-
itively, the free product is to be thought of as a group consisting of “words” formed
using the “alphabet” of elements of the G i for i ∈ I . More precisely, elements of
the free product consist of finite sequences (x1 , x2 , . . . , xn ) where each xk belongs
to some G i , any two successive terms belong to different groups and no term is the
identity element of any G i . One can define multiplication of “words” in the obvious
way. The essential point is to ensure that all of this is well-defined and that G exists.
If G 1 and G 2 are two groups, we denote their free product as G 1 ∗ G 2 . Now
suppose that X is a topological space with subspaces U and V . We have the induced
homomorphisms
π(U ) → π(X ), π(V ) → π(X )

as well as
i j
π(U ∩ V ) → π(U ), π(U ∩ V ) → π(V ),

arising from the inclusion maps. Then:

Theorem 5.18 (Seifert and van Kampen, 1933) Suppose that X is a topological
space with path-connected subspaces U and V . If U ∩ V is path-connected and
a ∈ U ∩ V , then
π(X, a)  π(U, a) ∗ π(V, a)/H

where H is the normal subgroup generated by the words i ∗ (g) j∗ (g)−1 where g ∈
π(U ∩ V, a).

This theorem allows us to compute, for instance, the fundamental group of the “figure
eight” as the free product Z ∗ Z since π(S 1 ) is Z and the intersection consists of a
point in this case. More generally, the same logic leads us to conclude that the
fundamental group of a flower with n petals is the n-fold free product of the integers.
In particular, if X is the union of two path-connected subspaces U and V with
U ∩ V simply connected, we deduce from Theorem 5.18 that π(X ) is isomorphic to
the free product of π(U ) and π(V ).
5.6 Applications 315

Exercises

1. Prove that the map constructed in Theorem 5.15 is continuous.


2. Show that any map f : S 1 → S 1 such that deg f = 1 has a fixed point.
3. Prove that if X and Y are homotopically equivalent, then πk (X )  πk (Y ) for every
k ≥ 1.
4. Suppose that f : B n → B n has no fixed point. Define r (x) as the point of inter-
section of the ray joining x to f (x) and S n−1 . Prove that r is a continuous function.
5. Suppose X is homeomorphic to B n . Show that any continuous map f : X → X
has a fixed point.
6. Let A be an n × n matrix with positive entries. For any v ∈ Rn define σ(v) to be
the sum the coordinates of v. Show that
Ax
f (x) =
σ(Ax)

defines a map f : Δn−1 → Δn−1 where Δn−1 denotes the (n − 1)-simplex.

5.7 Group Actions and Orbit Spaces

Recall that an action of a group G acting on a set X is a map G × X → X , which


we denote by (g, x) → g · x, satisfying
(i) 1 · x = x for all x ∈ X and
(ii) g · (h · x) = (gh) · x for all g, h ∈ G and x ∈ X .
Two elements x, y of X are said to be in the same orbit if there is an element g of G
so that g · x = y. Clearly, this is an equivalence relation and thus, an action of G on
X partitions X into disjoint orbits.
Now let G be a group acting on a topological space X . We also insist that in
addition to (i) and (ii) above, this action satisfies:
(iii) the map x → g · x is a homeomorphism of X for all g ∈ G.
That is, G defines a group of homeomorphisms of X . We can endow the space of
orbits, which we will denote by the notation, X/G, with a topology as follows. Let
p : X → X/G be the map that sends each point to its orbit. A subset U of X/G is
defined to be open if p −1 (U ) is open in X . (This is a special case of a more general
construction of the quotient topology induced on a set Y by a map f : X → Y from
a topological space X .)
For example, Z acts on R by translation: n · x = x + n for n ∈ Z and x ∈ R. The
orbit space is homeomorphic to S 1 . Another example is given by the action of Z/2Z
on S n given by (±1) · x = ±x. Thus the non-trivial element of Z/2Z takes x to its
antipodal point. The quotient space is called the real projective space and denoted
Pn (R).
316 5 Introduction to Algebraic Topology

Another important example concerns the following action of Z × Z on C. Take


τ ∈ C\R and define (m, n) · z = z + m + nτ . The quotient space C/Z2 is then
homeomorphic to a torus.
We say that G acts evenly or properly discontinuously if any point in X has a
neighborhood N such that g · N and h · N are disjoint for g = h. (The latter termi-
nology is more prevalent in the literature, but here we opt for the terser adjective as
introduced by Fulton, Algebraic Topology, p. 159.) The importance of even action
is given by the following theorem.

Theorem 5.19 If G acts evenly on X , then the projection map

p : X → X/G

is a covering map.

Proof The map p is continuous by construction of X/G. If N is as in the definition


of even action, then p −1 ( p(N )) is a union of open sets g · N for g ∈ G. We claim
that p(N ) is evenly covered. This is obvious from the definition of even action since
all the g · N are disjoint and homeomorphic to N which is in turn homeomorphic
to p(N ). Indeed, p provides a one-to-one correspondence between N and p(N )
because if p(x) = p(y) for x, y ∈ N , then there is some g ∈ G so that g · x = y.
This means y ∈ N ∩ g · N , contrary to the definition of even action. 

Observe that in both of the examples above, we have an even action. Thus, in both
cases,
p : X → X/G

is a covering map. If X is simply connected, by what we have shown in the previous


chapter, there is a one-to-one correspondence between π(X/G) and elements of
the fiber p −1 (b) for some b ∈ X/G. In the second example above, there are two
preimages for every element of Pn (R). We deduce that π(Pn (R))  Z/2Z for n ≥ 2
as S n is simply connected in this case.
An interesting example can be given with the group
 a b
S L 2 (Z) = : a, b, c, d ∈ Z, ad − bc = 1
cd

acting on the upper half-plane H = {z ∈ C : I m(z) > 0} given by



ab az + b
·z = .
cd cz + d

This defines an action (see Exercise 1). This, however, is not an even action because
even actions are fixed point free (see Exercise 2). That is, for an even action, g √
·x =x
for any element x implies g = 1 (see Exercise 2). In this example, z = i = −1 is
fixed by
5.7 Group Actions and Orbit Spaces 317
 
0 −1 10
= .
1 0 01

Therefore, H → H/S L 2 (Z) is not a covering map.


However, if we consider the group P S L 2 (R) acting on the upper half-plane, there
are situations where a discrete subgroup Γ of P S L 2 (R) acts evenly on H .
We now consider a path-connected space X with an even action by a group G.
Thus, p : X → X/G is a covering map. Fix e ∈ X and let b = p(e). Our goal now
is to find a relationship between G and π(X/G, b). Notice that

p −1 (b) = {g · e : g ∈ G}.

We will define an action of π(X/G, b) on the fiber p −1 (b). If γ ∈ π(X/G, b) then


there is a unique lift γ̃ of γ that begins at e and we have γ̃(1) ∈ p −1 (b) and so there
is a unique element gγ ∈ G such that γ̃(1) = gγ · e. Clearly, gγ depends only on the
path class of γ since the endpoint of the lift depends only on the path class. Thus,
we have a map
φ : π(X/G, b) → G

given by γ → gγ .

Theorem 5.20 The map φ is a homomorphism of groups.

Proof Consider two loops γ, η in X/G, based at b. Let γη be the unique lift of γη
that begins at e. Then, γη = γ̃(ηa ) where a = γ̃(1) and ηa is the unique lift of η that
begins at a. This is because γ̃(ηa ) is also a lift of γη that begins at e. Let η̃ be the
unique lift of η that begins at e. Since gγ · η̃ is also a lift that begins at gγ · e and
a = γ̃(1) = gγ · e, it follows that ηa = gγ · η̃. Hence,

γη(1) = (γ̃ηa )(1) = ηa (1) = gγ · η̃(1) = gγ · (gη · e) = (gγ gη ) · e.

It follows that φ is a homomorphism. 

Lemma 5.11 The kernel of φ is the subgroup p∗ π(X, e).

Proof The kernel consists of elements γ ∈ π(X/G, b) with φ(γ) = 1. These are the
elements for which γ̃(1) = e which are loops based at e. 

Corollary 5.3 The groups

π(X/G, b)/ p∗ π(X, e)

and G are isomorphic.

In particular, if X is simply connected, we have the isomorphism

π(X/G, b)  G.
318 5 Introduction to Algebraic Topology

This result allows us to deduce (yet again) the fact that π(S 1 )  Z from the even
action of Z on R given by translation. Similarly, we can deduce the fundamental
group of the torus.
We give one more example involving the Möbius strip. Let

X = {z ∈ C : z = x + iy, x ∈ R, 0 ≤ y ≤ 1}.

Define a transformation T on X by setting

T (z) = z + 1 + i

and let G be the cyclic group generated by T . Then, X/G is homeomorphic to the
Möbius strip. Moreover, the action is even and

X → X/G

is a covering map. We deduce immediately that the fundamental group of the Möbius
strip is Z.
We will compute the fundamental group of another important topological space,
namely the Klein bottle. This will turn out to be a non-abelian group.
Define the following transformations of the plane:

A · (x, y) = (x + 1, y); B · (x, y) = (−x, y + 1).

Notice that
(AB) · (x, y) = A · (−x, y + 1) = (−x + 1, y + 1)

and
(B A) · (x, y) = B · (x + 1, y) = (−x − 1, y + 1).

Thus, the group of tranformations of the plane generated by A and B is non-abelian.


We will leave as an exercise to show that this action is even. As R2 is simply con-
nected, we deduce that the fundamental group of the quotient space is the group
generated by A and B. It is infinite (since A, for example, has infinite order) and
non-abelian since AB = B A. The quotient space is called the Klein bottle.
The importance of P2 (R) arises in the classification theorem of surfaces. It was
defined above as a quotient space of S 2 where we identify antipodal points. In other
words, the southern hemisphere is identified with the northern hemisphere, which is

{(x, y, z) : x 2 + y 2 ≤ 1, z ≥ 0}.

It is easy to see that this is homeomorphic to the unit disk with antipodal points on
the boundary identified. Thus,

P2 (R)  B 2 /R
5.7 Group Actions and Orbit Spaces 319

where R is the relation x ∼ y if and only if x = y or x, y ∈ S 1 = ∂ B 2 and x = −y.


The reader should be able to see that this is really the “Möbius strip sewn onto a
disk.” It turns out that P2 (R) is essentially the only closed “non-orientable surface”
in the sense that all such surfaces are built out of it. We clarify this below.
Compact connected 2-manifolds are called surfaces. The 2-sphere S 2 , the torus
T = S 1 × S 1 and the real projective plane P2 (R) are examples of surfaces. It turns out
that every compact connected 2-manifold can be obtained from these three surfaces
by forming what is called the “connected sum.”
More precisely, let S1 and S2 be two disjoint surfaces. Their connected sum is
constructed as follows. Choose D1 ⊆ S1 and D2 ⊆ S2 so that D1 , D2 are each home-
omorphic to B 2 . Let h i : Di → B 2 be homeomorphisms. Define

S1 #S2 = (S1 \D1o ) ∪ (S2 \D2o )/R

where R is the equivalence relation given by (x, y) ∈ R if and only if x, y ∈ ∂ D1 ∪


∂ D2 and h 1 (x) = h 2 (y). We can now state:

Theorem 5.21 (Classification of surfaces) Any compact surface S is homeomorphic


to precisely one of the following:
(a)
S 2 # T #T # · · · #T g ≥ 0;
g times

(b)
S 2 # P2 (R)# · · · #P2 (R) n ≥ 1;
 
n times

In case (a), we say the surface is orientable and in the second, non-orientable. g
(and sometimes n) is called the genus of the surface.
Theorem 5.21 was initiated and carried through in the orientable case by A.F.
Möbius (1790–1868) in a paper he submitted for the Grand Prix de Mathématiques
of the Paris Academy of Sciences. He was 71 at the time. The jury did not consider
any of the submitted papers worthy of the prize and so the work of Möbius appeared
as just another mathematical paper in their proceedings. It is not clear who finally
proved the theorem in its full generality. Some ascribe it to H.R. Brahana, whose
paper appeared in the Annals of Mathematics in 1922.

Exercises

1. Show that the action of S L 2 (Z) on the upper half plane H given by

ab az + b
·z =
cd cz + d
320 5 Introduction to Algebraic Topology

is indeed an action.
2. If G acts evenly on a topological space, and g · x = x for some x ∈ X and g ∈ G,
show that g = 1.
3. Notice that S 3 is homeomorphic to

X = {(z, w) : z, w ∈ C, |z|2 + |w|2 = 1}.

We define an action of G = Z/qZ on X by a · (z, w) = (ζ a z, ζ a w) where ζ =


e2πi/q is a primitive qth root of unity. Show that this gives an even action of Z/qZ
on X . Deduce that π(X/G)  Z/qZ. (This space is called the Lens space.)
4. Show that every finitely generated abelian group can be realized as the funda-
mental group of some path-connected space.
5. Show that the fundamental group of the cylinder is Z.
6. Prove that the action defined on R2 describing the Klein bottle as a quotient space
is an even action.
7. With the action on R2 described by

A · (x, y) = (x + 1, y); B · (x, y) = (−x, y + 1)

show that AB A = B. Deduce that the fundamental group of the Klein bottle is the
group generated by the two elements A and B with only the relation AB A = B.

5.8 Automorphisms of Covering Spaces

Let p : (E, e) → (X, b) be a covering map of topological spaces. We will now


consider the group designated Aut (E/ X, p) of all covering transformations or
sometimes called deck transformations defined as homeomorphisms φ of E to E
satisfying p ◦ φ = p. We begin with:

Lemma 5.12 Let E be path-connected and p : (E, e) → (X, b) a covering map.


Suppose φ : E → E is a continuous map with p ◦ φ = p. If for some x1 ∈ E, we
have φ(x1 ) = x1 , then φ(x) = x for all x ∈ E.

Proof Let x be any point of E and let α : I → E be a path from x1 to x. Since


φ(x1 ) = x1 , the paths α and φ ◦ α both begin at x1 . Moreover, p ◦ α = p ◦ φ ◦ α
so that α and φ ◦ α are lifts of the path ( p ◦ α) : I → X . By the uniqueness lemma
(Lemma 5.9), α = φ ◦ α. In particular, the endpoints coincide so that φ(x) = x. 

It is clear that Aut (E/ X, p) acts on E simply by φ · x = φ(x). We would like to make
this into an even action (or equivalently, a properly discontinuous action) as defined
in the previous section. To ensure this, we need to assume that E is locally path-
connected. That is, each point of E has a neighborhood which is path connected. It
is not hard to see that the continuous image of a locally path-connected space is also
locally path-connected (Exercise 1).
5.8 Automorphisms of Covering Spaces 321

Theorem 5.22 If E is connected and locally path-connected, then the action of


Aut (E/ X, p) on E is an even action.
Proof Let x ∈ E. Let U be an evenly covered neighborhood of p(x). Thus,
p −1 (U ) = ∪ j V j and x ∈ Vk for some k. Let φ ∈ Aut (E/ X, p). If φ(x) = x, then φ
is the identity map by Lemma 5.12. Therefore, if φ = 1, we have φ(x) = x. Since
p ◦ φ(x) = p(x), it follows that φ(x) ∈ V j for some j. If V j = Vk , then φ(x) = x,
which is not the case. Thus, V j ∩ Vk = ∅. We may insist that U is path-connected
since E (and hence X , by Exercise 1) is locally path-connected. Therefore, each
of the sets Vi are path-connected and p ◦ φ(Vk ) = U so that φ(Vk ) ⊆ ∪i Vi . As the
V j are path-connected and φ(x) ∈ V j for x ∈ Vk we deduce that φ(Vk ) ⊆ V j and
Vk ∩ φ(Vk ) = ∅. Hence the action is even. 
Let p : (E, e) → (X, b) be a covering of topological spaces. In Sect. 3, we proved
that if Y is a connected space and

f :Y →X

is a continuous map, then any lift f˜ : Y → (E, e) if it exists is essentially unique.


We will now consider the question of when such a lift exists. Thus, given a map f :
(Y, c) → (X, b) of pointed spaces, does there exist an f˜ : (Y, c) → (E, e) making
the following diagram commute,
(E, e)
f˜ p

(Y, c) (X, b)
f
The diagram above induces the following diagram of fundamental groups:
π(E, e)
f˜∗ p∗

π(Y, c) π(X, b)
f∗

The homomorphism p∗ is injective (Exercise 4) and p∗ ◦ f˜∗ = f ∗ so that

f ∗ π(Y, c) = p∗ f˜∗ π(Y, c) ⊆ p∗ π(E, e).

Therefore, this is a necessary condition for the lift f˜ to exist. It turns out that this is
also a sufficient condition provided Y is locally path-connected. Though the proof
of this is not difficult, it is rather long and we omit it. For future reference, we state
it as:
Theorem 5.23 (Lifting criterion) Let p : (E, e) → (X, b) be a covering map. Sup-
pose Y is connected, locally path-connected and

f : (Y, c) → (X, b)
322 5 Introduction to Algebraic Topology

is a continuous map. Then, there is a lift

f˜ : (Y, c) → (E, b)

such that f (c) = b if and only if

f ∗ π(Y, c) ⊆ p∗ π(E, e).

We use this lifting criterion to determine when we can have “isomorphic” cover-
ings of a given topological space X .

Lemma 5.13 Let p1 : (E 1 , e1 ) → (X, b) and p2 : (E 2 , e2 ) → (X, b) be two cover-


ings of X , with E 1 , E 2 both connected and locally path-connected. If

p1 ∗ π(E 1 , e1 ) = p2 ∗ π(E 2 , e2 )

then there is a homeomorphism φ : (E 1 , e1 ) → (E 2 , e2 ) so that p2 ◦ φ = p1 .

Proof We have the diagram:


(E1 , e1 )
p1

(E2 , e2 ) (X, b)
p2
Since E 2 is connected and locally path-connected, p2 has a lift p̃2 by Theorem
5.23. By the uniqueness lemma (Lemma 5.9), this is unique and p1 ◦ p̃2 = p2 . Sim-
ilarly, we have a unique lift p̃1 : (E 1 , e1 ) → (E 2 , e2 ) so that p2 ◦ p̃1 = p1 . We thus
have
p̃1 p̃2
(E 1 , e1 ) → (E 2 , e2 ) → (E 1 , e1 )

so we can consider the composition

ψ := ( p̃2 ◦ p̃1 ) : (E 1 , e1 ) → (E 1 , e1 ).

Note that

p1 ◦ ψ = p1 ◦ ( p̃2 ◦ p̃1 ) = ( p1 ◦ p̃2 ) ◦ p̃1 = p2 ◦ p̃1 = p1 .

As ψ(e1 ) = e1 , we have by Lemma 6.1 that ψ is the identity map. Similarly, by


reversing the roles of E 1 and E 2 , we deduce p̃1 ◦ p̃2 = 1. Thus, p̃1 and p̃2 are
homeomorphisms and we may take φ = p̃1 to deduce the result. 

This lemma has a converse, the proof of which we leave as an exercise (Exercise 5).

Lemma 5.14 Let X be a Hausdorff space and let


5.8 Automorphisms of Covering Spaces 323

p1 : (E 1 , e1 ) → (X, b), p2 : (E 2 , e2 ) → (X, b)

be covering maps with E 1 , E 2 connected and locally path-connected. If there is a


homeomorphism
φ : (E 1 , e1 ) → (E 2 , e2 )

satisfying p2 ◦ φ = p1 , then

p1 ∗ π(E 1 , e1 ) = p2 ∗ π(E 2 , e2 ).

These lemmas allow us to make the definition that two coverings

p1 : (E 1 , e1 ) → (X, b) and p2 : (E 2 , e2 ) → (X, b)

are equivalent if there is a homeomorphism

φ : E1 → E2

satisfying p2 ◦ φ = p1 . (Note that base points of E 1 and E 2 are not necessarily


preserved by φ.)
By Theorem 5.4, we immediately deduce:
Theorem 5.24 Let p1 : (E 1 , e1 ) → (X, b) and p2 : (E 2 , e2 ) → (X, b) be two cov-
erings with E 1 , E 2 connected and locally path-connected. The two coverings are
equivalent if and only if

p1 ∗ π(E 1 , e1 ) and p2 ∗ π(E 2 , e2 )

are conjugate subgroups in π(X, b).


By combining Theorem 5.22 with Theorem 5.24, we will deduce the important result:
Theorem 5.25 Let p : (E, e) → (X, b) be a covering with E connected and locally
path-connected. If
p∗ π(E, e)  π(X, b)

then X is homeomorphic to E/G where G = Aut (E/ X, p).


Remark. Such coverings are called Galois coverings (or normal or regular cover-
ings in the literature).
Proof Since p∗ π(E, e)  π(X, b), we see that

p∗ π(E, e1 ) = p∗ π(E, e2 )

for any e1 ∈ p −1 (b). By Lemma 5.13, there is an element φ ∈ Aut (E/ X, p) with
φ(e2 ) = e1 . Thus, if p(e1 ) = p(e2 ), there is an element φ ∈ Aut (E/ X, p) such that
324 5 Introduction to Algebraic Topology

φ(e1 ) = e2 . Conversely, it is clear that if φ(e1 ) = e2 for some φ ∈ Aut (E/ X, p),
then p(e1 ) = p(e2 ). Therefore, Aut (E/ X, p) identifies points in E the same way p
does. This gives a one to one correspondence say ψ between X and E/G. These two
spaces are also homeomorphic as topological spaces because U in X is open if and
only if p −1 (U ) is open in E if and only if ψ(U ) is open in E/G. This completes the
proof. 

Applying Corollary 5.3, we obtain

Theorem 5.26 Let p : (E, e) → (X, b) be a covering map with E connected and
locally path-connected. If p∗ π(E, e)  π(X, b) then

π(X, b)/ p∗ π(E, e)  Aut (E/ X, p).

In particular, if E is simply connected, we deduce:

Theorem 5.27 If p : (E, e) → (X, b) is a covering with E simply connected and


locally path-connected, then

π(X, b)  Aut (E/ X, p).

This theorem allows us to compute the fundamental group of any topological space
X by first finding a covering E which is simply connected and locally path-connected
and then computing its automorphism group. The question of when a given space
possesses such a covering will be discussed in the next chapter.
For now, let us observe the similarity of Theorem 5.27 with the main theorem
in Galois theory. The fundamental group has been identified as the group of auto-
morphisms of the simply connected covering E. This analogy has other features that
parallel Galois theory, and these we take up in the next chapter.

Exercises

1. Show that the continuous image of a locally path connected space is locally path-
connected.
2. Show that a connected space which is locally path connected is path-connected.
3. Given an example of a space which is connected but not locally path-connected.
4. If p : (E, e) → (X, b) is a covering map, show that p∗ : π(E, e) → π(X, b) is
injective.
5. Prove Lemma 5.14.
5.9 The Universal Covering Space 325

5.9 The Universal Covering Space

In the last section, we showed that a covering p : (E, e) → (X, b) is determined


up to equivalence by the conjugacy class of the subgroup p∗ π(E, e). This raises
the question of whether for a given conjugacy class of subgroups of π(X, b) there
exists a covering p : (E, e) → (X, b) with p∗ π(E, e) belonging to the conjugacy
class. At one extreme, there is always the trivial covering (X, b) → (X, b) given by
the identity map and corresponding to the entire fundamental group. At the other
extreme, the covering p : (E, e) → (X, b) corresponding to the trivial subgroup is
very interesting since then, we may realize π(X, b) as Aut (E/ X, p) by the results
of the last section. Such a covering E is called the universal covering space of X ,
when it exists.
Suppose p : (E, e) → (X, b) is a covering with E simply connected. If x ∈ X
and y ∈ p −1 (x), then there is an evenly covered neighborhood U of x with p −1 (U ) =
∪ j V j , y ∈ Vk for some k and each V j is homeomorphic to U . Let V = Vk . We have
the diagram:
V E
p|V p
i
U X
which induces the following diagram of fundamental groups:
π(V, y) π(E, y)
(p|V )∗ p∗
i∗
π(U, x) π(X, x)
Since p|V is a homeomorphism, ( p|V )∗ is an isomorphism. Since π(E, y) is
trivial, we deduce that i ∗ must be the trivial homomorphism. In other words, if X
has a universal covering, every point x ∈ X has a neighborhood U such that the
homomorphism π(U, x) → π(X, x) is trivial. In such a situation, we say that X is
semilocally simply connected.
Thus, combining the results of the last section, we see that for a space X that
is connected and locally path-connected to have a universal covering space, it is
necessary that it be semilocally simply connected. This condition turns out to be
sufficient.
Most spaces we encounter will satisfy these conditions. However, there are spaces
which do not have universal coverings. For example,

X = ∪n≥1 Cn

where Cn is the circle in R2 of radius 1/n and center (1/n, 0) is not semilocally
simply connected at (0, 0) (see Exercise 1). Thus, this space fails to have a universal
covering.
326 5 Introduction to Algebraic Topology

We will now sketch the proof of:


Theorem 5.28 Let X be connected and locally path connected. Then X has a uni-
versal covering if and only if X is semilocally simply connected.
Proof (Sketch) We saw above that the condition is necessary. Now we show that it is
sufficient. We construct the universal covering E as follows. Fix a base point b ∈ X .
Let E be the equivalence classes of homotopic paths that begin at b. Thus,

E = {[α] : α : I → X, α(0) = b, [α] = [β] ⇐⇒ a ∼ β}.

Define a map p : E → X by setting p([α]) = α(1). By the theorem on lifting of


homotopies (Theorem 5.11), this is well-defined and independent of the representa-
tive we choose from the class. We have to put a topology on E to make it simply
connected. Let U be an open set in X and α a path from b to c (say) in U . Define

[U, α] = {[αβ] : β : I → X, β(0) = α(1), β(I ) ⊆ U }.

In other words, [U, α] consists of equivalence classes of paths αβ with β lying in


U and beginning at α(1). We use the sets [U, α] as a basis for a topology on E and
leave to the reader (Exercises 2 and 3) the verifications that they indeed generate a
topology. With this topology, the map p : E → X is continuous. Indeed, if U is open
in X and p −1 (U ) is empty, we are done. If [α] ∈ p −1 (U ), then

p([U, α]) = {(αβ)(1) : [αβ] ∈ [U, α]


= {β(1) : [αβ] ∈ [U, α]}
⊆ U

so that
p −1 (U ) = ∪[α]∈ p−1 (U ) [U, α]

which is open in E. Thus, p is continuous.


It remains to show that p : E → X is a covering. Thus, for each x ∈ X , we have
to find a neighborhood which is evenly covered. This is where the semilocally simply
connected property of X is used. Let V be an open neighborhood of x which is path-
connected and for which every closed path based at x is equivalent to the constant
path at x. Now
p −1 (V ) = ∪[α]∈ p−1 (V ) [V, α].

We claim that the [V, α] are either equal or disjoint. Indeed, if [γ] ∈ [V, α] ∩ [V, β],
by Exercise 2,
[V, γ] = [V, β], [V, γ] = [V, α]

so that [V, α] = [V, β]. Thus, the [V, α]’s are disjoint. The map pα = p|[V, α] :
[V, α] → V is continuous and surjective. To prove it is injective, suppose pα (αβ) =
pα (αγ). Then β and γ have the same endpoints. The path βγ −1 is a closed path in
5.9 The Universal Covering Space 327

V and thus equivalent to the constant path by our choice of V . Thus, β ∼ γ and
[αβ] = [αγ]. To complete the proof, we need to check that pα−1 is continuous. We
leave this to the reader (Exercise 4).
Finally, one needs to check that E is simply connected. First we show that E is
path-connected. Let ẽ denote the class of the constant path at b. Let [α] ∈ E. Define
α̃ : I → E by α̃(s) = [αs ] for s ∈ I and αs (t) = α(st), for t ∈ I . Then, α̃ is a path
in E from ẽ to [α]. This establishes path-connectedness.
If β is a closed path in E based at ẽ, by uniqueness of liftings

β = p
◦β

so that
[ p ◦ β] = [ p ◦ ( pβ)] = [ pβ(1)] = ẽ

which is the constant path. Hence E is simply connected. This completes the proof.


We can now state:

Theorem 5.29 (Fundamental Theorem of Covering Spaces) Let X be connected,


locally path-connected and semilocally simply connected. If H is a subgroup of
π(X, b), then there is a covering

p H : (X H , b H ) → (X, b)

unique up to equivalence of coverings, such that

H = p H ∗ π(X H , b H ).

This correspondence is one to one. Moreover, if H is normal in π(X, b), then X H is


a Galois covering and conversely.

Proof Let p : E → X be a universal covering of X , guaranteed by Theorem 5.28.


Let G = Aut (E/ X, p) be the group of covering transformations. Since

G  π(X, b)

by Theorem 5.27, we may take X H = E/H and p H the map induced by p. This
space has the required property.

We can apply this to study complex manifolds of dimension one which are called
Riemann surfaces. Roughly speaking, it is a topological space X for which every
point has a neighborhood homeomorphic to the unit disk in C.
The simplest example is of course C itself. Another example is the projective line
over C, P1 (C) called the Riemann sphere. This is the one point compactification
328 5 Introduction to Algebraic Topology

of C. A third example is given by the upper half-plane H introduced in an earlier


section.
A famous theorem of complex analysis states that any simply connected Riemann
surface is homeomorphic to one of these three spaces. This theorem is called the
uniformization theorem.
This allows us to classify all compact connected Riemann surfaces. As 2-
manifolds, we can show they are orientable and we can speak of their genus via
the classification of surfaces. A Riemann surface of genus zero is homeomorphic to
the Riemann sphere. If it is of genus one, it is homeomorphic to C/L for some lattice
L in C and we call it an elliptic curve. If it has genus ≥ 2, one can show it must be
of the form H/Γ for some discrete subgroup Γ ⊆ P S L 2 (R). Moreover, every such
discrete group gives rise to a Riemann surface. Every curve defined by a polynomial
equation
f (x, y) = 0 f ∈ C[x, y]

determines a compact connected Riemann surface M where we identify the C-points


of the curve with points of M. Conversely, every Riemann surface determines a plane
algebraic curve defined over the complex numbers. (We are assuming here that our
curves have no singularities.)

Exercises

1. Prove that
X = ∪n≥1 {(x, y) : (x − 1/n)2 + y 2 = 1/n 2 }

is not semilocally simply connected at (0, 0).


2. In the proof of Theorem 5.28, show that if [γ] ∈ [U, α], then [U, γ] = [U, α].
3. Prove that the intersection of two elements of the form [U, α] is again a union of
sets of this form.
4. In the proof of Theorem 5.28, check that pα−1 is continuous.

5.10 Suggestions for Further Reading

Here is a brief guide to some books in algebraic topology. It is by no means exhaus-


tive, but a cursory glance ranging from elementary texts to the more advanced ones.
I have ranked the books in what I perceive as their level of difficulty.

T.W. Gamelin and R.E. Greene, Introduction to Topology, Saunders Series, 1983.
5.10 Suggestions for Further Reading 329

This is a beautiful and brief introduction to the subject. One of the nice features is that
it treats the Borsuk–Ulam theorem without introducing homology or cohomology.
It is highly recommended.

C. Kosniowski, A First Course in Algebraic Topology, Cambridge University Press,


1980.

This has the features of the previous book and more. A full semester course can
be based on the book since the chapters are short. The emphasis is on fundamen-
tal groups and covering spaces though there is a brief chapter at the end on singular
homology. This is highly recommended for the serious student of algebraic topology.

K. Jänich, Topology, Undergraduate Texts in Mathematics, Springer-Verlag, 1984.

This is a very informal introduction to the subject with more emphasis on the intuitive
aspects. Its drawback is that in some cases there are no proofs and the treatment is
sloppy. But its colloquial style makes up for it. It is recommended reading on the bus
or for bedtime.

W. Massey, Algebraic Topology, An Introduction, 1967.

This is a standard text for the subject. But it is lengthy and it is probably impossible
to cover all of it in a semester. Homology and cohomology are not discussed at all,
nor is higher homotopy theory.

W. Fulton, Algebraic Topology, A First Course, Graduate Texts in Mathematics,


Springer-Verlag, 1995.

This is definitely an excellent text. I would have preferred an alternate arrangement


of contents. Covering spaces and fundamental groups are treated in the middle of
the book after a discussion of homology and cohomology. The beautiful aspect of
the book is its discussion of Riemann surfaces seldom found in introductory texts in
algebraic topology.

M. Greenberg, Lectures on Algebraic Topology, Benjamin, 1967.

This book was later reprinted as Algebraic Topology, A first course by M. Greenberg
and J. Harper. The first part, consisting of about 30 pages on elementary homotopy
theory, is very accessible and can be covered within a month.

M. Armstrong, Basic Topology, Undergraduate Texts in Mathematics, Springer-


Verlag, 1983.

This book is highly recommended with a gentle introduction to simplicial homology.


The book ends with a proof of the Lefschetz fixed point formula and a discussion of
knot theory.
330 5 Introduction to Algebraic Topology

J. Dieudonné, A History of Algebraic and Differential Topology, 1900-1960,


Birkhäuser, 1989.

This is a voluminous text full of interesting material. Reading it linearly is probably


not a good idea. Part 3 of Chap. 1 of this book is relevant to our chapter. One gains a
historical perspective on the evolution of these concepts by reading it. But definitely,
this book is not bedtime reading. It is a serious text.

G. Bredon, Topology and Geometry, Graduate Texts in Mathematics, Springer-


Verlag, 1993.

This is an excellent compendium with emphasis on manifold theory. Chapter 3 cov-


ers the theory of the fundamental group and covering spaces. It is rich in examples,
especially to the study of topology of Lie groups.

J. Rotman, An Introduction to Algebraic Topology, Graduate Texts in Mathematics,


Springer-Verlag, 1988.

Rotman has a strong algebraic flavor, and this appeals to many. Surprisingly, he does
the theory of covering spaces near the end of the book. Chapters 1–3, Chaps. 10 and
11 are relevant to our chapter. The intuitive style in some places of the book makes
for pleasant reading.

H. Sato, Algebraic Topology, An Intuitive Approach, Translations of Mathematical


Monographs, Vol. 183, American Math. Society, 1996.

Despite its title, this is not an easy book to read. Its merit is a painless introduction
to fiber bundles and spectral sequences.

M. Waldschmidt, et al., From Number Theory to Physics, Springer-Verlag, 1992.

This is a collection of articles. The article by E. Reyssat entitled “Galois theory


for coverings and Riemann surfaces” is excellent and should be read by every stu-
dent of number theory. I would also recommend the article by J-B. Bost, entitled
“Introduction to Compact Riemann Surfaces, Jacobians and Abelian Varieties.”
Index

A Big O notation, 25
Abelian functions, 19 Bolzano, B., 19
Abel’s theorem, 273 Bolzano-Weierstrass theorem, 47
Absolute value, 9 Borel, E., 49
Algebraic topology, 293 Bott, R., 293
Almost everywhere, 110 Bound variable, 3
Alternating tensor, 93 Boundary, 97
Analytic, 198 Bounded, 12, 17
Analytic continuation, 264 Bounded above, 17
Ananda-Rau, K., 273 Bounded below, 17
Antipodal point, 315 Bounded linear transformation, 130
Argument principle, 239 Bounded sequence, 18, 47
Arithmetic mean geometric mean inequality, Branch of the logarithm, 252
73 Brouwer fixed point theorem, 296, 311, 312
Arithmetic progression topology, 299 Brouwer, L.E.J., 293, 313
Atiyah, M., 293 Buddha, 10, 41
Axiom of choice, 136
Axiom of infinity, 5
Axiom of regularity, 5 C
Axiom schema, 3 Calabi, E., 80
Calculus of residues, 228
Cantor, G., 1
B Carleson, L., 131
Babylon, 10 Cartesian product, 5
Baire’s theorem, 133 Casorati-Weierstrass theorem, 218
Banach space, 127 Cauchy, A., 13, 67
Banach–Steinhaus theorem, 132, 134 Cauchy-Goursat theorem, 208
Base, 103 Cauchy-Riemann equations, 198
Basel problem, 189 Cauchy-Schwarz for integrals, 47
Base point, 300 Cauchy-Schwarz inequality, 45, 72, 113,
Basis of open sets, 299 128, 168
Bernoulli trials, 195 Cauchy sequence, 13, 15, 20, 40
Bessel differential equation, 39 Cauchy’s estimates, 220
Bessel function of the first kind, 39 Cauchy’s theorem, 225
Bessel inequality, 120, 121 Cauchy’s theorem for a convex set, 210, 211
Beta function, 78, 266, 267 Cauchy’s theorem for a triangle, 208
Bhaskaracharya, 42 Central limit theorem, 192, 193
© Hindustan Book Agency 2022 331
M. R. Murty, A Second Course in Analysis, IMSc Lecture Notes in Mathematics,
https://doi.org/10.1007/978-981-16-7246-0
332 Index

Chain rule, 54, 59 D


Change of basis matrix, 94 Deck transformations, 320
Change of variables, 75 Dedekind cuts, 13, 17
Characteristic function, 105 Dedekind, R., 6, 13
Chebotarev density theorem, 283 Deformation retract, 304
China, 10 Degenerate point, 58
Chudnovski, G., 188 Deligne, P., 294
Classes, 1 Density function, 193
Classification of surfaces, 318, 319 Derivative, 58
Closed ball, 104 Derivative in a Banach space, 134
Closed form, 100 Differential forms, 92, 96
Closed path, 204 Differentiation under the integral sign, 152
Closed rectangle, 48 Directional derivative, 52
Closed sets, 49, 103, 295 Dirichlet kernel, 131, 183, 184
Closed subspace, 115 Dirichlet series, 246, 250, 272
Compact-open topology, 312 Dirichlet’s theorem, 285
Comparison test, 23 Div, 87
Complete, 114 Divergence, 95
Completeness, 16 Divergence theorem, 87, 89
Complex measurable function, 104 Diverges, 23
Complex measure, 104 Domain, 6
Component functions, 60 Dominated convergence theorem, 109, 132,
Confucius, 10 150, 235
Connected space, 296 Dot product, 40
Connected 2-manifolds, 319 Dual basis, 93
Connected sum, 319 Dual space, 93, 131, 138, 140, 142
Conservative vector field, 60 du Bois Reymond, P., 131
Continuity, 21
Contractible space, 294 E
Contractible to a point, 305 Eigenvalues, 73
Contraction, 48 Electromagnetism, 90
Contraction mapping theorem, 48 Ellipsoid, 74
Convergence, 15, 17 Elliptic curve, 328
Convergent sequence, 17 Equidistribution, 178
Converges, 23 Equivalence relation, 7
Convex, 115, 297 Erdös, P., 273
Convolution, 146 Essentially bounded, 111
Convolution theorem, 158 Essentially bounded functions, 132
Coprime, 11 Essential singularity, 215, 217
Cosine law, 43 Essential supremum, 132
Countably additive, 104 Euclidean space, 40
Counting measure, 105 Euler, L., 126, 189
Covering map, 306 Euler-Maclaurin summation formula, 181
Covering spaces, 306 Euler’s constant, 26, 270
Covering system, 231 Evenly-covered, 306
Covering transformations, 320 Exact differential, 91
C r , 31 Exact form, 100
Critical points, 53 Existential proposition, 2
Cross product, 46, 99 Existential quantifier, 2
Curl, 60 Expectation, 193
Curvature, 50 Exponential type, 286
Curve, 50, 204, 251 Extended real line, 104
Cylindrical co-ordinates, 69 Extensionality, 3
Index 333

Exterior derivative, 97 Hamilton, William Rowan, 43


Hardy, G.H., 273
Harmonic conjugates, 224
F Harmonic functions, 198
Fatou’s lemma, 109 Harmonic series, 26
Féjer kernel, 172, 173, 175, 184 Hausdorff maximality principle, 137
Feynman, R., 155 Hausdorff maximality theorem, 136
Fiber, 306 Heat equation, 190
Finite order, 256 Heine-Borel theorem, 49
First category, 135
Heine, H.E., 49
Fourier, J., 145
Heisenberg’s uncertainty principle, 170
Fourier series, 182
Heisenberg, W., 169
Fourier transform, 118, 148
Fréchet derivative, 135 Heron’s formula, 73
Freedman, M., 294 Hessian matrix, 57
Free product, 314 Higher homotopy groups, 312
Free variable, 3 Hilbert space, 114
Fubini, G., 145 Hölder’s inequality, 129, 130, 141
Fubini’s theorem, 145 Holomorphic, 198
Function, 5 Homeomorphism, 295
Functional, 131 Homotopically equivalent, 293, 312
Functional analysis, 138 Homotopy, 251, 293, 296
Functor, 301 Homotopy lifting theorem, 307
Fundamental group, 293, 300 H-space, 305
Fundamental theorem of algebra, 220, 241, Hurwitz, A., 176
310 Hurwitz’s theorem, 241
Fundamental theorem of covering spaces,
327

I
G Identically distributed random variables, 193
Galois covering, 323 Implicit function theorem, 67
Gamma function, 78, 264 Incompressible, 89
Gauss, C.F., 87 Increasing sequence, 18
Gelfond-Schneider theorem, 188 Independent random variables, 193
Generalized mean value theorem, 39 Index of z, 204
Genus, 319 India, 10
Global Cauchy theorem, 224 Infimum, 17
Goursat theorem, 208 Infinite series, 23
Gradient, 53 Inner product, 40, 93
Gram–Schmidt process, 122 Inner product space, 112, 168
Grassmann, Hermann, 43
Integers, 8
Greatest lower bound, 12, 17
Intermediate value theorem, 37
Green, G., 82
Intersection of sets, 4
Green’s theorem, 176
Grothendieck, A., 294 Interval, 16
Group action, 315 Invariance of domain, 313
Inverse function theorem, 62, 204
Inverse relation, 4
H Inversion theorem, 159
Hadamard factorization, 260, 270 Irrotational, 60
Hadamard, J., 273 Isolated singularity, 214
Hadamard three circle theorem, 222 Isometry, 120, 166
Hahn–Banach theorem, 135, 136, 143 Isoperimetric inequality, 176
334 Index

J M
Jacobian determinant, 59 Madhava, 29
Jacobian matrix, 58, 60, 75 Matrix, 43
Jensen’s theorem, 256, 258 Maximum modulus principle, 218, 219
Maximum principle, 218
Maxwell equations, 90
K Mean, 193
Kelvin, Lord, 82, see also W. Thomson Mean value theorem, 39
k-forms, 96, 97 Measurable function, 104
Klein bottle, 318, 320 Measurable sets, 104
Kronecker delta function, 93 Measure space, 104
k-tensor, 92 Meromorphic function, 228
Method of contradiction, 11
Metric, 16, 40
L Metric spaces, 40
Lagrange multiplier method, 70, 71 Minimum modulus theorem, 219
Lagrange, J.L., 67, 71 Minkowski’s inequality, 128
Lao-Tzu, 10, 41 Mirsky, L., 232
Laplace’s equation, 198 Mittag-Leffler theorem, 262
Laurent series expansion, 215 Mixed partial derivative, 53
Least upper bound, 12, 17 Möbius, A.F., 319
Least upper bound property, 12 Möbius function, 282
Lebesgue integrable, 109 Möbius strip, 318
Lebesgue integral, 106, 107
Monodromy theorem, 309, 310
Lebesgue’s dominated convergence theo-
Monotone convergence theorem, 107, 167
rem, 109, 152
Monotonic sequence, 18
Lebesgue’s lemma, 303, 307
Morera’s theorem, 212
Lebesgue’s monotone convergence theorem,
Multilinear, 92
107
Lebesgue, H., 103
Lefschetz, S., 293
Left continuity, 21 N
Left derivative, 31 n-chains, 97
Legendre’s duplication formula, 267 Neighborhood base, 103
Leibniz, G., 29, 67 Newman, D.J., 232
Leibniz’s rule, 156 Newton, I., 29, 67
Lens space, 320 n-manifold, 295
Level sets, 73 Non-orientable, 319
Lift, 306 Norm, 112
Lifting criterion, 322 Normal covering, 323
Lifting problem, 306 Normed linear space, 127
Lim inf, 19 Nowhere dense, 135
Lim sup, 19 n-simplex, 305
Linearly ordered, 136
Liouville’s theorem, 219
Little o notation, 26 O
Littlewood, J.E., 273 One-to-one, 5
Local Cauchy theorem, 207 Open ball, 103
Locally path connected, 320 Open interval, 16
Logarithm, 251 Open map, 243
Logarithmic derivative, 239 Open mapping theorem, 242
Loop, 299, 300 Open rectangles, 48
Loop space, 312 Open sets, 103, 295
Lower bound, 12, 17 Orbit space, 315
Index 335

Ordered pair, 4 Principal part, 215


Order of a function, 256 Probability distribution, 192
Order of an analytic function, 213 Product measure, 145
Order of a pole, 215 Projective space, 315
Orientable, 87, 319 Properly discontinuous, 316
Orientation, 94 Pseudo-metric, 40
Oriented, 87 Punctured ball, 104
Orthogonal, 115 Punctured disk, 214
Orthonormal sets, 118 Pythagoras, 10, 41
Ostrogradski, M., 82 Pythagoras theorem, 115

P Q
PageRank algorithm, 313 Quadratic form, 73
Paley-Wiener theorem, 285, 287 Quadric surfaces, 74
Parallelepiped, 78 Quantifier, 2
Parallelogram law, 112 Quotient topology, 315
Parseval’s formula, 118
Parseval’s identity, 121
Partial derivatives, 52 R
Partially ordered set, 136 Radius of convergence, 36, 199
Partial sum, 23 Radon–Nikodym theorem, 111, 142
Partial summation, 203 Random variable, 192
Path, 204 Range, 6
Path-connected, 251, 300 Ratio test, 24
Path connectedness, 296 Rational numbers, 9
Path integral, 204 Real measurable function, 104
Path lifting theorem, 307 Real measure, 104
Peano, G., 6 Real numbers, 13
Perelman, G., 294 Region, 207
Perron, O., 313 Regular covering, 323
Peter-Weyl theorem, 283 Regular function, 198
Phragmén-Lindelöf theorem, 224, 256, 259 Relation, 4
Picard, E., 218 Relative topology, 295
Picard’s theorem, 218 Removable singularity, 214
Plancherel’s theorem, 166, 286 Representable analytic functions, 199
Plancherel transform, 166 Residue, 228
Planck’s constant, 172 Residue theorem, 228
Plato, 41 Retract, 304
Poincaré conjecture, 294 Retraction, 304
Poincaré, H., 293, 296 Riemann, G.F.B., 190
Pointwise convergence, 29 Riemann-Lebesgue lemma, 164, 182, 184
Poisson summation formula, 185, 186, 249 Riemann sphere, 327
Polar co-ordinates, 69 Riemann surfaces, 327
Positive definite, 93 Riemann zeta function, 190, 245, 272
Positive measure, 104 Riesz-Fischer theorem, 121, 126, 129
Potential function, 60 Riesz representation theorem, 116
Poussin, C. de la Vallée, 273 Rig Veda, 43
Power series, 35, 199 Right continuity, 21
Power set axiom, 4 Right derivative, 31
Prime number, 11 Right hand rule, 46
Prime number theorem, 272 Root test, 25, 36
Principal branch of logarithm, 252 Rotation, 60
336 Index

Rouché’s theorem, 239, 240 T


Russell, B., 2 Tangent space, 74, 95
Russell paradox, 2, 3 Tangent vector, 50
Tauber, A., 273
Tauberian theorem, 272, 277
Taylor polynomial, 37
S Taylor series, 36
Saddle points, 57, 58 Taylor’s theorem, 36
Sample space, 192 Taylor’s theorem for several variables, 54, 56
Sato-Tate theorem, 283 Tensor product, 92
Scalar field, 50 Theta function, 191
Schrödinger, E., 169 Thomson, W., 82
Schwartz, L., 169 Topological group, 302
Schwarz, H.A., 169 Topological space, 103, 295
Schwarz’s lemma, 221, 224 Topologist’s sine curve, 300
Second category, 135 Torque, 46
Seifert-van Kampen theorem, 314 Total differential, 59
Selberg, A., 273 Triangle inequality, 16, 40, 113
Semilocally simply connected, 325 Trigonometric polynomial, 123, 172
Sgn function, 93
Sheets, 306
U
σ -algebra, 104
Unbounded, 17
Sign of a permutation, 93 Uncertainty principle, 168
Simple closed curve, 83 Uniform convergence, 30
Simple function, 106 Uniform distribution, 178
Simple pole, 230 Uniformization theorem, 328
Simply connected, 91, 251, 301 Uniformly bounded, 132, 134
Singleton, 3 Unit tangent vector, 50
Singularities, 213 Universal covering space, 325
Singular n-cube, 97 Universal proposition, 2
Sink, 89 Universal quantifier, 2
Smale, S., 294 Upanishads, 42
Snell’s law, 62 Upper bound, 12, 17
Source, 89 Upper half-plane, 285
Source-free, 89
Spherical co-ordinates, 69
Star-shaped, 301 V
Steklov Institute, 294 Variance, 193
Stereographic projection, 301, 313 Vector field, 50, 95
Stirling’s formula, 268 Vector-valued functions, 50
Velocity vector, 50
Stokes theorem, 60, 89, 92, 99
Volume element, 95
Strictly increasing, 18
Volume of the hypersphere, 80
Subsequence, 19
Von Mangoldt function, 280
Subspace, 115 Von Neumann - Gödel-Bernays set theory, 2
Successor function, 7
Sulva Sutras, 41
Sup norm, 34 W
Supremum, 17 Wedge product, 94
Surface area of the hypersphere, 80 Weierstrass M-test, 33
Surfaces, 319 Weierstrass, K., 19, 173
Symmetric matrix, 73 Weil, A., 294
Symmetric tensor, 93 Well-defined, 8, 16
Index 337

Weyl criterion, 178, 179 Z


Wiener-Ikerhara Tauberian theorem, 272 Zermelo-Frankel set theory, 2
Winding number, 205 Zeros and singularities, 213
Wirtinger’s inequality, 178 Zorn’s lemma, 136

You might also like